“Data science” can mean many things to many people. Some people believe that data scientists are computer programmers; some think data science is deeply mathematical and statistics-based; and some believe data scientists are researchers who test hypotheses and implement change.
All of these beliefs are correct because data scientists wear many hats. At GovEx, we use data science tools to address and pose solutions to civic-minded issues – we call this work advanced analytics. Advanced analytics projects are led by our data scientists who are passionate about using advanced analytics concepts to drive policy and service change in cities.
GovEx recently wrapped up an advanced analytics project in Syracuse, New York where we ultimately presented the city with a model that evaluates differences between actual and predicted housing sale prices. Over the course of the project, the team met virtually each week, cleaned and merged data, and produced visualizations, recommendations, and deep insights. We capped off the project with a predictive machine learning model.
Here are four key tenets of structuring advanced analytics projects and teams that the GovEx team believes are crucial:
Gather a robust team of problem solvers who want to learn
Ensure that your team is well-rounded in thought, skill, and that members of your team are adept with problem solving. When an issue pops up, as it likely will – this is data science – you need your team to be able to tackle the issue and move forward. But, perhaps the most important quality that your data science team members need is a willingness to learn. To work on this type of project, we encourage a can-do attitude, willingness to learn as you go, and the ability to ask questions when necessary. While data science topics can seem quite complex at first, anything can be learned over time.
Remain thoughtful and curious about your data
GovEx spent many months working with Syracuse to learn the data, test for data quality, discuss novel patterns within the data and merge datasets. This step in the applied analytics process, although lengthy, was crucial because it allowed us to establish an understanding of pertinent background information. Without all of the knowledge that we gained from those conversations, the analysis might not have been successful due to lack of clarity around variables.
Parse out clear tasks and a timeline, but try to maintain some flexibility
While there may be some clear, distinct tasks that you are interested in completing, the road toward completing those tasks may twist and turn more than you had expected.
Be prepared to do things again, and again. And then, maybe, again
Data science projects are often cyclical, meaning that iterations of one or more parts of the project are common. During the Syracuse project, in order to really know whether or not our analysis was accurate, we needed to run multiple tests, create different versions of machine learning models and double-check the math behind our work. We did this because we wanted to ensure that the product we were providing was as strong as possible, we wanted to improve the accuracy rate of our model, and we wanted to check for understanding. We recommend that when you are structuring a data science project, you allow some time for repetition.