Predictive Analytics: Better than human intuition?

In 2014, ProPublica found that the formula used in Broward County to assign risk scores to residents indicating their likelihood to commit a violent crime within two years was “particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defenders,” and that “white defendants were mislabeled as low risk more often than black defendants.” The formula was developed by a for-profit company Northpointe, which did not agree with the results of the analysis. Northpointe’s analysis is based on 137 questions which do not explicitly ask about race.

If race was not an explicit consideration in Broward County, how were black people disproportionately falsely flagged as potential offenders? Is this a cautionary tale of the future of predictive analytics?

I spoke with Dr. Elizabeth Linos, Vice President and Head of Research and Evaluation at Behavioral Insights Team North America, to explore the future of algorithms and their use in cities.

Dr. Linos believes that data is certainly powerful in supporting evidence-based decision-making. But because of the potential power of predictive analytics and algorithms, it’s important to examine the nuance inherent in the tool.

Predictive analytics “sounds like it’s predictive about something in the future – fires, code violations, drug activity – but when these models use historical data, they are effectively identifying patterns in that data, and not necessarily predicting the future.” For instance, if a police department uses history of arrests in a model to identify where foot patrol should be increased, the model will point to communities with historically high history of arrests. But it’s possible that using the historical arrest data would reflect a legacy of stop and frisk policies that led to increases in arrests by people of color in those communities. This example illustrates that algorithms are unable to distinguish between patterns based on historical practices that may be unjustice toward certain groups vs. patterns in individual behavior. They just identify patterns in the data.

Building a better model

When determining which inputs to use in an algorithm–and when and whether race should be a factor–cities should consider what they want to get out of the model. As Dr. Linos puts it, “is the goal to get the best predictor, or is the goal to weight previous injustices in a way that equalizes future outcomes?” In other words, do you want to predict which communities are likely to have higher arrests, or is your goal to adjust policing strategies in a way that serves rather than targets certain communities? The decision of whether to use race is less a statistical question than one of policy and context. According to Dr. Linos, it’s not necessarily better to take race out of the equation, but “we need to understand what we are looking at, and how we act on it.”

“Is it better than human intuition?”

The value of predictive models is in whether the results are better than human intuition, which in most instances they probably are.

As discussed above, overpolicing a neighborhood with a history of high arrest rates as a result of an algorithm can reinforce historical bias, which can be damaging to the community. In this case, it may be worth examining whether the algorithm’s predictions are more accurate than the flip of a coin, to assess whether it’s worth using the model if it could potentially cause harm. Results may be more accurate than a coin toss or linear regression, but Dr. Linos says how much better has not really been studied. On the other hand, if the alternative to using the predictive model is relying on human decision making with its own entrenched biases, the algorithm may be a better tool. Whichever city leaders use to make decisions, Dr. Linos notes, it’s critical to remember that “you direct what the machine does–algorithms do not direct policy decisions.”

Taking action

The theory behind using a model rather than a person to make a judgment is that the model is less biased. Of course, as discussed above, data used over time can help in identifying patterns in existing historic data. It’s important for city leaders to understand the results of the model, and that means understanding causation before taking action. In the example of predicting arrests, city leaders have a choice of action to take: increase foot patrol in neighborhoods that are shown to have high rates of arrests in the past, or examine the behavior of individual police officers that may be causing that pattern to emerge, to understand what caused certain neighborhoods to have disproportionate interactions with police in the first place.

It’s also important to test the accuracy of the model. For governments, predictive analytics is often touted as a solution for fixing city problems, from prioritizing buildings for fire code inspections and restaurant inspections, to criminal justice decisions. Governments are increasingly using algorithms to determine sentencing, identify police officers that may have adverse interactions with the public, plan police patrols, set bail, and make parole decisions. In the criminal justice field in particular, the future of one individual can be decided based on population level data, and that data may include implicit bias against certain races, as well as socioeconomic status and gender.

Dr. Linos points that algorithms are “more powerful” and potentially less biased when they do not include datasets that are linked to what is being predicted. To test these models, use “a blend of predictive models and random sample” to schedule code inspections, restaurants inspections, increased police presence, etc. Monitor whether the model gives you better results than the random sample. Experiment with those non-traditional inputs that may simply be reinforcing historical behavior, and see if the results are more accurate than a coin toss.

How do you use predictive analytics in your city? Tweet us! @gov_ex @elizabethlinos @B_I_Tweets @KatKlosek