Software now widely used by courts to predict that criminals are likely to engage future crimes might be no more accurate than routine people with presumably little to no criminal judge expertise, a new study finds.
Predictive algorithms now regularly make recommendations regarding music, ads, health care, stock trades, auto insurance, and bank loans, among other things. In the criminal judge system, such algorithms have been used to predict where crimes will likely occur, who is likely to engage violent crimes, who is likely to fail to appear at their court hearings, and who is likely to repeat criminal behavior in the future.
One criminal hazard analysis tool, Correctional Offender Management Profiling for Alternative Sanctions (COMPAS), has been used to assess more than 1 million offenders after it was developed in 1998, and to predict recidivism, or repeat criminal behavior, after 2000. Supporters of such systems argue that automated techniques are more accurate and less biased than humans. However, previous research suggested COMPAS's predictions might be racially biased to underpredict recidivism among white defendants and overpredict recidivism among black defendants.
To investigate further whether algorithms can be more fair and accurate than humans at predicting recidivism, computer scientists recruited 400 workers over Amazon's online Mechanical Turk crowdsourcing marketplace, presumably none of them criminal judge experts. Each worker saw descriptions of 50 people from a pool of 1,000 defendants from Broward County, Florida, who awaited trial in 2013 and 2014. These descriptions contained seven features about each defendant, including their sex, age, and previous criminal history, but not their race.
The crowdsourced workers were then asked to rate the hazard that defendants would engage a misdemeanor or felony within two years of their last arrest. These results were then compared to ones from COMPAS.
Although the crowdsourced workers analyzed considerably fewer variables than COMPAS, their average results were accurate in 67 percent of the cases presented, about the same as COMPAS's accuracy of 65.2 percent.
"Considering that COMPAS uses 137 variables in its predictions, and that it is a commercial software presumably built on much more data than we had access to, this result was surprising," says study elder author Hany Farid, a computer scientist at Dartmouth College in Hanover, New Hampshire.
Further analysis found that a strategy that only looked at two variables—a defendant's age and complete number of preceding convictions—was about as accurate as COMPAS. A spokesperson for Equivant, the Ohio-based firm behind COMPAS, stated the company was not holding interviews. Equivant posted a statement about the new research shortly ahead of its release, calling its “highly misleading.”
"We believe that the best important implication of our work is that the courts should consider how much credibility to give these types of prediction algorithms—you can imagine that a judge would weigh a hazard assessment made from a big-data machine-learning algorithm differently than a hazard assessment made from people responding to an online survey," Farid says. "We further believe that there should be more transparency in the use of algorithms in making such critical, life-altering decisions."
"We are not saying in any way that fantastic data, machine learning, artificial intelligence should be abandoned," Farid says. "We are simply saying that their use should be deployed in a careful, thoughtful, and transparent manner, specifically when the results of such algorithms can have life-altering implications."
However, the researchers found that results from both the crowdsourced workers and COMPAS were similarly unfair to black defendants. Farid did note there appear to be differences in the base rates of recidivism over race, with black defendants reoffending at a rate of 51 percent as compared with 39 percent for white defendants, but "these base rates may themselves be the result of racial biases in the criminal judge system—for example, black people are almost four times as likely as white people to be arrested for drug offenses. So what we may be seeing is a ripple effect in policing and prosecution that disproportionately impacts African-Americans."
"On a nationwide scale, black people are more likely to have preceding crimes on their record than white people are—black people in America are incarcerated in authority prisons at a rate that is 5.1 times that of white Americans, for example," says study lead author Julia Dressel at Dartmouth College. "Within the dataset used in our study, white defendants had an average of 2.59 preceding crimes, whereas black defendants had an average of 4.95 preceding crimes. The racial bias that appears in both the algorithmic and human predictions is a result of this discrepancy."
In the future, there may be ways to test the effectiveness of this kind of software ahead of it goes on the market. "We can imagine that an organization like the National Institute of Standards and Technology (NIST) can undertake the task of creating standards and benchmarks that any software would have to meet," Farid says. "Such a system would require access to the type of data that we used in our study, but at a larger and more diverse scale."
"We think that studies similar to ours should be performed for all such algorithms," Farid says. "We would further welcome access to larger and more diverse data sets to help us grasp the efficacy of these algorithms and, possibly, develop more accurate algorithms."
Dressel and Farid detailed their findings on 17 January 2018 in the journal Science Advances.