New York City releases data about all police stops. The stops since 2014 (when Stop and Frisk ended) are especialy relevant.
An extreme gradient boosting model takes features of each stop, the suspect and the officer to predict whether the stop is likely to result in an arrest (and therefore whether it is warranted).
The model reduces the number of unwarranted stops by 75% while reducing the number of arrests by less than 20%.
Even after Stop and Frisk ended, most stops do not end in arrests.
Although as many as 40% of stops end in arrests in some areas of Manhattan, that proportion is near 10% all across Brooklyn.
Although the proportion of stops that end up in arrests is similar across groups, about 80% of people who are stopped are Black or Hispanic. Scroll over the plot below to see proportions.
Total stops
Unnecessary stops
Arrests
Stops due only to suspect's clothing
Given some features of the subject and the stop, the tool can accurately predict the likelihood of arrest. The predictions below are for a uniformed officer stopping a suspect outdoors at 10pm in the 106th precinct, these features can be changed in the full model.
The model was able to reduce the proportions of innocent people being stopped significantly, while keeping law enforcement effective. The results were consistent across races (the model did not reduce stops more in whites than in other groups).
"We can live in a world where the police don't kill people by limiting police interventions, improving community interactions, and ensuring accountability."
Location
Time of Stop
Age of Suspect
Precinct
Stop in Transit or Housing Authority
Weight of Suspect
Stop Indoors or Out
Period of Observation
Officer Wearing Uniform
The model was successful because it captured non-linear relationships in the data. For example, the relationship between time and arrests was highly non-linear. Below, you can see that the model captured that relationship effectively. The fits are from a cross-validated sample, so the model is not just fitting noise.
When a police officer gives a reason to stop a suspect, the odds of an arrest when that reason is used should be higher than when it isn't used. A classification of reasons by this criterion is provided below.
ongoing investigation
actions of engaging in a violent crime
inappropriate attire for season
wearing clothes commonly used in a crime
acting as a lookout
report by victim
suspicion of weapons
proximity to scene of offense
carrying suspicious object
My name is Aleksandr Sinayev. I hold a PhD in Quantitative Psychology and have done several years of freelance statistical consulting. I like using math and statistics for finding clean solutions to sticky problems. Learn more about me here!
R
Python
Shiny
XGBoost
Plotly
data.table
ggplot2