POLICE STOPPER

Using Machine Learning to Reduce Unnecessary Police Stops

Summary

Data

New York City releases data about all police stops. The stops since 2014 (when Stop and Frisk ended) are especialy relevant.

Model

An extreme gradient boosting model takes features of each stop, the suspect and the officer to predict whether the stop is likely to result in an arrest (and therefore whether it is warranted).

Results

The model reduces the number of unwarranted stops by 75% while reducing the number of arrests by less than 20%.

What does the data say?

Even after Stop and Frisk ended, most stops do not end in arrests.

Some places are better than others

Although as many as 40% of stops end in arrests in some areas of Manhattan, that proportion is near 10% all across Brooklyn.

What about race?

Although the proportion of stops that end up in arrests is similar across groups, about 80% of people who are stopped are Black or Hispanic. Scroll over the plot below to see proportions.

22,563

Total stops

18,595

Unnecessary stops

3,968

Arrests

949

Stops due only to suspect's clothing

The Tool

Given some features of the subject and the stop, the tool can accurately predict the likelihood of arrest. The predictions below are for a uniformed officer stopping a suspect outdoors at 10pm in the 106th precinct, these features can be changed in the full model.

Evidence It Works

The model was able to reduce the proportions of innocent people being stopped significantly, while keeping law enforcement effective. The results were consistent across races (the model did not reduce stops more in whites than in other groups).

"We can live in a world where the police don't kill people by limiting police interventions, improving community interactions, and ensuring accountability."

Campaign Zero

How It Works

Most Important Features

Location

Time of Stop

Age of Suspect

Precinct

Stop in Transit or Housing Authority

Weight of Suspect

Stop Indoors or Out

Period of Observation

Officer Wearing Uniform

Captures Non-linear Relationships

The model was successful because it captured non-linear relationships in the data. For example, the relationship between time and arrests was highly non-linear. Below, you can see that the model captured that relationship effectively. The fits are from a cross-validated sample, so the model is not just fitting noise.

Qualitative Insights

When a police officer gives a reason to stop a suspect, the odds of an arrest when that reason is used should be higher than when it isn't used. A classification of reasons by this criterion is provided below.

ongoing investigation

actions of engaging in a violent crime

inappropriate attire for season

wearing clothes commonly used in a crime

acting as a lookout

report by victim

suspicion of weapons

proximity to scene of offense

carrying suspicious object

About The Project

My name is Aleksandr Sinayev. I hold a PhD in Quantitative Psychology and have done several years of freelance statistical consulting. I like using math and statistics for finding clean solutions to sticky problems. Learn more about me here!

Key Technologies

R

Python

Shiny

XGBoost

Plotly

data.table

ggplot2