Q&A with the CT Data Collaborative: What to watch out for when analyzing the racial profiling data

Print More

Connecticut Data Collaborative

Downloadable traffic stop data by town.

In 2012, the Connecticut legislature revised a law called the Alvin W. Penn Racial Profiling Prohibition Act, compelling law enforcement agencies in Connecticut to electronically submit their traffic-stop data to the Office of Policy and Management.

A deep dive into the data was presented to legislators this week, and analysts concluded that “certain officers in the state are engaged in racial profiling during daylight hours when motorist race and ethnicity is visible.” The report also found that:

  • 13.5 percent of motorists stopped were observed to be black and a comparable 11.7 were motorists of Hispanic descent.
  • Minority stops were more likely to have occurred during daylight hours when a driver’s race and ethnicity is visible.
  • Groton Town, Granby, Waterbury, and State Police Troop C (Hartford) and Troop H (Tolland) were found to have significant disparities in their traffic stop data.
  • Wethersfield, Hamden, Manchester, New Britain, Stratford, Waterbury, and East Hartford were found to have consistent disparities that may indicate the presence of racial and ethnic bias.

One part of the revised racial profiling law requires the development of a transparent system that allows the public, policymakers, and law enforcement administrators to view the data — and this week, the CT Data Collaborative unveiled a new portal that does just that.

So TrendCT caught up with Michelle Riordan-Nold, the director of the CT Data Collaborative, and asked her about the ins and outs of the data:

Where did the data come from?
It is fed into the Connecticut Justice Information System by ​all police departments in the state (except Stamford).​​ Going forward, data will be updated on a quarterly basis, and Stamford will be included going forward.

What can residents of Connecticut do with your project?
They can explore the traffic stop data by town using our data table view. So a user can see the number of traffic stops by race, the reason for the stop, disposition of the traffic stop, when the stop occurred, and whether the stop involved a Connecticut resident. The data is available statewide and by town covering the time period from October 1, 2013 to September 30, 2014.

What trends have been noticed in the data set?
An analysis of the data was conducted by the Institute for Municipal and Regional Policy (IMRP) at CCSU in collaboration with the Connecticut Economic Resource Center.

What’s the history of this sort of traffic stop data?
The collection of traffic stop across the country sometimes is the result of a Department of Justice investigation and mandate or has come about as a result of a lawsuit. Typically, it is larger cities that collect the data or state highway police departments, but CT is unique in that it is a statewide effort. Additionally Connecticut’s data includes many data points not typically collected by other jurisdictions. As a result, Connecticut has a large and robust data set enabling rigorous statistical evaluation.

What are the implications for this sort of data?
By analyzing this data, policymakers can identify police departments where racial disparities are potentially occurring and offer targeted training and outreach to those departments. The next step in the analysis by the IMRP is to look at the departments where racial disparities occurred and look at the data at the individual officer level to determine if racial profiling is occurring with certain officers.

When was the data released and has anything been done with it since then?
A preliminary dataset was released in September 2014 that covered data from October 1, 2013, to May 31, 2014. On April 7, 2015, a final report was released with a full year’s worth of data, including a regression (statistical) analysis of the data that was conducted by the CT Economic Resource Center.

What hurdles did you encounter when working with this data set and how did you overcome them?

  • There is more information in the dataset that could be analyzed, but the data needs to be scrubbed due to input errors.
  • No uniform database of infractions. All data was contained in PDFs and required writing code to pull from the PDFs to Excel and then match it with the police stop data.
  • Police stop data pertaining to infractions is user-entered so it was prone to human input error or missing fields.
  • Some police departments didn’t have electronic systems so the IMRP worked with police departments to automate their systems.
  • Not all data fields were required in all of the police data entry systems. This required careful validation of the data before processing and also required careful filtering of the data before generating tables.

Is there anything that could be improved in the data collection for the next updated set?
A uniform data entry system with more stringent data validation rules would increase the data quality significantly.

Are there any additional ways civic-minded data hackers could approach this data?
You have to be careful with this dataset not to draw conclusions that cannot be gleaned without performing a rigorous regression analysis. People may be inclined to simply use descriptive statistics to attempt to draw conclusions, but that will lead to inaccurate conclusions. For example, people are tempted to look at the percent that are pulled over by race and compare it to the population, but that makes many assumptions that are not valid such as: the driving population is the same as the resident population; all people drive the same amount and therefore have the same probability of being pulled over; there are no seasonal differences in driving patterns, etc. The best way to analyze the data is by using regression analysis so I would recommend data-minded people read the report to understand the methodology that the Connecticut Economic Resource Center employed (the Veil of Darkness test, Post-stop activity through the KPT test, and Solar-powered model of Stops and Searches).

What the next step for the Connecticut Data Collaborative with this information?
We will be working on a data story explaining the regression analysis that was completed by the Connecticut Economic Resource Center. We will also be updating the site quarterly with new data as it gets collected. The IMRP is also working with police departments to improve the collection of location data so that we could map police stop data.

What do you think?

  • To get an accurate driver population estimate couldn’t someone FOI the logs of license plate scanners (assuming they scan in a sufficiently random pattern) in communities that have police cars that use those, and join that data to the automobile owner data to then build out samples and estimates for driver populations and even sufficiently account for variability based on things like day of the week and time of day?

    • alvinschang

      The part of your comment that really got me thinking is sufficiently random pattern of scanning. I can’t even make a decent guess as to whether it would be sufficiently random.

      • Well here is an additional thought on that front: do we really care whether stops are random in proportion to the population as a whole, or whether they are random in proportion to the population the cruisers encounter on a day to day basis?

        For example say we have a cruiser that has a patrol route, and the license plate scanner reveals to us the population of the cars on that route. If the stopped cars are reflective of that population overall, then we would not blame the cop for being biased because he’s just reflecting the environment he’s in. If his environment is different from the make-up of the people he tends to stop, then that raises questions.

        It could very well be that the routes the cops travel are what bias the stops, in which case that is also useful information because that should be easier to fix (or not fix if there is justification to bias the routes towards certain parts of town).

  • InfoSkeptic

    How might the following affect the analysis of racial profiling data in CT? The median age of the Hispanic population in CT is 28.1 yrs. The median age of the non-Hispanic white population is 45.8 yrs. Why are younger drivers more costly to insure?