Data problem: How do I deal with the many boundaries in Connecticut?

Print More
Screen Shot 2015-04-30 at 11.55.40 AM

Editor’s note: Occasionally, we’ll publish problems people have working with data in Connecticut. Our hope is to get these problems out in the open to find a solution.

At one point during the legislative session, lawmakers considered a bill that would set a statewide rate for vehicle property taxes. Citizens (and policymakers) would’ve been wise to ask: How would this proposal change my car tax bill? I set out to answer the question, searching for the varying tax rates — expressed in mills — for Connecticut residents. (A mill is equal to $1 of tax for each $1,000 of assessed property value.)

But this became a large data problem. The many borders for boroughs and districts — some of which have different mill rates — caused the project to be left unfinished.

Here’s how far I got.

The Process

The goal was to get a list of the 169 base rates for each municipality. From there, I wanted to figure out all the added taxes from various boroughs and districts, which would allow us to see how many people would’ve seen an increased car tax with the proposed bill.Screen Shot 2015-04-30 at 11.40.45 AM

  • I downloaded the current mill rates for the 2015 fiscal year from the state’s Office of Policy and Management. This gave me 282 different rates. I wanted to get it down to just the 169 base rates for each municipality.
  • In the data, there is a column called “service district codes.” I filtered out everything not marked ‘0,’ which helps us eliminate most non-municipal taxing districts and bring the number of rates down to 179, much closer to the 169 towns I’m looking for.
  • I removed the boroughs by filtering out the OPM-assigned town codes that are greater than 200, which brings the mill rates down to 171 results. Three different districts sneak through (two from West Haven and one from Goshen.) I think that means there’s a town missing, but let’s proceed.
  • The OPM data doesn’t include tax rates for four towns. For the remaining 165 towns, we can use a formula to calculate whether a town would see an increase in their car taxes, comparing it against a mill rate of 28, which was a proposed cap. At first it appears that 80 towns would’ve seen a tax increase on their cars and 84 would’ve seen a decrease.
  • But at least 41 towns have a taxing district or borough within them, which makes it difficult to determine the exact effect on residents. In some towns, only a portion of the town is divided into districts, leaving a portion without any district; in other towns, everything is carved up, so everyone is inside a district.
  • Since districts only add taxes, it is safe to say that at least 84 towns would see a decrease. However, it isn’t easy to measure the size of the decrease in the 20 towns with districts or boroughs.
  • Twenty-one of the towns that appear to have tax increases have additional tax districts to consider.

I was only able to navigate this far. Examining each individual case would probably take hours, while the initial phase of the project took only one or two.

What makes this difficult

Making many statistical comparisons of Connecticut cities and towns requires a surprising amount of local knowledge. Does anyone outside of Groton know how the city’s 10 taxing districts fit together? Or the seven in Stonington? Where does the Greenwich Sewer District start and end? In Morris, there’s the Deer Island Association. Eight towns have boroughs. About two dozen have fire districts. And these differences just have to do with property taxes.

There are issues with layering police departments, too. (See posts about racial profiling data here and on the police districts here.)

If you want to include other boundaries — like councils of government and regional educational service centers (both unelected regional bodies) or airport development zones or enterprise zones (areas featuring development incentives) — things get even more complicated.

Are there other multi-town organizations like the MDC that may have pension debt? I don’t even know.

I already mentioned that some towns have sewer districts (about six), but some sewer systems are bigger than towns, such as the Metropolitan District Commission. Legally a municipality, the MDC provides water and sewer services to eight member towns centered around Hartford, plus water to parts of four non-member towns. To calculate the all-level pension debt of Connecticut residents — a project I’ve long considered — you’d need to take into account the MDC. Are there other multi-town organizations like the MDC that may have pension debt?

Datasets can also refer to counties, census tracts, precincts and zip codes, to name a few other categories. Can someone consolidate all of these layers into one tool? Is that even possible?

I’m pretty handy with Excel, but I can’t find a solution to this particular problem. I imagine others who tried to compare local governments in Connecticut have faced this challenge.

Has anyone cracked the code and will they share the secret?

Zachary Janowski writes for the Yankee Institute, Connecticut’s free-market think tank.

What do you think?

  • I see this as having two parts. Data collection is probably a good intern project but I’d definitely contact MAGIC at UConn first since they probably have whatever is available so far insofar as mapping data goes. To the extent you’re looking to consider all these various layers I think Excel is going to be an underpowered tool. I’d guess that a GUI tool like Tableu or ArcGIS could make this “easy” to do if you have the budget to purchase those programs and only need to do analysis on a desktop. However if you’re going to do a lot of this mapping based analysis you’re going to want to learn SQL and probably use PostGIS with PostgreSQL eventually anyways. Maybe easiest to get your feet wet with this through CartoDB:

    • alvinschang

      A lot of non-technical people have gotten very far with ArcGIS (expensive) or QGIS (free):

      But I find that the barrier to entry to figure out this kind of question is so high, even when people become pretty good with data. It’s one of those problems that lacks a killer tool.