Every spreadsheet on CT.gov — and any other site you want to search

Print More

The state uploads a lot of spreadsheets to the internet, but it’s quite difficult to navigate to the thousands of different webpages that link to these files.

But a tool from U.S. Open Data, called Let Me Get That Data For You, can help out. In fact, that’s what we used to sniff out structured data from all ct.gov domains to see what’s out there. (Technical note: It looks for all hosted CSV, XML, JSON or Shapefiles.)

The tool detected nearly 6,000 spreadsheets on ct.gov, which you can see here. But first, a few caveats:

  1. Many of the files are duplicates, so it doesn’t necessarily mean there are 6,000 datasets.
  2. A handful of files are government forms that have no business being in a spreadsheet. It’s not data.
  3. For various technical reasons, some datasets won’t be discovered by this tool.

A quick overview of what’s on the state’s site: The agency with the most data files is the Department of Public Health. There are lots and lots of blank forms, but there’s also a lot of useful data. Next is the Department of Economic and Community Development, followed by the Department of Transportation and the Office of Policy and Management.

State agencies or divisions with the most hosted spreadsheets
We’re using the word “spreadsheet” loosely here. This is any structured data, from forms to databases.
Agency Structured data
Department of Public Health 1072
Department of Economic and Community Development 587
Department of Transportation 539
Office of Policy and Management 468
State Department of Education 379
Office of the State Comptroller 261
Department of Revenue Services 252
Department of Emergency Services and Public Protection 177
Department of Developmental Services 146
Division of Public Defender Services 140

Almost all the files are formatted in XLS or XLSX — both Microsoft Excel files. And it appears very few of them are actual comprehensive datasets, although we didn’t dig through each one.

The U.S. Open Data Institute said they built this data crawler to help government officials compile an inventory of all data files to assist in the process of creating an open data portal like in Boston and Hartford. But the tool can also be used by citizens who are curious about what spreadsheets might have been uploaded and forgotten.

Dig around. If you spot anything of interest let us know in the comments. Some possible tasks:

  • Could a search on HBO.com tell you whether your favorite character is actually dead? (Spoiler alert: Nope, sorry)
  • Is there anything worth discovering in the FilesToBeDeleted folder at hartfordparking.hartford.gov?
  • Are there any records on the ct.gov domain that should be investigated further?

What do you think?

  • ctyankee22

    It’s a damn shame that the bumbling-bureaucracy doesn’t know about template files for forms! Our tax dollars being wasted yet again.