The state uploads a lot of spreadsheets to the internet, but it’s quite difficult to navigate to the thousands of different webpages that link to these files.
But a tool from U.S. Open Data, called Let Me Get That Data For You, can help out. In fact, that’s what we used to sniff out structured data from all ct.gov domains to see what’s out there. (Technical note: It looks for all hosted CSV, XML, JSON or Shapefiles.)
The tool detected nearly 6,000 spreadsheets on ct.gov, which you can see here. But first, a few caveats:
- Many of the files are duplicates, so it doesn’t necessarily mean there are 6,000 datasets.
- A handful of files are government forms that have no business being in a spreadsheet. It’s not data.
- For various technical reasons, some datasets won’t be discovered by this tool.
A quick overview of what’s on the state’s site: The agency with the most data files is the Department of Public Health. There are lots and lots of blank forms, but there’s also a lot of useful data. Next is the Department of Economic and Community Development, followed by the Department of Transportation and the Office of Policy and Management.
|Department of Public Health||1072|
|Department of Economic and Community Development||587|
|Department of Transportation||539|
|Office of Policy and Management||468|
|State Department of Education||379|
|Office of the State Comptroller||261|
|Department of Revenue Services||252|
|Department of Emergency Services and Public Protection||177|
|Department of Developmental Services||146|
|Division of Public Defender Services||140|
Almost all the files are formatted in XLS or XLSX — both Microsoft Excel files. And it appears very few of them are actual comprehensive datasets, although we didn’t dig through each one.
The U.S. Open Data Institute said they built this data crawler to help government officials compile an inventory of all data files to assist in the process of creating an open data portal like in Boston and Hartford. But the tool can also be used by citizens who are curious about what spreadsheets might have been uploaded and forgotten.
Dig around. If you spot anything of interest let us know in the comments. Some possible tasks: