The state of open data in Connecticut

Print More

Last year in a Yale University auditorium, a tall, animated man named Waldo Jaquith stood in front of hundreds of people — including several elected officials — and graded Connecticut’s data portal a ‘C.’

There was an audible gasp in the room. But Jaquith, a well known figure in the open data community, had specific critiques, like how the website didn’t have a list of every school in the state. The portal lacked many of these “foundational” datasets, he said.

But he acknowledged it was only a few months old at the time — a product of Gov. Dannel P. Malloy’s Executive Order No. 39, which ordered state agencies to publish data for public consumption. Despite Jaquith’s critiques, many in the data community considered the order a gesture showing that the administration might actually understand the importance of open data.

As a concept, open data is pretty simple: Government should proactively share data for anyone to use, largely because it allows us to better measure how government is doing. But putting it into practice has been more difficult — and Connecticut is finding out why. Some of the challenges are technical; others are cultural.

In some respects, the project is on track; there are nearly 300 datasets on the portal. This initial activity follows the pattern of data portals that came before it. But now, the state has to figure out if this can be more than a tool — whether it can change culture and be institutionalized.

Back at the conference, another tall man with black-rimmed glasses listened in the audience — the state’s new chief data officer, Tyler Kleykamp, a longtime state employee with a background in geospatial analysis. He was, and still is, the sole caretaker of the state’s data portal. While others upload datasets, his job is to execute the order broadly. Jaquith’s grade could have been perceived as an indictment of his performance.

Instead, after the talk, Kleykamp merely agreed with the grade.

Government infrastructure was not built to do this

The best way to understand how Kleykamp thinks of government data is to hear him compare it to government buildings: They are assets, he says — and the state should understand what it has and who has it.

Unlike buildings, the state doesn’t know what datasets it has.

So Kleykamp is soon starting a project to take inventory — one he probably won’t complete, since some datasets sit in mainframes and others on state employees’ laptops.

“We should really have a better handle on this so we can start to prioritize things that might have better value,” he said, adding that an inventory would help people understand what to ask for.

In one way, this is a larger issue. Yahoo CEO Marissa Mayer describes data as the “world’s nervous system” — and what we can learn from it is growing exponentially, making data more and more valuable. The inventory is a nod to this big idea, as are Malloy’s investments in technology.

On the other hand, the inventory is just a part of solving the challenges of Executive Order 39. It is a natural outgrowth of the order, which assumes that access to data is in the public interest but does not explicitly require an inventory. What the order did require was the creation of a data portal and the position of a chief data officer; it said agencies must post datasets that keep government accountable and efficient, or that enhance “public knowledge of an agency’s operations.”

The executive order even laid out the first steps in this process. It created a position called “Agency Data Officer” as the point person at each agency. And it required them to  identify readily available data from their agencies — in other words, data already in spreadsheet format that could be easily uploaded to the data portal.

The executive order didn’t dig into the intricacies of making this system work. It didn’t address what data exists in state government, how it can be unearthed and how the public and its government can get the most value from it.

When Kleykamp was first hired, he assumed state government regularly generates information in formats that could easily be put on the portal. He thought his main challenge would be getting to the person who had the data. But after receiving the initial list of datasets from agencies — what were supposed to be low-hanging fruit — he found the scope of his challenge was larger.

For one, he was getting things that didn’t qualify as data, like a chart — “not necessarily rows and columns in a spreadsheet.” But this was a symptom of a deeper problem, which was that government infrastructure wasn’t always built to share data.

“These systems were developed primarily to capture information so state agencies can do their jobs,” he said.

For example, the Medicaid data system is built for the Department of Social Services to adjudicate claims — but not to spit out custom datasets for public consumption. So when an agency was slow to hand over a dataset, Kleykamp found it wasn’t someone trying to be difficult; it was that the system wasn’t built to make this easy. “You often need some people who really understand a database to extract that,” he said.

In addition, the order said data must be posted in its rawest possible form, but it also asked for easy-to-get datasets. There was friction between those goals, because the easy stuff was often the heavily aggregated data, Kleykamp said.

But, eventually, agencies began identifying and producing datasets that could be shared. The larger agencies began to upload data to the portal themselves, a relatively simple process. Some of the smaller agencies, without dedicated IT staff, worked through Kleykamp. Nowadays, Kleykamp holds training sessions to help others upload data.

How open data dies — or doesn’t

The greater hope is that the executive order will change the culture of government — to “shift the organizational culture to one of disclosure and transparency,” as Kleykamp’s boss, Office of Policy and Management Secretary Ben Barnes, told the Mirror last year.

But there’s some dissonance when the Malloy administration touts transparency.

The state has to comply with the Freedom of Information Act, which requires government to disclose certain records. But the Malloy administration has been criticized for watering down the law, and they’ve been extremely slow to respond to some requests. Last year, FOI advocates gave him a ‘D’ on transparency, although they did say the open data portal was a step in the right direction.

But open data and open records laws are not the same thing. With open data, government proactively publishes data — but it’s only the stuff they want you to see. With open records, government reactively hands over data in response to a request. The executive order does not address open records laws, other than to say that if the public requests a dataset enough times, the state should put it on the portal. Otherwise, the order does not open up all data for public consumption, with only redaction or aggregation as needed. Rather, it’s the other way around: The state picks what to put up.

This means that conversations about what “should” be on the portal are tough to have. The only things that will make it on the portal are what state employees put on it — and state employees are self-interested humans, like the rest of us.

“These are people who have jobs, and they want to look good,” Kleykamp said.

Asking them to voluntarily make themselves look bad is “irrational.” Jaquith said treating open data portals as a way to find government corruption will not work, and this is how the open data movement will “crash and burn.” In fact, he’s terrified someone will write a think piece in The Atlantic declaring the movement dead — that “we tried that and it was a big waste of time, it’s lousy for government, and we should stop it.”

He adds: “We’re at the point where that’s enough to shut down open data on a state and local level, and that would be terrible.”

So the question is: How can incentives be aligned so that state employees want to proactively publish high-value data?

Some state employees say this doesn’t necessarily need to happen because they want to share data, even if it looks bad. But both Kleykamp and Jaquith believe in another approach: Open data has to, first and foremost, make government’s job easier — or help them do their jobs better. For example, if it’s a commonly requested dataset, it’s much easier just to make it publicly available than to repeatedly respond to individual requests. Or, as Governor Malloy said at a panel discussion last year, maybe it makes government more efficient by letting the left hand know what the right hand is doing.

Jaquith points to a project he has going in his home state, Virginia, where he pays $150 every three months to get a list of all the businesses in the state — then puts it on the internet. Local governments don’t have this data; he estimates that in his town of Charlottesville, about 1,900 of the 4,200 businesses aren’t registered, which means there’s a large amount of tax revenue the town misses out on. Jaquith said local municipalities have been using his site to fill these holes, but it requires a large amount of his effort to maintain the site. “One of these days,” he said, “I’m going to cut them off.”

Whether he does or not isn’t the issue, though. Rather, it’s his way of explaining the state of open data.

“We have to get to the point where we’re asking: If we stop publishing this data, what bad things will happen?’” Jaquith said.

Cultural shifts

The executive order created another position — the Agency Data Officer. These people are largely responsible for fulfilling the executive order at each agency, but are not necessarily the only ones uploading and maintaining datasets.

There is one for each agency, but these employees have other, full-time responsibilities. Sometimes, their role at their agency has nothing to do with data. The only requirement is that they be a manager, not a unionized employee, and that they be assigned by the agency head.

They do not receive additional compensation for this role.

Jaquith said this model causes some problems.

“The only time liaisons for the state’s data repository make time for it is if there’s a deadline looming or if there’s a political reason,” he said. “Everyone is acting rationally here. For those liaisons, it’s not actually in their job description. They’re not evaluated on how well they relate to the chief data officer.”

But Kleykamp doesn’t agree with this critique. Sure, additional resources would be great. But he said, “If it’s between providing some type of service for somebody in need versus spending three or four million dollars on this, what’s going to win out in this argument?”

Plus, Kleykamp doesn’t believe it’s about having people dedicated to data in each department. Rather, the metaphor he gives for it is a community garden: Everyone has to contribute for it to work.

“It’s not very effective if it’s just one person who goes and grows carrots,” he said.

For some agencies, maintaining the portal has largely been just a different way of doing their job; they already collect and publish data on their website, but they say it brings more structure to that process. For others, the additional responsibilities take more manpower than before.

But Scott Gaul, director of the Hartford Foundation for Public Giving’s Community Indicators Project, hopes the state will go further and use data like the State of Maryland, where they are using it for “performance management or raising public awareness about key issues.”

“Their commitment has not just been to transparency as an ideal — they have committed to resources and staff to support the portal and to allow a more ambitious set of activities,” he said.

The data community and their expectations

The public is still a large part of this equation. In fact, a year ago, Kleykamp said getting the public to use data would be one of the primary markers of success.

In its first year, page views on the data portal grew steadily until a huge spike in February 2015. (Kleykamp didn’t know what caused it, but he guessed it was a hodgepodge of factors.)

Meanwhile, the number of “views loaded” — indicating people are looking at visualizations of data — has also crept up.

But how people use the portal is just as important as how many people use it. (We did a full Q&A with data users in the state, which we will publish tomorrow.)

In ongoing conversations with data users, most say they check the portal before they start a data project. But because the data hasn’t been uploaded systematically, it’s difficult to predict what will be there and what won’t. The other common complaint is that data is often outdated, and there are more up-to-date versions on an agency’s website.

Jaquith, who has worked with several open data portals, says this is a common problem. In fact, even the federal government struggled with this problem when it created its portal, The solution? To let agencies upload their own data wherever they wanted, but keep a machine-readable list of what files exist and where they live.

“For [an agency] to know there’s just one copy and it’s on their server, that is justifiably comforting,” he said.

This model, called “federated data,” is something the state’s open data policy allows for. In fact, it’s already been done with some transportation data. But there is still quite a bit of information that exists on both an agency’s site and the portal. The duplication causes data users to often wonder if they’re seeing the most recent data.

But Kleykamp doesn’t think it’s a systemic problem. Rather, he said, “I really think it’s just a different way of doing things, thus changing the old habits takes bit of time.”

The long view

After the conference last year, where Jaquith listed all the datasets the portal was missing, some of them slowly began appearing on the site. A list of schools is now up, as is a list of every address in the state, which is valuable for mapping and geospatial analysis. “There are very few states — maybe five or six — that provide, on a state level, their address file. They are displaying unusual leadership,” Jaquith said.

In addition, data from the City of Hartford’s portal is included in the results, which Jaquith applauded. But he still sees a few things missing, like data on legislation and campaign finance.

Kleykamp is focusing on the next stage of growth: getting more strategic and more institutionalized. Until this point, much of his work has been getting people to use the portal. Now, as he prepares to take inventory of the state’s data, he says, “Quite frankly we’re being a little more thoughtful.”

What do you think?

  • Ashley

    Tremendous article. Thanks for this. This is one of the most important and underreported issues in the state.