A gentle introduction to APIs for data journalists

Print More

API data can provide a rich source for journalists and data scientists, and I think I’ve found the simplest API on the web to get you started.

In this article, we’ll take a REST API run by the FAA to provide airport status information and write an API wrapper for it so we can access the data in future code. You should have some comfort writing code, and experience with Python if you want to follow along but the principles can be applied in any programming language. Python is pretty readable if you can write code in any language.

What is an API?

An application programming interface, or API, is a way for two programs to communicate. An API might provide a way to change the color of a light bulb or post on Twitter. In the case of data APIs, they offer a way to get some small slice of some large data set that lives on a server.

Do I need APIs?

APIs let you use other people’s databases and coding wizardry to enhance your own applications or analysis scripts. For instance you might have a spreadsheet with street addresses you need to turn into GPS coordinates so you can map them.

For a huge list of resources you can access via APIs, check out this list on programmableweb.com. Just like your human contacts, these APIs can be indispensable sources for a journalist. ProPublica has a number of APIs that are especially useful for journalists; and so does the Sunlight Foundation.

Airport status API

The FAA has an API that provides information about airports, including delays and weather. You give it an airport code, like “JFK” or “BDL” for Bradley International Airport, and it tells you what’s up at that airport.

To get data for an airport, you just go to this specially formed web address; it even works in a browser (go ahead, try it):

http://services.faa.gov/airport/status/JFK?format=application/json

The “JFK” part can be replaced with any valid airport code, and the “format” can be set to “application/xml”, but I’m not interested in XML, so the only “variable” part of this is the airport status web address.

The response looks like this, a JSON string:

{"delay":"true","IATA":"JFK","state":"New York","name":"John F Kennedy International","weather":{"visibility":10.00,"weather":"Mostly Cloudy","meta":{"credit":"NOAA's National Weather Service","updated":"9:51 AM Local","url":"http://weather.gov/"},"temp":"42.0 F (5.6 C)","wind":"Northwest at 16.1mph"},"ICAO":"KJFK","city":"New York","status":{"reason":"WX:Wind","closureBegin":"","endTime":"","minDelay":"31 minutes","avgDelay":"","maxDelay":"45 minutes","closureEnd":"","trend":"Increasing","type":"Departure"}}
REST APIs

This FAA API is a specific type of web API called a REST API. REST is probably the most popular type of web API these days, so you’ve probably heard of it.

Describing the constraints that define a REST API, particularly what makes it different from other web APIs, is beyond the scope of this write-up, and you don’t need to fully understand their architecture to use them. I do want to demystify the acronym “REST,” since it’s bad to use words we don’t know the definition of.

REST stands for REpresentational State Transfer, which sounds pretty inscrutable, but let’s break it down:

  • “State” is just a computer science term for information stored in some form or another at a point in time. Let’s think of it as a synonym for data.
  • That state can be “represented” in a bunch of different formats, including XML and JSON.
  • When that representation is transferred to the client (your computer), that’s a representational state transfer.

The “state” terminology makes more sense if you keep in mind that what you’re asking the server for might be something that’s ephemeral.

The “state” you’re requesting might be a file, like a text file of Moby Dick or an MP3 audio file, that lives on disk after your request is completed. But it might be the result of a computation that is performed on the server and sent back to you, like slicing out just a small portion of a database and turning it into a spreadsheet or JSON object. Once that computation is done and the result is sent to the client, that JSON object isn’t stored on the server.

Requests and methods

When your browser loads that FAA web address, it’s making an HTTP “request” to the server for the resource at that address, and it’s using the GET method.

The same thing happens when you browse to a web site, like TrendCT.org; but instead of getting a JSON string as a response, your browser gets HTML code, which it knows how to render as a web page.

There are other methods. Your browser would usually use the POST method when you fill out a web form and hit the “submit” button. Some APIs use more than just the GET method.

As you’ll see, we don’t need a web browser to make GET and POST requests. We can do it programmatically.

That’s plenty to get us through this article, but for more on HTTP and the request methods, check out this W3C article.

Planning to code

We’re going to write what is sometimes referred to as an API “wrapper,” which is a bunch of functions that take your arguments, form special requests, send them to an API, and return the response, so you don’t have to constantly rewrite that procedure when you want something from the API.

We know that in order to get the data, we need to somehow make a GET request with a specially-formed web address. That implies two steps:

  1. Somehow form the web address based on the variable (the airport code);
  2. Perform the request somehow and return the server response.
Let’s get to the code

Here’s the GitHub repo containing all the code we’ll discuss.

The program that actually interacts with the API is airportstatus.py, so it’s the only one we’ll cover in depth. (The other files in the repo show how you can incorporate the API wrapper into your code, like the getstatus.py command line script.)

This airportstatus.py file has two functions, which correspond to the two steps from the “Planning to code” section.

The first function, status_url(code), takes an airport code, like “JFK”, and returns a string of the specially-formed web address. This is done using Python’s + string concatenation operator.

The second function, get_status(code), calls status_url(code) to get the web address, and then uses the requests library to fetch the JSON object at that specially formed web address.

Note that the line if r.status_code != 200: is there to catch error codes. The server responds with status code of 200 if the request succeded, but other statuses that mostly represent different kinds of errors, can be sent back if the request fails. If we get one of those errors, we’ll raise an exception, indicating to the caller of this function that something went wrong.

Here’s the code all together:

That’s it for interacting with a REST API in Python. The remaining sections of the write-up will cover a few examples of how to use the data and how to interact with more complicated APIs that require API keys and offer more than just one query.

API keys

Most APIs are a little more complicated than FAA’s airport status API. Some require an API key.

An API key is a secret code given to you by the administrators of the service. It can be used to prevent people from making too many requests and bogging down the server.

One API that requires an API key is the Federal Elections Commission API, which can be used to get campaign finance data.

I wrote a minimal wrapper for passing requests to that API.

The module that interacts with the FEC server is available here, and here is the relevant code:

In this case, the API specifies that the API key should be sent as a URL parameter. This makes it fairly trivial to implement, because we just have to insert the API key into the URL, the same way we inserted the airport code into the URL of the FAA airport status API request.

Some services require the API key to be sent as a field in the header portion of the HTTP request, rather than in the web address.

One example of a service that allows you to do this is api.data.gov. They expect the header field X-Api-Key to contain your key

But how do we get the API key in there? The requests library supports custom headers:

Go get an API key from api.data.gov. I’ll wait.

OK, cool. Fire up an interactive Python shell , and we’ll run through the first API key example on api.data.gov, but we’ll use Python instead of the curl command line utility. Peck these commands out, replacing MY_SECRET_KEY with your key:

When you examine the response content with r.content, here’s what you should see — the first alternative fuel station in the National Renewable Energy Laboratories inventory:

Requesting more

So far we’ve only looked at asking an API for only one thing. In the case of the airports API, there’s only one thing you can ask for —  the status of an airport. Most APIs offer you access to more than one piece of information or even let you perform operations. You tell the server what you want to do by forming the web address according to their documentation.

Looking at the documentation for the NREL API we see we can find alternative fuel stations near us by using the “nearest” query, which looks like this:

/api/alt-fuel-stations/v1/nearest.format?parameters

This takes a “location” parameter, described as a “A free-form input describing the address of the location.” That’s good since I don’t know my latitude and longitude coordinates. Let’s use “json” for the “format” part and tell the API we want the nearest alternative fuel station to Capitol Avenue in Hartford, Conn. I’ll use the string “capital+avenue+hartford+connecticut” to describe where my car is parked:

Looks like it found something close by, with a station_name of “Hartford High School” and there’s a real Hartford address!

Pre-wrapped APIs

Now you know how API wrappers work. Even with more complicated APIs, the concepts are the same. But you don’t have to write a wrapper for every API you find, since a lot of other programmers have already done the same.

Here is a list of Python API wrappers. These can save you a lot of time.

What do you think?