Every few weeks, we’ll be talking through one of our projects and showing our work. The required level of technical knowledge will vary from week to week. This week we’re going with a tutorial that requires an intermediate level of technical know-how.
Today, I will talk about how we got the data for our story about the most popular Meet-up groups in Connecticut.
We will walk through:
- Querying Meetup.com’s API to get the proper dataset
- Analyzing the data programmatically
Before we begin, you’ll need the following:
- A Meetup.com API key
- A way to run Python scripts. (Mac? Use Terminal. PC? Something a little more complicated.)
- A way to run R scripts. (RStudio is highly recommended and works on any system.)
This process is not the cleanest or most efficient, but it’s the way we did it. We are open to refining it if anyone reading this has suggestions.
We are at all different skill levels, so I hope to learn as much, or more, as I teach. So let us know if you find a more clever or creative way of solving a problem. This will hold true for every future tutorial we do.
Getting the data
The question we want to answer is: How do the meet-up groups in the top five most-populated cities in Connecticut compare in membership totals, category types, and change in time?
In order to get the data, we have to figure out the best way to access it from Meetup.com’s website. Luckily, they have what is called an API, or application-programming interface.) An API is a set of programming instructions and standards set by a company to let outsiders have access its data. Typically, you access it with a query to their server and they send back their data structured in some way — usually in JSON.
If Meetup.com didn’t have an API, getting their data would be difficult. We would likely build a “scraper,” which means we could have to parse the human-friendly data on their site — and not the computer-friendly data sent back by the API. (But that’s for project and tutorial.) Fortunately, Meetup.com has a great API with excellent documentation.
Below is the Python script I used, called meetup-pages-names-dates.py.
This is what the script does:
- It makes a request to Meetup.com with these parameters: your API key, what city and state to search and within how many miles.
- It takes Meetup.com’s JSON response and filters it, grabbing only the fields in which we are interested — area, when it was started, city, state, category, number of members, and who.
- It repeats the above for each city
- It structures the data into a CSV, which can be opened in any spreadsheet program.
You can run the script and get a CSV with the following command in the terminal:
python meetup-pages-names-dates.py > meetup_groups.csv
Congrats! You’ve got a data set you can analyze now in a spreadsheet program.
In Excel, you could filter by city, then create a pivot table to figure out the categories and membership totals by town.
But we’re going to do this the hard way: in R.
Analyzing the data
My philosophy is to do what will save you time in the long term. If you’re doing this just once and can pull it off in a spreadsheet app, go for it. But with a script, it is easily repeatable with cities and with other criteria.
So, what is R? It’s a language and environment for statistical analysis and graphics. With a few lines of code, you can slice up a data set and put it up on a chart. Academics love it. The downside is there’s a steep learning curve. I’ll put together a “Starting with R” tutorial in the future, but in the meantime I’ll show you how to at least the run the script I put together to deal with this meetup.com data.
We’ll use the dplyr library in R to organize and run calculations on the data.
This is what we’re trying to do:
- Find the total number of groups by town.
- For each town, find: 1) the number of members for each category type, like “tech;” 2) the growth by year; and 3) the popularity of each category group.
This is the top third of the script: meetups-analysis.R
Running the script
You’ve installed RStudio, right? OK, go to the menu bar, open up the downloaded meetups-analysis.R file.
Now go to menu bar, and go to:
Session > Set Working Directory > Choose Directory
Select the directory where you output the meetup_groups.csv.
Select all the text in the top left window and click the Run button. (Or just press the Source button above the script)
You now have seven datasets to play with:
- Categories, by city (1)
- Membership, by city (1)
- Groups over time, one for each city (5)
If you have any trouble, let us know. If you do something cool, please share!