Tutorial: Web scraping and mapping breweries with import.io and R

Print More

Getting information on craft breweries can be a difficult process, with data dispersed over multiple websites or in formats unusable for analysis.

Import.io instantly turns webpages into data ready for analysis with minimal or no setup. This previous tutorial highlights the process in detail.

This method works well for extracting metadata on craft breweries, and overall beer ratings from the popular site Beer Advocate which is an online community which supports beer education, events and a forum to rate beers.

Getting started

Difficulty level: Intermediate.

This tutorial will make sense if you already know how to use R or have gone through these previous walkthroughs:

You’ll need the import.io app and R installed.

Getting the data from the website

1. Go to BeerAdvocate.com and navigate to the search for a place (Connecticut) state, then onto the breweries link.


2. We need to determine the URL structure because of the pagination on Beer advocate so we can be sure we’re scraping more than one page of the results.


Luckily enough this is fairly simple to do by clicking on each of the results links (ie. 1-20, 21-40, 41-60).


Here are the necessary links:

3. After initially setting up an account on import.io, which can be done through linking your Github account, you can navigate to your my data page and input the previous links into the bulk extractor located in the “How would you like to use this API?” dropdown for your Magic API and press the button to run the queries.


4. The new output page will be a tabular view of all of the extracted link data ready for export in multiple formats such as Spreadsheet (for CSV), HTML, and JSON. We will download the Spreadsheet format for this tutorial.


Now that we have the data we can take it into R to process and visualize it.

Data pre-processing

5. Read the data into R:

6. So we have some links and columns filled with more than one piece of information, but that’s easy to fix by removing duplicate columns, creating coherent column headers, specifying and unifying missing data and extracting the necessary information from other columns.

Geolocation of addresses for plotting on a map

7. Grab some latitudes and longitudes for those craft brewery addresses.

Data visualization using Leaflet

8. Plot the coordinates of the craft brewery onto a javascript-based map, Leaflet.

What do you think?

  • ScraperHunk

    Nice tutorial. I am also involved in web scraping since 6 years.I have developed many projects for my clients. Check it out : http://srgv.in/699578

  • ScraperHunk

    Nice tutorial. I am also involved in web scraping since 6 years.I have developed many projects for my clients. Check it out : https://tr.im/M3JpN