We faced a challenge recently with a post — a contributor had energy data from each census tract in Connecticut but did not have the town names that the tracts existed within. Off the top of their heads, people don’t know that tract 9009167201 is in New Haven.
Usually, two datasets are joined based on a column of mirroring information using programs like Access or Google Fusion Tables. But for this problem, the data did not exist on two spreadsheets; it existed on two shapefiles from The University of Connecticut Libraries’ MAGIC (Map and Geographic Information Center).
Easy. If you have not installed QGIS yet, go ahead and do so now.
This tutorial will go over two methods of spatial joins in QGIS. In this example, one worked better than the other but it’s still good to know how to do both. If you want, you can skip ahead to the way that works better.
The two spatial datasets
We’re going to use the free and open source geographic information system program to essentially do this:
Importing the shapefiles
Bring in the two shapefiles (ending in .shp) either through the menu Vector > Add Layer…
…or drag it into the QGIS layers window.
Right click and select Open Attribute Table to take a peek at the data inside.
The tracts dataset in the map below has no town names.
The towns dataset in the map below has no census tracts identifiers.
Spatial merging (version 1)
Select Menu Vector > Data Management Tools > Join Attributes by Location
Target vector layer is what you want added to. Join vector layer is the map you want to take the data from.
Select Take attributes of first located feature and not Take summary of intersecting features. The second option is useful if one of the maps was a list of latitude and longitude locations and you needed to find the total or average number of dots in a given polygon or town shape.
Browse to where you want to save the output shapefile and click OK. When it asks if you want to add it to the current display, select Yes.
Right click and select Open Attribute Table on the new layer and you can see on the far right that the names column has successfully merged. There is a town name associated with every census tract.
But wait. After looking at the data, it looks like a handful of them got named incorrectly. In the map above you can see that this part of Stamford is incorrectly labeled as Greenwich. That’s because the option to join attributes was from first located feature. That means the merge occurred on the first polygon dot instead of the one that contains the majority of the tract.
Spatial merging (version 2)
Select Menu Plugins > Manage and Install Plugins.
Search for and install two programs: refFunctions and spatialJoin.
Select Menu Plugins > spatialJoin.
Target Layer is the shapefile you want to add data to (we’re adding the names). Spatial join type should be within but there are many other options to play with in the future like overlaps or nearest.
Then select the columns you want brought over from the Layer to Join, select Dynamic join and then click OK.
A new layer was not created. Instead the data was added to the layer targeted. Right click and Open Attribute Table
Looking at the same census tract from before, it looks like it got the right town name this time.
Save as CSV for the future
If you want to keep a copy as a spreadsheet for future use, like for our contributor who wasn’t interested in the mapping shapes but only interested in what town matched a particular census tract, the data can be saved as a .CSV. Right click and select Save As…
Select Comma Separated Value [CSV] as the format, choose the location, and click OK.
It worked. See how we applied it in our story about how low-income residents aren’t benefitting from free weatherization programs.