Getting rid of duplicate rows using Google Sheets

Print More

Google Sheets is a great free tool for working with data online. Previously known as Google Spreadsheets, users can import CSV or Excel files and access it on any computer with an internet connection.

The sharing options are also very powerful as multiple users can work in the same spreadsheet at once. Plus, Google’s search is great for tracking down that obscure data file created years ago.

However, it lacked the advanced functionality of Excel, such as pivot-tables.

Last year, Google introduced add-ons which allowed programmers to expand on its functionality. At the moment, there are some pretty useful add-ons, like random number generators, styling options, or string splitters.

We’ll be working with a data set of nail salon listings from Yelp that we collected for a recent story. The list was compiled by gathering up all salons within a 10-mile radius of every city in Connecticut. Since most towns are less than 10 miles wide, many stores were added to the data frame more than once.

There are less than 900 nail salons across the state but this scraper picked up more than 42,000 listings. That’s a lot of duplicate stores we’ll have to get rid of.

This tutorial will quickly go over how to remove duplicate rows from a spreadsheet.

Difficulty level


The whole point of this add-on is to do some advanced data cleaning with a few clicks and avoid using more advanced programs like Open Refine or R.

Setting up
Importing the data

File > Import… > Upload

Screen Shot 2015-08-27 at 12.10.58 PM
Installing the add-on
Screen Shot 2015-08-27 at 12.14.24 PM

Add-ons > Get add-ons
Then search for Remove Duplicates.

Screen Shot 2015-08-27 at 12.15.10 PM

Click free to install.

Screen Shot 2015-08-27 at 12.15.21 PMClick Accept.

Removing duplicate rows

Select the column to search for duplicates in— url for this example.

Add-ons > Remove Duplicates > Find duplicates or uniques

Screen Shot 2015-08-27 at 12.19.52 PM

Range has already been selected or reselect it here.

Screen Shot 2015-08-27 at 12.20.04 PM

Many options to choose from here. For this example, we just want to focus on the duplicate instance and ignore the first occurrence.

Or you could isolate the non-duplicates if you wanted to find the nail salons that did not overlap within 10-miles of another town. Then select the Uniques option.

Screen Shot 2015-08-27 at 12.20.23 PM

Some more advanced options on the target column.

Screen Shot 2015-08-27 at 12.20.40 PM

What do you want to do with the identified duplicate rows? You could just change the background color or copy it into another spreadsheet entirely.

For this exercise, we want to delete it. Click Finish

Screen Shot 2015-08-27 at 12.20.57 PM

Now, we wait. For this example, it will take a very long time for the browser to churn through 42,000 rows. Step outside for a break.

Screen Shot 2015-08-27 at 12.28.34 PM

It took me about a couple of hours to churn through this data set, but now we’ve gone from 42,000 rows to more than 900. Not bad.

Screen Shot 2015-08-27 at 5.57.15 PM

Oh, also you can run pivot tables in Google Sheets now. We’ll try that in the next walkthrough.

What do you think?