R for beginners: How to transition from Excel to R

Print More

Google Trends

Switching from Excel to R for data analysis can seem daunting. Over time, the open-source statistical programming language has consistently grown in popularity among those who work with numbers, with thousands of user-created libraries to expand on its power.

Though it was first created primarily to make it easier to create statistical models and output very basic visuals to explore data, it’s expanded to the point that people can use R to do a multitude of advanced processes such as scrape websites, communicate with APIs, and publish beautiful interactive charts and maps.

All with just a few lines of code.

The practice of “Reproducible Research” has been spreading outside the world of academia to other areas, like non-profits and journalism. It’s the idea that analyses should be published with the original data, as well as the methodology or software code so that others can verify or build on them.

Others can reproduce your practice, but the big draw is that for other projects, so can you.

In Excel, a user might run a formula, do some sorting, create a couple of pivot tables— but the work you did on that data cannot be replicated quickly on another spreadsheet with a similar data structure. The whole process has to be repeated step by step.

The point of doing data analysis in R is that a user can write a script to slice up and analyze a spreadsheet, then it can be saved, and then brought back to be used on another spreadsheet with just a few tweaks in the code.

Difficulty level


This tutorial is based on a presentation I gave at the Boston University Storytelling with Data Workshop. It’s meant to ease seasoned Excel users into the world of R. With GIFs.

Setting up R
Navigating RStudio
Screen Shot 2015-06-11 at 7.54.34 PM

We will be working out of RStudio, which is a powerful shell that runs on top of R.

Data structures and variables in R
Screen Shot 2015-06-11 at 7.56.46 PM

Get acquainted with the different types of data structures within R.

Spreadsheets in Excel are usually recognized as Matrices and Dataframes. When working in R, it’s important to understand how they’re made up by individual vectors and lists.

Start by creating some arrays and assigning them to variables using the arrow.
The c() command stands for combine.

Type in sports in the console and this is what you get: A dataframe made up of different types of arrays (String, Number, Boolean).

Screen Shot 2015-06-11 at 8.45.23 PM
Opening a file

Note: You can type out or copy the code in this tutorial into RStudio’s console (bottom left window) line by line but the point is to reproduce this in the future. That means you should put all your code in a script in the top left window. File > New File > R Script


First, you have to set the directory to where your files (like csvs) are.

Or, you can set it via the RStudio menu up top: Session > Set Working Directory > Choose Directory

Looking up number of observations

Focus on first five rows

Checking structure of the data

If you look closely at this column in Excel, you see the $ symbol disappears when you click into it. The raw data has no $ sign but only puts it there for cosmetic purposes. In R, it doesn’t intuitively know that a $ means currency. That will cause errors when you try to run math on it because R won’t let you perform calculations on strings. Here’s how to check and see what kind of variables are in your dataframe.

Focusing on Columns or Rows or Cells

Sometimes you want to look at a single cell, column, or row. To do so, add brackets to the dataframe in the console, like earnings[1,3].

Think “Data RoCks.” As in, Dataframe[Rownumber,Columnnumber]

Screen Shot 2015-06-11 at 9.13.41 PM

Most of the time, though, it’s difficult to reference columns by number — especially when there’s many, many columns. Here’s an alternative: dataframe$NameOfColumn

Screen Shot 2015-06-11 at 9.18.45 PM
Changing format of a column

Sort a data frame based on a column

Creating a formula

Filtering data by column content

Calculations on columns

Data to Columns in Excel

What if you want to split up a name into separate columns? It’s pretty easy to do in Excel.


It’s a bit more complicated in R, but here’s how. The gist of it is: Creating new columns by copying the original name column and deleting everything before or after a comma.

Pivot table for simple count

How many employees are there per department?


Advanced pivot table

What’s the sum of income for all employees in each department? Here’s how to do it in R.

Other options other than “sum” include “median” and “mean.”

Check it and save file as a spreadsheet (CSV)

You could also save as a text file, if you want.

All set for now

Hopefully, this gets you on your way to working with R. If you wanted to run the same analysis on the second spreadsheet included in the files, you’d only have to adjust some of the column names, but you could run the entire thing and it would export you a new CSV in seconds.

Some of the steps in R were more complicated to pull off than in Excel. There are actually some great libraries that are more efficient, but I wanted to show you how to execute the basics before we get into the shortcuts that people have put together in packages.

The next R tutorial will expand on translating advanced Excel techniques to R.

What do you think?

  • Awesome reference. Keeping this bookmarked.

  • Sujee

    Awesome article. This made me start learning r. Thank you.

    • Andrew Ba Tran

      Thanks for reading! If you have any questions, let me know.

      • Amit Bose

        sir, My name is Amit i have been recently learning Excel, Sql and R but i am not getting any project to implement it….Please guide as how i should start this journey as a data analyst as i am not able to get a proper guidance for this domain.PLEASE HELP SIR….

  • Limbu Malongo Limbu

    Loving it, very helpful!

  • Kelum Perera

    Its so helpful to understand how to use r for excel users, thank you very much.

    I have a large data data set in excel where i need to do a calculation as follows. Can i do this calculation in R?

    Excel calculation

    Cell A2 contains = Code:1+Name:Book+Size:200pg+Price:100+Vendor:Abooks
    To cell B1 contains = Code

    So want to get “1” to cell B2 & i use this fourmular =IF($A2″”,IFERROR(MID($A2,(FIND((B$1&”:”),$A2)+LEN(B$1)+1),IFERROR((FIND((“+”),$A2,(FIND((B$1&”:”),$A2)+LEN(B$1)+1))-(FIND((B$1&”:”),$A2)+LEN(B$1)+1)),LEN($A2))),””),”

    More details
    In my data set , column A i have similar data to A2, which has the details of products.(each parameter & its answer separated by a “:”, different parameters separated by “+”.
    A3 = Code:2+Name:Pen+Size:Standard+Price:10+Vendor:Abooks
    A4 = Code:3+Name:Bag+Size:Standard+Price:250+Vendor:TopBags

    List of parameters are stated in the Row no 1 from cell no B1 onwards.
    C1= Name , D1=Size , E1=Price, D1=Vendor

    Sample data set attached https://drive.google.com/open?id=1ubkVd_tquAE6gmW9uumC8CQ8irthHRiyBG-EnahRBT0