How to Import Data and Export Results in R

Shaumik Daityari
Share

With the craze for “big” data, analytics tools have gained popularity. One of these tools is the programming language R. In this guide, I’ll show how to extract data from text files, CSV files, and databases. Then I’ll show how to send that data to a web server.

You may be wondering, Do I need to learn a new language all over again? The answer is no! All you need to know is a few commands.

Programmers from diverse backgrounds who work on web applications in a variety of programming languages can import the data into R and, after processing, export it in the format they require.

Note: If you’re not familiar with R, I recommend SitePoint’s article on how to install R and RStudio. It provides basic commands in R and a general introduction to the language. This post covers commands that can be run on the R terminal without the use of the RStudio IDE. However, handling large datasets on a terminal could turn out to be difficult for beginners, so I’d suggest using RStudio for an enriched experience. In RStudio, you can run the same commands in the Console box.

Handling Text Files

A text file present on your local machine can be read using a slightly modified read.table command. Because it’s designed for reading tables, you can set the separator to an empty string ("") to read a text file line by line:

file_contents = read.table("<path_to_file>", sep = "")

Note: where you see angled brackets such as in <path_to_file>, insert the necessary number, identifier, etc. without the brackets.

The path to the file may also be the relative path to the file. If your rows have unequal length, you have to set fill = TRUE as well. The output of this command is a data frame in R.

If your file is too large to be read in one go, you can try reading it in steps using the skip and nrow options. For instance, to read the lines 6–10 in your file, run the following commands:

connection <- file("<path_to_file>")
lines6_10 = read.table(connection, skip=5, nrow=5) # 6-10 lines

Handling CSV Files

A CSV (comma-separated values) file is a file that, quite literally, contains values separated by commas. You can read a CSV file using the read.csv command:

file_contents = read.csv("<path_to_file>")

A header option states whether the CSV file contains column headers. It’s set to TRUE by default. (This can also be specified when reading text files.) In case you have unequal columns in different rows, you need to set fill to TRUE as well.

For large files, you can skip rows in a similar manner:

connection <- file("<path_to_file>")
lines6_10 = read.csv(connection, skip=5, nrow=5) # 6-10 lines

Using MySQL Databases

To make database connections, you need the separate RMySQL library. It can be installed using the following command:

install.packages('RMySQL')

Once installed, you need to activate it by running the following:

library('RMySQL')

Assuming that your database is running, you can now make MySQL queries after establishing a connection:

con <- dbConnect(MySQL(),
  user="root", password="root",
  dbname="nsso", host="localhost", port=8889)

If you’re running MySQL through MAMP on a Mac, you need to specify a unix.socket:

con <- dbConnect(..., unix.socket = "/Applications/MAMP/tmp/mysql/mysql.sock")

To make a MySQL query, you first need to execute the query and then store the data in a data frame:

rs <- dbSendQuery(con, "SELECT * FROM my_table;") # Make sure you run a LIMIT if your query is too large
data <- fetch(rs, n = -1)

Once you’re done with your queries, you can disconnect your connection through the dbDisconnect command:

dbDisconnect(con)

Read Data on the Web

What if your data source is on the Web? How do you read online files? In R, it can be done simply by changing the file path that you specify in the read command. You need to use the url command and specify the URL in the read.csv command. For instance:

file_contents = read.csv(url("<file_URL>"))

For a database, the host may be changed to extract data from a database on a web server.

Export Data

Just like read.csv and read.table, a data frame can be exported into a text or a CSV file using the write commands:

write.csv(data_frame, file = "data.csv")

To export as a text file using a different delimiter (say, a tab), you can use the write.table command:

write.table(data_frame, file = "data.txt", sep = "\t")

Updating databases is just as easy, and can be done by executing UPDATE and INSERT MySQL commands.

Export Graphs

Once you’ve processed and plotted your data in R, you can export it too! The png or jpeg command does that for you. Basically, it saves the plot that’s currently active:

# Initiate Image
png(filename="sample.png")
# Make a plot
plot(c(1,2,3,4,5), c(4,5,6,7,8))
# Save the plot
dev.off()

Ideally, you can tweak the second command to save a required plot.

Export Data to the Web

Uploading files to the web directly might be a bit tricky, but you can export data to the Web using two steps: save a file locally, then upload it to the Web. You can upload a file to the web using a POST request through R, which can be emulated using the httr package:

POST("<upload_URL>", body = list(name="<path_to_local_file>", filedata = upload_file(filename, "text/csv")))

For more details, here’s a quickstart guide on the httr package.

Conclusion

R has gained a lot of popularity in recent years among people working with statistics, and now’s a good time to learn this wonderful language. It’s flexible enough to sync with various types of data sources, and working with R is very easy too, irrespective of your background. Let’s hope this post got you started with R!