With the craze for “big” data, analytics tools have gained popularity. One of these tools is the programming language R. In this post, I’ll show how to extract data from text files, CSV files, and databases. Then I’ll show how to send that data to a web server.
You may be wondering, Do I need to learn a new language all over again? The answer is no! All you need to know is a few commands.
Programmers from diverse backgrounds who work on web applications in a variety of programming languages can import the data into R and, after processing, export it in the format they require.
Note: If you’re not familiar with R, I recommend SitePoint’s article on how to install R and RStudio. It provides basic commands in R and a general introduction to the language. This post covers commands that can be run on the R terminal without the use of the RStudio IDE. However, handling large datasets on a terminal could turn out to be difficult for beginners, so I’d suggest using RStudio for an enriched experience. In RStudio, you can run the same commands in the Console box.
Handling Text Files
A text file present on your local machine can be read using a slightly modified
read.table command. Because it’s designed for reading tables, you can set the separator to an empty string (“”) to read a text file line by line:
file_contents = read.table("<path_to_file>", sep = "")
Note: where you see angled brackets such as in
<path_to_file>, insert the necessary number, identifier, etc. without the brackets.
The path to the file may also be the relative path to the file. If your rows have unequal length, you have to set
fill = TRUE as well. The output of this command is a data frame in R.
If your file is too large to be read in one go, you can try reading it in steps using the
nrow options. For instance, to read the lines 6–10 in your file, run the following commands:
connection <- file("<path_to_file>") lines6_10 = read.table(connection, skip=5, nrow=5) # 6-10 lines
Handling CSV Files
A CSV (comma-separated values) file is a file that, quite literally, contains values separated by commas. You can read a CSV file using the
file_contents = read.csv("<path_to_file>")
header option states whether the CSV file contains column headers. It is set to
TRUE by default. (This can also be specified when reading text files.) In case you have unequal columns in different rows, you need to set
TRUE as well.
For large files, you can skip rows in a similar manner:
connection <- file("<path_to_file>") lines6_10 = read.csv(connection, skip=5, nrow=5) # 6-10 lines
Using MySQL Databases
To make database connections, you need the separate
RMySQL library. It can be installed using the following command:
Once installed, you need to activate it by running the following:
Assuming that your database is running, you can now make MySQL queries after establishing a connection:
con <- dbConnect(MySQL(), user="root", password="root", dbname="nsso", host="localhost", port=8889)
If you’re running MySQL through MAMP on a Mac, you need to specify a
con <- dbConnect(..., unix.socket = "/Applications/MAMP/tmp/mysql/mysql.sock")
To make a MySQL query, you first need to execute the query and then store the data in a data frame:
rs <- dbSendQuery(con, "SELECT * FROM my_table;") # Make sure you run a LIMIT if your query is too large data <- fetch(rs, n = -1)
Once you’re done with your queries, you can disconnect your connection through the
Read Data on the Web
What if your data source is on the web? How do you read online files? In R, it can be done simply by changing the file path that you specify in the read command. You need to use the
url command and specify the URL in the
read.csv command. For instance:
file_contents = read.csv(url("<file_URL>"))
For a database, the
host may be changed to extract data from a database on a web server.
read.table, a data frame can be exported into a text or a CSV file using the
write.csv(data_frame, file = "data.csv")
To export as a text file using a different delimiter (say, a tab), you can use the
write.table(data_frame, file = "data.txt", sep = "\t")
Updating databases is just as easy, and can be done by executing
INSERT MySQL commands.
Once you’ve processed and plotted your data in R, you can export it too! The
jpeg command does that for you. Basically, it saves the plot that’s currently active:
# Initiate Image png(filename="sample.png") # Make a plot plot(c(1,2,3,4,5), c(4,5,6,7,8)) # Save the plot dev.off()
Ideally, you can tweak the second command to save a required plot.
Export Data to the Web
Uploading files to the web directly might be a bit tricky, but you can export data to the web using two steps: save a file locally, then upload it to the web. You can upload a file to the web using a
POST request through R, which can be emulated using the
POST("<upload_URL>", body = list(name="<path_to_local_file>", filedata = upload_file(filename, "text/csv")))
For more details, here’s a quickstart guide on the
R has gained a lot of popularity in recent years among people working with statistics, and now’s a good time to learn this wonderful language. It’s flexible enough to sync with various types of data sources, and working with R is very easy too, irrespective of your background. Let’s hope this post got you started with R!