Ruby on Medicine: Scrolling Through Large Files
In my tutorial Handling Large Files, we saw how to use Ruby to extract some portion of text from a large text file. The file from that post is VERY large (3.3 GB!), and now it’s time to improve the approach we used in the past tutorial a bit.
Today, rather than extracting a portion of the large text file, we want to navigate through it. In other words, we want to scroll through that large text file smoothly, the Ruby way!
So as not to reinvent the wheel regarding terminology and the file we will be working with, please see the sections “Terminology” and “Obtaining the file” from Handling Large Files. The terminology section will walk you through some concepts that are useful for this tutorial. The latter section shows you where to get the large text file, since this the toy we will be playing with in this tutorial.
Let’s get started.
Read, Read, Until You Get Tired…
We have a very large text file under our belts now. You’re probably used to text files being the smallest files. This, however, is not the case when it comes to Genomes. When you open a text file related to Genomes, expect that you will read a lot, and I mean a lot!
As mentioned above, rather than extracting some portion of this large text file, we want to navigate (scroll) smoothly through the text file. The idea is that, instead of obtaining portion by portion as was shown in the last tutorial, you may decide that you just want to keep scrolling until you get tired of the process.
The previous article demonstrated how the text editors I used for opening the text file just went crazy. We need Ruby at this point.
Let’s write a script that the user can run at the command line to specify the file and handling our smooth scrolling. The first thing we ask is the user to give us the file name, which will be stored in a variable. Thus, open a blank file and add the following:
puts "Enter the file name you want to scroll through"
file_name = gets.chomp
gets
is a method that gets the user input as a string. chomp
is used to remove \n
, which you obtain when pressing the enter/return key.
Great! We have read the file name. Now, it’s a simple mattter of opening that file. This is done as follows:
input_file = File.open(file_name,'r')
The file is opened in read
mode, which is specifed by the r
.
After opening the file, let’s go through that file line-by-line. Since we want to navigate (scroll) through the text file, it would be a good idea to display the output chunk by chunk. In other words, display a specific amount of text, and then ask the user to press any key to continue scrolling or type EXIT
to terminate the program.
A Ruby method that comes in handy in this step is each_line
, which reads each line from the text file. We can do the following:
input_file.each_line do |line|
In order to keep navigating (scrolling) through the text file, we’ll use a while true
loop, which is an infinite loop. Howerver, we’ll add a conditional (i.e. if) statement to check the user’s response and exit as needed.
I mentioned above that we would output the text in chunks. Let’s say that the chunks are 100 lines each. In this case, after displaying those 100 lines, ask the user to press any key if to continue reading, or enter EXIT
to, well, exit.
Thus, we can add the following if-statement
for this case:
if response == 'EXIT'
exit
end
As for continuing to scroll, the user can press any key. We’ll need to save some state so we know when to stop and prompt the user to continue. Each time we stop and ask the user to take action, we can reset a counter to trace the number of lines displayed. When the counter reaches the value 100
, this means that 100 lines have been displayed, and it is time to prompt the user to take action. At the same time, we reverse the counter to 0
to keep trace of the lines displayed.
I will show the entire script in the next section.
Putting It Altogether
Here’s our fancy, new Ruby script:
puts "Enter the file name you want to scroll through"
file_name = gets.chomp
input_file = File.open(file_name,'r')
counter = 0 # used to keep track of the number of lines displayed
user_input = ' ' # stores input from user
while true
input_file.each_line do |line|
print line
counter = counter + 1
if counter == 100
counter = 0
puts 'To continue scrolling, press any key...'
puts 'To terminate, type EXIT and press enter'
user_input = gets.chomp
if user_input == 'EXIT'
exit
end
end
end
end
Running the Program
In order to run the above script, type the following at your command line (assuming the file name is scroll.rb):
ruby scroll.rb
You will be prompted to enter the file name, which is in our case hg38.txt
:
When you run the program, the first 100-lines will be displayed and you will get a prompt asking you to continue or to terminate. The figure below shows the third page (scroll) displayed:
If you type EXIT
instead of any key, the program will terminate and you’ll be free to go on your merry way.
EXIT
As we saw in this tutorial, Ruby enables us to scroll through very large text files smoothly and easily. Now, we are not constrained by the weaknesses of text editors when dealing with large files.
Do you think that this idea could lead to building a Ruby based text editor? What benefits do you think such an editor could provide? Do you think it would be high performance? What other issues would the editor need to handle?
In the next article in the Ruby on Medicine series, we will go a-hunting for the elusive Gene Sequence. Stay tuned!