Ruby on Medicine: Handling Large Files

Originally published at: http://www.sitepoint.com/ruby-medicine-handling-large-files/

This entry is part 2 of 2 in the series Ruby on Medicine

Ruby on Medicine

There I was, visiting the Sequence and Annotation Downloads page on the UCSC Genome Bioinformatics website. That page contains links to sequences and annotation data downloads for the genome assemblies that are featured in the UCSC Genome Browser. There were so many files to choose from, but I was interested in downloading the following file in the assembly of the human genome data set:

hg38.fa.gz – “Soft-masked” assembly sequence in one file. Repeats from RepeatMasker and Tandem Repeats Finder (with period of 12 or less) are shown in lower case; non-repeating sequence is shown in upper case.

Guess what? That file is greater than 3GB in size! No worries, you may say. Text editors today can handle massive files, right?? I am using Windows, so we’re talking about Notepad, WordPad, and Microsoft Office Word, just to name a few.

Well, it seems we have overestimated the abilities of these editors. When I tried the text editors mentioned above, they screamed in agony. Check it out:

Notepad

WordPad

WordPad

Microsoft Office Word

Word

Yikes.

Ruby on Medicine

<< Ruby on Medicine: Converting DICOM to JPG
Continue reading this article on SitePoint

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.