Zip It! Zip It Good with Rails and Rubyzip
In our day-to-day activities we are often interacting with archives. When you want to send your friend a bunch of documents, you’d probably archive them first. When you download a book from the web, it will probably be archived alongside with accompanying materials. So, how can we interact with archives in Ruby?
Today we will discuss a popular gem called rubyzip that is used to manage zip archives. With its help, you can easily read and create archives or generate them on the fly. In this article I will show you how to create database records from the zip file sent by the user and how to send an archive containing all records from a table.
Source code is available at GitHub.
Before getting started, I want to remind you that various compressed formats have different compression ratios. As such, even if you archive a file, its size might remain more or less the same:
- Text files compress very nicely. Depending on their contents, the ratio is about 3:1.
- Some images can benefit from compression, but when using a format like .jpg that already has native compression, it won’t change much.
- Binary files may be compressed up to 2 times of their original size.
- Audio and video are generally poor candidates for compression.
Getting Started
Create a new Rails app:
$ rails new Zipper -T
I am using Rails 5 beta 3 and Ruby 2.2.3 for this demo, but rubyzip works with Ruby 1.9.2 or higher.
In our scenario today, the demo app keeps track of animals. Each animal has the following attributes:
name
(string
)age
(integer
) – of course, you can use decimal insteadspecies
(string
)
We want to list all the animals, add abilities to them, and download data about them in some format.
Create and apply the corresponding migration:
$ rails g model Animal name:string age:integer species:string
$ rake db:migrate
Now let’s prepare the default page for our app:
animals_controller.rb
class AnimalsController < ApplicationController
def index
@animals = Animal.order('created_at DESC')
end
end
views/animals/index.html.erb
<h1>My animals</h1>
<ul>
<% @animals.each do |animal| %>
<li>
<strong>Name:</strong> <%= animal.name %><br>
<strong>Age:</strong> <%= animal.age %><br>
<strong>Species:</strong> <%= animal.species %>
</li>
<% end %>
</ul>
config/routes.rb
[...]
resources :animals, only: [:index, :new, :create]
root to: 'animals#index'
[...]
Nice! Proceed to the next section and let’s take care of creation first.
Creating Animals from the Archive
Introduce the new
action:
animals_controller.rb
[...]
def new
end
[...]
*views/animals/index.html.erb
<h1>My animals</h1>
<%= link_to 'Add!', new_animal_path %>
[...]
Of course, we could craft a basic Rails form to add animals one by one, but instead let’s allow users to upload archives with JSON files. Each file will then contain attributes for a specific animal. The file structure looks like this:
- animals.zip
- animal-1.json
- animal-2.json
Each JSON file will have the following structure:
{
name: 'My name',
age: 5,
species: 'Dog'
}
Of course, you may use another format, like XML, for example.
Our job is to receive an archive, open it, read each file, and create records based on the input. Start with the form:
views/animals/new.html.erb
<h1>Add animals</h1>
<p>
Upload a zip archive with JSON files in the following format:<br>
<code>{name: 'name', age: 1, species: 'species'}</code>
</p>
<%= form_tag animals_path, method: :post, multipart: true do %>
<%= label_tag 'archive', 'Select archive' %>
<%= file_field_tag 'archive' %>
<%= submit_tag 'Add!' %>
<% end %>
This is a basic form allowing the user to select a file (don’t forget the multipart: true
option).
Now the controller’s action:
animals_controller.rb
def create
if params[:archive].present?
# params[:archive].tempfile ...
end
redirect_to root_path
end
The only parameter that we are interested in is the :archive
. As long as it contains a file, it responds to the tempfile
method that returns path to the uploaded file.
To read an archive we will use the Zip::File.open(file)
method that accepts a block. Inside this block you can fetch each archived file and either extract it somewhere by using extract
or read it into memory with the help of get_input_stream.read
. We don’t really need to extract our archive anywhere, so let’s instead store the contents in the memory.
animals_controller.rb
require 'zip'
[...]
def create
if params[:archive].present?
Zip::File.open(params[:archive].tempfile) do |zip_file|
zip_file.each do |entry|
Animal.create!(JSON.load(entry.get_input_stream.read))
end
end
end
redirect_to root_path
end
[...]
Pretty simple, isn’t it? entry.get_input_stream.read
reads the file’s contents and JSON.load
parses it. We are only interested in .json files though, so let’s limit the scope using the glob
method:
animals_controller.rb
[...]
def create
if params[:archive].present?
Zip::File.open(params[:archive].tempfile) do |zip_file|
zip_file.glob('*.json').each do |entry|
Animal.create!(JSON.load(entry.get_input_stream.read))
end
end
end
redirect_to root_path
end
[...]
You can also extract part of the code to the model and introduce a basic error handling:
animals_controller.rb
[...]
def create
if params[:archive].present?
Zip::File.open(params[:archive].tempfile) do |zip_file|
zip_file.glob('*.json').each { |entry| Animal.from_json(entry) }
end
end
redirect_to root_path
end
[...]
animal.rb
[...]
class << self
def from_json(entry)
begin
Animal.create!(JSON.load(entry.get_input_stream.read))
rescue => e
warn e.message
end
end
end
[...]
I also want to whitelist attributes that the user can assign preventing him from overriding id
or created_at
fields:
animal.rb
[...]
WHITELIST = ['age', 'name', 'species']
class << self
def from_json(entry)
begin
Animal.create!(JSON.load(entry.get_input_stream.read).select {|k,v| WHITELIST.include?(k)})
rescue => e
warn e.message
end
end
end
[...]
You may use a blacklist approach instead by replacing select
with except, but whitelisting is more secure.
Great! Now go ahead, create a zip archive and try to upload it!
Generating and Downloading an Archive
Let’s perform the opposite operation, allowing the user to download an archive containing JSON files representing animals.
Add a new link to the root page:
views/animals/index.html.erb
[...]
<%= link_to 'Download archive', animals_path(format: :zip) %>
We’ll use the same index
action and equip it with the respond_to
method:
animals_controller.rb
[...]
def index
@animals = Animal.order('created_at DESC')
respond_to do |format|
format.html
format.zip do
end
end
end
[...]
To send an archive to the user, you may either create it somewhere on the disk or generate it on the fly. Creating the archive on disk involves the following steps:
- Create an array of files that has to be placed inside the archive:
files << File.open("path/name.ext", 'wb') { |file| file << 'content' }
- Create an archive:
Zip::File.open('path/archive.zip', Zip::File::CREATE) do |z|
- Add your files to the archive:
Zip::File.open('path/archive.zip', Zip::File::CREATE) do |z|
files.each do |f|
z.add('file_name', f.path)
end
end
The add
method accepts two arguments: the file name as it should appear in the archive and the original file’s path and name.
- Send the archive:
send_file 'path/archive.zip', type: 'application/zip', disposition: 'attachment', filename: "my_archive.zip"
This, however, means that all these files and the archive itself will persist on disk. Of course, you may remove them manually and even try to create a temporary zip file as described here but that involves too much unnecessary complexity.
What I’d like to do instead is to generate our archive on the fly and use send_data method to display the response as an attachment. This is a bit more tricky, but there is nothing we can’t manage.
In order to accomplish this task, we’ll require a method called Zip::OutputStream.write_buffer
that accepts a block:
animals_controller.rb
[...]
def index
@animals = Animal.order('created_at DESC')
respond_to do |format|
format.html
format.zip do
compressed_filestream = Zip::OutputStream.write_buffer do |zos|
end
end
end
end
[...]
To add a new file to the archive, use zos.put_next_entry
while providing a file name. You can even specify a directory to nest your file by saying zos.put_next_entry('nested_dir/my_file.txt')
. To write something to the file, use print
:
animals_controller.rb
compressed_filestream = Zip::OutputStream.write_buffer do |zos|
@animals.each do |animal|
zos.put_next_entry "#{animal.name}-#{animal.id}.json"
zos.print animal.to_json(only: [:name, :age, :species])
end
end
We don’t want fields like id
or created_at
to be present in the file, so by saying :only
we limit them to name
, age
and species
.
Now rewind the stream:
compressed_filestream.rewind
And send it:
send_data compressed_filestream.read, filename: "animals.zip"
Here is the resulting code:
animals_controller.rb
[...]
def index
@animals = Animal.order('created_at DESC')
respond_to do |format|
format.html
format.zip do
compressed_filestream = Zip::OutputStream.write_buffer do |zos|
@animals.each do |animal|
zos.put_next_entry "#{animal.name}-#{animal.id}.json"
zos.print animal.to_json(only: [:name, :age, :species])
end
end
compressed_filestream.rewind
send_data compressed_filestream.read, filename: "animals.zip"
end
end
end
[...]
Go ahead and try the “Download archive” link!
You can even protect the archive with a password. This feature of rubyzip is experimental and may change in the future, but it seems to be working currently:
animals_controller.rb
[...]
compressed_filestream = Zip::OutputStream.write_buffer(::StringIO.new(''), Zip::TraditionalEncrypter.new('password')) do |zos|
[...]
Customizing Rubyzip
Rubyzip does provide a bunch of configuration options that can be either provided in the block:
Zip.setup do |c|
end
or one-by-one:
Zip.option = value
Here are the available options:
on_exists_proc
– Should the existing files be overwritten during extraction? Default isfalse
.continue_on_exists_proc
– Should the existing files be overwritten while creating an archive? Default isfalse
.unicode_names
– Set this if you want to store non-unicode file names on Windows Vista and earlier.Default isfalse
.warn_invalid_date
– Should a warning be displayed if an archive has incorrect date format? Default istrue
.default_compression
– Default compression level to use. Initially set toZlib::DEFAULT_COMPRESSION
, other possible values areZlib::BEST_COMPRESSION
andZlib::NO_COMPRESSION
.write_zip64_support
– Should Zip64 support be disabled for writing? Default isfalse
.
Conclusion
In this article we had a look the rubyzip library. We wrote an app that reads users’ archives, creates records based on them, and generates archives on the fly as a response. Hopefully, the provided code snippets will come in handy in one of your projects.
As always, thanks for staying with me and see you soon!