Ruby
Article

Zip It! Zip It Good with Rails and Rubyzip

By Ilya Bodrov-Krukowski

In our day-to-day activities we are often interacting with archives. When you want to send your friend a bunch of documents, you’d probably archive them first. When you download a book from the web, it will probably be archived alongside with accompanying materials. So, how can we interact with archives in Ruby?

Today we will discuss a popular gem called rubyzip that is used to manage zip archives. With its help, you can easily read and create archives or generate them on the fly. In this article I will show you how to create database records from the zip file sent by the user and how to send an archive containing all records from a table.

Source code is available at GitHub.

Before getting started, I want to remind you that various compressed formats have different compression ratios. As such, even if you archive a file, its size might remain more or less the same:

  • Text files compress very nicely. Depending on their contents, the ratio is about 3:1.
  • Some images can benefit from compression, but when using a format like .jpg that already has native compression, it won’t change much.
  • Binary files may be compressed up to 2 times of their original size.
  • Audio and video are generally poor candidates for compression.

Getting Started

Create a new Rails app:

$ rails new Zipper -T

I am using Rails 5 beta 3 and Ruby 2.2.3 for this demo, but rubyzip works with Ruby 1.9.2 or higher.

In our scenario today, the demo app keeps track of animals. Each animal has the following attributes:

  • name (string)
  • age (integer) – of course, you can use decimal instead
  • species (string)

We want to list all the animals, add abilities to them, and download data about them in some format.

Create and apply the corresponding migration:

$ rails g model Animal name:string age:integer species:string
$ rake db:migrate

Now let’s prepare the default page for our app:

animals_controller.rb

class AnimalsController < ApplicationController
  def index
    @animals = Animal.order('created_at DESC')
  end
end

views/animals/index.html.erb

<h1>My animals</h1>

<ul>
  <% @animals.each do |animal| %>
    <li>
      <strong>Name:</strong> <%= animal.name %><br>
      <strong>Age:</strong> <%= animal.age %><br>
      <strong>Species:</strong> <%= animal.species %>
    </li>
  <% end %>
</ul>

config/routes.rb

[...]
resources :animals, only: [:index, :new, :create]
root to: 'animals#index'
[...]

Nice! Proceed to the next section and let’s take care of creation first.

Creating Animals from the Archive

Introduce the new action:

animals_controller.rb

[...]
def new
end
[...]

*views/animals/index.html.erb

<h1>My animals</h1>

<%= link_to 'Add!', new_animal_path %>
[...]

Of course, we could craft a basic Rails form to add animals one by one, but instead let’s allow users to upload archives with JSON files. Each file will then contain attributes for a specific animal. The file structure looks like this:

  • animals.zip
    • animal-1.json
    • animal-2.json

Each JSON file will have the following structure:

{
  name: 'My name',
  age: 5,
  species: 'Dog'
}

Of course, you may use another format, like XML, for example.

Our job is to receive an archive, open it, read each file, and create records based on the input. Start with the form:

views/animals/new.html.erb

<h1>Add animals</h1>

<p>
  Upload a zip archive with JSON files in the following format:<br>
  <code>{name: 'name', age: 1, species: 'species'}</code>
</p>

<%= form_tag animals_path, method: :post, multipart: true do %>
  <%= label_tag 'archive', 'Select archive' %>
  <%= file_field_tag 'archive' %>

  <%= submit_tag 'Add!' %>
<% end %>

This is a basic form allowing the user to select a file (don’t forget the multipart: true option).

Now the controller’s action:

animals_controller.rb

def create
  if params[:archive].present?
    # params[:archive].tempfile ...
  end
  redirect_to root_path
end

The only parameter that we are interested in is the :archive. As long as it contains a file, it responds to the tempfile method that returns path to the uploaded file.

To read an archive we will use the Zip::File.open(file) method that accepts a block. Inside this block you can fetch each archived file and either extract it somewhere by using extract or read it into memory with the help of get_input_stream.read. We don’t really need to extract our archive anywhere, so let’s instead store the contents in the memory.

animals_controller.rb

require 'zip'

[...]

def create
  if params[:archive].present?
    Zip::File.open(params[:archive].tempfile) do |zip_file|
      zip_file.each do |entry|
        Animal.create!(JSON.load(entry.get_input_stream.read))
      end
    end
  end
  redirect_to root_path
end
[...]

Pretty simple, isn’t it? entry.get_input_stream.read reads the file’s contents and JSON.load parses it. We are only interested in .json files though, so let’s limit the scope using the glob method:

animals_controller.rb

[...]
def create
  if params[:archive].present?
    Zip::File.open(params[:archive].tempfile) do |zip_file|
      zip_file.glob('*.json').each do |entry|
        Animal.create!(JSON.load(entry.get_input_stream.read))
      end
    end
  end
  redirect_to root_path
end
[...]

You can also extract part of the code to the model and introduce a basic error handling:

animals_controller.rb

[...]
  def create
    if params[:archive].present?
      Zip::File.open(params[:archive].tempfile) do |zip_file|
        zip_file.glob('*.json').each { |entry| Animal.from_json(entry) }
      end
    end
    redirect_to root_path
  end
  [...]

animal.rb

[...]
class << self
  def from_json(entry)
    begin
      Animal.create!(JSON.load(entry.get_input_stream.read))
    rescue => e
      warn e.message
    end
  end
end
[...]

I also want to whitelist attributes that the user can assign preventing him from overriding id or created_at fields:

animal.rb

[...]
WHITELIST = ['age', 'name', 'species']

class << self
  def from_json(entry)
    begin
      Animal.create!(JSON.load(entry.get_input_stream.read).select {|k,v| WHITELIST.include?(k)})
    rescue => e
      warn e.message
    end
  end
end
[...]

You may use a blacklist approach instead by replacing select with except, but whitelisting is more secure.

Great! Now go ahead, create a zip archive and try to upload it!

Generating and Downloading an Archive

Let’s perform the opposite operation, allowing the user to download an archive containing JSON files representing animals.

Add a new link to the root page:

views/animals/index.html.erb

[...]
<%= link_to 'Download archive', animals_path(format: :zip) %>

We’ll use the same index action and equip it with the respond_to method:

animals_controller.rb

[...]
def index
  @animals = Animal.order('created_at DESC')

  respond_to do |format|
    format.html
    format.zip do
    end
  end
end
[...]

To send an archive to the user, you may either create it somewhere on the disk or generate it on the fly. Creating the archive on disk involves the following steps:

  • Create an array of files that has to be placed inside the archive:
files << File.open("path/name.ext", 'wb') { |file| file << 'content' }
  • Create an archive:
Zip::File.open('path/archive.zip', Zip::File::CREATE) do |z|
  • Add your files to the archive:

Zip::File.open('path/archive.zip', Zip::File::CREATE) do |z|
  files.each do |f|
    z.add('file_name', f.path)
  end
end

The add method accepts two arguments: the file name as it should appear in the archive and the original file’s path and name.

  • Send the archive:
    
    send_file 'path/archive.zip', type: 'application/zip',
          disposition: 'attachment',
          filename: "my_archive.zip"
    

    This, however, means that all these files and the archive itself will persist on disk. Of course, you may remove them manually and even try to create a temporary zip file as described here but that involves too much unnecessary complexity.

What I’d like to do instead is to generate our archive on the fly and use send_data method to display the response as an attachment. This is a bit more tricky, but there is nothing we can’t manage.

In order to accomplish this task, we’ll require a method called Zip::OutputStream.write_buffer that accepts a block:

animals_controller.rb

[...]
def index
  @animals = Animal.order('created_at DESC')

  respond_to do |format|
    format.html
    format.zip do
      compressed_filestream = Zip::OutputStream.write_buffer do |zos|
      end
    end
  end
end
[...]

To add a new file to the archive, use zos.put_next_entry while providing a file name. You can even specify a directory to nest your file by saying zos.put_next_entry('nested_dir/my_file.txt'). To write something to the file, use print:

animals_controller.rb

compressed_filestream = Zip::OutputStream.write_buffer do |zos|
  @animals.each do |animal|
    zos.put_next_entry "#{animal.name}-#{animal.id}.json"
    zos.print animal.to_json(only: [:name, :age, :species])
  end
end

We don’t want fields like id or created_at to be present in the file, so by saying :only we limit them to name, age and species.

Now rewind the stream:

compressed_filestream.rewind

And send it:

send_data compressed_filestream.read, filename: "animals.zip"

Here is the resulting code:

animals_controller.rb

[...]
def index
  @animals = Animal.order('created_at DESC')

  respond_to do |format|
    format.html
    format.zip do
      compressed_filestream = Zip::OutputStream.write_buffer do |zos|
        @animals.each do |animal|
          zos.put_next_entry "#{animal.name}-#{animal.id}.json"
          zos.print animal.to_json(only: [:name, :age, :species])
        end
      end
      compressed_filestream.rewind
      send_data compressed_filestream.read, filename: "animals.zip"
    end
  end
end
[...]

Go ahead and try the “Download archive” link!

You can even protect the archive with a password. This feature of rubyzip is experimental and may change in the future, but it seems to be working currently:

animals_controller.rb

[...]
compressed_filestream = Zip::OutputStream.write_buffer(::StringIO.new(''), Zip::TraditionalEncrypter.new('password')) do |zos|
[...]

Customizing Rubyzip

Rubyzip does provide a bunch of configuration options that can be either provided in the block:

Zip.setup do |c|
end

or one-by-one:

Zip.option = value

Here are the available options:

  • on_exists_proc – Should the existing files be overwritten during extraction? Default is false.
  • continue_on_exists_proc – Should the existing files be overwritten while creating an archive? Default is false.
  • unicode_names – Set this if you want to store non-unicode file names on Windows Vista and earlier.Default is false.
  • warn_invalid_date – Should a warning be displayed if an archive has incorrect date format? Default is true.
  • default_compression – Default compression level to use. Initially set to Zlib::DEFAULT_COMPRESSION, other possible values are Zlib::BEST_COMPRESSION and Zlib::NO_COMPRESSION.
  • write_zip64_support – Should Zip64 support be disabled for writing? Default is false.

Conclusion

In this article we had a look the rubyzip library. We wrote an app that reads users’ archives, creates records based on them, and generates archives on the fly as a response. Hopefully, the provided code snippets will come in handy in one of your projects.

As always, thanks for staying with me and see you soon!

  • DoctorMcZ

    The best thing about Rails is that nothing useful is bundled with the core distribution. It is like buying a car with no A/C, no seats, no tires and no windows. It is the best design practices.

  • http://say8425.github.io/ 펭귄

    I absolutely impressive with your post. And it save my time. Thank you so much. And I have one question. What’s meaning `rewind`? When I delete it, then zip file is zero KB. How happen it?

    • Ilya Bodrov

      That’s a good question. When we finished writing the stream, we need to send it. Imagine a casette: we reached its end and then need to watch it again. What would we do? Obviously, we would rewind it :) The same happens here – after finishing writing to the stream, we return to its very beginning and then start to send it to the user. Otherwise the stream would be at its end and we would have nothing to send

Recommended

Learn Coding Online
Learn Web Development

Start learning web development and design for free with SitePoint Premium!

Get the latest in Ruby, once a week, for free.