Ruby
Article

Speed up with Materialized Views on PostgreSQL and Rails

By Vinoth

Today I would like to talk about how to implement materialized views in PostgreSQL, utilizing them in a Ruby on Rails application.

PostgreSQL, known commonly as Postgres, is an multi-platform, open source SQL database system. It is one of the most popular database systems and second-most used open source database in the world. The development for Postgres is being done under the PostgreSQL Global Development Group. If you’ve never used Postgres I’d recommend trying it on your next project, it’s really cool.

The tutorial is split into three parts as follows:

  1. What is a Database View.
  2. What is a Materialized View.
  3. How to add a Materialized View to a Rails application.

What is a Database View?

A view is basically a stored set of queries against the underlying tables. Whenever a view is invoked, it’ll execute a series of pre-stored queries against the underlying data, returning the resultant data. Unlike a table, it doesn’t take physical space, but the schema of a view is stored and accessed from memory instead.

When the underlying data on which the view is constructed changes, it is reflected in subsequent invocations the view.

Some of the advantages a view provides are:
1. Data abstraction – A view can be used to abstract a set of data from the underlying tables.
2. Optimization – A view, just like any query, goes through an optimizer which improves it as much as possible based on the operation performed on the result of the view.
3. Simplicity – As mentioned above, a view can abstract away the data while at the same time simplifying querying multiple tables. A view is often comprised of one or more joins between tables, making the data of these joins look like a simple table.

Almost all the major database providers have a view feature, and it is the single way of querying certain NoSQL databases.

What is a Materialized View?

A materialized view, also referred to as a “matview”, is a form of database view that was first introduced in Oracle. It has since been adapted by many major databases.

From Wikipedia,

A materialized view is a database object that contains the results of a query.

Basically, a materialized view is similar to a database view except it is stored physically on disk and updated manually. A matview stores the results of a query into a table-like structure of its own, from which the data can be queried. It is not possible to add or delete rows, but the rest of the time it behaves just like an actual table.

Also, when the data behind a materialize view changes, the matview must be manually refreshed before the new data is included in the stored matview. This is both a positive and a negative. Storing the results of the query that makes the materialized view allows it to work like a table. Queries are faster, and the matview itself can be indexed. However, since the data that comprises the results is basically a snapshot, it can be stale and you have to know when to refresh the data.

At the time of this writing, matviews are natively available in Oracle DB, PostgreSQL, Sybase, IBM DB2, and Microsoft SQL Server. MySQL doesn’t provide native support for matviews, unfortunately, but there are open source alternatives to it.

Matviews in PostgreSQL

Materialized views were introduced in Postgres version 9.3. In version 9.4 an option to refresh the matview concurrently (meaning, without locking the view) was introduced.

The syntax for creating a matview in Postgres in similar to creating a table:

CREATE MATERIALIZED VIEW ex_matview AS SELECT col1, col2 FROM mytable WHERE col3 = condition;

Adding a Materialized View to a Rails app

For our simple example, I’ve created a Rails app and generated 1M rows in the database used by the application. The code is available on Github. The app has three tables called channel, item, and sales that mimic the sales data for a retailer. The schema for the sample app is:

schema

Let’s say we have a dashboard page that provides sales information for the past year, along with the following details:

1) Amount of sales a channel has done on a per-item basis, we can query it like so:

Sale.joins(:channel).joins(:item).where('channels.name = ?', 'web').group('items.name').sum(:amount)

2) Amount of sales each channel has done overall:

Sale.joins(:channel).group('channels.name').sum(:amount)

3) Amount of sales on per day basis:

Sale.group("DATE_TRUNC('day', sold_date)").sum(:amount)

These queries aren’t complicated, but they are expensive, even after many optimizations. Also, the data will likely grow more and more. This is a good candidate for a materialized view, since the data needn’t be real time and we’re abstracting only a subset of data.

To get started, let’s create a migration:

rails g migration CreateSalesMatview

Add the following lines to the migration file:

I am just picking up the necessary data that are needed for the reports.

After saving the file, run bin/rake db:migrate and voila! the matview is created. The changes for this migration won’t be available in the schema file, by the way, since it’s not a table.

Creating a Model

One awesome thing by using a materialized view with Rails/ActiveRecord is we can create models for it, just like any other table in the schema. The model in our case will be in a filed called sales_matview.rb. As such, the model name is SalesMatview. You’re welcome to use any name you want, but it’s advised to follow the naming conventions of Rails.

Let’s create a file called app/models/sales_matview.rb, and add the following lines to it:

You’ll notice that the model is bound to the underlying matview using self.table_name. Also, we’ve created two methods:

  • readonly? which locks the model from writing data making, it a read only table.
  • self.refresh which will be called to refresh the materialized view when needed.

We’re all set to use the materialized view we’ve just created. You can query it exactly how you would in a normal ActiveRecord query.

Now, let’s look at our previous queries:

  1. Amount of sales a channel has done on per item basis:

    SalesMatview.where('channel = ?', 'web').group(:item).sum(:amount)
  2. Amount of sales each channel has done overall:

    SalesMatview.group(:channel).sum(:amount)
  3. Amount of sales on a per day basis:

    SalesMatview.group(:date).sum(:amount)

The queries are similar, except without those joins, but you’ll notice a big difference on how these perform:

Channel per item
       user     system      total        real
AR   (251.5ms)  0.040000   0.070000   0.110000 (  0.412950)
MV   (2.3ms)    0.000000   0.010000   0.010000 (  0.012010)

Channel overall
       user     system      total        real
AR   (374.4ms)  0.000000   0.000000   0.000000 (  0.376352)
MV   (3.3ms)    0.000000   0.000000   0.000000 (  0.006813)

Sales per day
       user     system      total        real
AR   (976.4ms)  0.020000   0.000000   0.020000 (  0.990258)
MV   (6.2ms)    0.020000   0.000000   0.020000 (  0.026783)

That is a HUGE savings, to say the least! Over 100 times improved! Give it a try and see for yourself.

Refreshing the Materialized Views

Now we need to tackle refreshing the materialized view. We can create a rake task to refresh the matview to update the data regularly. Add the below line to the lib/tasks/sales.rb file and it’s done.

desc 'Update the Sales materialized view'
task :update_sales_view => :environment do
  SalesMatview.refresh
end

For larger datasets where the refreshes may take too much time, there is an option in Postgres to perform the refresh concurrently. Just change the statement in the refresh method to:

REFRESH MATERIALIZED VIEW CONCURRENTLY sales_matview

Keep in mind, a unique key will be required in the matview to perform concurrent refreshes.

Also, since the materialized view is read only if you want to change the schema it is constructed upon you have to drop the matview and recreate it. Just add the following line to your migration before everything else to drop the matview:

DROP MATERIALIZED VIEW IF EXISTS sales_matview

Conclusion

All the code that is used in the examples are available here –Github. It also has a seed file for generating 1 million rows for the mentioned schema.

I hope this has been helpful and informative for you and I thank you for reading. Let me know your suggestions in comments.

Credits to:
http://www.postgresql.org/docs/9.4/static/rules-materializedviews.html
http://en.wikipedia.org/wiki/Materialized_view

More:
  • http://careersreport.com mai.morrison

    This is how it is possible to get eighty-five dollars /hour… After being unemployed for six-months , I started making cash over this website and today I possibly can not be more happy. After 3 months doing this my income is around $5000month -Check internet-website Learn more by visiting my profile

  • http://wrburgess.com/ Randy Burgess

    So, is there a good way to test Material Views with rspec? I’ve tried a couple of times, but the testing environment doesn’t seem to know what a Material View is.

    • https://github.com/Fire-Dragon-DoL Francesco Belladonna

      Unfortunately, testing anything related to a database is “complicated”, means the only way is to build an actual dataset and compare it to an expected resulting dataset. Couldn’t find a different way.

      • http://wrburgess.com/ Randy Burgess

        I’m used to needing to do that…my issue is that I cannot seem to get the Mat Views to be recognized as existing, even if I populate data in the test db. I think it has to do with the aspect of views that don’t represent actual tables.

        • https://github.com/Fire-Dragon-DoL Francesco Belladonna

          Are you using schema.sql instead of schema.rb?

          • http://wrburgess.com/ Randy Burgess

            nope, using schema.rb

          • Court Simas

            For our project, we pulled the sql out into a separate .sql file, and the migration runs that – but what that allowed us to do, was run those sql files individually. So we have some methods that load up the sql and create the views before the test suite is ran (rspec).

          • https://github.com/Fire-Dragon-DoL Francesco Belladonna

            You should try using an sql schema instead (the rb don’t support views, can’t work at all with them!)

        • http://about.me/tamouse Tamara Temple

          I have a thought, completely untested or tried (maybe when I get some time), to put the view table creation and initial load into an initializer.

          • vtamara@pasosdeJesus.org

            Yes I do that where the underlying tables don’t change often.

  • http://nruth.tumblr.com Nick Rutherford

    I’d like to see more discussion of the schema dumping problem. It’s prevented me from using this so far (after seeing a few other blogs on it, and implementing a feature using them in a topic branch).

    “The changes for this migration won’t be available in the schema file, by the way, since it’s not a table.”
    This is a bigger problem than it’s presented as. Your schema is no longer the canonical representation of the database your application requires. How are you going to run tests against this now? How do you fix the database after db:schema:load, which will leave these view-backed-models complaining that their table does not exist?

    Relying on running migrations to recreate the database may work, but isn’t how the tests are run, and leads to debugging migrations if you have done any data manipulation in your migrations through your application models (which of course change later in the software’s life). It’s not the recommended usage of migrations.

    Switching to SQL schema dumping seems to be the suggested solution, and this is what I’d like to see discussed more. It warns that you will no longer be able to switch to other databases, but there seems to be more to it than that, as the rails/rake commands change, and there seems to be less automation and more room for error. That and nobody seems to be doing it.

    SchemaPlus is another option, but it hasn’t implemented Materialized Views yet, only normal ones.

    • http://philipefatio.com/ Philipe Fatio

      I’ve always been using db/structure.sql over db/schema.rb because I usually depend on PostgreSQL specific features such as PostGIS, arrays, full text search indexes, JSON, etc.

      > The changes for this migration won’t be available in the schema file, by the way, since it’s not a table.

      I agree that this is bad. Solution: just use structure.sql instead.

      > Switching to SQL schema dumping seems to be the suggested solution, and this is what I’d like to see discussed more.

      Not sure what needs to be discussed. It just works. Yes, you are no longer able to “easily” switch databases, but is this a limiting factor? If you have a need to use structure.sql, then you’re most likely already using a database feature only implemented by the current one your using. My point is that which DB to use is a decision you take once and stick to it. Any migration to a different DB will come with significant overhead, regardless of which dump format you’re using.

      • http://nruth.tumblr.com Nick Rutherford

        Yes that’s fine (unfortunate but necessary) and the docs cover it. My concerns are more with the rails tooling.

        I’d appreciate comments on any of the following.

        – The rails commands needed to work with a sql schema are different, which seems strange since it’s a config setting, and is confusing upon switching
        – Does rake db:migrate still update the schema file
        – Do the tests make the switch to the sql schema file
        – How does this behave in practice when shared across developers with different minor version numbers of e.g. postgresql, do the dumped sql schema files get lots of false-positive changes?

        Thanks!

        • http://philipefatio.com/ Philipe Fatio

          > The rails commands needed to work with a sql schema are different, which seems strange since it’s a config setting, and is confusing upon switching

          Are they? I commonly use db:setup, db:seed, db:create, db:structure:load, db:migrate, db:rollback etc.

          > Does rake db:migrate still update the schema file

          Yes (the db/structure.sql file, the db/schema.rb is gone).

          > Do the tests make the switch to the sql schema file

          Yes. AFAIK when you set up the test db (which nowadays should be handled automatically by ActiveRecord), it simply loads the db/structure.sql file. You could even do it yourself with one short and quick command: `dropdb –if-exists app_test && createdb app_test && psql –quiet How does this behave in practice when shared across developers with different minor version numbers of e.g. postgresql, do the dumped sql schema files get lots of false-positive changes?

          Not at all. At least the PostgreSQL dump format is very stable.

          • http://nruth.tumblr.com Nick Rutherford

            Great, thanks for the reassurance! Will try this out again soon, as want the materialised views & triggers for MTI and so on. Sounds like the only difference then is using db:structure:load rather than db:schema:load. I’m sure I’ll type the wrong one in frequently so will just override that task in the app to raise an exception.

          • http://nruth.tumblr.com Nick Rutherford

            Have now been using this for some months and am glad we made the switch. Everything works as it should. The schema is a bit harder to read, but not having to think about whether AR supports database feature x anymore is a great help.

Recommended

Learn Coding Online
Learn Web Development

Start learning web development and design for free with SitePoint Premium!

Get the latest in Ruby, once a week, for free.