Speed up with Materialized Views on PostgreSQL and Rails

Key Takeaways

Materialized views in PostgreSQL can significantly speed up complex queries by storing the results for later use, which is particularly useful for queries that involve multiple joins or aggregations on large tables.
Materialized views, unlike regular views, are physically stored on disk and need to be manually updated when the data changes. They can’t be used to add or delete rows, but can be queried like a regular table and can be indexed for faster queries.
Materialized views can be implemented in a Rails application through creating a migration and a model for the view. The model can be made read-only and a method can be created to refresh the view when needed.
For larger datasets where refreshes may take a lot of time, PostgreSQL provides an option to perform the refresh concurrently, meaning without locking the view. However, a unique key will be required in the view to perform concurrent refreshes.

Today I would like to talk about how to implement materialized views in PostgreSQL, utilizing them in a Ruby on Rails application.

PostgreSQL, known commonly as Postgres, is an multi-platform, open source SQL database system. It is one of the most popular database systems and second-most used open source database in the world. The development for Postgres is being done under the PostgreSQL Global Development Group. If you’ve never used Postgres I’d recommend trying it on your next project, it’s really cool.

The tutorial is split into three parts as follows:

What is a Database View.
What is a Materialized View.
How to add a Materialized View to a Rails application.

What is a Database View?

A view is basically a stored set of queries against the underlying tables. Whenever a view is invoked, it’ll execute a series of pre-stored queries against the underlying data, returning the resultant data. Unlike a table, it doesn’t take physical space, but the schema of a view is stored and accessed from memory instead.

When the underlying data on which the view is constructed changes, it is reflected in subsequent invocations the view.

Some of the advantages a view provides are:
1. Data abstraction – A view can be used to abstract a set of data from the underlying tables.
2. Optimization – A view, just like any query, goes through an optimizer which improves it as much as possible based on the operation performed on the result of the view.
3. Simplicity – As mentioned above, a view can abstract away the data while at the same time simplifying querying multiple tables. A view is often comprised of one or more joins between tables, making the data of these joins look like a simple table.

Almost all the major database providers have a view feature, and it is the single way of querying certain NoSQL databases.

What is a Materialized View?

A materialized view, also referred to as a “matview”, is a form of database view that was first introduced in Oracle. It has since been adapted by many major databases.

From Wikipedia,

A materialized view is a database object that contains the results of a query.

Basically, a materialized view is similar to a database view except it is stored physically on disk and updated manually. A matview stores the results of a query into a table-like structure of its own, from which the data can be queried. It is not possible to add or delete rows, but the rest of the time it behaves just like an actual table.

Also, when the data behind a materialize view changes, the matview must be manually refreshed before the new data is included in the stored matview. This is both a positive and a negative. Storing the results of the query that makes the materialized view allows it to work like a table. Queries are faster, and the matview itself can be indexed. However, since the data that comprises the results is basically a snapshot, it can be stale and you have to know when to refresh the data.

At the time of this writing, matviews are natively available in Oracle DB, PostgreSQL, Sybase, IBM DB2, and Microsoft SQL Server. MySQL doesn’t provide native support for matviews, unfortunately, but there are open source alternatives to it.

Matviews in PostgreSQL

Materialized views were introduced in Postgres version 9.3. In version 9.4 an option to refresh the matview concurrently (meaning, without locking the view) was introduced.

The syntax for creating a matview in Postgres in similar to creating a table:

CREATE MATERIALIZED VIEW ex_matview AS SELECT col1, col2 FROM mytable WHERE col3 = condition;

Adding a Materialized View to a Rails app

For our simple example, I’ve created a Rails app and generated 1M rows in the database used by the application. The code is available on Github. The app has three tables called channel, item, and sales that mimic the sales data for a retailer. The schema for the sample app is:

schema

Let’s say we have a dashboard page that provides sales information for the past year, along with the following details:

1) Amount of sales a channel has done on a per-item basis, we can query it like so:

Sale.joins(:channel).joins(:item).where('channels.name = ?', 'web').group('items.name').sum(:amount)

2) Amount of sales each channel has done overall:

Sale.joins(:channel).group('channels.name').sum(:amount)

3) Amount of sales on per day basis:

Sale.group("DATE_TRUNC('day', sold_date)").sum(:amount)

These queries aren’t complicated, but they are expensive, even after many optimizations. Also, the data will likely grow more and more. This is a good candidate for a materialized view, since the data needn’t be real time and we’re abstracting only a subset of data.

To get started, let’s create a migration:

rails g migration CreateSalesMatview

Add the following lines to the migration file:

I am just picking up the necessary data that are needed for the reports.

After saving the file, run bin/rake db:migrate and voila! the matview is created. The changes for this migration won’t be available in the schema file, by the way, since it’s not a table.

Creating a Model

One awesome thing by using a materialized view with Rails/ActiveRecord is we can create models for it, just like any other table in the schema. The model in our case will be in a filed called sales_matview.rb. As such, the model name is SalesMatview. You’re welcome to use any name you want, but it’s advised to follow the naming conventions of Rails.

Let’s create a file called app/models/sales_matview.rb, and add the following lines to it:

You’ll notice that the model is bound to the underlying matview using self.table_name. Also, we’ve created two methods:

readonly? which locks the model from writing data making, it a read only table.
self.refresh which will be called to refresh the materialized view when needed.

We’re all set to use the materialized view we’ve just created. You can query it exactly how you would in a normal ActiveRecord query.

Now, let’s look at our previous queries:

Amount of sales a channel has done on per item basis:

SalesMatview.where('channel = ?', 'web').group(:item).sum(:amount)

Amount of sales each channel has done overall:
```
SalesMatview.group(:channel).sum(:amount)
```
Amount of sales on a per day basis:
```
SalesMatview.group(:date).sum(:amount)
```

The queries are similar, except without those joins, but you’ll notice a big difference on how these perform:

Channel per item
       user     system      total        real
AR   (251.5ms)  0.040000   0.070000   0.110000 (  0.412950)
MV   (2.3ms)    0.000000   0.010000   0.010000 (  0.012010)

Channel overall
       user     system      total        real
AR   (374.4ms)  0.000000   0.000000   0.000000 (  0.376352)
MV   (3.3ms)    0.000000   0.000000   0.000000 (  0.006813)

Sales per day
       user     system      total        real
AR   (976.4ms)  0.020000   0.000000   0.020000 (  0.990258)
MV   (6.2ms)    0.020000   0.000000   0.020000 (  0.026783)

That is a HUGE savings, to say the least! Over 100 times improved! Give it a try and see for yourself.

Refreshing the Materialized Views

Now we need to tackle refreshing the materialized view. We can create a rake task to refresh the matview to update the data regularly. Add the below line to the lib/tasks/sales.rb file and it’s done.

desc 'Update the Sales materialized view'
task :update_sales_view => :environment do
  SalesMatview.refresh
end

For larger datasets where the refreshes may take too much time, there is an option in Postgres to perform the refresh concurrently. Just change the statement in the refresh method to:

REFRESH MATERIALIZED VIEW CONCURRENTLY sales_matview

Keep in mind, a unique key will be required in the matview to perform concurrent refreshes.

Also, since the materialized view is read only if you want to change the schema it is constructed upon you have to drop the matview and recreate it. Just add the following line to your migration before everything else to drop the matview:

DROP MATERIALIZED VIEW IF EXISTS sales_matview

Conclusion

All the code that is used in the examples are available here –Github. It also has a seed file for generating 1 million rows for the mentioned schema.

I hope this has been helpful and informative for you and I thank you for reading. Let me know your suggestions in comments.

Credits to:
http://www.postgresql.org/docs/9.4/static/rules-materializedviews.html
http://en.wikipedia.org/wiki/Materialized_view

Frequently Asked Questions (FAQs) about Materialized Views on PostgreSQL and Rails

What are the key differences between views and materialized views in PostgreSQL?

In PostgreSQL, a view is a virtual table based on the result-set of an SQL statement. It contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database. However, a materialized view is a physical copy, snapshot or a picture of the original base tables. The result of the query, which could be a combination of one or more tables, is stored in the hard disk as a separate table. This can be used to obtain complex queries’ results faster by storing the result of the query.

How do I create a materialized view in PostgreSQL?

To create a materialized view in PostgreSQL, you use the CREATE MATERIALIZED VIEW statement. This statement is similar to the CREATE TABLE AS statement, but it also stores the query result for later use. Here is an example of how to create a materialized view:
CREATE MATERIALIZED VIEW view_name AS
SELECT column1, column2
FROM table_name
WHERE condition;

How can I refresh a materialized view in PostgreSQL?

To refresh a materialized view in PostgreSQL, you use the REFRESH MATERIALIZED VIEW statement. This statement updates the stored data in the materialized view to reflect changes in the underlying tables. Here is an example of how to refresh a materialized view:
REFRESH MATERIALIZED VIEW view_name;

What are the benefits of using materialized views in PostgreSQL?

Materialized views in PostgreSQL can significantly boost the performance of complex queries by storing the results of these queries for later use. This is particularly useful for queries that involve multiple joins or aggregations on large tables, which can be computationally expensive and slow to execute. By using a materialized view, you can avoid the need to repeatedly execute these complex queries, resulting in faster query performance.

How do I use materialized views with Rails?

Rails does not natively support materialized views, but you can use the Scenic gem to add this functionality. Scenic provides a set of conventions and helpers for creating, updating, and managing database views in Rails. Once you’ve added the Scenic gem to your Gemfile and run bundle install, you can create a new materialized view with the following command:
rails generate scenic:view view_name
This will create a new SQL file in the db/views directory where you can define your view.

Can I index a materialized view in PostgreSQL?

Yes, one of the advantages of materialized views in PostgreSQL is that you can index them. Indexing a materialized view can significantly improve query performance, especially for queries that filter or sort the data in the view. To create an index on a materialized view, you use the CREATE INDEX statement, just like you would with a regular table.

Are there any limitations or drawbacks to using materialized views in PostgreSQL?

While materialized views can significantly improve query performance, they do have some limitations. One of the main limitations is that they take up more storage space than regular views, as they store a physical copy of the query result. Additionally, materialized views need to be refreshed manually to reflect changes in the underlying tables, which can be a drawback in scenarios where the data changes frequently.

How do I drop a materialized view in PostgreSQL?

To drop a materialized view in PostgreSQL, you use the DROP MATERIALIZED VIEW statement. This statement removes a materialized view from the database. Here is an example of how to drop a materialized view:
DROP MATERIALIZED VIEW view_name;

Can I use materialized views in a Rails migration?

Yes, you can use materialized views in a Rails migration. The Scenic gem provides a DSL for managing views in Rails migrations. You can create, update, or drop a view using the create_view, update_view, and drop_view methods respectively.

How do materialized views improve the performance of my Rails application?

Materialized views can improve the performance of your Rails application by reducing the computational load on your database. By storing the results of complex queries, materialized views can serve these results much faster than if the database had to execute the query each time. This can lead to faster response times and a smoother user experience, especially for data-intensive applications.