Speed up with Materialized Views on PostgreSQL and Rails
Today I would like to talk about how to implement materialized views in PostgreSQL, utilizing them in a Ruby on Rails application.
PostgreSQL, known commonly as Postgres, is an multi-platform, open source SQL database system. It is one of the most popular database systems and second-most used open source database in the world. The development for Postgres is being done under the PostgreSQL Global Development Group. If you’ve never used Postgres I’d recommend trying it on your next project, it’s really cool.
The tutorial is split into three parts as follows:
- What is a Database View.
- What is a Materialized View.
- How to add a Materialized View to a Rails application.
What is a Database View?
A view is basically a stored set of queries against the underlying tables. Whenever a view is invoked, it’ll execute a series of pre-stored queries against the underlying data, returning the resultant data. Unlike a table, it doesn’t take physical space, but the schema of a view is stored and accessed from memory instead.
When the underlying data on which the view is constructed changes, it is reflected in subsequent invocations the view.
Some of the advantages a view provides are:
1. Data abstraction – A view can be used to abstract a set of data from the underlying tables.
2. Optimization – A view, just like any query, goes through an optimizer which improves it as much as possible based on the operation performed on the result of the view.
3. Simplicity – As mentioned above, a view can abstract away the data while at the same time simplifying querying multiple tables. A view is often comprised of one or more joins between tables, making the data of these joins look like a simple table.
Almost all the major database providers have a view feature, and it is the single way of querying certain NoSQL databases.
What is a Materialized View?
A materialized view, also referred to as a “matview”, is a form of database view that was first introduced in Oracle. It has since been adapted by many major databases.
A materialized view is a database object that contains the results of a query.
Basically, a materialized view is similar to a database view except it is stored physically on disk and updated manually. A matview stores the results of a query into a table-like structure of its own, from which the data can be queried. It is not possible to add or delete rows, but the rest of the time it behaves just like an actual table.
Also, when the data behind a materialize view changes, the matview must be manually refreshed before the new data is included in the stored matview. This is both a positive and a negative. Storing the results of the query that makes the materialized view allows it to work like a table. Queries are faster, and the matview itself can be indexed. However, since the data that comprises the results is basically a snapshot, it can be stale and you have to know when to refresh the data.
At the time of this writing, matviews are natively available in Oracle DB, PostgreSQL, Sybase, IBM DB2, and Microsoft SQL Server. MySQL doesn’t provide native support for matviews, unfortunately, but there are open source alternatives to it.
Matviews in PostgreSQL
Materialized views were introduced in Postgres version 9.3. In version 9.4 an option to refresh the matview concurrently (meaning, without locking the view) was introduced.
The syntax for creating a matview in Postgres in similar to creating a table:
CREATE MATERIALIZED VIEW ex_matview AS SELECT col1, col2 FROM mytable WHERE col3 = condition;
Adding a Materialized View to a Rails app
For our simple example, I’ve created a Rails app and generated 1M rows in the database used by the application. The code is available on Github. The app has three tables called
sales that mimic the sales data for a retailer. The schema for the sample app is:
Let’s say we have a dashboard page that provides sales information for the past year, along with the following details:
1) Amount of sales a channel has done on a per-item basis, we can query it like so:
Sale.joins(:channel).joins(:item).where('channels.name = ?', 'web').group('items.name').sum(:amount)
2) Amount of sales each channel has done overall:
3) Amount of sales on per day basis:
These queries aren’t complicated, but they are expensive, even after many optimizations. Also, the data will likely grow more and more. This is a good candidate for a materialized view, since the data needn’t be real time and we’re abstracting only a subset of data.
To get started, let’s create a migration:
rails g migration CreateSalesMatview
Add the following lines to the migration file:
I am just picking up the necessary data that are needed for the reports.
After saving the file, run
bin/rake db:migrate and voila! the matview is created. The changes for this migration won’t be available in the schema file, by the way, since it’s not a table.
Creating a Model
One awesome thing by using a materialized view with Rails/ActiveRecord is we can create models for it, just like any other table in the schema. The model in our case will be in a filed called sales_matview.rb. As such, the model name is
SalesMatview. You’re welcome to use any name you want, but it’s advised to follow the naming conventions of Rails.
Let’s create a file called app/models/sales_matview.rb, and add the following lines to it:
You’ll notice that the model is bound to the underlying matview using
self.table_name. Also, we’ve created two methods:
readonly?which locks the model from writing data making, it a read only table.
self.refreshwhich will be called to refresh the materialized view when needed.
We’re all set to use the materialized view we’ve just created. You can query it exactly how you would in a normal ActiveRecord query.
Now, let’s look at our previous queries:
Amount of sales a channel has done on per item basis:
SalesMatview.where('channel = ?', 'web').group(:item).sum(:amount)
Amount of sales each channel has done overall:
Amount of sales on a per day basis:
The queries are similar, except without those joins, but you’ll notice a big difference on how these perform:
Channel per item user system total real AR (251.5ms) 0.040000 0.070000 0.110000 ( 0.412950) MV (2.3ms) 0.000000 0.010000 0.010000 ( 0.012010) Channel overall user system total real AR (374.4ms) 0.000000 0.000000 0.000000 ( 0.376352) MV (3.3ms) 0.000000 0.000000 0.000000 ( 0.006813) Sales per day user system total real AR (976.4ms) 0.020000 0.000000 0.020000 ( 0.990258) MV (6.2ms) 0.020000 0.000000 0.020000 ( 0.026783)
That is a HUGE savings, to say the least! Over 100 times improved! Give it a try and see for yourself.
Refreshing the Materialized Views
Now we need to tackle refreshing the materialized view. We can create a rake task to refresh the matview to update the data regularly. Add the below line to the lib/tasks/sales.rb file and it’s done.
desc 'Update the Sales materialized view' task :update_sales_view => :environment do SalesMatview.refresh end
For larger datasets where the refreshes may take too much time, there is an option in Postgres to perform the refresh concurrently. Just change the statement in the
refresh method to:
REFRESH MATERIALIZED VIEW CONCURRENTLY sales_matview
Keep in mind, a unique key will be required in the matview to perform concurrent refreshes.
Also, since the materialized view is read only if you want to change the schema it is constructed upon you have to drop the matview and recreate it. Just add the following line to your migration before everything else to drop the matview:
DROP MATERIALIZED VIEW IF EXISTS sales_matview
All the code that is used in the examples are available here –Github. It also has a seed file for generating 1 million rows for the mentioned schema.
I hope this has been helpful and informative for you and I thank you for reading. Let me know your suggestions in comments.