N + 1: When More Queries Is a Good Thing

Parth Modi
Share

Over the last week I have been trying to understand how eager loading works in Rails to eliminate the infamous N+1 query problem by reducing the number of queries fired. My initial hypothesis was that reducing the number of queries as much as possible was the goal. However, I was surprised by what I discovered.

Using includes to Reduce Queries

Most posts that you read about the infamous N + 1 Query Problem cite the includes method to address the issue. includes is used to eager load associations related to the model by using the minimum number of queries possible. For this, under the hood, it uses a preload or left outer join, depending on the situation. I will explain both situations in subsequent sections.

This is nicely explained with examples in Active Record Query Methods documentation.

When and How to Use includes?

Suppose that our user can have many posts and can comment on any post. Each post can have many comments. The basic structure is shown in snippet shown below:

# models/users.rb
class User < ApplicationRecord
  has_many :posts
  has_many :comments
end

# models/posts.rb
class Post < ApplicationRecord
  has_many :comments
  belongs_to :user
end

# models/comments.rb
class Comment < ApplicationRecord
  belongs_to :user
  belongs_to :post
end

Now, if we want a user’s information with posts made by the user along with their comments, simply calling User.all will first load users, then it will fetch the posts by each user. After fetching the posts, it will fetch the comments made by the user for those posts. If we have 10 users each having 5 posts, and on average 2 comments on each post, one User.all will end up doing around 1 + 5 + 10 queries.

# users_controller.rb
def index
  @users = User.all
  render json: @users
end

A simple solution is to use includes to tell Active Record that we want to fetch users and all related posts:

@users = User.all.includes(:posts)
posts are preloaded, comments are not preloaded

posts are preloaded, comments are not preloaded

This improvises performance a bit, as it fetches users first, and then in the subsequent query it fetches posts related to those users. Now the previous 1 + 5 + 10 queries are reduced to 1 + 1 + 10 queries. But this will be a lot better if comments related to posts are loaded in advance, as well. This will reduce it all down to 1 + 1 + 1, totaling 3 queries to fetch all the data. Look at the snippet shown below to understand:

# users_controller.rb
def index
  @users = User.all.includes(:posts => [:comments])
  render json: @users
end
All data is loaded in just 3 queries, one for users, one for posts, and one for comments related to posts

All data is loaded in just 3 queries, one for users, one for posts, and one for comments related to posts

All data is loaded in just 3 queries, one for users, one for posts, and one for comments related to posts

Passing comments in an array tells active record to preload comments related to the posts as well. If some relationship of comments needs to preloaded, we can change arguments passed to the includes methods, like so:

User.all.includes(:posts => [:comments => [:another_relationship]])

This way, any number of nested relationships can be preloaded. For all of the above queries includes uses preload.

Fetching Posts with a Specific Title

User.all.includes(:posts => [:comments]).where('posts.title = ?', some_title)

This will raise an error. Whereas,

User.all.includes(:posts => [:comments]).where(posts: {    title: some_title })

will give us the expected result. This happens because when hash conditions are passed, a left outer join of users and posts is performed to fetch users with posts having the specific title.

Specifying hash conditions on  included relations

Specifying hash conditions on included relations

But, what if we want to use a pure string or array conditions instead of hash conditions to specify conditions on included relations? Look at the following example:

User.all.includes(:posts => [:comments]).where('posts.title = ?', some_title).references(:posts)

Notice the references(:posts) part? references tells includes to force join the posts relation with a left outer join. To understand this, see the example query generated by the above line of code:

Example query when posts relation is force joined with includes through references

Example query when posts relation is force joined with includes through references

We reduced the number of queries from 1 + 5 + 10 to 1 query. That’s great!

But, Less is NOT Always More

Look at the last two example queries. Both are 3 to 4 lines long, and are have substrings like t0_r1, t0_r2, … , t2_r5. This seems unusual. I am not an SQL expert and didn’t know what this means. These are known as a CROSS JOIN or CARTESIAN join.

So, using references or hash conditions to specify conditions for included relations can cause very long queries and unnecessary outer joins, which may affect performance and memory adversely. Instead, splitting a large single query into a few queries would be more beneficial.

The Active Record documentation clearly states that when you need to query associations, you should use join queries with includes instead of references.

Even though Active Record lets you specify conditions on the eager loaded associations, just like joins, the recommended way is to use joins instead.

A better way to specify conditions on eager loaded associations:

User.all.joins(:posts).where('posts.title = ? ', some_title).includes(:posts => [:comments])

This generates 1 + 1 + 1 queries and loads only the users having posts matching the given conditions, such as a specific title, etc.

Conclusion

Eager loading associations can be very useful and improve performance to a great extent, but can also cause serious harm when many nested associations are loaded. I for one was a bit surprised that reducing trips to the database can actually make things worse.

This post at Engine Yard explains issues related to eager loading associations very well. For example, do not forget to add pagination and limit to records being fetched.

I hope you enjoyed this quick post on N + 1 queries.

CSS Master, 3rd Edition