N + 1: When More Queries Is a Good Thing
Over the last week I have been trying to understand how eager loading works in Rails to eliminate the infamous N+1 query problem by reducing the number of queries fired. My initial hypothesis was that reducing the number of queries as much as possible was the goal. However, I was surprised by what I discovered.
Using includes
to Reduce Queries
Most posts that you read about the infamous N + 1 Query Problem cite the includes
method to address the issue. includes
is used to eager load associations related to the model by using the minimum number of queries possible. For this, under the hood, it uses a preload or left outer join, depending on the situation. I will explain both situations in subsequent sections.
This is nicely explained with examples in Active Record Query Methods documentation.
When and How to Use includes
?
Suppose that our user can have many posts and can comment on any post. Each post can have many comments. The basic structure is shown in snippet shown below:
# models/users.rb
class User < ApplicationRecord
has_many :posts
has_many :comments
end
# models/posts.rb
class Post < ApplicationRecord
has_many :comments
belongs_to :user
end
# models/comments.rb
class Comment < ApplicationRecord
belongs_to :user
belongs_to :post
end
Now, if we want a user’s information with posts made by the user along with their comments, simply calling User.all
will first load users, then it will fetch the posts by each user. After fetching the posts, it will fetch the comments made by the user for those posts. If we have 10 users each having 5 posts, and on average 2 comments on each post, one User.all
will end up doing around 1 + 5 + 10 queries.
# users_controller.rb
def index
@users = User.all
render json: @users
end
A simple solution is to use includes
to tell Active Record that we want to fetch users and all related posts:
@users = User.all.includes(:posts)

posts are preloaded, comments are not preloaded
This improvises performance a bit, as it fetches users first, and then in the subsequent query it fetches posts related to those users. Now the previous 1 + 5 + 10 queries are reduced to 1 + 1 + 10 queries. But this will be a lot better if comments related to posts are loaded in advance, as well. This will reduce it all down to 1 + 1 + 1, totaling 3 queries to fetch all the data. Look at the snippet shown below to understand:
# users_controller.rb
def index
@users = User.all.includes(:posts => [:comments])
render json: @users
end

All data is loaded in just 3 queries, one for users, one for posts, and one for comments related to posts
All data is loaded in just 3 queries, one for users, one for posts, and one for comments related to posts
Passing comments in an array tells active record to preload comments related to the posts as well. If some relationship of comments needs to preloaded, we can change arguments passed to the includes
methods, like so:
User.all.includes(:posts => [:comments => [:another_relationship]])
This way, any number of nested relationships can be preloaded. For all of the above queries includes
uses preload.
Fetching Posts with a Specific Title
User.all.includes(:posts => [:comments]).where('posts.title = ?', some_title)
This will raise an error. Whereas,
User.all.includes(:posts => [:comments]).where(posts: { title: some_title })
will give us the expected result. This happens because when hash conditions are passed, a left outer join of users and posts is performed to fetch users with posts having the specific title.

Specifying hash conditions on included relations
But, what if we want to use a pure string or array conditions instead of hash conditions to specify conditions on included relations? Look at the following example:
User.all.includes(:posts => [:comments]).where('posts.title = ?', some_title).references(:posts)
Notice the references(:posts)
part? references
tells includes
to force join the posts
relation with a left outer join. To understand this, see the example query generated by the above line of code:

Example query when posts relation is force joined with includes through references
We reduced the number of queries from 1 + 5 + 10 to 1 query. That’s great!
But, Less is NOT Always More
Look at the last two example queries. Both are 3 to 4 lines long, and are have substrings like t0_r1
, t0_r2
, … , t2_r5
. This seems unusual. I am not an SQL expert and didn’t know what this means. These are known as a CROSS JOIN or CARTESIAN join.
So, using references or hash conditions to specify conditions for included relations can cause very long queries and unnecessary outer joins, which may affect performance and memory adversely. Instead, splitting a large single query into a few queries would be more beneficial.
The Active Record documentation clearly states that when you need to query associations, you should use join queries with includes
instead of references
.
Even though Active Record lets you specify conditions on the eager loaded associations, just like
joins
, the recommended way is to use joins instead.
A better way to specify conditions on eager loaded associations:
User.all.joins(:posts).where('posts.title = ? ', some_title).includes(:posts => [:comments])
This generates 1 + 1 + 1 queries and loads only the users having posts matching the given conditions, such as a specific title, etc.
Conclusion
Eager loading associations can be very useful and improve performance to a great extent, but can also cause serious harm when many nested associations are loaded. I for one was a bit surprised that reducing trips to the database can actually make things worse.
This post at Engine Yard explains issues related to eager loading associations very well. For example, do not forget to add pagination and limit to records being fetched.
I hope you enjoyed this quick post on N + 1 queries.