ActiveRecord: When does includes do a join, and when does it do a second query?

Wait, what’s the question?

The other day, we were working our Rails application for Interport, and we ran into an issue where an ActiveRecord query with an includes clause was performing a join, when we felt it would have been more efficient and improve performance to perform a second query. We had an idea that ActiveRecord would do one or the other, and was presumably making a choice intelligently.

Let’s dig in to see when using includes performs a second query, when it performs a join, and why we were silly to think we were smarter than ActiveRecord.

Example Architecture

We’ll be using these two models, with a simple has_many and belongs_to association pair:

class User < ActiveRecord::Base
  has_many :cards
end

class Card < ActiveRecord::Base
  belongs_to :user

  def self.with_phrase(phrase)
    where(phrase: phrase)
  end
end

Exploration

If we simply includes(:cards) from a User relation, we get two queries. Here’s the base case:

User.where(id: 1).includes(:cards)
SELECT "users".*
FROM "users"
WHERE "users"."id" = 1

SELECT "cards".*
FROM "cards"
WHERE "cards"."user_id" IN (1)

If we joins the table, then it’s one query with an INNER JOIN:

User.where(id: 1).joins(:cards).includes(:cards)
SELECT
  "users"."id" AS t0_r0,
  "users"."name" AS t0_r1,
  "users"."created_at" AS t0_r2,
  "users"."updated_at" AS t0_r3,
  "cards"."id" AS t1_r0,
  "cards"."user_id" AS t1_r1,
  "cards"."phrase" AS t1_r2,
  "cards"."created_at" AS t1_r3,
  "cards"."updated_at" AS t1_r4
FROM "users"
  INNER JOIN "cards"
    ON "cards"."user_id" = "users"."id"
WHERE "users"."id" = 1

This makes sense - we’re no longer just eager loading the cards, but now we’re only loading users that have cards (and we’re eager loading the cards as well). ActiveRecord is giving us the scoping of users and the eager loading through one SQL query, and that’s pretty neat.

If we references the table, then we get one query - with a LEFT OUTER JOIN instead of an INNER JOIN. This tells ActiveRecord that we’ll be referencing that table in our query. If we don’t tell ActiveRecord and try to refer to that table later anyway, bad things happen:

:001 > User.includes(:cards).merge(Card.with_phrase("banana"))
PG::UndefinedTable: ERROR:  missing FROM-clause entry for table "cards"
LINE 1: SELECT "users".* FROM "users" WHERE "cards"."phrase" = $1
                                           ^

Here’s an example of using references correctly. Now the generated SQL can scope to data in the cards table properly.

User.includes(:cards).references(:cards).merge(Card.with_phrase("hammocks"))
SELECT
  "users"."id" AS t0_r0,
  "users"."name" AS t0_r1,
  "users"."created_at" AS t0_r2,
  "users"."updated_at" AS t0_r3,
  "cards"."id" AS t1_r0,
  "cards"."user_id" AS t1_r1,
  "cards"."phrase" AS t1_r2,
  "cards"."created_at" AS t1_r3,
  "cards"."updated_at" AS t1_r4
FROM "users"
  LEFT OUTER JOIN "cards"
    ON "cards"."user_id" = "users"."id"
WHERE "cards"."phrase" = "hammocks"

So what if I don’t want to do the join?

So back to the problem we were experiencing. We were joining from one table, through several, to a table that had a huge number of records. This query was unacceptably slow. We looked and failed to find database-optimization, so we went with just breaking the query up into pieces.

For example, let’s modify our example architecture and introduce a join table:

class User < ActiveRecord::Base
  has_many :hands
  has_many :cards, through: :hands
end

class Hand < ActiveRecord::Base
  belongs_to :user
  belongs_to :card

  def self.with_cards(cards)
    where(card_id: cards)
  end
end

class Card < ActiveRecord::Base
  def self.with_phrase(phrase)
    where(phrase: phrase)
  end
end

Now if we only have five users, but we have 10 billion cards, we may want to scope to the relevant cards first, and then get the users and only join with the Hand table - instead of all the way through to the Card table.

We can do this with two queries:

cards = Card.with_phrase('hammocks')
users = User.joins(:hands).merge(Hand.with_cards(cards))

This simplifies the SQL for each query and may be faster.

YMMV. Try the standard joins and includes method first, but if that doesn’t work, this might.

Further Reading

Documentation

The Rails docs on ActiveRecord::QueryMethods provide a great overview of includes, join, and so on.

Implementation

If we want to peak under the covers, as always, the source is available to us.

To answer the original question of when ActiveRecord perform one vs. two queries, we need to look at ActiveRecord::Relation as well.

The eager_loading? method is used by exec_queries to determine when to perform one query or when to perform multiple (which is done with the preloader). As we’ve seen, eager_loading? returns true - meaning we’ll perform only one query - when we’ve used includes and we’re either joining that table explicitly as well, or if we’ve indicated that the query references it.

Tweet at Jon

Share this post!