Skip to content

Instantly share code, notes, and snippets.

@coralieco
Last active November 21, 2018 14:22
Show Gist options
  • Save coralieco/39582b0c76b0a87d3fbc669a78d0563f to your computer and use it in GitHub Desktop.
Save coralieco/39582b0c76b0a87d3fbc669a78d0563f to your computer and use it in GitHub Desktop.
ActiveRecord advanced query: What happens in DB with has_many & distinct queries

ActiveRecord advanced query

☀️ postgresql: has_many & distinct.

What happens in DB ?

Rails app

I created a rails app with models like so:

class User < ApplicationRecord
  has_many :comments
end

class Article < ApplicationRecord
  has_many :comments
end

class Comment < ApplicationRecord
  belongs_to :article
  belongs_to :user
end

Seed

I then seeded with some records:

art1 = Article.create(title: 'Ruby')
art2 = Article.create(title: 'Rails')

user1 = User.create(name: 'Ola')
user2 = User.create(name: 'Hello')
user3 = User.create(name: 'Wow')

comment1 = Comment.create(article: art1, user: user1, rating: 1)
comment2 = Comment.create(article: art1, user: user2, rating: 2)
comment3 = Comment.create(article: art2, user: user2, rating: 3)
comment3 = Comment.create(article: art2, user: user3, rating: 5)
comment4 = Comment.create(article: art2, user: user1, rating: 2)

SQL views

SELECT * FROM users

 id | name  |
----+-------+--
 1  | Ola   | 
 2  | Hello |
 3  | Wow   |      

SELECT * FROM articles

  id | title  |
-----+--------+
 1   | Ruby   |
 2   | Rails  |

Implentation of queries in User model

What I wanted to do is:

  1. create a method to rank the user by comment rating order_comment_rating
  2. create a method to select users with comments with_comments
  3. chaine the two methods above with_comments_order_comment_rating

The probleme being manly to write the third method because of the distinct that may appear in the second method.

Let's deep into it a little later.


order_comment_rating

Ordering records by comment rating

This is the method to order the users by comment rating

  def self.order_comment_rating
    joins(:comments).merge(Comment.order(:rating))
  end

Let's split the method to understand what it does.

Join comments

First we need to connect the users and comments tables

>> User.joins(:comments)
>> SELECT "users".* FROM "users" INNER JOIN "comments" ON "comments"."user_id" = "users"."id" LIMIT

This is an interesting :joins as it returns 5 records, but we only have 3 users. What happens here ?

In SQL, this query does this:

 id | name  |  comment_id |
----+-------+-------------+
 1  | Ola   |       1     | 
 2  | Hello |       2     |
 2  | Hello |       2     |
 3  | Wow   |       3     |  
 1  | Ola   |       1     |

It returns all the users but some of them appears twice. We have to imagine that there is a virtual table on the right, where each one of the comments is connected.

Users Hello and Ola appear twice as they both wrote two comments.

Order comments by rating

>> Comment.order(:rating)
>> SELECT "comments".* FROM "comments" ORDER BY "comments"."rating" ASC

With this query we get all the comments from the database, ordered by rating, so we have 5 records in output.

Chain both

Then we have no trouble chaining the two methods together as both returns 5 records, to have our order_comment_rating method to list users.

def self.order_comment_rating
  joins(:comments).merge(Comment.order(:rating))
end

with_comments

List users with comments

I implemented this method to return the users who wrote comments.

def self.with_comments
	joins(:comments).distinct
end

As we saw earlier joins(:comments) return all users (even several times) for all the comments they wrote.

I applied a distinct there so I don't have duplicatas of users. Because I just want to know which users have comments.

>> joins(:comments).distinct
>> SELECT DISTINCT "users".* FROM "users" INNER JOIN "comments" ON "comments"."user_id" = "users"."id"

So here the SQL would be:

 id | name  | 
----+-------+---
 1  | Ola   | 
 2  | Hello |
 3  | Wow   |

Same as the table above, we have to imagine a virtual table on the right, the comments table.

with_comments_order_comment_rating

List users with_comments and order_comment_rating

The first reflex would be to chain the two methods above, like so:

>> User.with_comments.order_comment_rating

Let's split to explain what happens. Reading the definition of the methods in the User model, the query above is the same as

--  User.with_comments.order_comment_rating
>> User.joins(:comments).distinct.merge(Comment.order(:rating))

This fails as the User.joins(:comments) returns five records:

  • 2 records for the User id = 1 as he wrote two articles
  • 2 records for the User id = 2 as he wrote two articles
  • 1 record for the User id = 3 as he wrote one article

When we apply the .distinct, the query returns then, three records:

  • 1 record for the User id = 1
  • 1 record for the User id = 2
  • 1 record for the User id = 3

then we try to merge Comment.order(:rating). But this query returns 5 records (one for each comment).

So we try to connect together one table with three records from one side and a table with five records in the other side.

Traceback (most recent call last):
ActiveRecord::StatementInvalid (PG::InvalidColumnReference: ERROR:  for SELECT DISTINCT, ORDER BY expressions must appear in select list)
LINE 1: ..." ON "comments"."user_id" = "users"."id" ORDER BY "comments"...
                                                             ^
: SELECT DISTINCT "users".* FROM "users" INNER JOIN "comments" ON "comments"."user_id" = "users"."id" ORDER BY "comments"."rating" ASC LIMIT $1

ActiveRecord doesn't know which line to connect with one line.

There is a way to handle this !

Subquery

We can handle this by doing a subquery.

# In Active Record
>> User.from(User.with_comments, :users)
>> SELECT "users".* FROM (SELECT DISTINCT "users".* FROM "users" INNER JOIN "comments" ON "comments"."user_id" = "users"."id") users

So basically, this query uses the with_comments method and wrap it into a subquery. Then says to select all users from this subquery.

It returns exactly 3 users as the User.with_comments query does.

But the differences is within the SQL:

>> SELECT DISTINCT "users".* FROM "users" INNER JOIN "comments" ON "comments"."user_id" = "users"."id"

The SELECT DISTINCT is not called at the same moment.

Now we can apply the order_comment_rating on the subquery, like so:

>> User.from(User.with_comments, :users).order_comment_rating
>> SELECT "users".* FROM (SELECT DISTINCT "users".* FROM "users" INNER JOIN "comments" ON "comments"."user_id" = "users"."id") users INNER JOIN "comments" ON "comments"."user_id" = "users"."id" ORDER BY "comments"."rating" ASC

It returns 5 records listing the users ordered by comment rating.

These queries were done using a posgresql database configuration.


☀️ sqlite3: has_many & distinct.

I changed the database configuration to use the adapter sqlite3.

The app remained the same, I still used the users and comments relationships and did not change the ActiveRecord queries in User model.

I opened a rails console and wrote the exact same query as before.

>> User.with_comments.order_comment_rating

Just before I had a ActiveRecord::StatementInvalid error and had to use a subquery to perform this query.

What was my surprise when, expecting this error to come up, I had 3 records returned and no error.

I looked at the SQL generated:

SELECT DISTINCT "users".* FROM "users" INNER JOIN "comments" ON "comments"."user_id" = "users"."id" ORDER BY "comments"."rating" ASC

sqlite does not have any troubles handling the DISTINCT in queries.

Conclusion and further researches to do

  • Be carefully with database configuration: it has an impact on SQL queries.
  • And if changing from one database to the other, queries can start to fail.
  • What is the difference between sqlite3 and posgresql that explains this behavious ?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment