group_concat as an alternative for array_agg

For quite some time I have been hardcore PostgreSQL user. I like to leverage database capabilities when possible to get faster response times in my Ruby On Rails applications. Even though PostgreSQL is still by far my favourite open source RDBMS, I’m currently working with MySQL on daily basis and I often find myself searching for MySQL equivalent for some PostgreSQL features.

For the sake of this post let’s assume following table schema

  CREATE TABLE orders(
    id SERIAL,
    user_id integer,
    total_amount numeric,
    CONSTRAINT orders_pkey PRIMARY KEY (id)
  );

The requirement is to return all users with total amount of money they spent purchasing items. The implementation is pretty straightforward and query looks the same in both database engines.

SELECT user_id, sum(total_amount) FROM orders GROUP BY user_id

Having that we are presented with new requirement:

  • for users that made purchases for more than $1000 we want to get their order ids for further processing.

First part of the task is easily solvable in both databases using HAVING

SELECT user_id, sum(total_amount) FROM orders GROUP BY user_id HAVING sum(total_amount) > 1000

The next part we can satisfy from application code on couple different ways, for example triggering separate queries for each user

Order.group(:user_id).having('sum(total_amount) > 1000').pluck('user_id, sum(total_amount)').each do |user_id, _|
  # separate query to get order ids for each qualified user
  order_ids = Order.where(user_id: user_id).pluck(:id) 
end

This represents classical N+1 queries issue that will most likely have negative impact on performance. Lucky for us it can be solved using built-in features of both MySQL and PostgreSQL.

array_agg in PostgreSQL

In PostgreSQL we can get aggregated values for the group using array_agg function

-- PostgreSQL
SELECT user_id, sum(total_amount), array_agg(id) as order_ids FROM orders GROUP BY user_id HAVING sum(total_amount) > 1000

We got all the information we want with just 1 database round-trip, awesome!

group_concat in MySQL

When working with MySQL there is also a possibility to achieve similar results using group_concat. There are two main differences though:

  • in ruby code we don’t get array but string of ids separated by some character - by default it will be comma
  • there is a length limit applied, after reaching it the value will be quietly truncated (I admit that I did not read MySQL’s documentation carefully enough and was surprised by this)

First bullet is easily solvable but how to get around the second one? It turns out that we can increase max length that is allowed for group_concat results! Consider this code:

with_increased_group_concat_max_len do
  Order.group(:user_id).having('sum(total_amount) > 1000').pluck('user_id, sum(total_amount), group_concat(id)').each do |user_id, _, order_ids|
     # ...
  end
end
def with_increased_group_concat_max_len(value = 1045576)
  original = get_group_concat_max_len
  set_group_concat_max_len(value)
  yield
ensure
  set_group_concat_max_len(original)
end
def get_group_concat_max_len
  row = ActiveRecord::Base.connection.select_one <<-SQL
    SHOW VARIABLES like 'group_concat_max_len'
  SQL

  row['Value'].to_i
end
def set_group_concat_max_len(value)
  ActiveRecord::Base.connection.execute <<-SQL
    SET SESSION group_concat_max_len = #{value};
  SQL
end

We wrapped the query in a block that remembered original value of group_concat_max_len, increased it for the time of executing query and once the query got executed set it back to previous value.

Just as with PostgreSQL, we got all the information we want with just 1 database round-trip, awesome!

WARNING

The max value that can be set for group_concat_max_len depends on other server variable - max_allowed_packet. I strongly recommend to read documentation and verify what’s the maximum value for your installation.

Adrian Serafin

Adrian Serafin
I'm experienced Ruby developer. I love PostgreSQL, Redis and optimizing performance. I strongly believe in using right tool for the job and short feedback loops. How can I help you?

Really fast CSV imports with Rails and PostgreSQL

Let’s say that we have to import data from some social network. We will be provided with CSV filegenerated every 5 minutes containing lis...… Continue reading

React router on Amazon S3

Published on March 23, 2016