Home > column_queries

column_queries

Column_queries is a project mainly written in Ruby, based on the MIT license.

faster select_values implementation for ActiveRecord with PostgreSQL pg gem

Description

PostgreSQL pg gem has one particularly cool method which is PGresult#column_values. It's fast function implemented in C on top of native libpq facilities. This gem is an attempt to put some thin convenience layer around it, to be used with ActiveRecord in a way that doesn't break ORM but rather augments it.

When working with Rails back-end applications driven by rich business domains, and trying to squeeze out every bit of performance, there's often a very simple need to pull nothing more than ids for objects that match certain criteria. When certain dataset can be reduced only on later stages of processing, having all rows of it as instantiated ActiveRecord objects comes at big cost. Sometimes arrays, hashes and "grouped_by" hashes is all that has to be fetched form DB in order to end up with meaningful response for some domain specific search API call. See the full story by this link.

ActiveRecord connection adapter methods select_values, select_all, etc are the usual answer to such need. However, if you happen to find yourself in an environment with PostgreSQL, Rails 3, and pg gem, there's better alternative. And you don't have to rewrite your scopes into raw SQL in order to use it.

Here are some benchmarks obtained by spec/support/benchmarks.rb:

Test sample contains 20000 records
Benchmarking find
Time: 34.165s 0.6833s per run
Benchmarking select_values
Time: 3.686s 0.0737s per run
Benchmarking to_int_array
Time: 2.426s 0.0485s per run

Usage

The gem adds several methods to ActiveRecord::Relation, which can be easily used with scopes:

book_ids = Book.available.scoped_by_author_id(author.id).to_int_array(:id)

Just like to_a method on relations, all column_queries methods should be called last in scope chains. So putting, say scoped_by_author_id, after to_int_array won't work. The same is true when defining scopes:

class Book < ActiveRecord::Base
  # this won't work:
  scope :availabile_ids, lambda { where(:available => true).to_int_array(:id) }

  # use this instead:
  scope :available, where(:available => true)

  def self.available_ids
    available.to_int_array(:id)
  end
end

Currently, provided methods are: to_int_array, to_columns_as_int_arrays, to_int_hash and to_int_groups. They all accept column names for arguments.

to_int_array(column) returns result of query SELECT column FROM ... as an array of integers. column argument can be omitted, the first column of query result would be taken then.

to_columns_as_int_arrays(*columns) is similar to to_int_array but takes multiple column names. It returns int arrays for all requested columns:

comment_ids, book_ids = Comment.scoped.to_int_arrays(:id, book_id)

to_int_hash(keys_column, values_column) can be used to quickly pull hash mapping between two integer columns:

books_by_comments = Comment.scoped.to_int_hash(:id, :book_id)

will return a hash which maps comment ids to book ids: books_by_comments[comment_id] => book_id.

And to_int_groups(keys_column, values_column) returns a hash which is more like group_by result. It combines values from the second column that correspond to the same key into an array:

books_comments = Comment.scoped.to_int_groups(:book_id, :id)

books_comments[book_id] in this case will contain an array of all ids of comments for given book_id.

NULLs treatment may be surprising: all db NULLs are converted into 0s. The reason behind that is performance. It was very tempting to skip nil? check inside of intense loops given the fact that database sequences usually start with 1 and PKs never contain NULL values. Though I'll try to provide better support (and some flexibility) for casting in next versions of the gem. Some methods like to_int_hash would benefit a lot from more flexible type casting code. Being able to do something like:

titles = Book.pricy.to_hash(:id, :title)

would be a nice option to have in certain optimization scenarios.

Requirements

This gem requires ActiveRecord 3 and pg 0.11.0. Backport to latest versions of Rails 2 is coming soon.

Install

gem install column_queries

or with bundler:

gem "column_queries", "~> <version>"

License

Copyright Serge Balyuk for Avenue100 Media Solutions Inc.

column_queries gem is released under the MIT license. Please refer to MIT-LICENSE for details.

Previous:app1