Indexing ActiveRecord Objects in an Ordered Collection
Ever wanted to find the index of an ActiveRecord object in a collection? By collection, I’m referring to both associations and named scopes, the two types of lazy-loaded object collections available in ActiveRecord.
Suprisingly, Rails offers no easy way of doing this without loading the entire collection. This is fine for small collections, but once your data grows the operation will become more expensive.
The Problem
Let’s take an example. Say we have an Author model which has_many :articles. We have a given @author, and our collection is @author.articles.
Author.has_many :articles, :order => "title ASC" @author = Author.first # => #<Author id: 1, last_name: "Hollingworth", first_name: "Matthew"> @articles = @author.articles # => [#<Article id: 2 ...>, #<Article id: 3 ...>, #<Article id: 6 ...>, #<Article id: 7 ...>, #<Article id: 10 ...>] @article = @articles.find(7) # => #<Article id: 7, title: "Secret URLs in Rails", author_id: 1, created_at: "2009-06-15">
To find the index of @article in this collection, we could just try:
@articles.index(@article) # => 3
Which correctly tells us that @article is the fourth article in the @articles collection. But let’s look at the log for that query:
Article Load (0.5ms) SELECT * FROM articles WHERE (articles.author_id = 1)
Not so good. The index method is delegated to the collection array and the entire collection is loaded (as indicated by SELECT *). The same thing will happen with a named scope.
A Solution
How can we do better? Rather than load the entire collection and use array methods to find the article, we’d like instead to query the database to make this calculation. Conceptually this is not too hard to do. We need to determine how the collection is ordered, then count the number of records which come before the article under that ordering criterion.
How would such a query look? For the simple case above, a query like this would do:
SELECT count(articles.id) FROM articles WHERE (articles.title < 'Secret URLs in Rails') AND (articles.author_id = 1)
That makes sense – we’re adding a articles.title < 'Secret URLs in Rails' condition to the existing articles.author_id = 1 of the collection, restricting the count to those articles with a title before that of the article we want. (Remember, our association is ordered by title.)
Most collections won’t be that simple. What if we order our collection based first on the author name and then on the publication date?
@articles = Article.scoped(:joins => :author, :order => "authors.last_name ASC, articles.created_at DESC")
Our count query will be a bit more more complicated. This will do the trick:
SELECT count(articles.id) FROM articles INNER JOIN authors ON authors.id = articles.author_id WHERE (authors.last_name < 'Hollingworth' OR (authors.last_name = 'Hollingworth' AND (articles.created_at > '2009-06-15'))
You can see how this works – we’re checking each ordering attribute in turn, according to its priority, falling through to the next attribute only on equality. But what if we add another ordering column? It’s going to get hard to construct by hand. But this is ripe for programmatic construction! Let’s add a class method, index_of, to the ActiveRecord::Base class to do just that.
def index_of(object) # return the count of records before the object in the current scope end
Implementation
The first thing we need to do is find the ordering columns and directions (ASC or DESC) for our collection. We can do this using scope(method, key), a class method on ActiveRecord::Base. Calling scope(:find, :order) will return the :order option for the collection, which will either be a SQL ORDER BY snippet or a symbol representing a column name (or nil, if there is no ordering). The following will do:
columns = scope(:find, :order).to_s.split(',').map(&:strip) operators = columns.map do |column| column.slice!(/\s+(asc|ASC)$/) column.slice!(/\s+(desc|DESC)$/) ? ">" : "<" end
For our example collection, the result will be:
# scope(:find, :order) "authors.last_name ASC, articles.created_at DESC" # columns: ["authors.last_name", "articles.created_at"] # operators: ["<", ">"]
We need to get the values of those columns for the object in question. To do that we’ll reload the object with some additional attributes that we’ll name as follows:
attributes = (1..columns.size).map { |n| "a#{n}" }
Using these attribute names, we construct a SQL SELECT snippet to pull the values from the database:
selects = [ columns, attributes ].transpose.map do |column, attribute| "#{column} AS #{attribute}" end.unshift("#{table_name}.*").join(', ') object_with_attributes = find(object.id, :select => selects) values = attributes.map { |attribute| object_with_attributes.send(attribute) }
In our example, the :select option will be:
articles.*, authors.last_name AS a1, articles.created_at AS a2
Now we’re ready to construct the :conditions option for the count. These conditions will select all records which appear before our object in the ordered collection. We’ll build up the conditions using the columns, operators, attribute names and values that we’ve already collected.
string, hash = "", {} [ columns, operators, attributes, values ].transpose.reverse.each do |column, operator, attribute, value| string = string.blank? ? "#{column} #{operator} :#{attribute}" : "#{column} #{operator} :#{attribute} OR (#{column} = :#{attribute} AND (#{string}))" hash.merge!(attribute.to_sym => value) end
Notice how the lowest-priority ordering attributes are processed first. It’s helpful to see how these conditions develop for our example:
# (first iteration) # string: "articles.id < :a3" # hash: {:attribute_3=>"7"} # (second iteration) # string: "articles.created_at > :a2 OR (articles.created_at = :a2 AND (articles.id < :a3))" # hash: {:a3=>"7", :a2=>"2009-06-15"}
Finally, we can use these conditions to perform the actual count:
count("#{table_name}.#{primary_key}", :conditions => [ string, hash ], :distinct => true)
Here we’re just counting distinct primary keys. The count returns the number of records before the object in the ordered collection, which corresponds to the object’s index. Just return the value and we’re done!
Code
Here is our final index_of method in its entirity:
def index_of(object) columns = scope(:find, :order).to_s.split(',').map(&:strip) operators = columns.map do |column| column.slice!(/\s+(asc|ASC)$/) column.slice!(/\s+(desc|DESC)$/) ? ">" : "<" end attributes = (1..columns.size).map { |n| "a#{n}" } selects = [ columns, attributes ].transpose.map do |column, attribute| "#{column} AS #{attribute}" end.unshift("#{table_name}.*").join(', ') object_with_attributes = find(object.id, :select => selects) values = attributes.map { |attribute| object_with_attributes.send(attribute) } string, hash = "", {} [ columns, operators, attributes, values ].transpose.reverse.each do |column, operator, attribute, value| string = string.blank? ? "#{column} #{operator} :#{attribute}" : "#{column} #{operator} :#{attribute} OR (#{column} = :#{attribute} AND (#{string}))" hash.merge!(attribute.to_sym => value) end count("#{table_name}.#{primary_key}", :conditions => [ string, hash ], :distinct => true) end
Put it inside a module, extend ActiveRecord::Base with that module and you’re ready to go.
Limitations
This method will fall down in a few situations.
Degenerate Ordering
What happens if your collection contains more than one object with the same values for the ordering attributes? Any of these objects will have the same index_of result, despite the fact that (obviously) they will appear at consecutive locations in the collection array.
This may seem to be a shortcoming of our code, but in fact it’s more a result of the underlying database. In SQL, for an unordered query, there’s no guarantee that the rows returned by a query will be returned in any given order. Likewise, with an ordered query containing degenerately ordered records, those records are not guaranteed to be returned in any given order.
Nonetheless, in MySQL and SQLite (at least), we’re used to seeing unordered collections returned in primary key order. This seems to be the usual behaviour of these DBMSs. Hence a pragmatic approach to the problem is to append the primary key as the lowest-priority, default ordering column in the above code:
columns << "#{table_name}.#{primary_key}"Since the primary key is unique, we’re removing any chance of a degenerate ordering.
Collections Using Limits and Offsets
Scopes and associations including :limit and :offset options behave a little confusingly in Rails. Ask for the entire collection using all, and the results will be limited and offset correctly. Ask for the size of the collection using count and you’ll get the count of the entire collection, without the limit or offset applied.
You are best to leave :limit and :offset options to pagination, where they are really intended to be used. Nonetheless, the above method can be made to work with these options using a little bit of tweaking, left as an exercise for the reader. :)
Applications
I was motivated to develop this code for a pagination library that I’ve created. In a Rails app, have you ever wanted to link to the page in your paginated index view containing a specific object? This is the way to do it. (Expect an article on my pagination gem at a future date.)
The other obvious use is to find the next or previous item for an object in an ordered collection. (Think “next post” and “previous post” links in a blog.) This is easily achieved using the index_of method.