adventures in making stuff with Daniel Higginbotham

Chunky Iterator: So You Don't have to Load All your AR Objects at Once

12 May 2008

The following code lets you iterate over large collections of Active Record objects without having to load them all at once, thus reducing memory usage. It's allowed me to run cron jobs which iterate over thousands of records without getting the cron'd process killed for using too much of a system's resources.

class ChunkyIterator
  include Enumerable
  def initialize(model_class, chunk_size, options)
    @model_class = model_class
    @chunk_size = chunk_size
    @options = options
  end

  def each
    rows = @model_class.find(:all, merged_options(0))

    until model_objects.empty?
      rows.each{|record| yield record}
      model_objects = @model_class.find(:all, merged_options(rows.last.id))
    end
  end

  def merged_options(id)
    @options.merge(
      :conditions => merge_conditions("#{@model_class.table_name}.id > #{id}"),
      :limit => @chunk_size
    )
  end

  def merge_conditions(added_condition)
    existing_condition = @options[:conditions]
    new_condition = case existing_condition
    when nil: added_condition
    when String: "(#{existing_condition}) AND (#{added_condition})"
    when Array
      ["(#{existing_condition[0]})" +
       " AND (#{added_condition})"] +
       existing_condition[1..-1]
    end
  end
end

# Example
Bacon.find_all_in_chunks(500, :conditions => "fresh = TRUE").each do |bacon|
  bacon.feed_to_cat
end
  • Update: altered code to use ID rather than offset, like Jamis Buck does.
  • Update 2: Fixed merge_conditions per Frank's observation

Comments