Rails acts_as_ferret without DRb

I wanted to add a search feature to my Ruby on Rails application BirdSite. There is a plugin called “actsasferret” which allows Rails applications to use the Ferret search engine (based on Lucene).

Using this tutorial, I was able to get search up an running. It worked great on my development system. But there was a warning that you needed to run the indexer using a DRb server instead of directly from the Rails app. This is because the index cannot handle multiple processes writing to it simultaneously. (I also ran into this problem using PHP with another engine called Xapian).

I was hoping my new site was low traffic enough to avoid problems. Since I’m hosting it on a shared server, I can’t run a DRb server there anyway. But I got my first indexing exceptions less than 24 hours after I loaded my stuff onto the live server.

So I decided to try periodic indexing by a cron job. This would allow me to update the index once an hour, from a single process. The downside is that the search index is only updated hourly, but I decided I could live with that.

The first step is telling your Rails app to not index the content when updating. My model already had a before_save method, so I added this code:

in the model

def beforesave
  # other stuff goes here

  # disable automatic ferret indexing…move it to a cron job
  self.disable
ferret(:always)
end


Then I had to create a rake task which would build the index:

ferret_index.rake

desc “Updates the ferret index for the application.”
task :ferretindex => [ :environment ] do | t |
  MyModel.rebuild
index
  # here I could add other model index rebuilds
  puts “Completed Ferret Index Rebuild”
end


This task is simplified: I’m telling it to rebuild the entire index each hour. I’m guessing when my dataset gets big enough, this will be really slow. In that case I’ll need to track all the model instances that got updated in the past hour and just index those.

Finally, I needed a cron job to run the rake task, making sure to set the environment to “production”:

cd /railsapp && rake ferretindex RAILS_ENV=production

So far this is working well, and I haven’t received any indexing exceptions since.