If you've ever implemented searching in your rails app, you probably noticed that it's a major hassle (unless, of course, you already know about Ultrasphinx). Ferret is probably the most commonly used indexing solution, but, it isn't anywhere near production ready. acts_as_solr, another option, is so full of show stopper bugs that it really isn't even worth bothering with. The worst part about both of those solutions is that they usually work great in your development environment. Try to put them in to production, though, and you're in for big problems.
Enter Ultrasphinx, by Evan Weaver. Sphinx is a super high performance search daemon, designed to suck information out of a mysql or pgsql database (though, it isn't limited to that). Normally, one would configure Sphinx with SQL queries that it uses to fetch the data. Apparently, the configuration file can be a major hassle to work with — I wouldn't know. Ultrasphinx provides you with a declarative API that it ultimately uses to generate the Sphinx configuration file for you, making the process quick and painless. A few lines of code, a couple of rake commands, and you're searching.
The Sphinx approach has several advantages over ferret, and acts_as_solr. First of all, Sphinx is extremely stable. They're nearing a 1.0 release, and its stability certainly merits that. You don't even really need to worry about monitoring the search daemon, because it just isn't going to crash. Also, if you're running any rails apps in production, you know that long running ActiveRecord callbacks can lead to your app performing very poorly. The ferret and solr solutions both rely on active record callbacks to inform the indexer of new data. Moreover, if the search daemon goes down for any period of time, or, say, the index becomes corrupted (ferret, I'm looking at you), your entire app is going to be down for the count. You set the Sphinx indexer to run every half hour on a cron job, and since it works with the db directly, its performance and stability characteristics have absolutely zero impact on your app.
Trying it Out
So, let's implement a search for our blog. First, we'll want to edit the paths to the index and logs in the default config file. Note that Ultrasphinx uses two types of config files: one that the programmer edits, and one that it generates; both are necessary on all machines that are accessing the index (seriously, not having the generated config on a slave machine caused me some trouble). Then, we'll want to declare our post model as indexed (note that you must declare the fields as strings, not symbols):
class Post < ActiveRecord::Base
is_indexed :fields => ['title', 'body']
end
Then, we'll need to ask Ultrasphinx to generate a configuration file. You've got to re-run this rake task any time you make changes to the definition of your models' is_indexed declaration. Make sure to .gitignore (svn:ignore for people still stuck in svn land) the generated file in development. Evan recommends checking the production file in to version control, but I have it set to generate automatically on deploy with a cap recipe. That way, if I make changes to the indexing, I won't forget to re-generate the config file. All you have to do is run:
$ rake ultrasphinx:configure
Then, since it's our first time, we've got to run the indexer:
$ rake ultrasphinx:index
Finally, we'll need to start the search daemon:
$ rake ultrasphinx:daemon:start
Now, we can start searching.
@search = Ultrasphinx::Search.new(:query => params[:query])
@search.run
@search.results
So, it's really easy to build a basic search engine using Ultrasphinx. There are some gotchas, though.
Gotchas & Notes
- More complex indexing with Ultrasphinx can be slightly more verbose, and SQL-focused than it would be with a solution that relies on AR callbacks to do its indexing.
- Transforming data with Ruby is impossible; the data you index must be in your database (unless you use a stored procedure, which can actually be written in ruby if you're using pgsql, but I digress).
- Ultrasphinx preloads your indexed models when it is initialized. So, if your models depend on any monkey patches that live in your app's lib directory, they must be loaded before the Ultrasphinx (in my experience, this has meant pluginizing my monkey patches). Because of the way that exceptions are caught in the preloading routine, you won't see the actual error that is stopping your model from loading. Instead, you'll just get a constant loading error, or a name error, or something. If you see something like that, and your models load fine without Ultrasphinx installed, look for dependency issues.
- Ultrasphinx attempts to preload your indexed models using a regex that doesn't respect commenting (at the time of writing). If you're struggling with issues mentioned in the last gotcha, you'll probably try commenting out the is_indexed call to see whether that's what's causing the problem. That won't work. You can either delete the is_indexed call entirely when debugging, or pull from my git repo, where I've modified the regex to respect #-style commenting (but not the =begin/=end style).
- If you see DivideByZeroErrors in production, it's probably because you're missing the generated configuration file on one or more of your app server machines.
Check it Out
You'll need to grab Sphinx. Then, the plugin...
Get it from svn:
$ svn export svn://rubyforge.org/var/svn/fauna/ultrasphinx/trunk vendor/plugins/ultrasphinx
Or pull from my git repo (for the change described in the gotchas section):
git clone git://github.com/giraffesoft/ultrasphinx.git
To Sum Up
Ultrasphinx is by far the most effective rails searching solution I've come across. Unlike most of the other options, the search daemon is incredibly stable, and the index never seems to become corrupted (I'm running it in a relatively high load production environment with absolutely zero trouble so far). Also, since Ultrasphinx doesn't rely on AR callbacks for indexing, your application isn't quite as coupled to your search daemon; if it dies, search functionality will break, but the rest of your app will still function. It's not without problems, and complex indexing can be trickier, but Ultrasphinx's stability and performance make the choice a no-brainer.


Another gotcha: Foxy Fixtures (sparse ids in general) can make for very slow indexing. See this thread: http://rubyforge.org/forum/forum.php?threadid=21928&forumid=14244
Gonna try that link again on the assumption it was parsed as Textile: "link":http://rubyforge.org/forum/forum.php?threadid=21928&forumid=14244
The reading order is bottom-up, by the way.
I just wanted to reinforce one point you made:
"Transforming data with Ruby is impossible; the data you index must be in your database"
This is really, really important to note. I love Ultrasphinx, but I've had several applications where I couldn't use it because we needed to search on the result of a method that wasn't appropriate for database storage.
simple transformations can be done with the :function_sql setting such as text replacement, otherwise you can use an SP but that is a pain
You can of course also put results of Ruby code in the database using e.g. a before_save filter, but I suppose that wouldn't help in Ben's case.
filter -> callback
Hey james - thanks for this post, hits on something I've been needing to solve soon. A question though, that might be obvious to everybody but me ;-). If I want to search across all kinds of models, how is this handled (eg. Site Search)? Are the results that are returned instantiated ActiveRecord objects of whatever model class each result is?
Thanks!
@Cameron - The US index isn't model-specific; it runs the query on all of the models that you have indexed. I'm not sure whether it's possible to search through only one model at a time, since I haven't needed that functionality.
The results are returned as AR objects.
James, thanks for the post and illustration of the issues with implementing search functionality with Rails. I've been looking for a stable implementation of search and this is the second major recommedation of Sphinx so I will have to give it a try!
Cameron: For site search, you could also look at using Google to site search. May not be as tailored but it could work for you.
Michael @ SEOG.net
It may be worth looking at Postgresql's built in text searching in 8.3, as it's come a long way from it's 8.2 tsearch module. It's pretty customizable as well, with two different index types, and the ability to add in your own parsers, templates and rankers.
Your article is timely, as I'm about to run some figures on Ultrasphinx as well... so I'll keep you posted. Thanks James !
Hi James,
Just wondering if you might able to help with this issue while trying install sphinx-0.9.8-rc2, make fails on Leopard 10.5.2. I am also running MySQL 5.1.23-rc-osx10.5-x86_64
ld: warning in indexer.o, file is not of required architecture ld: warning in libsphinx.a, file is not of required architecture ld: warning in /usr/local/mysql/lib/libmysqlclient.dylib, file is not of required architecture Undefined symbols: "_main", referenced from: start in crt1.10.5.o ld: symbol(s) not found collect2: ld returned 1 exit status make[2]: * [indexer] Error 1 make[1]: * [all] Error 2 make: * [all-recursive] Error 1
Thanks for the info! I've been having all kinds of issues with AAF and nice to find a viable alternative :)
anyone know how to use excerpts?
In case anybody runs into the same problem I ran into while installing Sphinx on Leopard on the 64 bit architecture you need to run "LDFLAGS="-arch x86_64" ./configure" that is at least for sphinx-0.9.8-rc2 release.
Hi, I'm new to Rails and have Ultrasphinx running on windows, but I'm not sure how to play with the @search AR object. For example, I can't display <%= @search.results.username %> in my view for a search that returns one 'user', even though running 'raise @search.results.inspect' in the controller shows that @search.results is just a hash of the user info.
Anyone trying to get Ultrasphinx running on windows might need to look at these fixes (I needed them all):
http://hillemania.wordpress.com/2006/09/21/rails-mysql-50-on-windows-setup-problemfix-libmysqldll/ https://rubyforge.org/forum/message.php?msg_id=49341 http://www.cordinc.com/blog/2008/03/installing-sphinx-ultrasphinx.html
I've been using Sphincter to work with Sphinx. It's similar to Ultrasphinx, but (intentionally) has fewer features. Support from the author seems non-existent, so I've had to dig through it a couple of times to fix bugs. But at least the simple, clear code means you can do this.
I was wondering whether you have tried Sphincter and, if so, what you made of it?
I went through all this hassle, followed every link and finally managed to install Ultrasphinx on my Windows developer machine. Config and indexing appear to go fine, but now when I try to start the daemon it says "Failed to start" with no further explanations. Any ideas? I'm using Rails 2.0 and Postgresql.
@James - you can use the :class_names parameter to specify the models that you want to search.
I had the same error as Mohammad Abed:
ld: warning in indexer.o, file is not of required architecture ld: warning in libsphinx.a, file is not of required architecture ld: warning in /opt/local/lib/mysql5/mysql/libmysqlclient.dylib, file is not of required architecture Undefined symbols: "_main", referenced from: start in crt1.10.5.o ld: symbol(s) not found collect2: ld returned 1 exit status make[2]: * [indexer] Error 1 make[1]: * [all] Error 2 make: * [all-recursive] Error 1
His solution didn't work quite right for me, but got me going on the right track. I installed MySQL via MacPorts, and when running
mysql --version, got this:Ver 14.12 Distrib 5.0.51a, for apple-darwin9.2.0 (i686) using EditLine wrapper
For me, this was the solution instead:
$ LDFLAGS="-arch i386" ./configure
Seems apparent in retrospect, but I hope this helps someone.
anyone know how to use excerpts?
I have a problem in US. I have a Product model and Tag model. Now i want to search product by tag. Does anyone know how can i index and search it.
@Mohammad Abed – I ran into the same problem installing sphinx and a brand-new imac. Thanks for the help!!