Ruby on Rails
HowToIntegrateFerretWithRails

howto initially provided by: JanPrill .at. blauton .dot. de

Important Notes:

INTRODUCTION

I’m new to ruby as well as to rails. As always I’m learning things alongside a little project that I’m trying to realise with RoR. Java Developers are spoiled with the great open source search engine Lucene when it comes to search capabilities in web apps. Search is on websites a feature that can’t be overrated imho. A content-rich dynamic website without fulltext-search capabilities is worth nothing seen from a usability standpoint. Until now you could achieve fulltext search capabilities in your RoR-webapp in different ways. The ones I’ve thought about for my little project are:

Regarding different approaches to port lucene to ruby the new and upcoming project “Ferret” (http://ferret.davebalmain.com) from David Balmain seems to have come a pretty long way on providing a native ruby port with optional C-modules to increase performance. After all I’ve decided to give ‘Ferret’ a try. Since I’m a ruby newbie, I’m not the right one to poke on performance matters, so I will leave them aside for this howto.

SOURCES

I’ve made up a little sideproject just for the purpose of testing ‘Ferret’ since I’m not bold enough to publish my whole project right now. Naturally I keep things simple since I haven’t got the skills to get complicated to date. You could download the zipped rails_root from http://www.blauton.de/ferret/ferret_test.zip . After deploying the little project (

The ddl comes with some sample data, so why don’t you try a search for ‘ruby’?

INTEGRATING FERRET

I assume that you already downloaded rails and its dependencies. Following I will describe how I’ve integrated ‘Ferret’ to be able to have an ‘in-near-future-high-performance’ fulltext search facility. Following these steps should lead to a project similar to the one at http://www.blauton.de/ferret/ferret_test.zip

1. Create a new rails project:

> rails ferret_test

2. >

cd ferret_test

3.

ruby script/server
and have a look at http://localhost:3000 checking if you’ve put ruby on rails

4. Terminate WEBrick for now

5.

gem install ferret
– with ferret-0.1.3 I’ve had no problems to install this on linux as well as on win32. You’ll find detailed information on
http://ferret.davebalmain.com

6. Create a database:

I’m using mysql and write my definitions to root/db:


create_ferret_test_development:

DROP DATABASE IF EXISTS `ferret_test_development`;
CREATE DATABASE `ferret_test_development`;
USE `ferret_test_development`;

DROP TABLE IF EXISTS `favourites`;
CREATE TABLE `favourites` (
  `id` bigint(20) unsigned NOT NULL auto_increment,
  `title` varchar(200) NOT NULL default '',
  `teaser` text NOT NULL,
  `link` varchar(200) NOT NULL,
  `created_at` datetime NOT NULL default '0000-00-00 00:00:00',
  `updated_at` datetime NOT NULL default '0000-00-00 00:00:00',
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

now its just something like

rails/db > mysql -uroot -p < create_ferret_test_development.ddl
to create the database… The predefined script in the ‘db’ directory of http://www.blauton.de/ferret/ferret_test.zip has some example records already dumped.

7. Since we are only testing right now we’ll be good with the configuration of the testing section of

config/database.yml
for our database connection:


development:
  adapter: mysql
  database: ferret_test_development
  host: localhost
  username: [your_username]
  password: [your_password]

8. Let’s do some scaffolding magic to have a little interface for our newly create database:

> ruby script/generate scaffold Favourite favourites

9. Fire up ruby script/server again and check if you get something on http://localhost:3000/favourites/list. You should get eight records that were already in the ddl.

10. So now that we have a few records in our favourites table lets put Ferret in the game… You have the library already installed but now we have to make it possible to use its functionality in rails.

Ferret::Index::Index
gives us a high level api for working with a lucene index. For detailed information on the api of ferret just check out http://ferret.davebalmain.com/api/ .
Ferret::Index::Index
seems to give us the possibility to ask the Object for a
.reader() or a .writer()
so that we have all at hand from this one Object in a singleton fashion. We could retrieve a writer(), write a index with it and then get ourselves a reader() to retrieve search results from this index. So let’s look for a place for this helpful
INDEX
:

a. Create a subfolder ‘index’ in your project

b. Create a file: config/environments/ferret_environment.rb with the following content:


require 'ferret'
module FerretConfig
  include Ferret

  INDEX = Index::Index.new(:path => 'f:/home/jan/workspace/ferret_test/index')

end 

c. Finally we need to make our new environment known to rails. Put the following in your

environment.rb:

  1. Ferret-Configuration
    require ‘environments/ferret_environment.rb’
    require ‘ferret’

Don’t forget to restart WEBrick after configuration changes.

11. Right now we’ve got some records in the database but no lucene- (aka ferret-) index in our ‘index’ directory (delete files in your index directory if you got some there after your tests before you start up WEBrick again – later you would use the Index.delete method to keep index and database in sync), let’s change that and build a function for an initial index creation:

a.

ruby script/generate controller Search

b. define a function create in your search_controller.rb. It should look like:


  # initial creation of a lucene index
  def create_index
    # our central INDEX
    index = FerretConfig::INDEX
    
    # get all Favorites, iterate over and index them
    favourites = Favourite.find(:all)
    for fav in favourites
      index << {:key => fav.id, :title => fav.title, :teaser => fav.teaser, :url => fav.link}
    end
    # don't close the index, but be sure to write it to the fs
    index.optimize()
    redirect_to(:action => 'search_form')
  end 

c. The corresponding view:


search/new_index.rhtml:

<h2>Build up a new index</h2>
<small>Please delete the files that are already in your index dir</small>
<br />
<br />
<%= start_form_tag(:action => 'create_index') %>
  <%= submit_tag "Create New Index" %>
<%= end_form_tag %> 

simply starts the action on the controller. Your index-directory should get populated with lucene-index-files if you try this. If you want to you may introspect these files with a text editor. You should find words and fragments of words from our test records.

The action in our controller will redirect us to a search_form which’s view looks like:

d. search/search_form.rhtml:


<h2>Search</h2>
<br />
<%= start_form_tag(:action => 'get_results') %>
	<%= text_field 'search', 'search_term'  %>
  <%= submit_tag "Search" %>
<%= end_form_tag %> 

This view is simply outputting a google-like single search field and a submit button. It will invoke the

e. get_result action in our search_controller:


  def get_results
    #we'll handle over the search_term to the view for following paginator searches
    if (params[:search])
      @search_term = params[:search][:search_term]
    else # is it a paginator search?
      @search_term = params[:search_term]
    end
    condition = 'teaser:"' + @search_term + '"'
    # we'll use the standard rails paginator
    @result_pages, @results = paginate(:result, :per_page => 5, :conditions => condition)
    return @result_pages, @results, @search_term
  end 

As you may have realised we haven’t yet modelled a result object that we may use in our paginator, let’s change that:

f. model/result.rb


class Result

  def self.find(*args)
    options = extract_options_from_args!(args)
    conditions = options[:conditions]
    per_page = options[:limit]
    offset = options[:offset]
    
    search_index(conditions)
    # return only the records for the current page
    @records[offset...(offset+per_page)]
  end

  def self.count(conditions = nil, joins = nil)
    search_index(conditions)
    @conditions = conditions
    # return the size of the whole recordset
    @records.size()
  end
  
  # Helper methods 
  def self.search_index(conditions)
    if (@conditions != conditions)
      @records = Array.new
      index = FerretConfig::INDEX
      # we want 1000 docs returned at max
      index.search_each(conditions, {:num_docs => 1000}) do |doc, score|
        @records << index[doc]
      end
    end
  end
  
  def self.extract_options_from_args!(args)
    options = args.last.is_a?(Hash) ? args.pop : {}
    options
  end

end 

Since we want to use the paginator object of rails in our controller we need the methods

self.count(conditions = nil, joins = nil) and self.find(*args)
in our model because the paginator uses them to initialize himself. From a Ferret point of view the search_index(conditions) method is most interesting. We need to

g. make the new result model known in controllers/application.rb:


  model   :result

and finally build a little view for the results:

h. search/get_results.rhtml:


<h2>Favourites</h2>

<% for result in @results -%>
	<p>
		<strong><a href="<%= result.field('url').string_value %>">
			<%= result.field('title').string_value %></a></strong><br />
		<small><%= highlight(result.field('teaser').string_value, @search_term) %></small><br/>
		<small><a href="<%= url = result.field('url').string_value %>">
			<%= url %></a></small>
	</p>
<% end -%>

<%= pagination_links(@result_pages, {:params => {:search_term => @search_term}}) %>
<br />
<br />
<a href="search_form">New Search</a> | <%= link_to "New Favourite", :controller=>"/favourites", :action=>"new" %>

What we have achieved so far, is that we could build up an index from existing database records. After building up the index we could query against it. With a little more effort we would be able to use the full power of Lucenes and Ferrets query language. What we haven’t achieved until now is, to keep our index consistent to our activerecord model. Let’s go a little into that direction before finishing this howto:

12. Keeping the index in sync with the database

We need a way to update the index, when insertions, updates and deletes happen on our activerecord model. Here I will only react on the after_save callback, implementing deletes and updates are up to the reader. Let’s change our model/favourite.rb:


class Favourite < ActiveRecord::Base

  def after_save()
    
    index = FerretConfig::INDEX
    index << {:key => self.id, :title => self.title, :teaser => self.teaser, :url => self.link}
    index.optimize()
    
  end

end

This is all you need for keeping the index in sync after a new insertion of a favourite. Just try it…

WHAT TO WONDER ABOUT

As I’ve already said I’m far, far away from mastering ruby and RoR. This should be just a little contribution to this great community that even a RoR newbie could handle. Regarding Ferret I wonder:

1. Where to put the INDEX-singleton. Is it the right way to put things like this to the environment? I’ve already realised, that there are issues with the write lock on the index folder if you do things like I did in this howto…

2. Java Lucene is so fast and powerful that it is no problem to do searches all along when you need a paginated object. As you may have realised I’ve put some caching on the result class with its @records object. I don’t know if you would do things like this in ruby and RoR. Please, please correct me and correct this howto in whole when you have better ways to do things

MY WISHLIST

I’m pretty delighted of the RoR way of doing things. I think it is great that there is a strong ruby webframework. The main advantage over frameworks like JSF, Struts etc. is in my opinion the zero turnaround time that only a dynamic language could provide. Another thing IMHO is the strong community and its work on a single framework. On Java you have the choice. This is sometimes great, on the other hand the community divides its effort on so many different frameworks that choice sometimes seems to become a disadvantage. Anyway: I think a webframework like RoR should provide a way for the integration of one (or many) fulltext search frameworks. IMHO David Balmain did come a pretty long way. Lucene has shown in many projects that it is powerful enough to handle millions of documents. Nutch for example is an enterprise strength fulltext engine that – given the necessary hardware – could be a competitor of google et al. I’m looking forward to the integration of such great search capabilities in rails!

What is this error?

undefined method `weight’ for #Hash:0×3ad4d28

g:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.0/lib/ferret/search/index_searcher.rb:107:in `search’
g:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.0/lib/ferret/index/index.rb:622:in `do_search’
g:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.0/lib/ferret/index/index.rb:317:in `search_each’
g:/ruby/lib/ruby/1.8/monitor.rb:229:in `synchronize’
g:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.0/lib/ferret/index/index.rb:316:in `search_each’
#{RAILS_ROOT}/app/models/result.rb:27:in `search_index’
#{RAILS_ROOT}/app/models/result.rb:15:in `count’
#{RAILS_ROOT}/app/controllers/search_controller.rb:27:in `get_results’

Parameters: {"search"=>{"searchterm"=>"a"}, “commit”=>"Search"}_

Resolution to the error above



The “undefined method `weight’ for #Hash:0×3ad4d28” error is due not to ferret, but to the new way Rails 1.1 handles the Model count method. To resolve the error change the following code:



class Result

...


  def self.count(conditions = nil, joins = nil)
    search_index(conditions)
    @conditions = conditions
    # return the size of the whole recordset
    @records.size()
  end

...

end 



To something like:
(This code was obtained and modified from the new calculation count method source in activerecord. )

<pre> class Result ... def self.count(*args) options = {} if args.size >= 0 and args.size <= 2 if args.first.is_a?(Hash) options = args.first elsif args[1].is_a?(Hash) column_name = args.first options = args[1] else options.merge!(:conditions => args[0]) if args.length > 0 options.merge!(:joins => args[1]) if args.length > 1 end else raise(ArgumentError, "Unexpected parameters passed to count(*args): expected either count(conditions=nil, joins=nil) or count(options={})") end conditions = options[:conditions] search_index(conditions) @conditions = conditions @records.size end ... end


undefined method `string_value’ for #<Ferret::Document::Field



If using Ferret 0.9.3 you will need to replace the .string_value functions with .data or you will get this error. see http://rubyforge.org/pipermail/ferret-talk/2006-April/000364.html