Ruby on Rails
FullTextSearch

If you use Postgres, the most straightforward option is acts-as-tsearch . Tsearch2 is a Postgres contrib module that lets you offload the heavy lifting to low-level C code.
——

Here’s proposed solution. This is work in progress for now. Please comment. Will produce a proper patch, when done.
GlebArshinov


# this goes into base.rb

      # Works like find_all, but takes a plain text search string
      # rather than conditions.  Current implementation splits the
      # text into keywords and applies +find_by_keywords+, which then
      # uses SQL LIKE operator too look for matches.
      # +find_by_keywords+ sanitizes keywords to avoid SQL injection.
      #
      # +orderings+ and +limit+ arguments are supported as in +find_all+.
      #
      # * TODO: current tokenizer just splits the text on spaces.  I'd
      #   like to be able to consider quoted strings as single token
      #   (standard search engine behavior).
      # Example:
      #   User.find_by_text "artist David"
      def find_by_text(text, orderings = nil, limit = nil, joins = nil)
        keywords = text.split(' ')
        find_by_keywords(keywords)
      end

      # Works like find_all, but takes a collection of keywords rather
      # than conditions.  Returns an array of objects where all the
      # keywords are found in the attributes.  Keywords are sanitized
      # to avoid SQL injection.
      #
      # +orderings+ and +limit+ arguments are supported as in +find_all+.
      #
      # Implementation details:
      # * Current implementation builds a set of conditions using SQL
      #   LIKE '%keyword%' operator.  For each keyword a match is
      #   attempted on all the columns.  If any column matches, then
      #   keyword matches.  A record matches when all the keywords
      #   match.  Example:
      #     Given User(name, description) search for ["artist" "David"]
      #     will produce condition:
      #       (name LIKE '%artist%' OR description LIKE '%artist%') AND
      #       (name LIKE '%David%' OR description LIKE '%David%')
      # * TODO: need to deal with specifics of LIKE operator.  Case
      #   sensitivity is different for different databases.  Also need
      #   to escape special LIKE character (e.g. '%') in provided
      #   strings.  Also, not sure if applying LIKE to non-character
      #   columns is generally supported, though it does works in
      #   <a href="http://wiki.rubyonrails.org/rails/pages/MySQL" class="existingWikiWord">MySQL</a>.
      # * Additional method (this do not exist right now) integrating
      #   with <a href="http://wiki.rubyonrails.org/rails/pages/MySQL" class="existingWikiWord">MySQL</a> full text search would be useful:
      #   <a href="http://dev.mysql.com/doc/mysql/en/Fulltext_Search.html">http://dev.mysql.com/doc/mysql/en/Fulltext_Search.html</a>
      # * Additional method (this do not exist right now) integrating
      #   with "PostgreSQL":http://www.moonation.com/3.html full text search would be useful:
      #   <a href="http://www.sai.msu.su/">http://www.sai.msu.su/</a>~megera/postgres/gist/tsearch/V2/
      #
      # Limitations:
      # * +conditions+ argument is not supported.  Supporting its
      #   array form would require some refactoring of existing code,
      #   since we need to avoid further sanitizing of conditions we
      #   create, as they use '%' characters.
      #
      # Example:
      #   User.find_by_keywords ["artist" "David"]
      def find_by_keywords(keywords, orderings = nil, limit = nil, joins = nil)
        # TODO: should do less conservative sanitizing, since we know
        # that these will go inside SQL string, so characters like ':'
        # or ';' are OK.  Also, need to escape \%_ characters (these
        # have special meaning to LIKE)
        #
        # LIKE documentation:
        #   <a href="http://www.postgresql.org/docs/7.1/interactive/functions-matching.html">http://www.postgresql.org/docs/7.1/interactive/functions-matching.html</a>
        #   <a href="http://dev.mysql.com/doc/mysql/en/Pattern_matching.html">http://dev.mysql.com/doc/mysql/en/Pattern_matching.html</a>
        #   <a href="http://dev.mysql.com/doc/mysql/en/Case_sensitivity.html">http://dev.mysql.com/doc/mysql/en/Case_sensitivity.html</a>
        keywords.collect! { |keyword| sanitize(keyword) }

        # build condition in the form of
        #   (name LIKE '%artist%' OR description LIKE '%artist%')
        #   AND (name LIKE '%David%' OR description LIKE '%David%')
        condition = keywords.collect { |keyword|
          '(' + columns.collect { |column|
            "#{column.name} LIKE '%#{keyword}%'"
          }.join(' OR ') + ')'
        }.join(' AND ')

        find_all(condition, orderings, limit, joins)
      end

# this goes into finder_test.rb

  def test_find_by_text
    # simple match
    topics = Topic.find_by_text "Mary"
    assert_equal(1, topics.size)
    assert_equal(@topic_fixtures["second"]["title"], topics.first.title)

    # case insensitive match
    topics = Topic.find_by_text "mary"
    assert_equal(1, topics.size)
    assert_equal(@topic_fixtures["second"]["title"], topics.first.title)

    # substring
    topics = Topic.find_by_text "ary"
    assert_equal(1, topics.size)
    assert_equal(@topic_fixtures["second"]["title"], topics.first.title)

    # simple negative test
    topics = Topic.find_by_text "does_not_exist"
    assert_equal(0, topics.size)

    # negative test across columns
    topics = Topic.find_by_text "Mary does_not_exist"
    assert_equal(0, topics.size)

    # different order in the same column
    topics = Topic.find_by_text "day have"
    assert_equal(2, topics.size)

    # match across 2 columns
    topics = Topic.find_by_text "day David have"
    assert_equal(@topic_fixtures["first"]["title"], topics.first.title)
    assert_equal(1, topics.size)
  end



See Also

—-

The sanitize line:

keywords.collect! { |keyword| sanitize(keyword) }
is putting single quotes around each keyword so the search ends up like
LIKE ‘%’key’%’…
and fails. Is there another way to escape the keywords?

—-

I believe that you can use sanitize("%#{term}%") instead, unless sanitize escapes %.

—-
Instead

keywords = text.split(' ')

you can use

<code/>keywords = text.scan(/[^"() ]*["(][^")]*[")]|[^"() ]+/) 

and you will get Google like tokenizer