Fast search engine by text

I have a table of songs and I want to be able to quickly search for songs by artist, title or lyrics fragment. The title and the lyrics fragment must keep their original punctuation, diacritics and capitalization, but the user will ignore them during the search.

I want to use the same search field because the user may not know the title or lyrics. Furthermore, the user may make mistakes. The pluralization may be wrong, a particle may be forgotten (show [an] example), a letter may be different (sing/song), ignore a missing letter (ma’am/madam), useless distinction (it’s/it is), orthographic difference (color/colour).

So far, I have created the column search_index and an index for it. It is automatically filled out by a trigger that concatenates the title and the lyrics fragment fragment, convert to lower case and remove diacritics and punctuation. The app was very slow before the index column.

The search term that the user inputs is converted to lower case without diacritics and punctuation. The comparison in the query is:

search_index REGEXP CONCAT('(.+\\s)?', :search_term, '(\\s.+)?')

The next steps is to ignore pluralization, missing articles and alternate orthography, but it may be very complex and slow.

I need to improve my search engine. Then I have to know what the search engines of most web sites do.

I don’t know what “most sites” do, but Discourse (what this forum runs) uses PostgreSQL

There is extensive documentation and I have not read much of it let alone understand it all, but I think before you get too deep into writing your own text search code you should give it a read

Even if you don’t use PostgreSQL reading it will give you familiarity with terminology (lexemes, dictionary, stemmer, stop words, etc.) and how the pieces work so you’ll be in a better position to find what you need in what ever database you end up using.

Try this:

Generally for extensive search people tend to use a specialised database such as ElasticSearch which does everything you’ve described and more.

1 Like

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.