Searching on 38 million records. How can I speed up the return of results?

Hello guys,

I really do hope you guys can help out here. I have a MySQL database which has 38 million records in it.

Didn’t Facebook, MySpace and Digg start out on MySQL databases? So I would imagine MySQL can handle 38 million records OK.
(Please note that I am not attempting to build anything like FB, MySpace or Digg - there is too many of them in the world already :slight_smile: )

The first thing I did was split down these records into sub tables

tbl_a to tbl_z tbl_1 to tbl_0

When a record is inserted the insert checks the first character of a string and assigns it to the correct table.

Now the problem… Searching on those tables and the 38 million records.

A typical search query could be ‘ARC’
Please note I am INDEXING the Name (which is what I’m searching on)

In trial and error testing and in depth research (books, web) I have the following facts (which might prove useful to other people):

FULLTEXT

Speed advantage is great. Fast due to the indexing.
but the disadvantages are that you can’t use Wildcards as MySQL Fulltext only indexes full words.

Returns results in 500 ms

LIKE (With Wildcard)

Much slower. (Down to the Wildcard). Produces more accurate results

Returns results in 9 secs

REGEX search

Slower than LIKE. Not an option for this case.

Returns in 11 secs

LIKE BINARY (With Wildcard)

The binary result compares the exact string (Case sensitive). I thought this might help performance. I was wrong.

Returns results in 12 secs

STRIP BACK ALL UN-NEEDED FIELDS (ON LIKE WITH WILDCARD)

Example I only return ID, NAME, PRICE instead of ID, NAME, PRICE, SOLD, LIVE, etc…

Returns results slightly faster by reducing 1/2 secs

THE HOSTING

MySQL is currently running on a Windows Dedicated Server. So one option is to move the database onto a dedicated UNIX platform for slightly better results (not sure if it would make much difference?)

So as you can see I have tried, tested and exhausted the options.

The question:

Am I missing something, is there something else I can do in regards to indexing or speeding up my search results getting returned?

Ideal world would be getting the 9 sec LIKE WILDCARD search down to a 2/3 second search at most.

ANY help, ANY knowledge is very, very gratefully received.

Thanks

fulltext searching can use a wildcard, but apparently only at the end of a word (haven’t tested this myself)[indent]‘apple*’

Find rows that contain words such as "apple’‘, "apples’‘, "applesauce’‘, or "applet’'. [/indent]what exactly are you searching for?

something like ‘ARC’ might be too short

I would like ARC can be found on its own with spaces or as part of a word ARCH.

I must confess I have seen the wildcard example you have highlighted though for my purpose I need the wildcard functionality at the start the string also.

If it helps, there is a setting you can set in the mysql configuration that controls how many characters are full-text indexed. I recall changing this for a local mediawiki search once, but I don’t recall the setting name.

I can create a new table with “reverse” values of the column that I want to search and then UNION the tables in the select query.
May not be the right solution, but a hack worth a look.

Search for stan using Full Text.

mysql> select * from countries where match(Country_official_Name) against(‘afghan*’ in boolean mode);
±----------------±---------------±----------------±----------------------±---------+
| Country_Numeric | Country_Apha-2 | Country_Alpha-3 | Country_Official_Name | batch_id |
±----------------±---------------±----------------±----------------------±---------+
| 4 | AF | AFG | Afghanistan | 0 |
±----------------±---------------±----------------±----------------------±---------+
1 row in set (0.00 sec)

mysql> select * from countries where match(Country_official_Name) against(‘stan’ in boolean mode);
Empty set (0.00 sec)

mysql> select * from countries where match(Country_official_Name) against(‘*stan’ in boolean mode);
Empty set (0.00 sec)

mysql> create table countries_reverse select Country_Numeric, Country_Apha-2, Country_Alpha-3, reverse(Country_Official_Name) as Country_Official_Name_Rev, batch_id from countries;
Query OK, 233 rows affected (0.04 sec)
Records: 233 Duplicates: 0 Warnings: 0

mysql> alter table countries_reverse add fulltext(Country_official_Name_Rev);
Query OK, 233 rows affected (0.05 sec)
Records: 233 Duplicates: 0 Warnings: 0

mysql> select Country_Numeric, Country_Apha-2, Country_Alpha-3, Country_Official_Name, batch_id from countries where match(Country_official_Name) against(‘stan*’ in boolean mode) UNION select Country_Numeric, Country_Apha-2, Country_Alpha-3, reverse(Country_Official_Name_Rev), batch_id from countries_reverse where match(Country_official_Name_Rev) against((concat(“'”,reverse(‘stan’),“*”,“'”)) in boolean mode);
±----------------±---------------±----------------±----------------------±---------+
| Country_Numeric | Country_Apha-2 | Country_Alpha-3 | Country_Official_Name | batch_id |
±----------------±---------------±----------------±----------------------±---------+
| 398 | KZ | KAZ | Kazakhstan | 0 |
| 586 | PK | PAK | Pakistan | 0 |
| 860 | UZ | UZB | Uzbekistan | 0 |
| 762 | TJ | TJK | Tajikistan | 0 |
| 4 | AF | AFG | Afghanistan | 0 |
| 795 | TM | TKM | Turkmenistan | 0 |
| 417 | KG | KGZ | Kyrgyzstan | 0 |
±----------------±---------------±----------------±----------------------±---------+
7 rows in set (0.00 sec)

that’s creative, nice job :slight_smile:

although i wonder if doubling a 38-million-row database is feasible…