Need help with many "AND" conditions and indexes


I’m having a tough time making some of my larger queries that sometimes takes 90 seconds to run on a table with only 60,000 rows. Here is my table with indexes:

CREATE TABLE `items` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `seller` varchar(30) NOT NULL,
 `listtype` tinyint(1) unsigned NOT NULL,
 `live` tinyint(1) unsigned NOT NULL,
 `currentprice` decimal(10,2) unsigned NOT NULL,
 `buyprice` decimal(10,2) unsigned NOT NULL,
 `created` datetime NOT NULL,
 `endtime` datetime NOT NULL,
 `storelevel1` smallint(5) unsigned NOT NULL,
 `storelevel2` smallint(5) unsigned NOT NULL,
 `storelevel3` smallint(5) unsigned NOT NULL,
 `storelevel4` smallint(5) unsigned NOT NULL,
 `storelevel5` smallint(5) unsigned NOT NULL,
 `storelevel6` smallint(5) unsigned NOT NULL,
 PRIMARY KEY (`id`),
 KEY `seller` (`seller`),
 KEY `listtype` (`listtype`),
 KEY `live` (`live`),
 KEY `currentprice` (`currentprice`),
 KEY `buyprice` (`buyprice`),
 KEY `created` (`created`),
 KEY `endtime` (`endtime`),
 KEY `storelevel1` (`storelevel1`),
 KEY `storelevel2` (`storelevel2`),
 KEY `storelevel3` (`storelevel3`),
 KEY `storelevel4` (`storelevel4`),
 KEY `storelevel5` (`storelevel5`),
 KEY `storelevel6` (`storelevel6`)

Here is my general query:

SELECT * FROM items WHERE seller = 'johnson' AND storelevel1 = 2653 AND listtype <= 2 AND live = 1 ORDER BY created DESC LIMIT 25

Here is information about my columns - all of these columns have indexes on them:

“seller” is the username of the seller
“storelevel1” is a category id number - store category that item is located in.
“listype” is a number between 1 and 4 - signifying an auction-only or fixed-priced-only listing for 1 and 2 (50% of items are listtype = 1, 10% are listtype = 2 and 40% are something else)
“live” is a number between 0 and 4 - live = 1 means item is live, live = 2 means it’s sold out, etc (90% of items are listed as live = 1)

ORDER BY can be sorted by “created” (newest items), “endtime” (ending soonest) or “currentprice” ASC OR DESC" to order items by prices.

How do I optimize this query for scalability and so it can load in milliseconds (if possible)? Am I supposed to merge these indexes somehow? Use sub-queries or joins? I’m lost here, please help!

Kind regards

Which of them fields are not used as join conditions (in the ON Clause) in any query?

There aren’t any JOIN conditions in these types of queries. It’s just one simple SELECT WHERE query with a bunch of AND conditions and a single ORDER BY condition.

I came up with a query today using multiple JOIN statements:

SELECT * FROM items AS f
JOIN items AS f2 ON = AND f2.seller =  'johnson'
JOIN items AS f3 ON = AND f3.storelevel1 = 2653
JOIN items AS f4 ON = AND f4.listtype <= 2
JOIN items AS f5 ON = AND = 1
ORDER BY f.created LIMIT 25

It seems like this query is helping as it reduces the load time from 60 seconds down to between 1 and 5 seconds. Although this isn’t the magical Millisecond-speed one that I was hoping to accomplish.

Here is the EXPLAIN information on that query:

id	select_type	table	type	possible_keys	key	 key_len	ref	rows	Extra
1	SIMPLE	        f2	ref	PRIMARY,seller  seller	 92	       const	1836	Using where
1	SIMPLE	        f3	eq_ref	PRIMARY,storelevel1 PRIMARY	4	1	Using where
1	SIMPLE	        f	eq_ref	PRIMARY	         PRIMARY	4	1	Using where
1	SIMPLE	        f4	eq_ref	PRIMARY,listtype PRIMARY	4	1	Using where
1	SIMPLE	        f5	eq_ref	PRIMARY,live	 PRIMARY	4         1	Using where

Please let me know if I’m doing this wrong and if there are any improvements that could be done to help optimize this query.

Kind regards

ADD INDEX ( seller, storelevel1, listypte, live )

then try your original query again

your indexes on listype alone and live alone are unlikely ever to be useful

Is there any other solution? I would need to create a composite index to also include storelevel2, 3, 4, 5 and storelevel6 as well. This is just the “store” part of my site for individual sellers… I also have to make similar composite indexes for 6 levels of our general categories as well.

so the index i suggested worked well? your original query ran faster?

yes, you would want to consider additional composite indexes for your more popular/frequent queries

you can, obviously, go overboard, so use prudence

This works, but unfortunately I would need to create about 20 composite indexes to cover everything on my site (even more if I have to order results by time and price combinations). My larger site with a half million products is set up similarly and it takes a looooong time to set up all these composite indexes.

Is there any other way to do this besides composite indexes? I’m tempted to just use Sphinx search, which works perfectly for everything on my site, but using Sphinx is “cheating”. I should be able to do it the MySQL way, especially if I hope to get a job that requires MySQL knowledge.

1 Like

Just a random thought here, have you thought about doing this in Solr? Seems to me Solr would be perfect for your needs.

storelevel1 through storelevel6 seem like a perfect range for partitioning your table. If allowed by your data, you could replace the storelevelx columns with just two columns: store_level and store, with store_level (ranging from 1 through 6 or higher) being used for partitioning and store column keeping the original data. This way it would be like having six different tables fused together but with the added bonus of being treated as different tables when using something like this in a query: where store_level = 1 thus reducing the working data set considerably, resulting in faster query response.

Regarding the single column vs. composite index, if you do this:

ADD INDEX ( seller, listypte, live )

then the following conditions should be resolved faster by using the index:

  1. where seller = 'johnson'

  2. where seller = 'johnson' AND listtype <= 2

  3. where seller = 'johnson' AND listtype <= 2 AND live = 1

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).

Finally, having a table where all the columns are indexed individually, that’s an oddity.

If Sphinx were “cheating”, then so would SOLR : P

Hm, yes, I suppose if you view Spinx as cheating than Solr would be cheating as well. Although technically Spinx is nothing more than some sugar coated rich text search engine bolted on a DB (MySQL/PG/etc) connection, whereas Solr is a DB in its own right.

It could be argued that MySQL simply isn’t the best tool for the job here (faceted search), whereas Solr is built for the sole purpose of faceted search, and is blazingly fast at it. If you just stay away from replication, as that is a royal PITA. At least it was when I used it +/- 3 years ago. Things might have (should have?) changed since then.

Replication, sharding, distribution… I hear is pretty automatic with ElasticSearch, surely SOLR has stolen some of that by now.

ES and SOLR can function as document stores, but I figure there’s a reason besides separation of storage/CRUDdy stuff and search that people continue to bolt search engines on next to DBs. I’d love to know actually, maybe a question to be separately asked in the DB forum.

Sphinx is also blazingly fast. It seems to sorta be the benchmark the Postgres guys like to use as a goal for their internal FTS.

All that aside, I agree with the OP that it’s probably a good idea to learn SQL well enough to write excellent and fast queries, something I’d like to learn as well, eventually. It would help you determine if/when an external search is the better choice, and engines like Sphinx themselves use SQL queries, so the knowledge is helpful there anyway.