Need to optimize MySQL queries

Hi all. A few months ago, I coded a PHP/MySQL script that determined the “most popular” searches on my site. Apparently the way I did it is quite resource intensive —even though the site overall is pretty simple and efficient — and my database access has been temporarily restricted at my host. My skills in this area may not be strong enough to resolve the problem. Can you please assist?

Here is the thread where I developed the solution. And below that is the email I received from my host.

[quote]Unfortunately, we have been forced to temporarily restrict access to MySQL for username on gator4173.hostgator.com. Please take a moment to review this email in full as it contains important information and resources to assist you in resolving this issue. Please note that while this restriction is temporary, you should take actions to resolve the issue(s) which may have caused it in order to avoid further down-time/permanent restrictions.

Please reply to this email once you have determined a resolution to this issue.

CPU_TIME:195 table_rows_read:356971695 SELECTS:8342 ROWS_UPDATED:14 ROWS_FETCHED:240842 BUSY_TIME:200 ONNECTED_TIME:252 BYTES_SENT:0 BYTES_RECEIVED:2526383 WAIT_TIME:5

Top table row reads:
DB_USER: username – TOTAL_CONNECTIONS: 1689 – CONNECTED_TIME: 252 – CPU_TIME: 195 – TABLE_ROW_READS: 356971695 – SELECT_COMMANDS: 8342 – UPDATE_COMMANDS: – BUSY_TIME: 200 – BYTES_SENT: 0 – BYTES_RECEIVED: 2526383 – WAIT_TIME (IO): 5

Top WAIT (IO) TIME:
DB_USER: username – TOTAL_CONNECTIONS: 1689 – CONNECTED_TIME: 252 – CPU_TIME: 195 – TABLE_ROW_READS: 356971695 – SELECT_COMMANDS: 8342 – UPDATE_COMMANDS: – BUSY_TIME: 200 – BYTES_SENT: 0 – BYTES_RECEIVED: 2526383 – WAIT_TIME (IO): 5

username 54673 0.0 0.0 300932 16000 ? RN 19:56 0:00 /usr/bin/php /home3/username/comparewear/comparison.php

Wed Nov 9 19:56:29 CST 2016
Running Processes:
username 54673 0.0 0.0 301192 16832 ? SN 19:56 0:00 /usr/bin/php /home3/username/comparewear/comparison.php

Running Queries:
*************************** 1. row ***************************
USER: username
DB: dbname
STATE: Copying to tmp table
TIME: 1
COMMAND: Query
INFO: SELECT comparisons_created.pid, comparisons_created.cid, categories.category, products.hide, products.url_id, products.shortname, products.brand, products.product, products.category_singular
FROM comparisons_created
INNER JOIN products on products.pid = comparisons_created.pid
INNER JOIN categories on categories.cid = comparisons_created.cid
WHERE comparison_new_datetime > ‘2016-11-06 19:56:29’
AND comparisons_created.cid = ‘1’
GROUP BY comparisons_created.pid
ORDER BY count(comparisons_created.pid) DESC
LIMIT 20[/quote]

Thanks!

That’s the query which is hitting you? Something seems off - table_rows_read: 356,971,695. Really?

Do you have indexes on all the appropriate fields?

  • products pid
  • categories.cid
  • comparisons_created.cid
  • comparisons_created.pid
  • comparison_new_datetime

I emptied the table as a temporary band-aid. I really only need to retain the prior few weeks or so, so I could automate that pruning, which would certainly help.

My education on MySQL was pretty self-taught and scattered, so it’s possible I did not structure the table correctly. I am having login problems with phpMySQL at the moment (interesting), but here are the structures, from my backups:

[code]-- Table structure for table categories

CREATE TABLE IF NOT EXISTS categories (
cid int(11) NOT NULL,
category text COLLATE utf8_unicode_ci NOT NULL,
cat_singular text COLLATE utf8_unicode_ci NOT NULL,
csort int(11) NOT NULL
) ENGINE=MyISAM AUTO_INCREMENT=6 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;



– Table structure for table comparisons_created

CREATE TABLE IF NOT EXISTS comparisons_created (
sid int(11) NOT NULL,
pid int(11) NOT NULL,
cid int(11) NOT NULL,
comparison_new_datetime datetime NOT NULL,
comparison_new_products text COLLATE utf8_unicode_ci NOT NULL,
comparison_new_products_count int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;



– Table structure for table products

CREATE TABLE IF NOT EXISTS products (
pid int(11) NOT NULL,
cid int(11) NOT NULL,
brand tinytext COLLATE utf8_unicode_ci NOT NULL,
product tinytext COLLATE utf8_unicode_ci NOT NULL,
shortname tinytext COLLATE utf8_unicode_ci NOT NULL,
url_id tinytext COLLATE utf8_unicode_ci NOT NULL,
category tinytext COLLATE utf8_unicode_ci,
category_singular tinytext COLLATE utf8_unicode_ci NOT NULL,
hide int(11) NOT NULL,
price float DEFAULT NULL,
year year(4) NOT NULL,
calories tinytext COLLATE utf8_unicode_ci NOT NULL,
cycling tinytext COLLATE utf8_unicode_ci NOT NULL,
distance tinytext COLLATE utf8_unicode_ci NOT NULL,
heart tinytext COLLATE utf8_unicode_ci NOT NULL,
oxygen tinytext COLLATE utf8_unicode_ci NOT NULL,
sleep tinytext COLLATE utf8_unicode_ci NOT NULL,
steps tinytext COLLATE utf8_unicode_ci NOT NULL,
swim text COLLATE utf8_unicode_ci NOT NULL,
battery_duration tinytext COLLATE utf8_unicode_ci NOT NULL,
battery_rechargeable tinytext COLLATE utf8_unicode_ci NOT NULL,
Android tinytext COLLATE utf8_unicode_ci NOT NULL,
iOS tinytext COLLATE utf8_unicode_ci NOT NULL,
Windows_Phone tinytext COLLATE utf8_unicode_ci NOT NULL,
Mac tinytext COLLATE utf8_unicode_ci NOT NULL,
Windows tinytext COLLATE utf8_unicode_ci NOT NULL,
Bluetooth tinytext COLLATE utf8_unicode_ci NOT NULL,
cellular tinytext COLLATE utf8_unicode_ci NOT NULL,
GPS tinytext COLLATE utf8_unicode_ci NOT NULL,
NFC tinytext COLLATE utf8_unicode_ci NOT NULL,
WiFi tinytext COLLATE utf8_unicode_ci NOT NULL,
display_color tinytext COLLATE utf8_unicode_ci NOT NULL,
display_resolution tinytext COLLATE utf8_unicode_ci NOT NULL,
display_size tinytext COLLATE utf8_unicode_ci,
display_touch tinytext COLLATE utf8_unicode_ci NOT NULL,
display_type tinytext COLLATE utf8_unicode_ci NOT NULL,
speaker tinytext COLLATE utf8_unicode_ci,
clock tinytext COLLATE utf8_unicode_ci,
phone tinytext COLLATE utf8_unicode_ci,
vibration tinytext COLLATE utf8_unicode_ci,
voice tinytext COLLATE utf8_unicode_ci,
accelerometer tinytext COLLATE utf8_unicode_ci,
barometer tinytext COLLATE utf8_unicode_ci NOT NULL,
ultraviolet tinytext COLLATE utf8_unicode_ci,
water_resistance tinytext COLLATE utf8_unicode_ci,
notif_calls tinytext COLLATE utf8_unicode_ci,
notif_email tinytext COLLATE utf8_unicode_ci NOT NULL,
notif_texts tinytext COLLATE utf8_unicode_ci NOT NULL,
operating_system tinytext COLLATE utf8_unicode_ci NOT NULL,
apps tinytext COLLATE utf8_unicode_ci,
music tinytext COLLATE utf8_unicode_ci,
link1 text COLLATE utf8_unicode_ci,
link2 text COLLATE utf8_unicode_ci
) ENGINE=MyISAM AUTO_INCREMENT=110 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;[/code]

There’s not a single index on any of those tables if that’s the entire script. That would explain greatly why you’re getting buried there.

At the very least, I would have expected to see

  • PRIMARY KEY (cid) on categories
  • PRIMARY KEY (pid) on products

I would also expect to see

  • FOREIGN KEY (cid) REFERENCES categories(cid) on products
  • FOREIGN KEY (pid) REFERENCES products(pid) on comparisons_created
  • FOREIGN KEY (cid) REFERENCES categories(cid) on comparisons_created

Yes. Although you should not have every field as an index you will definitely want them for the more “costly” queries.

http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html

Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. This is much faster than reading every row sequentially.

Pro tip
Try using EXPLAIN to optimize your queries.

Thank you.

I was able to set Primary Keys via PHPmyAdmin. I’m not sure how to set foreign keys as suggested. Would you mind helping with the MySQL queries?

Anyone?

Indexing is generally straight-forward for the obvious stuff - primary keys, fields containing relations to other tables and so forth.

However, optimising indexes requires a little more thought. To properly to support foreign indexes ensure that any relations between the tables are covered by an index (if you join 2 or more tables in a query, think about how the data needs to be accessed in that join for the non-primary tables - typically that involves looking at your “on” and “using” clauses, and to some extend the “where” clause as well. Also tke into account the type of relation as well: many-to-many, one-to-many, one-to-one and so forth. We might be able to make a guess or two given the schema as above, but beyond that you really do need to think about how the data gets retrieved from your query and that is going to be your ultimate guide.

As someone said above, don’t index everything. I will re-state that differently - only index stuff that must be indexed based on your queries as each index is in effect a performance penalty you have to pay (or trade off - against read performance). Some MySQL tools you might use include:

  • The “explain” statement, which does an analysis of your queries and tells you what indexes it would use or whether it would use sequential searches (i.e. pretty much read the entire table on each joined record).

  • Enable the slow-query log in your configuration. This will warn you when things are taking too long; either because of database locks preventing inserts and updates and even other non-DMS queries as well. Don’t enable this too soon - try the above re optimisation first, else you’ll find you have to examine too many slow queries to handle all at once.

  • Examine the process list, often. If you’re finding lots of clients are waiting for locks, check to see what is blocking them and look at the query being executed.

As you have some very large tables, I have some recommendations:

  • MyISAM is the worst possible engine to be using if you are doing heavy queries on large data, especially if the content is often edited because MySQL will often lock MyISAM tables on SELECT. This is because indexes are not versioned as they are in INNODB, and in order to preserve the integrity of the query the data needs to be frozen. (MariaDB - see below - removes many of these problems in its improved implementation).

  • Minimise the number of indexes as noted above: each index has a cost to update.

  • Partition the data into convenient chunks - month based table names is often suitable, and triggers can assist with determining which table data is inserted into, or you can use various tricks to achieve the same thing (MERGE tables for example).

  • Extract smaller tables regularly to grab just the data you want (to index), with a pointer back to the pkey of the primary table you are reporting on if you need data from there. Consider a nightly batch job - since you are now reporting on the much smaller table and not the one being continuously used, it is a) faster, b) doesn’t matter if it gets a lock on select as it won’t affect data being used by other clients.

  • Use a more robust database ENGINE. I’m not a big fan of INNODB, but that’s because i’ve tended to use more write intensive apps - in reads and complex queries it is way better than MyISAM though, and even fully supports proper foreign key constraints (which you don’t get with MyISAM).

  • Consider MariaDB (mysql drop-in replacement) which will provide even better options for performance and choice of database backends. The feasibility of this will be determined largely based on use of recent MySQL specific features (which MariaDB won’t have).

HTH,

/d

1 Like

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.