SQL query taking longer as data grows

Hi,

I have an SQL query which gets clients’ info from the DB, by joining some tables. It returns the desired data fine.

My problem is that the query started to take considerably longer to load as the number of DB entries grow (around 2000 clients now). How long it takes varies, but it sometimes takes up to 10 or 15 seconds, which is not acceptable.

Before considering removing some tables, hence simplfying the query, I am looking for ideas to improve the query speed in whatever ways possible.

I guess my tables could use some key indexes? I am not much experienced with using indexes properly, so any recommendation for which kind of index should be used for which column would be great.

My query:

SELECT clients.id, fname, lname, email, clients.phone, city, state, zip, address, notes, added, started, abbr, statuses.name AS status, filters FROM clients
LEFT JOIN states ON clients.state = states.id
LEFT JOIN client_statuses ON clients.id = client_statuses.client_id
LEFT JOIN statuses ON statuses.id = client_statuses.status_id
LEFT JOIN (SELECT client_filters.client_id, GROUP_CONCAT(filters.name,',',filters.type SEPARATOR ':') AS filters FROM client_filters
INNER JOIN filters ON filters.id = client_filters.filter_id
GROUP BY client_filters.client_id) SQ ON SQ.client_id = clients.id
LEFT JOIN client_filters ON clients.id = client_filters.client_id
WHERE active = 1 GROUP BY clients.id ORDER BY added DESC LIMIT 0, 20

My tables:

CREATE TABLE `clients` (
  `id` int(7) UNSIGNED NOT NULL AUTO_INCREMENT,
  `fname` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
  `lname` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
  `email` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
  `phone` bigint(10) UNSIGNED DEFAULT NULL,
  `city` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
  `state` int(2) DEFAULT NULL,
  `zip` varchar(5) COLLATE utf8_unicode_ci DEFAULT NULL,
  `address` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL,
  `notes` text COLLATE utf8_unicode_ci,
  `added` datetime NOT NULL,
  `started` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `email` (`email`),
  UNIQUE KEY `phone` (`phone`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `client_filters` (
  `id` int(7) UNSIGNED NOT NULL AUTO_INCREMENT,
  `client_id` int(7) UNSIGNED NOT NULL,
  `filter_id` int(5) UNSIGNED NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `client_statuses` (
  `id` int(7) UNSIGNED NOT NULL AUTO_INCREMENT,
  `client_id` int(7) UNSIGNED NOT NULL,
  `status_id` int(2) UNSIGNED NOT NULL,
  `user_id` int(7) UNSIGNED NOT NULL,
  `active` int(1) UNSIGNED NOT NULL DEFAULT '1',
  `date` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `filters` (
  `id` int(5) UNSIGNED NOT NULL AUTO_INCREMENT,
  `name` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `type` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ROW_FORMAT=DYNAMIC;

CREATE TABLE `states` (
  `id` int(2) UNSIGNED NOT NULL AUTO_INCREMENT,
  `name` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
  `abbr` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
  `phone` bigint(10) DEFAULT NULL,
  `google` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL,
  `yelp` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL,
  `served` int(1) UNSIGNED NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `statuses` (
  `id` int(2) UNSIGNED NOT NULL AUTO_INCREMENT,
  `name` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Thanks for your opinions.

First off, the left joins are going to be, by their very nature, slower because it’s going to have to do a full table scan for each one. So if you’re looking for only the matches, use an INNER JOIN where ever possibe.

Second, as soon as you include a value from a joined table into the where clause, the table is essentially an inner join, so the Client_Status table should just be an INNER JOIN

1 Like

Anytime you have a FK table that has only a single column in it, I question the value of the table, unless you’re storing the database on a potato.

client_statuses.status_id is NOT NULL, so when you pull client_status into an inner join (as per Dave’s suggestion above), statuses goes with it into inner join by definition (since there must be exactly 1 status in that field.)

1 Like

Thank you both for your suggestions.

Just so I understand you correctly, you mean that having tables with just one column, e.g. statuses, are really not ideal in cases like mine -. So, removing the statuses table, and for example, changing status_id in client_status table to status to hold statuses.name info is better?

Shall I add key indexes to columns where I perform the joins between the tables? For example in

INNER JOIN states ON clients.state = states.id

states.id is primary key. Shall I add a key index to clients.state? And similarly in other columns used for joining?

Ostensibly, all it does when you have a extra table with a single column (of a short variety) is that it conserves a little bit of disk space, because you write an INT into your main table, rather than the string.

It depends a bit on your use cases for the table, of course. If you query the statuses table by itself, or more than 1 table references it, then it may serve a good value.

That said, if statuses was simply a column in client_statuses, your query turns from

SELECT clients.id, fname, lname, email, clients.phone, city, state, zip, address, notes, added, started, abbr, statuses.name AS status, filters FROM clients
LEFT JOIN states ON clients.state = states.id
LEFT JOIN client_statuses ON clients.id = client_statuses.client_id
LEFT JOIN statuses ON statuses.id = client_statuses.status_id
...

into

SELECT clients.id, fname, lname, email, clients.phone, city, state, zip, address, notes, added, started, abbr, status, filters FROM clients
LEFT JOIN states ON clients.state = states.id
LEFT JOIN client_statuses ON clients.id = client_statuses.client_id
...

One less join for the engine to make every time you do a query.

1 Like

UPDATE: Issue solved.

Adding my notes here, in case anyone else may have a similar issue in the future.

Firstly, I read a bit more about SQL JOINs and using indexes properly, now I understand them better. Especially the examples on this page were helpful:

https://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins

Then, I went ahead to do changes and test my query time at each change. I started with changing all LEFT joins to INNER joins. As far as I can tell, this had around a -10% reduction in query time, so even though it was good, considering the loading time was around 10-15 seconds, it was certainly not enough. I would think for a lot larger data, it may have a bigger impact. Nevertheless, I will make sure to use INNER joins where it makes more sense, going forward.

I continued with adding KEY indexes to those columns where I join the tables, if they are not already marked as PRIMARY index. Adding KEY indexes at client_statuses table to client_id and status_id columns was the real gain.

Before the changes, my query times were varying between 8 to 16 seconds, now they are between 0.06 to 0.2 seconds.

Removing unnecessary tables is another step for sure, but for the time being I will leave it to another time.

Thanks again for your insights.

2 Likes

WHAT??!!!

relational integrity… ever heard of it?

If it’s a single column, accessed only through a link to a single other table, then what relational integrity would be lost in just… having the column in the other table?

not sure i understand what this latest statement is trying to propose…

could you please give an example using two tables, dummy tables if necessary

to quote the OP:

CREATE TABLE `client_statuses` (
  `id` int(7) UNSIGNED NOT NULL AUTO_INCREMENT,
  `client_id` int(7) UNSIGNED NOT NULL,
  `status_id` int(2) UNSIGNED NOT NULL,
  `user_id` int(7) UNSIGNED NOT NULL,
  `active` int(1) UNSIGNED NOT NULL DEFAULT '1',
  `date` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `statuses` (
  `id` int(2) UNSIGNED NOT NULL AUTO_INCREMENT,
  `name` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

What relational integrity is lost by changing this to be

CREATE TABLE `client_statuses` (
  `id` int(7) UNSIGNED NOT NULL AUTO_INCREMENT,
  `client_id` int(7) UNSIGNED NOT NULL,
  `status` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `user_id` int(7) UNSIGNED NOT NULL,
  `active` int(1) UNSIGNED NOT NULL DEFAULT '1',
  `date` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

quite possibly you do not understand relational integrity

OP obviously forgot the actual FK, but you ~did~ call it an “FK table” so let’s pretend that there actually was a FK relationship

in your one-table scheme, how do you get the database to automatically prevent invalid statuses from being entered?

You don’t. The same way you don’t prevent invalid statuses from being added to the external table.

that’s priceless

you win the database internet for today

If you want to enforce an enum on the field, you’d… enforce an enum on the field.

Adding an additional join for the sake of trying to fake an enum just seems wasteful.

Putting a database restriction on something finite (i.e. not something like a date) is usually not a good idea. Having to alter the table structure each time you want to add/remove a valid value can get very hairy real quick.

It’s much easier, extendable and for the most part safer to use FK constraints which allows you to add/edit/remove table records at will, not to mention cascading deletes to maintain referential integrity

In what way that is not also present when doing it with an external table? Clearly I’m wrong, but I don’t see the reasoning yet.

Debatable.

How is it more extendable than an enum? Again, we’re talking specifically about a single-column table here.

In what way is an external table safer than an in-table enum?

I will grant you, it saves you having to type in DELETE FROM client_statuses WHERE status = "wark". But… I still don’t see there being any referential integrity required.

Sorry to interrupt your discussion :slight_smile: but if I remember correctly, I had my reasons for putting statuses into a separate table. I believe, I needed to track the order of statuses for a flow of certain actions on some other scripts, and then it is possible that status names or order could change in time etc. Anyway, with all the discussion, I better understand that if using such a separate one-column table is not totally necessary, then it can be better to merge it in the main table, to reduce the number of joins. Thanks again.

Because when you change a referential constraint, all you’re doing is altering a value in a database. So, let’s take the example here - status. Say you’ve got statuses like this:

  • New
  • Active
  • On Hold
  • Closed

Now, suppose your boss comes to you and said “I don’t like the term On Hold. Make it Suspended instead.” To do that with a referential integrity table, all you would have to do would be this:

UPDATE TABLE SET Name = 'Suspended' WHERE Name = 'On Hold'

If you’re doing it with CHECK CONSTRAINTS, you’d have to do multiple steps

-- Add Suspended as a valid status
ALTER TABLE client_statuses DROP CONSTRAINT CK_VALID_STATUS;
ALTER TABLE client_statuses ADD CONSTRAINT CK_VALID_STATUS
            CHECK (status IN ('New', 'Active', 'On Hold', 'Closed', 'Suspended'));
-- Change the status
UPDATE TABLE client_statuses SET status = 'Suspended' WHERE status = 'On Hold'
-- Remove the old value from the valid status list
ALTER TABLE client_statuses DROP CONSTRAINT CK_VALID_STATUS;
ALTER TABLE client_statuses ADD CONSTRAINT CK_VALID_STATUS
            CHECK (status IN ('New', 'Active', 'Suspended', 'Closed'));

The ADD CONSTRAINT lines get much more expensive the more records you have because every record has to be checked against the constraint every time it’s changed.

Uh… no? Though i’ll grant you it will take 3 steps, I’m not sure how this is any ‘hairier’… I wasnt referring to constraints, I was using the field type ENUM. Yaknow. That thing, that they included. For this purpose.

ALTER TABLE client_statuses MODIFY status ENUM('New','Active','On Hold','Closed','Suspended')
UPDATE table SET status = "Suspended" WHERE status = "On Hold";
ALTER TABLE client_statuses MODIFY status ENUM('New','Active','Closed','Suspended')