PHP and large mysql database

Hi I am currently planning to run PHP and MySql to store large amounts of personal data.

I am doing tests on my local machine, with a database with these fields

id age sex postcode mortgage married prop_val employed car_own head_house children

I currently have 2025150 rows in my database to execute a query to return all these rows takes 17.76 seconds

In the final app there will be 30million plus records and many more fields. I would expect people pulling results of around 10000 to 150000, with up to 20 simultaneous conections. Ideally I would like this as an ajax style app where the results are filtered on the fly altering the count.

Does anyone have any suggestions to maximise the speed.

I have already considered having the post code limiter on a separate page non ajax to limit the results.

For your queries are you using the evil SELECT * or are you specifying just the fields that you want (where you don’t need all the fields returned)?

Yes ha ha this morning I thought of that, realised I only need to select id, I now gat 2208226 in 7.69 seconds. Instead of 18 seconds. I had always been told to only select the fields you need but in most applications the time saving is minimal in this it is clear to see a huge performance boost.

I had always been told to only select the fields you need

Better you’ve been told to only select the rows you need…

Slightly off-topic, but I wonder which is more performant, SELECT * or SELECT fields when you actually do need all the fields in the table…

I guess what I’m pondering is whether or not there is any overhead incurred by the RDBMS when asked to pluck out the fields rather than just getting 'em all.

I notice when I am in php myadmin, it does the queries much faster, is there any reason for this? Or any way to get my code working as fast?

Oh you already made greatest improving to your database architect.
Nothing can help you more.

Save for book or two on database basics may be.

this doesn’t make a lot of sense

why do you want a query to return 2 million ids?

what kind of web page lists 2 million ids?

surely there is something else you’re trying to accomplish

totally disagree

Like r937 I also disagree. How about indices, are they setup correctly? Do you use the correct data types? How about using shards? (etc)

I only pray for some administrator would think the same.
Cause I am usually get punished for my sarcasm.

your sarcasm isn’t obvious, that’s the problem

R927 the reason for returning all the ids is that I really only want a count of the data, as the idea is your create a count of your required data, limiting on various data types like age, sex, etc, etc. You then pay for your list and you will receive a link to download the data.

All I am doing is refining the search to determine the final count of data.

ScallioXTX this is my test database, I have noticed there is no index, I thought I setup an index on id with auto increment obviously not.

CREATE TABLE IF NOT EXISTS data (
id int(10) NOT NULL,
age int(3) NOT NULL,
sex int(1) NOT NULL,
postcode varchar(25) NOT NULL,
mortgage int(10) NOT NULL,
married int(1) NOT NULL,
prop_val int(10) NOT NULL,
employed int(1) NOT NULL,
car_own int(1) NOT NULL,
head_house int(1) NOT NULL,
children int(2) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

Can you tell me more about Shards?

I also think that it is PHP slowing the process down, as on php my admin the query is always a small amount but the page refresh is very slow.

in that case you should use COUNT(*) and return only one row from the database

This is how bad habits learned when working with tiny data sets (SELECT *, returning a long result set and counting in PHP, not designing indexes, using bad data types, returning too many rows, etc.) come back to bite you.
None of that stuff makes a difference with 100 rows, but needs to be paid attention to now.

Using EXPLAIN to ensure your queries use indexes, and caching where appropriate will probably give you the biggest performance gains.

@AnthonySterling - it’s my understanding (never tested it) that even when you do need to select all fields it is faster to specify them all in the query.