SitePoint Sponsor |
|
User Tag List
Results 1 to 24 of 24
Thread: ip2country optimize
-
Dec 20, 2005, 20:06 #1
- Join Date
- Sep 2005
- Posts
- 6
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
ip2country optimize
Hello!
i want to install ip2country db on hosting
and aprox 50 000 queries per day, each 100 bytes
naturally it will consume shared hosting CPU
is there a way to optimize this DB to lessen CPU load?
postgres is better than mysql ?
any experiences?
help! need advice
-
Jan 18, 2006, 08:23 #2
- Join Date
- Dec 2002
- Location
- Switzerland
- Posts
- 735
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
hi,
i am trying to achieve the same goal.
i currently use the free ip2country database imported in mysql.
On a daily basis, I run about a million queries against this table.
Due to the format of the table (each ip block identified by lower and higher ip), one cannot query the table for multiple IPs at a time, neither use indexes (because of the use of < and > in the where clause). I think this also prevents from using in-memory HEAP tables.
This results in the process taking quite some time even on a speedy machine.
I'd be curious as to any performance improvement possible as well. The way I see it, better performance could only be achieved by using a non sql database, and use the known formatting of the table and good algorithms to avoid full table scans.
-
Jan 18, 2006, 09:10 #3
Just an idea, create a table with the first byte of the ip, and foreign key to the possible ranges?
-
Jan 18, 2006, 10:25 #4
- Join Date
- Feb 2004
- Location
- Tampa, FL (US)
- Posts
- 9,854
- Mentioned
- 1 Post(s)
- Tagged
- 0 Thread(s)
try using inet_aton() and inet_ntoa() in your tables. these twon functions convert IP's to and from integers so your range comparisons can successfully use an index.
-
Jan 18, 2006, 12:45 #5
- Join Date
- Dec 2002
- Location
- Switzerland
- Posts
- 735
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
actually, that's the way it is already. I was under the impression that mysql did not use indexes when having a comparison in where clauses, are you sure it does ?
If yes, what I don't get, is why i get the following:
Code:EXPLAIN SELECT * FROM `iptocountry` WHERE `l` <= 3558292872 AND `h` >= 3558292872 table type possible_keys key key_len ref rows Extra iptocountry ALL l,h NULL NULL NULL 16929 Using where
If it used indexes correcty, shouldn't it be able to find the right row right away ?
-
Jan 18, 2006, 13:46 #6
- Join Date
- Dec 2002
- Location
- Switzerland
- Posts
- 735
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
after making a few tests, it seems that the number of rows looked at depend on the number one compares to. Probably is there some sort of optimization algorithm or something.
Anyway, it doesn't seem optimized enough, because I have written a small php script that searches through a text file, using a simple dichotomy algorithm and I'm having much better results.
On a 60000+ lines database (from webhosting.info ), a SQL query like the above takes (on my machine) between 0.0006 s (when the number is such that only a few lines are looked through) and 0.02s (when full table scan).
My dichotomic script returns systematically results in around 0.0003s, having performed 13 to 18 iterations.
Unless there's something i've missed on the index usage, i'm probably going to switch my method of operation, hopefully dividing at least in two the time taken by my batch.
Let me know if anyone's interested in reading the script (it's fairly simple) and i'll post it.
-
Jan 18, 2006, 14:08 #7
Yes, the maxmind geoip php code does something similar I believe.
-
Jan 18, 2006, 14:14 #8
- Join Date
- Feb 2004
- Location
- Tampa, FL (US)
- Posts
- 9,854
- Mentioned
- 1 Post(s)
- Tagged
- 0 Thread(s)
this index might actually get used:
Code:alter table add index `l_h` (`l`, `h`);
Code:SELECT * FROM `iptocountry` WHERE 3558292872 BETWEEN `l` AND `h`
-
Jan 18, 2006, 14:25 #9
Seems MySQL refuses to use the index. Even when forced.
CREATE TABLE `iptocountry` (
`l` bigint(20) unsigned NOT NULL default '0',
`h` bigint(20) unsigned NOT NULL default '0',
`isoCode` char(2) NOT NULL default '',
PRIMARY KEY (`l`,`h`)
) ENGINE=MyISAM;
With 37879 rows.
EXPLAIN SELECT * FROM iptocountry force index(primary) WHERE 3558292872 BETWEEN l AND h
1, 'SIMPLE', 'iptocountry', 'ALL', '', '', '', '', 37879, 'Using where'
[Using data from http://www.ip2country.net/download/cr.zip ]
-
Jan 18, 2006, 15:09 #10
- Join Date
- Dec 2002
- Location
- Switzerland
- Posts
- 735
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Ren, don't you experiment that changing the number you compare to produces different results?
IP tables have a somewhat strange repartition scheme, which may be the reason for disparities.
This is also the reason why I don't get better results with other algorithms than dichotomy. I've tried the false position and tangent algorithms, which are supposed to converge faster than dichotomy, but the repartition of the IP blocks seems to produce weird results with those methods.
Anyway, I've modified my script a bit to put the whole file in memory and search php strings/arrays rather than the file, and I'm now getting results around 0.0002s per query (vs 0.0003s searching the file and 0.0006s-to 0.003s with sql).
Getting closer. Might experiment with a binary version and then i should be done.
-
Jan 18, 2006, 15:41 #11
Originally Posted by quenting
http://cvs.sourceforge.net/viewcvs.p...25&view=markup
_geoip_seek_country() does the search.
But as I need to do lookups on batchs of IPs, guessing that SQL version will perform better.
-
Jan 18, 2006, 15:46 #12
- Join Date
- Feb 2004
- Location
- Tampa, FL (US)
- Posts
- 9,854
- Mentioned
- 1 Post(s)
- Tagged
- 0 Thread(s)
Code:alter table iptocountry add index full (`l`, `h`, iso_code);
-
Jan 18, 2006, 16:42 #13
- Join Date
- Dec 2002
- Location
- Switzerland
- Posts
- 735
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
alter table iptocountry add index full (`l`, `h`, iso_code);.
I've been using the MaxMind geoip, as well that was the first IP to country I've found. Just wondering how both data sets compare for accuracy etc.
But as I need to do lookups on batchs of IPs, guessing that SQL version will perform better.
I'd like to try out maxmind and see how it compares to webhosting.info's, but it seems it's not free either. Does anyone know a freely available databases including ip to city correspondance ?
-
Jan 18, 2006, 17:44 #14
- Join Date
- Feb 2004
- Location
- Tampa, FL (US)
- Posts
- 9,854
- Mentioned
- 1 Post(s)
- Tagged
- 0 Thread(s)
ah, forgot to mention that the columns have to appear in the order of the index. so your query has to be this:
Code:SELECT iso_code FROM `iptocountry` WHERE 3558292872 BETWEEN `l` AND `h`
-
Jan 18, 2006, 18:31 #15
Originally Posted by quenting
Unfortunately cant get to their website atm to find the download link.
I'm importing web log files, so each line gets imported, atm I'm doing the country lookup in PHP, as each line gets imported. But plan to remove it from the loop, and have a single INSERT when the raw data has been imported.
Something along the lines of...
INSERT table SELECT l.logId, i.countryId FROM logs l INNER JOIN l.clientIp BETWEEN i.l AND i.h
-
Jan 19, 2006, 02:39 #16
- Join Date
- Dec 2002
- Location
- Switzerland
- Posts
- 735
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
ah, forgot to mention that the columns have to appear in the order of the index. so your query has to be this:
Code:EXPLAIN SELECT code FROM `iptocountry` WHERE 3558292872 BETWEEN `l` AND `h` table type possible_keys key key_len ref rows Extra iptocountry index NULL FULL 18 NULL 63370 Using where; Using index
-
Jan 19, 2006, 05:48 #17
Just had a play with creating a intermediary table, determining country for 1,438 ips.
SELECT l.logId, l.clientIp, i.isoCode
FROM tracker.logs l
INNER JOIN octet o ON l.clientIp >> 24 = o.classA
INNER JOIN iptocountry i ON o.rangeId = i.rangeId AND l.clientIp BETWEEN i.l AND i.h
id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
1, 'SIMPLE', 'l', 'ALL', 'Index_2', '', '', '', 1438, ''
1, 'SIMPLE', 'o', 'ref', 'PRIMARY', 'PRIMARY', '1', 'func', 1514, 'Using where; Using index'
1, 'SIMPLE', 'i', 'eq_ref', 'PRIMARY', 'PRIMARY', '2', 'test.o.rangeId', 1, 'Using where'
Which appears alot (25x ) better than
SELECT l.logId, l.clientIp, i.isoCode
FROM tracker.logs l
INNER JOIN iptocountry i ON l.clientIp BETWEEN i.l AND i.h
id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
1, 'SIMPLE', 'l', 'ALL', 'Index_2', '', '', '', 1438, ''
1, 'SIMPLE', 'i', 'index', '', 'Index_Full', '18', '', 37879, 'Using where; Using index'
Where Index_Full is an index on (l, h, isoCode)
-
Jan 19, 2006, 09:07 #18
- Join Date
- Feb 2004
- Location
- Tampa, FL (US)
- Posts
- 9,854
- Mentioned
- 1 Post(s)
- Tagged
- 0 Thread(s)
Originally Posted by quenting
-
Jan 19, 2006, 09:29 #19
- Join Date
- Dec 2002
- Location
- Switzerland
- Posts
- 735
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
it says using index!
-
Jan 19, 2006, 09:44 #20
- Join Date
- Feb 2004
- Location
- Tampa, FL (US)
- Posts
- 9,854
- Mentioned
- 1 Post(s)
- Tagged
- 0 Thread(s)
the 63370 in the explain is NOT an exact number. that is an ESTIMATE of the number of rows that the index will return based on the key distribution. it has nothing to do with the actual number of rows looked at, although some times it just happens to be the same as the number of rows returned from that table, but that is coincidence, not a mathematical or procedural certainty.
-
Jan 19, 2006, 10:49 #21
Originally Posted by longneck
932 rows in set (32.99 sec)
vs
932 rows in set (9.16 sec)
Using the pair of queries post above. Latter timing using the extra table.
-
Jan 19, 2006, 12:07 #22
- Join Date
- Dec 2002
- Location
- Switzerland
- Posts
- 735
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Originally Posted by longneck
Still, if it ends up returning one row, and the estimate was 63370, it means the index doesn't work as it should. And if it estimates to get 63370 in the result set, it will go through a full table scan (because it will think it has to) won't it?
What's weird is the query times, I now get 0.002s for the same query that takes 0.0006s without the index.
-
Jan 20, 2006, 09:46 #23
- Join Date
- Feb 2004
- Location
- Tampa, FL (US)
- Posts
- 9,854
- Mentioned
- 1 Post(s)
- Tagged
- 0 Thread(s)
i'm stumped. if anyone else want's to take a crack at it, i've attached a SQL file that creates a table of random ip addresses. you can download the ip2country sql file from http://www.ip2country.net/download/cr.zip (look for a file called anp_ip2country.sql)
-
Jan 20, 2006, 09:47 #24
- Join Date
- Feb 2004
- Location
- Tampa, FL (US)
- Posts
- 9,854
- Mentioned
- 1 Post(s)
- Tagged
- 0 Thread(s)
i guess i need to actually attach the file....
Bookmarks