SitePoint Sponsor

User Tag List

Results 1 to 10 of 10
  1. #1
    SitePoint Member
    Join Date
    May 2006
    Posts
    7
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Weird PHP / MySQL Problem

    I'm trying to load a CSV text (containing IP Location information) file into a MySQL database table.

    The file contains about 73,000 records, but my load program stops at about record 8,488 every time.

    On investigation, it looks as though the program is attempting to insert a duplicate key, even though there are no duplicates in the text file.

    After adding further debugging lines, I've found that one insert command (insert into iplocation set ip_from=2147483648, ip_to=2147549183, registry='IANA', assigned='414892800', country2='ZZ', country3='ZZZ', country='RESERVED'), which is immediately prior to the record it's stopping at, is actually adding different values to the ones in the insert command - instead of using the values above, it actually adds ip_from and ip_to (which are both key columns) values that are both set to 2147483647. This value, 2147483647, is actually the ip_to value from the previously-inserted row. The other values on the row with the corrupted ip_from and ip_to columns have the correct data (i.e. it's not been copied from the previous row).

    I'm guessing that it's some form of memory issue in MySQL, but I've no idea what's going on or, more importantly, how to fix it. (I've loaded much larger tables using similar code without any problems.)

    Any help anybody can offer would be much appreciated.

    The data file is available at:

    http://www.wealthyteddy.co.uk/geoip/IpToCountry.csv

    However, as this is about 5MB in size, I've created a copy of this file, containing all the records up to the point of error, and one or two records more, to save anybody having to download the whole file. This shortened copy, which I've also renamed as a .txt file so it can be opened in the browser, is available at:

    http://www.wealthyteddy.co.uk/geoip/...pToCountry.txt

    The database table is defined as:

    CREATE TABLE `iplocation` (
    `ip_from` int(10) NOT NULL default '0',
    `ip_to` int(10) NOT NULL default '0',
    `registry` varchar(20) NOT NULL default '',
    `assigned` varchar(10) NOT NULL default '',
    `country2` char(2) NOT NULL default '',
    `country3` char(3) NOT NULL default '',
    `country` varchar(255) NOT NULL default '',
    UNIQUE KEY `ip_from` (`ip_from`,`ip_to`)
    ) TYPE=MyISAM;

    My program code is available at:

    http://www.wealthyteddy.co.uk/geoip/loadipdata.txt

    (The live code is actually called loadipdata.php, of course.)

    I've just had a thought that maybe the unique key of the database table is not defined correctly. Both the ip_from and ip_to columns should be unique in their own right, but I wanted the primary key of each row to be these two columns together. I'm not sure why this would cause the problem I'm getting, especially as it's added the first 8,374 rows without any issues.

    Many thanks!


    Mark.

  2. #2
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,578
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    That's as large a number as a signed integer column can store. Here's your reference:

    http://dev.mysql.com/doc/refman/5.0/...-overview.html

    Looks like BIGINT may be what you'll need.

  3. #3
    SitePoint Zealot Serberus's Avatar
    Join Date
    Oct 2005
    Location
    Herts, UK
    Posts
    113
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The insert command with the issue has ip_to with a value of 2147549183, this is larger than the maximum signed range of an INT column (2^32 / 2 = 2147483648).

    Try changing the column specification to INT(10) UNSIGNED which will double the positive range.

  4. #4
    SitePoint Zealot Serberus's Avatar
    Join Date
    Oct 2005
    Location
    Herts, UK
    Posts
    113
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Dang! Beaten

  5. #5
    SitePoint Member
    Join Date
    May 2006
    Posts
    7
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Thank!

    Dan (and others),

    Many thanks for putting me out of my misery.

    I changed the two columnns to BIGINT, and it seemed to solve the problem.

    However, for some reason, the program stopped loading records somewhere around record # 33700, with no error messages - it just said "Done".

    I've no idea why it stopped at this point, but rather than do further debugging, I simply copied the records beyond that point into a separate file and ran the program again, using this new file, and everything's now loaded into my database OK.

    Once again, thanks!



    Mark.

  6. #6
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,578
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    You might've hit the script execution time limit.

    http://us2.php.net/function.set-time-limit

  7. #7
    SitePoint Member
    Join Date
    May 2006
    Posts
    7
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Thanks Again!

    Dan,

    Thanks for that.

    I did wonder about that, but I've run into that error before and it actually gave me an error message that told me that the maximum execution time had been exceeded.

    Having said that, when I did get that error, it was on a different server, so mabye the configuration was different?

    Anyway, thanks, and best wishes,



    Mark.

  8. #8
    Non-Member
    Join Date
    Jan 2003
    Posts
    5,748
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Once your 30 seconds are up...

  9. #9
    Resident Java Hater
    Join Date
    Jul 2004
    Location
    Gerodieville Central, UK
    Posts
    446
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Let me give you a word of warning, I worked on a property website last year, and we started using PHP, and it was utter poop for handling big text files. We were dealing with a 1.7 million record CSV file, and it used to barf round the ~20K lines mark due to RAM useage. I found this was because PHP would not collect any garbage. I ended up having to split the file into 20K line segments and process each file.

    When we switched to Ruby we never had this problem surprisingly enough :P. This is one problem with PHP being too web orientated imo
    http://virtualfunction.net - Rails Web Development
    http://squaremove.co.uk - Rails powered Property Listings

  10. #10
    SitePoint Member
    Join Date
    May 2006
    Posts
    7
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Thank!

    Thanks for that tip-off!

    I'd sort of come to the same conclusion myself about garbage collection.

    Best wishes,


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •