SitePoint Sponsor

User Tag List

Results 1 to 11 of 11
  1. #1
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question Cleansing table data for import

    Hi guys!

    I've been tasked with importing table data into a new web application. However, the data is chock full of "bad" characters that are generating tons of errors when I try to import.

    There's lots of "Ian\\\\\\\\\\\\\\\\\\\\\\'s home page" and: "That's not the idea; certainly not my idea!" which are wreaking havoc.

    There's also instances of double apostrophes and even some HTML thrown in for good measure.

    As you can imagine, picking through over 13,000 entries by hand really isn't going to happen. I've written a small PHP script to stripslashes(), but it's not getting rid of all of them, just some.

    Does anyone have any tips for cleansing table data?

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,220
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    i can see the problem with the backslashes, but what's wrong with that other example?

    your only hope is to do lots of searches... for example,
    Code:
    WHERE copy LIKE '%\\%'
    followed by UPDATE statements

    make sure your boss/client approves the expense of data cleansing (it will be rather substantial)
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The ";" character is interpreted as a line termination, which brings the whole import process to a halt.

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,220
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    what format is the imported data in? csv?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    No, it's in SQL with insert statements.

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,220
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    that's interesting, i would expect that the file would run, i.e. would load the bad data, seeing as how the SQL statements were likely created from a mysqldump

    can you give an example of an INSERT statement that fails?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I don't have the data to hand. For obvious reasons, I had to abandon the last attempt.

    I'm using phpMyAdmin for exporting and importing the data. The inserts all seem perfectly fine, other than the data inside them.

  8. #8
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,220
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    okay, now we're getting someplace

    if the data has been loaded already, then see post #2, you will have to do lots of searches, followed by individual UPDATE statements which fix the data
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  9. #9
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    So it's going to be a manual process?

  10. #10
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,220
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    not completely, no
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  11. #11
    It's been real... Forbes's Avatar
    Join Date
    Dec 2004
    Location
    Yorkshire, England
    Posts
    676
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    OK, thanks for the help!


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •