SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Zealot Coastal Web's Avatar
    Join Date
    Jan 2006
    Location
    Oregon, U.S.
    Posts
    131
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    fixing strange characters in files

    Hello everyone,

    I work with a publishing company that takes daily content from our magazines and newspapers and uploads it to our websites. I'm working on a parser that takes exported text files and converts them into an XML format that can easily be imported into our backend to save time and money as currently uploading content requires massive amounts of cut and paste.

    In any event, the text files are exporeted from mac's from NewEditPro (an older version). They look fine when opened up with text edit on a mac, however when l upload the files via FTP to our linux based server (or open the text files on a windows PC) certian characters are all screwed up and l'm wondering how l can fix this with PHP.

    I'd rather not create a str_replace() array to fix the characters because l don't believe they are always consisent, here is an example:
    Code:
    A Pennsylvania man used a backhoe to break into a museum owned by his father Ń the pioneering fantasy artist Frank Frazetta Ń in an attempt to steal 90 paintings valued at $20 million, police said Thursday.
    State police charged Alfonso Frank Frazetta, 52, of Marshalls Creek, with theft, burglary and trespass after they say he was caught loading the artwork into his trailer and SUV.
    The elder Frazetta, 81, is renowned for his work on characters including Conan the Barbarian, Tarzan and Vampirella. He was in Florida at the time of the theft.
    His sonŐs motive may stem from a family feud over the master illustratorŐs assets, according to a law enforcement source who spoke on condition of anonymity because it is still early in the investigation.
    Frazetta was arraigned and sent to the Monroe County jail. Bail was set at $500,000. Officials didnŐt know whether he had a lawyer yet.
    Police said that Frazetta and another man used the backhoe to enter the Frazetta Art Museum in the Pocono Mountains region on Wednesday afternoon, tripping a burglar alarm.
    A trooper who responded said Frazetta claimed he had been instructed by his father Ňto enter the museum by any means necessary to move all the paintings to a storage facility,Ó according a police affidavit.
    The elder Frazetta told police that his son did not have permission to enter the museum or to remove any artwork. Frank FrazettaŐs attorney, Gerard Geiger, said the stolen paintings were insured for $20 million, according to court documents.
    Geiger did not immediately return a phone message Thursday.
    Police say charges are pending against a second suspect.
    As you can see Ň should be double quotes, Ő should be single quotes, Ń should be mdashes, or -- etc...

    Any help would be greatly appreciated!

  2. #2
    SitePoint Wizard rguy84's Avatar
    Join Date
    Sep 2005
    Location
    Durham, NC
    Posts
    1,659
    Mentioned
    5 Post(s)
    Tagged
    0 Thread(s)
    I am going to take a stab in the dark, either your character encoding may be screwy somewhere or magic quotes may be on...
    Ryan B | My Blog | Twitter

  3. #3
    SitePoint Zealot Coastal Web's Avatar
    Join Date
    Jan 2006
    Location
    Oregon, U.S.
    Posts
    131
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by rguy84 View Post
    I am going to take a stab in the dark, either your character encoding may be screwy somewhere or magic quotes may be on...
    Hi rguy84,


    Well this happens when files are taken form the mac's over to the PC's so it's not a magic quotes thing for sure. it is some type of character encoding issue, the painful part is that in the version of NewEditPro that these files are being exported from doesn't have anyway to change/set the encoding preferences. So if l want to do this l'll need to fix the bad characters.

  4. #4
    SitePoint Wizard rguy84's Avatar
    Join Date
    Sep 2005
    Location
    Durham, NC
    Posts
    1,659
    Mentioned
    5 Post(s)
    Tagged
    0 Thread(s)
    Sorry, I cannot help you further. I never heard of the program before this post. A quick search keeps asking if I mean NewsEditPro.
    Ryan B | My Blog | Twitter

  5. #5
    SitePoint Enthusiast
    Join Date
    Feb 2004
    Location
    France
    Posts
    58
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Sounds like a charset issue. Most likely your source file is in something like UTF-8 and you read it as ISO-8859-1. PHP has a charset conversion function called iconv


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •