SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Wizard Zaggs's Avatar
    Join Date
    Feb 2005
    Posts
    1,045
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Replacing strange characters to display on XML feed

    Hi Guys!

    Im using PHP to generate an XML feed on the fly. All of my data is stored in a MySQL database. From time to time, i'm getting some strange characters show in the XML output, for example:

    Code:
    HR Advisor role – 9m contract – London
    Part-time HR Manager Role – Digital Marketing
    Here's a snippet of my PHP script that parses the data from the database and TRIES to get it in a readable format...

    PHP Code:
    foreach($jobs as $key=>$array){
                
    $rss_title htmlspecialchars($jobs[$key]['job_title']); // Job title
                
    $rss_title html_entity_decode($rss_titleENT_COMPAT,'UTF-8');
                
                
    $rss_description strip_tags($jobs[$key]['job_description']); // Description
                
    $rss_description html_entity_decode($rss_descriptionENT_COMPAT,'UTF-8');
                if(
    strlen($rss_description) > 400){
                    
    $rss_description substr($rss_description0400).'...'// Shorten description
                
    }
                
                
    $rss_date $jobs[$key]['date_posted']; // Date posted
                
    $rss_link SITEURL.'/'.$this->settings['company_directory'].'/'.$jobs[$key]['company_url'].'/'.$jobs[$key]['job_url']; // Link
                
                
    $date date("D, d M Y G:i:s"strtotime($rss_date));
                
    $date $date.' +0000';
                
    $result .= '<item>';
                
    $result .= '<title><![CDATA['.$rss_title.']]></title>';
                
    $result .= '<description><![CDATA['.$rss_description.']]></description>';
                
    $result .= '<link><![CDATA['.$rss_link.']]></link>';
                
    $result .= '<guid>'.$rss_link.'</guid>';
                
    $result .= '<pubDate>'.$date.'</pubDate>';
                
    $result .= '</item>';
            } 
    Any ideas what's wrong?

    Thanks in advance :-)

  2. #2
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    Could it be an encoding problem before it gets to your database?

  3. #3
    SitePoint Wizard Zaggs's Avatar
    Join Date
    Feb 2005
    Posts
    1,045
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Cups View Post
    Could it be an encoding problem before it gets to your database?
    How would I check that out? The characters that's causing the problem is also stored in the database like this: €“

  4. #4
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    I ask the question because in my experience the encoding anomalies mostly come from the text files they originate from.

    I have no idea how they get into your db, pasted in, scraped from somewhere etc.

    If you have someone typing the data in then perhaps this will not be your case.

  5. #5
    SitePoint Member
    Join Date
    Nov 2011
    Posts
    7
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm not 100% sure of what you mean by 'strange characters', but if you mean 'bad' ASCII characters, then I've treated those before in one of my projects.

    Code:
    // Removing all ASCII characters below ASCII 32 (except 9, 10 and 13 (tab, newline and carrige return)).
    $bad_characters = array_diff(range(chr(0), chr(31)), array(chr(9), chr(10), chr(13)));
    $text = str_replace($bad_characters, '', $text);

    I hope that is useful for you.
    Thanks.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •