|
|||||||
New to SitePoint Forums? Register here for free!
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#1 |
|
SitePoint Guru
![]() ![]() ![]() ![]() ![]() Join Date: Jul 2003
Location: Newcastle upon Tyne
Posts: 930
|
Making my RSS feed work with special characters
Hi there.
I'm using the following code to create the RSS feed on my site (please note I've edited it in places). My RSS feed is valid however it crashes when I put in pound symbols (£) or dollar symbols ($) or, indeed, any special character (ie: such as a %) How do I get my PHP-powered RSS feed to work around the special characters problem? I appreciate any help you can give on this subject. Here is the code; Code:
<?php
$pubDate = date("r");
$year = date("Y");
function iso_8601 ($txt_date) {
$fDate = strtotime($txt_date);
$main_date = date("Y-m-d\TH:i:s", $fDate);
$tz = date("O", $timestamp);
$tz = substr_replace ($tz, ':', 3, 0);
$return = $main_date . $tz;
return $return;
} // end function
header ("Content-type: text/xml");
echo ("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n");
?>
<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:hr="http://www.w3.org/2000/08/w3c-synd/#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/">
<channel rdf:about="http://www.somesite.com/index.php">
<title>TITLE</title>
<description>DESCRIPTION</description>
<link>LINK</link>
<language>en-gb</language>
<?php
echo ("<pubDate>$pubDate</pubDate>");
echo ("<copyright>Copyright $year. somesite.com</copyright>");
echo ("<webMaster>EMAIL_ADDRESS</webMaster>");
?>
</channel>
<?php
while ($row = mysql_fetch_array ($result)) {
$postid = $row['post_id'];
$txt_date = $row['post_date'];
$txt_title = $row['post_headline'];
$txt_article = $row['post_article'];
$txt_title = stripslashes($txt_title);
$txt_title = fixDisplay($txt_title);
$txt_article = fixDisplay($txt_article);
$txt_article = strip_tags($txt_article);
$formatted = iso_8601($txt_date);
$articleLink = 'http://www.somesite.com/archives'.'/'.$postid;
// DO RSS DISPLAY
echo ("<item rdf:about=\"http://www.somesite.com\">
<title>");
echo $txt_title;
echo ("</title>
<description>");
echo $txt_article;
echo ("</description>
<link>");
echo $articleLink;
echo ("</link>
<dc:date>");
echo $formatted;
echo ("</dc:date>
</item>\n\n");
}
mysql_free_result ($result);
?></rdf:RDF>
mysql_close($connection);
?>
|
|
|
|
|
|
#2 |
|
SitePoint Zealot
![]() ![]() Join Date: May 2003
Location: United States
Posts: 108
|
Is it the php thats giving you the error, or the attempt to view RSS from the browser?
You can use: Code:
<element><![CDATA[Non Xml Content Here#$%@ ]]></element> ![]() |
|
|
|
|
|
#3 | |
|
SitePoint Guru
![]() ![]() ![]() ![]() ![]() Join Date: Jul 2003
Location: Newcastle upon Tyne
Posts: 930
|
Sorry, I wasn't very clear in my last email where the error was coming from.
The error is coming from actually displaying special characters such as a pound symbol. Example; Quote:
$txt_title with the title example I've used above. Replace: $txt_article with the article example I've used above. When you run the PHP code it complains that an error has been caused at line XX - its to do with the £ symbol, it cannot handle them in an XML format. Is there something I can do that'll help? |
|
|
|
|
|
|
#4 |
|
SitePoint Guru
![]() ![]() ![]() ![]() ![]() Join Date: Jul 2003
Location: Newcastle upon Tyne
Posts: 930
|
Perhaps there's a different PHP to RSS code someone has that a) produces RSS Valid code and b) can display £ and other special characters / symbols without screwing up?
|
|
|
|
|
|
#5 |
|
SitePoint Addict
![]() ![]() ![]() Join Date: Nov 2003
Location: England
Posts: 293
|
Assuming you are only interested in displaying the RSS feed results through a browser, how about
PHP Code:
![]()
__________________
Your mind is like a parachute. It works best when open. (HH The Dalai Lama) |
|
|
|
|
|
#6 | |
|
Ceci n'est pas Zoef
![]() ![]() ![]() ![]() ![]() ![]() Join Date: Nov 2002
Location: Malta
Posts: 1,112
|
Quote:
Rik
__________________
English tea - Italian coffee - Maltese wine - Belgian beer - French Cognac |
|
|
|
|
|
|
#7 | |
|
SitePoint Guru
![]() ![]() ![]() ![]() ![]() Join Date: Jul 2003
Location: Newcastle upon Tyne
Posts: 930
|
Ah, I never thought of that!
--- a few minutes later --- Unforutnetly it keeps coming up with an error relating to the £ symbol too... Actual error message; Quote:
Is there a way around this? Thanks for helping. |
|
|
|
|
|
|
#8 | ||
|
Test cases complete. 0 fails.
![]() Join Date: Feb 2001
Location: Melbourne Australia
Posts: 6,569
|
To answer the original question,
You have specified a character set of UTF-8. You have to be careful when doing this, because if you have characters in the document that aren't valid UTF-8 characters, then some XML parsers will die with an error. An XML parser must either ignore an invalid character, replace it with another (such as a question mark), or halt and display an error. You should ensure that the document is free of invalid characters before anyone needs to parse it. Was the original document UTF-8? If the original was ISO-8859-1, then change the charset of this feed to match. If you have absolutely no idea what character set you were using then it may have been ISO-8859-1 (but it may also contain some characters that aren't valid in ISO-8859-1). If you have absolutely no idea and you have characters that are not valid either in UTF-8 or in ISO-8859-1, then you will have to bite the bullet and just filter out all non-ASCII characters. Do this: $output = preg_replace('/[^\x20-\x7F]+/', '', $output); For more information about character sets, I recommend you read the Unicode FAQ (do a google). Quote:
Quote:
& < > " Other HTML entities will NOT work in an XML document unless the reader is non-compliant (broken) or the XML file is in a format which allows them (RSS is not such a format). HTML entities other than these should ONLY be used in HTML documents, not in XML, or RSS, or plain text, or anything else. ---------- General notes about character sets: Anybody who builds online applications should be aware of character sets. You should pick one character set, and everything should stick to this character set, because translating it is a hassle. I use UTF-8 for all data in the application I'm building. If you don't specify a character set for your output, then you are relying on the fact that the browser or whatever's reading your output happens to have the same default character set as your application, which it might not. For instance, when you POST data to a form, the POST data is sent in the same character set as the page. If the page doesn't have one, then the server has no way of knowing what character set the data it receives will be in. UTF-8 is better than ISO-8859-1 in most ways, because it's capable of having many thousands more characters than it, including all languages in existence, in the one character set.
__________________
[mmj] My momentous journey~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Bit Depth Blog · Twitter · Contact me Spuds Jokes Bazaar VCS Inkscape Firefox phpBB |
||
|
|
|
|
|
#9 |
|
SitePoint Guru
![]() ![]() ![]() ![]() ![]() Join Date: Jul 2003
Location: Newcastle upon Tyne
Posts: 930
|
Thanks, mmj - that tutorial proved useful and enlightening. I'm not that hot on RSS feed, and I'm only doing it because everybody else seems to have a RSS feed, but I do not know anyone who actually uses one on a day-to-day basis.
Thanks again mmj, I shall investigate UTF vs ISO somemore and experiment. |
|
|
|
|
|
#10 |
|
SitePoint Guru
![]() ![]() ![]() ![]() ![]() Join Date: Jul 2003
Location: Newcastle upon Tyne
Posts: 930
|
Hey, you were right - it was the UTF vs ISO thing - I've changed it so instead of it saying:
Code:
echo ("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n
Code:
echo ("<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\n
Yey! |
|
|
|
|
|
#11 |
|
SitePoint Zealot
![]() ![]() Join Date: Oct 2001
Location: Estonia
Posts: 141
|
i use function from comments to htmlentities
Code:
function xmlentities($string, $quote_style=ENT_COMPAT)
{
$trans = get_html_translation_table(HTML_ENTITIES, $quote_style);
foreach ($trans as $key => $value)
$trans[$key] = '&#'.ord($key).';';
return strtr($string, $trans);
}
__________________
(2B) or (not 2B) = FF |
|
|
|
|
|
#12 | |
|
Ceci n'est pas Zoef
![]() ![]() ![]() ![]() ![]() ![]() Join Date: Nov 2002
Location: Malta
Posts: 1,112
|
Quote:
I'm looking into XML and RSS with the idea of writing a decent reader/agregator and I must say that it can all be rather confusing. I'm finding it hard to gather the information I need. There's the 'introductory articles' which are a dime in a dozen. There's few 'practical guidelines' or 'best practice' articles out there that go a bit further then the simplest stuff. Even the specs are ambiguous at best These are a few of the questions I'm strugling with:
.Rik
__________________
English tea - Italian coffee - Maltese wine - Belgian beer - French Cognac |
|
|
|
|
|
|
#13 | |
|
SitePoint Member
Join Date: May 2004
Location: Belgium
Posts: 1
|
Quote:
This might come in handy when you want to display characters that are not allowed in XML. For example a url to a specific forum post in an rss feed: "http://localhost/forum/index.php?showtopic=100&#entry504". The pound/hash (#) symbol will cause an error if you don't fit it in a CDATA section. This will render a correct '#' in XML: Code:
<![CDATA[#]]> |
|
|
|
|
|
|
#14 |
|
SitePoint Member
Join Date: Oct 2007
Location: Everywhere
Posts: 3
|
The problem occurs at the stage of Post collection
I had spend lot of time in solving this problem. I needed to post XML data, which is UTF-8 encoded. I tried with ISO-8859-1 also but the same problem. I noticed that the POST data was truncated at the first occurrence of "&"
As in valid XML "&" must necessarily be converted to special entity, when you post the same data using any form submitted through a browser, entire data is URLencoded. But when the same data is sent via POST method, using any other application, in my case it was VB Program, the data was truncated, even when I used form encoding as application/x-www-form-urlencoded Now I shall try reading the RAW POST DATA using PHP://INPUT Then that data must be urldecoded and HTML_ENTITY_DECODED as well. I think upon accessing raw post data, it should work. For now, I have converted special entity to differrent substitute as I need to finish the project . Regards, |
|
|
|
![]() |
| Bookmarks |
«
Previous Thread
|
Next Thread
»
| Thread Tools | |
| Display Modes | |
|
|
|
All times are GMT -7. The time now is 09:23.









[mmj] My momentous journey
.


Linear Mode
