Dealing with special characters when parsing XML

Hi All,

I am try to parse an XML file.

<en>View from plot</en>

The file contains place name that have accents. Such as “Mazarrón”

I am using the following to encode it:

htmlspecialchars($property_description_en, ENT_COMPAT, ‘UTF-8’);

however that is chaning “Mazarrón” to “Mazarrón”

What code should I be using?

code follows:

foreach($xml->property as $property)
{

$property_kyero_id = $property->id;
$property_date_modified = $property->date;
$property_agent_ref = $property->ref;
$property_price = $property->price;
$property_price_frequency = $property->price_freq;
$property_part_ownership = $property->part_ownership;
$property_leasehold = $property->leasehold;
$property_type	= $property->type->en;
$property_location_id = $property->location_id;
$property_town = $property->town;
$property_province = $property->province;
$property_location_detail = $property->location_detail;
$property_beds = $property->beds;
$property_baths = $property->baths;
$property_pool = $property->pool;
$property_description_en = $property->desc->en;
$property_description_en = htmlspecialchars($property_description_en, ENT_COMPAT, 'UTF-8');
$property_description_es = $property->desc->es;
$property_description_es = htmlspecialchars($property_description_es, ENT_COMPAT, 'UTF-8');
$property_description_de = $property->desc->de;
$property_description_de = htmlspecialchars($property_description_de, ENT_COMPAT, 'UTF-8');
$property_description_nl = $property->desc->nl;
$property_description_nl = htmlspecialchars($property_description_nl, ENT_COMPAT, 'UTF-8');
$property_description_fr = $property->desc->fr;
$property_description_fr = htmlspecialchars($property_description_fr, ENT_COMPAT, 'UTF-8');
$property_description_da = $property->desc->da;
$property_description_da = htmlspecialchars($property_description_da, ENT_COMPAT, 'UTF-8');
$property_description_ru = $property->desc->ru;
$property_description_ru = htmlspecialchars($property_description_ru, ENT_COMPAT, 'UTF-8');
$property_description_de = $property->desc->de;
$property_description_de = htmlspecialchars($property_description_de, ENT_COMPAT, 'UTF-8');
$property_description_it = $property->desc->it;
$property_description_it = htmlspecialchars($property_description_it, ENT_COMPAT, 'UTF-8');
$property_description_pt = $property->desc->pt;
$property_description_pt = htmlspecialchars($property_description_pt, ENT_COMPAT, 'UTF-8');

You do not need to encode it or anything like that. Just send it as UTF-8 and it will be fine.

If I do that it can mess up the insert query and doesn’t go in as it should…character errors etc…

This is the xml I am trying to work with is this:

xml

The parser is here

http://www.spanishproperty.es/admin/parse_xml.php?agent_id=17

This is what it should be:

from 136,800€ (£94,345)

1 bedroom, 1 bathroom, frontline apartment on a 5* luxury golf resort. communal pools. 5* intercontinental hotel, gym, hairdressers, supermarket, restaurants, club house, 18 hole golf course. fantastic investment/rental opportunity.

payment terms
3,000 reservation (£2,069)
34,200€ - 1 month (£23,586)
20,520€ - 9 months (£14,152)
79,080€ - completion (£54,538)

  • 2 bed from 151,000€ – golf views
  • 3 bed from 230,000€ – golf views

mortgage available.

this is what happens just accessing the node directly:

from 136,800€ (£94,345) 1 bedroom, 1 bathroom, frontline apartment on a 5* luxury golf resort. communal pools. 5* intercontinental hotel, gym, hairdressers, supermarket, restaurants, club house, 18 hole golf course. fantastic investment/rental opportunity. payment terms 3,000 reservation (£2,069) 34,200€ - 1 month (£23,586) 20,520€ - 9 months (£14,152) 79,080€ - completion (£54,538) * 2 bed from 151,000€ – golf views * 3 bed from 230,000€ – golf views mortgage available.

my code is

foreach($xml->property as $property)
{

$property_kyero_id = $property->id;
$property_date_modified = $property->date;
$property_agent_ref = $property->ref;
$property_price = $property->price;
$property_price_frequency = $property->price_freq;
$property_part_ownership = $property->part_ownership;
$property_leasehold = $property->leasehold;
$property_type	= $property->type->en;
$property_location_id = $property->location_id;
$property_town = $property->town;
$property_province = $property->province;
$property_location_detail = $property->location_detail;
$property_beds = $property->beds;
$property_baths = $property->baths;
$property_pool = $property->pool;
$property_description_en = $property->desc->en;

echo $property_description_en."<hr>";
// $property_description_en = htmlspecialchars($property_description_en, ENT_COMPAT, 'UTF-8');
//$property_description_en = utf8_encode($property_description_en);
$property_description_es = $property->desc->es;
//$property_description_es = htmlspecialchars($property_description_es, ENT_COMPAT, 'UTF-8');
$property_description_de = $property->desc->de;
//$property_description_de = htmlspecialchars($property_description_de, ENT_COMPAT, 'UTF-8');
$property_description_nl = $property->desc->nl;
// $property_description_nl = htmlspecialchars($property_description_nl, ENT_COMPAT, 'UTF-8');
$property_description_fr = $property->desc->fr;
// $property_description_fr = htmlspecialchars($property_description_fr, ENT_COMPAT, 'UTF-8');
$property_description_da = $property->desc->da;
// $property_description_da = htmlspecialchars($property_description_da, ENT_COMPAT, 'UTF-8');
$property_description_ru = $property->desc->ru;
// $property_description_ru = htmlspecialchars($property_description_ru, ENT_COMPAT, 'UTF-8');
$property_description_it = $property->desc->it;
//$property_description_it = htmlspecialchars($property_description_it, ENT_COMPAT, 'UTF-8');
$property_description_pt = $property->desc->pt;
// $property_description_pt = htmlspecialchars($property_description_pt, ENT_COMPAT, 'UTF-8');

You need to set the connection to MySQL, I’m assuming you are referring to a database, to UTF-8 then.
http://www.phpwact.org/php/i18n/utf-8/mysql

Why is it that even if I echo the contents of the XML node, it doesn’t reproduce the same text?

It messes up the accented letters.

When you talk about the connection. Are we talking about the collation?

Do the fields in mysql need to be set to UTF8? if so which one if I want to deal with different languages?