Anyone have any good resources on the topic of converting ISO-8859-1 to utf8?
I'm ending up with weird characters using this function I swiped from the PHP doc comments:
PHP Code:
function fixEncoding($in_str)
{
$cur_encoding = mb_detect_encoding($in_str) ;
if($cur_encoding == "UTF-8" && mb_check_encoding($in_str,"UTF-8"))
return $in_str;
else
return utf8_encode($in_str);
}
Is there any "straightforward" way to prevent the "weird" characters?
I'm converting information stored in a table into XML. If I exclude the conversion above the XML errors with this:
HTML Code:
XML Parsing Error: junk after document element
Location: http://local.project-padv5/cascade_module.php
Line Number 10, Column 1:
Not really sure about the correct route to take here considering the conversion function above isn't really "smart" which leads to weird(excuse my lack thereof terminology) characters being outputted in the XML in certain places.
These are the current functions to build the XML hierarchy from the domain level objects.
PHP Code:
function parse_object(IActiveRecordDataEntity $entity,DOMDocument $dom,DOMElement $node=null) {
if(is_null($node)) {
$node = $dom->createElement(Inflector::underscore(get_class($entity)));
$dom->appendChild($node);
}
foreach($entity as $property=>$value) {
$objectNode = $dom->createElement($property);
$node->appendChild($objectNode);
if($value instanceof IActiveRecordDataEntity) {
parse_object($value,$dom,$objectNode);
} else if($value instanceof ActiveRecordCollection) {
parse_collection($value,$dom,$objectNode);
} else {
// this line is the problem
$textNode = $dom->createTextNode(is_null($value)?'':$value);
$objectNode->appendChild($textNode);
}
}
return $node;
}
function parse_collection(ActiveRecordCollection $collection,DOMDocument $dom,DOMElement $node=null) {
if(is_null($node)) {
$node = $dom->createElement('active_records');
$dom->appendChild($node);
}
if(count($collection)!=0) {
foreach($collection as $object) {
$childNode = parse_object($object,$dom);
$node->appendChild($childNode);
}
}
}
I'm not really familiar with the specifics of character encoding so if someone could help me it would be appreciated.
thanks
Bookmarks