SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    SitePoint Member
    Join Date
    Feb 2006
    Posts
    3
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    XML / PHP News System

    Hi,

    I am coding a news system whereby an xml file can be uploaded via a webform, parsed with PHP and eventually read into a database. I'm reading the xml OK up to a point - at the moment I'm just printing out to the page.

    PHP Code:
    for($x=0;$x<count($story_array);$x++){ 
        echo 
    "\t<h2>" $story_array[$x]->title "</h2>\n"
        echo 
    "\t\t\n"
        echo 
    "\t<i>" $story_array[$x]->when "</i>\n"
          echo 
    "\t\t\n"
        echo 
    "\t<i>" $story_array[$x]->body "</i>\n";
        echo 
    "\t<i>" $story_array[$x]->about "</i>\n";

    This seems to ready out almost perfectly, but it seems to be misunderstanding where the start of new stories begin. The parsing script is:

    PHP Code:
    $xml_file "temp.xml"
                
                
    $xml_title_key "*NEWS*NEWSITEM*TITLE";
                
    $xml_when_key ="*NEWS*NEWSITEM*WHEN";
                
    $xml_where_key ="*NEWS*NEWSITEM*WHERE";
                
    $xml_intro_key ="*NEWS*NEWSITEM*INTRODUCTION"
                
    $xml_body_key ="*NEWS*NEWSITEM*BODY";
                
    $xml_about_key ="*NEWS*NEWSITEM*ABOUT"
                
                
                
    $story_array = array(); 
                
                
    $counter 0
                class 
    xml_story
                    var 
    $title$when$where$intro$body$about;
                } 
                
                function 
    startTag($parser$data){ 
                    global 
    $current_tag
                    
    $current_tag .= "*$data"
                } 
                
                function 
    endTag($parser$data){ 
                    global 
    $current_tag
                    
    $tag_key strrpos($current_tag'*'); 
                    
    $current_tag substr($current_tag0$tag_key); 
                } 
                
                function 
    contents($parser$data){ 
                    global 
    $current_tag$xml_title_key$xml_when_key$xml_where_key$xml_intro_key$xml_body_key$xml_about_key$counter$story_array
                    switch(
    $current_tag){ 
                        case 
    $xml_title_key
                            
    $story_array[$counter] = new xml_story(); 
                            
    $story_array[$counter]->title $data
                            break; 
                        case 
    $xml_when_key
                            
    $story_array[$counter]->when $data
                            
    $counter++; 
                            break; 
                        case 
    $xml_where_key
                            
    $story_array[$counter]->where $data
                            
    $counter++; 
                            break;
                        case 
    $xml_intro_key
                            
    $story_array[$counter]->intro $data
                            
    $counter++; 
                            break; 
                        case 
    $xml_body_key
                            
    $story_array[$counter]->body $data
                            
    $counter++; 
                            break;
                        case 
    $xml_about_key
                            
    $story_array[$counter]->about $data
                            
    $counter++; 
                            break; 
                    } 
                } 
                
                
    $xml_parser xml_parser_create(); 
                
                
    xml_set_element_handler($xml_parser"startTag""endTag"); 
                
                
    xml_set_character_data_handler($xml_parser"contents"); 
                
                
    $fp fopen($xml_file"r") or die("Could not open file"); 
                
                
    $data fread($fpfilesize($xml_file)) or die("Could not read file"); 
                
                if(!(
    xml_parse($xml_parser$datafeof($fp)))){ 
                    die(
    "Error on line " xml_get_current_line_number($xml_parser)); 
                } 
                
                
    xml_parser_free($xml_parser); 
                
                
    fclose($fp); 
    This almost works, but seems to split up the text very oddly, for example:

    Code:
     <h2>title</h2>
    		
    	<i></i>
    		
    	<i></i>
    	<i></i>
    	<h2></h2>
    		
    	<i>date</i>
    		
    	<i></i>
    	<i></i>
    	<h2></h2>
    		
    	<i></i>
    		
    	<i></i>
    	<i></i>
    	<h2></h2>
    		
    	<i></i>
    		
    	<i>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Nullam a est. Praesent fringilla lacinia mauris. In tempus ante ut augue. Morbi vel massa. Etiam sit amet ligula. In fringilla feugiat nulla. Ut nonummy. Nullam nonummy nunc et est. Sed felis erat, pharetra in, bibendum vel, feugiat non, massa. Qui</i>
    	<i></i>
    	<h2></h2>
    		
    	<i></i>
    		
    	<i>sque condimentum mi id nunc. Pellentesque magna arcu, sagittis eget, volutpat nec, commodo id, ipsum. Ut vel lectus sed tortor gravida bibendum. In hac habitasse platea dictumst. Etiam egestas eros eu mauris. Aliquam leo elit, suscipit non, molestie eu, sollicitudin et, risus. Fusce dui. Praesent dolor. Etiam facilisis tortor vitae eros. Vivamus viverra enim ac nulla.</i>
    	<i></i>
    	<h2></h2>
    		
    	<i></i>
    		
    	<i></i>
    	<i>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Nullam a est. Praesent fringilla lacinia mauris. In tempus ante ut augue. Morbi vel massa.</i>

    Notice all the extra <i> tags etc. and also the word "Quisque" has been split mid word! (Qui <split... lots of additional tags then> sque).

    Quite a lot of code, I'm sorry. Any ideas?

    Thanks.

  2. #2
    SitePoint Addict
    Join Date
    Sep 2002
    Posts
    225
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Just curious what version of PHP are you using? Simple XML that is in PHP 5.x might make things a little easier.

  3. #3
    SitePoint Member
    Join Date
    Feb 2006
    Posts
    3
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for the reply.

    PHP Version 4.3.7
    Windows NT

  4. #4
    SitePoint Member
    Join Date
    Feb 2006
    Posts
    3
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I've worked this script a bit more today and it works slightly better. I think the problem is to do with the fact that the <body></body> XML element contains HTML. I've done a preg match to stick in CDATA containers to try and help this.

    PHP Code:
    preg_match_all ("/<NEWS>.*<\/NEWS>/Uis"$data$matches);
    $matches[0][0]=str_replace("<Body>",'<BODY><![CDATA["',$matches[0][0]);
    $matches[0][0]=str_replace("</Body>",'"]]></BODY>',$matches[0][0]); 
    But it still seems to skip large proportions of the body text when parsing. Another example:

    This:
    Code:
    <Body>
    Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Fusce vitae dolor. Maecenas ut felis id enim tempor ultrices. Nullam libero erat, vehicula ut, sollicitudin vitae, laoreet et, lectus.
    <p>Morbi sit amet quam non urna molestie laoreet. Vestibulum tellus. Suspendisse potenti. Donec pretium. Vivamus erat nunc, rhoncus et, mattis ac, tristique eget, felis. Quisque non ipsum. Suspendisse potenti.
    </p>
    <h1>dolor sit amet</h1> Morbi vitae erat eu dolor mattis gravida. Fusce adipiscing. Nulla leo. Fusce nunc. Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec gravida diam nec mauris.
    <p>
    <img src="http://www.url.com/image.jpg"/>
    <img src="http://www.url.com/image.jpg"/>
    </p>
    			<h1> sollicitudin </h1>Praesent pharetra, nibh ut condimentum pharetra, massa eros ullamcorper nisl, fringilla commodo massa massa id magna. Phasellus sit amet augue. 		<p>
    			<img src="http://www.url.com/image.jpg"/>
    			</p>
    			<h1> mauris </h1> Suspendisse elementum porta dui. Praesent in orci. Nulla sit amet risus. Praesent neque. Etiam leo nisl, ultricies non, pulvinar ut, adipiscing vitae, mauris. Cras et risus. In accumsan tristique odio. Nam rhoncus. Donec tincidunt auctor sem. Donec quis purus. Donec sagittis pellentesque nulla. Etiam pharetra molestie diam. Mauris egestas, enim at imperdiet sagittis, leo purus blandit quam, sed aliquet augue nisi in nisl. Aliquam placerat neque quis dolor.	
    			<p>
    				<img src="http://www.url.com/image.jpg"/>
    				<img src="http://www.url.com/image.jpg"/>
    				<img src="http://www.url.com/image.jpg"/>
    			</p>
    			<h1>Donec sagittis</h1>
    Nulla sit amet risus. Praesent neque. Etiam leo nisl, ultricies non, pulvinar ut, adipiscing vitae, mauris. Cras et risus. In accumsan tristique odio.
    </Body>
    In the XML would become this when parsed and printed to the screen:

    Code:
    pendisse elementum porta dui. Praesent in orci. Nulla sit amet risus. Praesent neque. Etiam leo nisl, ultricies non, pulvinar ut, adipiscing vitae, mauris. Cras et risus. In accumsan tristique odio. Nam rhoncus. Donec tincidunt auctor sem. Donec quis purus. Donec sagittis pellentesque nulla. Etiam pharetra molestie diam. Mauris egestas, enim at imperdiet sagittis, leo purus blandit quam, sed aliquet augue nisi in nisl. Aliquam placerat neque quis dolor.	
    			<p>
    				<img src="http://www.url.com/image.jpg"/>
    				<img src="http://www.url.com/image.jpg"/>
    				<img src="http://www.url.com/image.jpg"/>
    			</p>
    			<h1>Donec sagittis</h1>
    Nulla sit amet risus. Praesent neque. Etiam leo nisl, ultricies non, pulvinar ut, adipiscing vitae, mauris. Cras et risus. In accumsan tristique odio.
    That is, it has cut out half of it! Any ideas?


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •