Read specific tags of xml document then separate words of sentence

hi

i have many XML file that their format is .FILE . (i hope you understand),

then i read all of then one by one, with fopen($file, “r”) line by line… because their format is not .xml i couldn use normal xml reader functions to read.

so the problem is read content of specific XML elements, for example look at below example:


<DOC>
<DATE>03/08/2009</DATE>
<AUTHOR>MotorStreet</AUTHOR>
<TEXT>The A8 never fails to amaze us. On a recent vacation we drove 700 miles north and on the way the kids never asked "Are we there yet?" once. The car is so relaxing to travel in and makes any journey comfortable. The gas mileage is equally impressive, on our vacation we averaged 28 mpg and drove most of the way at 80 mph. The 350hp direct injection 4.2L V8 has immediate power and even more refinement. The MMI is easy to use and just makes sense. We looked at the Lexus LS460, but it felt big, heavy, and was worse to drive than the old LS430; the Lexus had too many features and had millions of buttons. The 2007 BMW 750i had the stupid I-drive and the S550 was too expensive. The A8 is brilliant. </TEXT>
<FAVORITE>The engine, rear legroom, seats, suspension, steering, looks, quality, MMI, and more.</FAVORITE>
</DOC>
<DOC>
<DATE>03/31/2008</DATE>
<AUTHOR>driver</AUTHOR>
<TEXT>I've owned a 1979 5000S and a 2002 A4 1.8T, among other makes/models. This car is truly "the ultimate driving machine" bar none (sorry BMW). I've blown away many wanna-be's in BMW's from 3-series up to the 7-series on both highway and backroads with this baby and it's never misbehaved or thrown me a surprise. Incredible power, positive steering, and an amazing ride.</TEXT>
<FAVORITE>The power of the W12, fantastic feedback from the road. Stereo is amazing, Ipod interface is great.</FAVORITE>
</DOC>

so i wana read contents of textcontent and favorite content …
how to do it in optimize way?
then if possible get all of words that exist in sentences, but first step is get contents of this two tags,

Sorry, I don’t. I have never heard of FILE format.

Are they XML or not?

Is it just that they are XML but not well-formed / valid ?

1 Like

its xml file,but format is not .xml

yes,they are xml… i can read this

The XML parsers can be a bit picky about correctness.

The DOM parser is a bit more forgiving though it can still throw errors.

http://php.net/manual/en/domdocument.loadhtml.php

Unlike loading XML, HTML does not have to be well-formed to load.

While malformed HTML should load successfully, this function may generate E_WARNING errors when it encounters bad markup. libxml’s error handling functions may be used to handle these errors.

Give DOMDocument a try. If all you need to to is extract a limited amount of string content it may be good enough.
http://php.net/manual/en/book.dom.php

1 Like

Thanks

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.