SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    SitePoint Zealot prefab's Avatar
    Join Date
    Jan 2003
    Location
    Belgium
    Posts
    133
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Parsing XML with SAX/CData/Int.Characters/Speed

    I've implemented some classes to edit and retrieve XML articles.

    Basically, ach XML file contains some 'meta' type info (keywords, descr) and some sections. The whole system is based on SAX callbacks, and works fine IMHO.

    The point is: how can I include normal HTML tags within the section content? Normally this is where a DTD / CData section comes in right?

    Say I would like to do this:

    Code:
    ...
    <section id="intro">
         <content>
               This is some <b>bold</b> text
         </content>
    </section>
    ...
    Is this simply a matter of declaring it in a DTD description? And does PHP parse this info accordingly? Or do I need to set a CData handler (SAX) somehow?

    Another thing I'm not sure about is whether to use iso-8859-1 encoding (there's a lot of int. char. content) or to use UTF-8.
    I think it's best to convert these to HTML entities on display, not when saving to the XML file.

    Are those nasty HTML entities still recommended over a simple character set definition? Everything should be int. char. and Euro sign safe though...

    The last thing I'm wondering about is whether such a system can keep up speed wise. Not that I really have doubts, as everything is working as expected, SAX seems really quick, the content files are typically only around 5000 chars, and the intended use is for a low to medium traffic site. I was planning to use a cache system where complete pages will be cached once generated, but I'm not sure it's needed.

    Thanks

  2. #2
    Non-Member
    Join Date
    Jan 2003
    Posts
    5,748
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Sorry I don't use SAX though an answer to a few points:

    1) SAX is fast - ie it'll do the job - though XSL-T is a lot faster - why I'm using it 8)
    2) With Sablotron I get errors if I use iso-8859-1 so I'm using utf-8 at the moment but personally I think it's an IE issue
    2.1) I've had problems in the past using iso-8859-1 and client side XSL stylesheets
    3) If you need to embolden some text for example, why not have <bold>...</bold> XML tags ? I might be mis - guided here since I'm looking at it from the XSL-T perspective ?

    Hope I've helped in some way................

  3. #3
    SitePoint Zealot prefab's Avatar
    Join Date
    Jan 2003
    Location
    Belgium
    Posts
    133
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Dr Livingston
    1) SAX is fast - ie it'll do the job - though XSL-T is a lot faster - why I'm using it 8)
    The problem with XSLT is that only a handful of hosts support PHP/Sablotron etc. Otherwise I think XSLT is great. Not sure how to handle such a workflow when content comes from MySQL instead of XML files, as there *might* be some overhead rendering XML from db resultsets and then to another XML like language (XHTML or HTML) again. Maybe you can convince me otherwise though

    2) With Sablotron I get errors if I use iso-8859-1 so I'm using utf-8 at the moment but personally I think it's an IE issue
    If I'm correct, it should be easy to convert to unicode with PHP (or back to iso) if needed. Maybe I can add this option in my class somewhere...
    2.1) I've had problems in the past using iso-8859-1 and client side XSL stylesheets
    I wish I could rely on those in the near future, but for now it's serverside XSL for me, if any
    3) If you need to embolden some text for example, why not have <bold>...</bold> XML tags ? I might be mis - guided here since I'm looking at it from the XSL-T perspective ?
    I think SAX won't work that way; with XSL-T you select (XPath) your tags from the complete XML and presto! With SAX you parse tag by tag from start to finish, so you have no simple way of knowing where exactly your '<bold>' tags are. All this can (and probably needs to) be done with callbacks of coarse, but you're only working with snippets at a time. That's where CData sections come in I guess, but I'm still not sure how to handle this.

  4. #4
    Non-Member
    Join Date
    Jan 2003
    Posts
    5,748
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    After running into some trouble myself yersterday I went on the rake to read up on some stuff and over at www.xml.com I came across the following link which might help ?

    http://www.xml.com/pub/rg/SAX_Tutorials

    Haven't looked but something in there might help ? Also any overhead would proberly be from mysql and not directly by XSL-T LoL

    It's all pretty fast and I'm happy 8) But I get the point....

    -- EDIT--

    Try www.h2hosting.com for Sablotron w/ PHP 8) $79 per year so it's not too risky.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •