SitePoint Sponsor

User Tag List

Results 1 to 3 of 3
  1. #1
    SitePoint Enthusiast
    Join Date
    Oct 2008
    Location
    Pretoria, South Africa
    Posts
    63
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Post XML data to HTML list conversion

    Hi all
    I am writing a WP plugin that takes a Word 2007 document and extracts the content to post is as a page/post.

    While adding new functionality for the next version, I came across this problem. Following is a snippet of the XML data extracted from the document:
    Code XML:
    <w:p>
        <W:PPR>
            <W:PSTYLE W:VAL='ListParagraph' />
                <W:NUMPR>
    		<W:ILVL W:VAL='0' />
    		<W:NUMID W:VAL='1' />
    	    </W:NUMPR>
        </W:PPR>
        <W:R>
    	<W:T value='the value of the item' />
        </W:R>
    </W:P>

    This is all the code the xml file gives me to use (this is repeated for each item). Here is an explanation of the tags (as far as I could figure out):

    1. W:P - Start a normal Paragraph tag
    3. W:PSTYLE - What style of text will follow
    5. W:ILVL W:VAL = 0 - The level of indentation
    6. W:NUMID W:VAL = 1 - The type of list being used (1 = ul | 2 = ol)
    10. W:T - This is the text being displayed

    I extracted the data to an array using the xml_parse_into_struct function that creates an array (snippet) like this:

    Code PHP:
        [496] => Array
            (
                [tag] => W:P
                [type] => open
                [level] => 3
                [attributes] => Array
                    (
                        [W:RSIDR] => 00F775F1
                        [W:RSIDRDEFAULT] => 00F775F1
                        [W:RSIDP] => 00F775F1
                    )
     
            )
     
        [497] => Array
            (
                [tag] => W:PPR
                [type] => open
                [level] => 4
            )
     
        [498] => Array
            (
                [tag] => W:PSTYLE
                [type] => complete
                [level] => 5
                [attributes] => Array
                    (
                        [W:VAL] => ListParagraph
                    )
     
            )
     
        [499] => Array
            (
                [tag] => W:NUMPR
                [type] => open
                [level] => 5
            )
     
        [500] => Array
            (
                [tag] => W:ILVL
                [type] => complete
                [level] => 6
                [attributes] => Array
                    (
                        [W:VAL] => 0
                    )
     
            )
     
        [501] => Array
            (
                [tag] => W:NUMID
                [type] => complete
                [level] => 6
                [attributes] => Array
                    (
                        [W:VAL] => 1
                    )
     
            )
     
        [502] => Array
            (
                [tag] => W:NUMPR
                [type] => close
                [level] => 5
            )
     
        [503] => Array
            (
                [tag] => W:PPR
                [type] => close
                [level] => 4
            )
     
        [504] => Array
            (
                [tag] => W:R
                [type] => open
                [level] => 4
            )
     
        [505] => Array
            (
                [tag] => W:T
                [type] => complete
                [level] => 5
                [value] => The first item of a UL
            )
     
        [506] => Array
            (
                [tag] => W:R
                [type] => close
                [level] => 4
            )
     
        [507] => Array
            (
                [tag] => W:P
                [type] => close
                [level] => 3
            )

    I then parse each array separately into a function that fist check the type of tag (open|complete|close) and then (using switch case) to test for specific tags and when recognized, append the correct string to the variable for output.

    And now finally my problem: How do I generate an ordered or bulleted list (with multi-level support) from this tags (as only one tag can be processed at a time)?

    Some more background information:
    * An Office 2007 file is a .zip file containing xml data and the images as files (it is not embedded in the document itself).
    * I was able to correctly extract Bold, Italics, Underlined, Super and Subscripts, Images (resized), Tables and even hyperlinks. All I still need is the list.
    * The plugin can be downloaded Free at http://wordpress.org/extend/plugins/docx-to-html-free/ and bought Premium at http://wpplugins.com/plugin/305/docx-to-html-premium

  2. #2
    SitePoint Enthusiast
    Join Date
    Oct 2009
    Location
    Internet
    Posts
    49
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Is it not just possible to loop through, find where these tags occur and process them into a seperate variable? like
    PHP Code:
    $ul = array(); 
    Then after loop through this array and print the contents?

  3. #3
    SitePoint Enthusiast
    Join Date
    Oct 2008
    Location
    Pretoria, South Africa
    Posts
    63
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I was unable to make it work (with the main reason: at what tag do I start the loop through) and decided that my current method used for processing would not support a solution like this.
    Therefore I will leave this functionality for a future version.
    @Tribal_01, thank you for pointing out how this would have been possible (had I used a different method).


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •