Creating Multipage HTML using XML

Hi,
I have chosen to use XML for a project as it is a project that needs to produce a web page and PDF from the same content, and which also could be the first of many, and I figured that this would make things a lot easier as it would just be a case of using the existing XML structure.

Anyway, I am finding the process a bit of a steep learning curve. At the moment I am looking at designing an .xsl to transform the .xml into my website and possibly just importing the .xml into Adobe InDesign for now to do my .pdf.

However, the thing I can’t seem to work out is the best way to design the website. I have created a .xsl file which makes the front page look perfectly but don’t know how to start generating multiple .html pages. At the moment I am doing all my transforming in Oxygen XML Editor. I have looked into editing some of the Frameworks, Docbook HTML Chunks for example, but just can’t work out where to start. I basically need to change the structure a lot, and already have some .css stylesheets which I was hoping to add but am struggling with it.

Is this the best way of doing it? By using one of the existing frameworks and customizing it? I am finding that a lot of the information out there on XML and XSL is so full of jargon and overly-descriptive instructions that it is confusing the hell out of me. If someone could offer some advice or maybe tell me where a good place is to find some information/tutorials on creating a website from XML in chunks I would be extremely grateful!!

Thanks
Russ

Do you mean XSLT, XSL-FO, or both (XSLT with XSL-FO as the output language)?

XSLT is used to turn the XML into XSL-FO (in this case), and XSL-FO is used as the language from which a PDF can be generated.

You can use Apache FOP to generate a PDF out of the XSL-FO output. If that doesn’t suit you, you can always [url=http://www.google.co.uk/search?hl=en&q=XSL-FO+processor]search for “XSL-FO processor”.

Well, I’m actually talking about the webpage rather than the PDF so I believe this would be using XLS or XLST to style my XML. However, I still can’t work out the best way to do this. Has anyone here designed a website this way and what was the best of doing this?

For instance, I could use one of the docbook templates but this would include some major customisations and I don’t know if this would be too big of a learning curve. Otherwise I could use the XML DOM to call up my content, but then it looks like this would have some major consequences in regards to SEO. I would really like to design my own XLS sheet which does the transform but really can’t find that much information about it. Does anyone have any guiders or ideas that might help???

Thanks!
Russ

SitePoint’s book “No Nonsense XML Web Development with PHP” http://www.sitepoint.com/books/xml1/ creates CMS pages from XML files. It might be worth your time to at least check out the free sample chapters.

When you have a certain XML data source, there are different ways you can generate a web page and PDF from it (That’s the beauty of XML…).

  1. Use one XSLT to generate a web page, and another to generate XSL-FO, which would then be used by a FO processor (such as the mentioned Apache FOP) to generate the PDF.
    Advantage: Data is separated from presentation. Your PDF can be tailored the way you want it, and not be very similar to your page style. In addition, adding a third style (say, a mobile version) would be easier. Last but not least, XSL-FO could also be converted by a FO processor to other things, like a PNG image for example.
    Disadvantage: Two (or more, if you later want more formats) XSLT stylesheets to worry about. When outputting XSL-FO, you should also take in account what can your FO processor process.

  2. Use one XSLT to generate a web page, and an HTML-to-PDF converter, to generate the PDF. Such converters use their own rendering engine, and usually respect the @print blocks.
    Advantage: Using XML in this case is needless, as you’re processing the HTML. Still, it is possible to use XML with XSLT to generate the HTML, in order to get the data separation benefit. The design of both the PDF and the web page should be completely consistent, unless you have some tricky @print blocks.
    Disadvantage: Think of the HTML-to-PDF converter as one more engine you need to adhere to. Also, it’s somewhat (but not very…) hard to get really diverse styles for your HTML and PDF, which may or may not be a good thing (depending on how you want it).

  3. Use one script (with or without XSLT) to generate the web page, and another such script (without XSLT) to generate the PDF.
    Advantage: Complete control over the appearance of both (or just one) of the formats, while still getting the data separation benefit. Also, no additional rendering engine to worry about. Anything your PDF creation API can do, you can do.
    Disadvantage: Any additional format or a variation of an existing one needs a new script. It is also very hard to tailor the PDF to be similar to the HTML page, while still editing only the HTML styles.
    (Note that when I say “a script”, I mean “a code block” that may be embedded into a single file or separated into more files, but which works as one when it’s being called for)

BTW, what programming language are you using anyway? What is your host’s OS? Can you install or execute programs?

P.S. Hey, this is my 100th post… yay! :stuck_out_tongue:

Wow! Thanks for the information. I’ve actually already started this project and have got a clearer idea of what I’m doing now, but would be happy to get your thoughts on this.

The unique thing about this project is that it is a website which will be accompanied by a downloadable pdf, with both sharing the same content, and that this content will be updated every six months. Plus, they may also be further sites created that use this exact template.

As this first effort is a trial somewhat I’ve decided to try and do this in the easiest way possible. Hence, I have written the XML structure which will be the same for all future websites, using one of the Docbook schemas. Now, I am writing the XSL in Oxygen XML editor and using their Saxon processing engine to transform that into HTML. I haven’t started work on the .pdf side yet but have used Adobe InDesign quite a lot before so did a few tests using that and am thinking of simply using their Import XML function to generate the pdf that way.

In the future we will be looking to have all these transformations happening on the server but for the time being I think this is the best way of doing things. What are your thoughts? It has been really hard to get any advice on this subject, or even find documentation from people doing a similar project.

Yeah. It’s hard to find such, because FO processors, as well as HTML-to-PDF convertors usually require more privilages from the server. Most sites run on a shared host, and therefore, their authors can’t afford to use those programs.

That’s why I asked you what kind of host and language do you use. The solution you’d select will often depend on it.

If you have JSP or are willing to write a simple Java program to “build” your HTML and PDF files, you can easily use Apache FOP, since it’s in a JAR file.

If you have PHP on a shared host, using DOMPDF to convert the HTML to PDF may be the only pre-made option you have.

If you have PHP on a host that has the PHP/Java Bridge and lets you use it, you can use the XML_fo2pdf PEAR package.

If you have nothing of that, but still have some kind of a language with an XML parser and a PDF API at hand, you can write a script in it that does this kind of processing.