ISO a tool that combines many HTML files into one

I’m finishing up a documentation project which produced about three dozen HTML files. Now the client says he wants the content delivered in a single PDF.

I’d like to do this by creating a file that says in effect, “include this HTML file, then this one, then this one,…” A tool will read this file, assemble all of HTML files into one big HTML file, and resolve all of the cross-links (links from one HTML file to another) into internal links (links from one part of the combined HTML to another). Then I can convert the combined HTML file to a PDF.

My object is not so much to create the PDF file more efficiently, as to avoid having to update two parallel versions of the content once I’ve done so. If I had this tool I could regenerate the PDF in a couple of steps whenever an HTML file changes.

Does such a tool exist?

Have you considered using Windows CHM (Help File) format?
The [free, I think] tool provided by Microsoft does what you are describing.

I appreciate your effort to be helpful, but this really doesn’t address the question. My client asked me for a PDF. If I give him a CHM, he’s just going to wonder why I don’t follow instructions.

Php to PDF has a library and a comment may save you some time:
[COLOR=#000000][FONT=Source Code Pro]

If you only have PDFLib Lite installed, I would not recommend bothering with this library, as you can really only output text and import an image, and that’s about it. Forget about adding complexities such as color, blocks and other elements. Switch to an open source library such as FreePDF ([/FONT][/COLOR][COLOR=#000000][FONT=Source Code Pro]).


there are a lot of websites that offer free conversion of html pages to pdf i.e

However, the nature of a ‘web’ site is the interconnected relationship among pages. This cannot be represented in a serial document such as a PDF.

Three out of four responders have misunderstood my question in different ways, soi I’m persuaded that I did a bad job of stating it, and I’d like to have another go.

On one hand, I have a web site with several dozen pages. On the other, I have people who are effectively my clients (now at least two of them out of four) who tell me that their clients find it difficult to utilize information from web sites, and insist on a page-oriented format, customarily a PDF.

To provide that I can convert each page to a PDF with Acrobat, or with a free PDF driver like CutePDF, and then combine the PDFs with Acrobat. But this approach has a serious disadvantage: the pages are heavily interlinked, and the links would continue to point to their original targets. From the reader’s point of view, there would be all these links that obviously were supposed to point to other parts of the document, and they’d point to the same material on some web site instead. Dumb!

So, I want a tool that lets me combine the HTML documents – resolving the links from inter-document targets to intra-document targets, among other things – and then convert to PDF. If the tool is really nice it can automatically add front matter, page separators, and a table of contents (although I’ll probably have to add the page numbers by hand after the conversion).

Returning briefly to the suggestions:

The people making the request are asking specifically for a PDF. Thus the whole point is to produce a page-oriented rendition of the web site. Some other page oriented format would probably be OK, although I’d have to clear it with them. Another interactive format, e.g., CHM, would be completely off point.

I don’t get to tell them that they don’t really want what they’re asking for because a web site cannot be (faithfully) represented in a serial document. That’s true, but irrelevant. This is one of those cases where the customer is always right; the customer’s customer is right squared!

Maybe some overriding constraint compels my clients’ clients to use a serial format, and it just hasn’t been explained to me. Maybe they’re asking for this because they put their heads on backwards when they get out of bed. It doesn’t matter.

I’m looking for an HTML tool, not a software component that I could use to create my own tool. My “client” does not have time to wait while I engage in a bout of software development, nor does my boss pay me to do that. In any case the response didn’t seem to imply that PDFLib can retarget links and do the other things required to solve this problem. I looked at its web site briefly, and got the impression that it \would just let me do those things myself by manipulating PDF rather than HTML. It’s not clear what the percentage is in that… especially if it involves implementing the tool on a server, which would not be its natural environment.

By the way, the problem is now solved to the extent it can be, because today is my last day in this job. Everything I can do for my “clients” is done. I encountered a similar problem once before, though, so I foresee encountering one again in the future, and I’d like to be ready when I do. On top of that, the problem is technically interesting.

Please read the whole thread before replying. Online converters have already been suggested, and Orthoducks has explained why they are not an appropriate solution.

As Orthoducks has also said that the issue ceased to be his/her problem three months ago, there seems little point in reviving the discussion now.

Thread closed.