SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    SitePoint Zealot Rio's Avatar
    Join Date
    Nov 2001
    Location
    United Kingdom
    Posts
    171
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    PDF to HTML Conversion

    Hi there!

    Does anyone know if it's possible to convert PDF file into html using PHP?

    I'm building an online library of downloadable PDFs but, according to an expert, some African countries (which we deal with very often) have extremely low bandwidth. In some cases, I was told, that even downloading 250kb file will pose a problem.

    I thought converting them on the fly into html and displaying them in a browser maybe one way to get around this problem.

  2. #2
    killall -9 lusers
    Join Date
    Oct 2002
    Location
    Cincinnati, Ohio, USA
    Posts
    390
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I don't know if PHP could do it for you, but it is certainly possible. Google does this when it caches PDF documents.

  3. #3
    As the name suggests... trickie's Avatar
    Join Date
    Jul 2002
    Location
    Melbourne, Australia
    Posts
    678
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It is probably feasible, but all the PDF classes/extensions that i have used are for PDF generation. I don't know of any that allow you to parse an existing PDF.

    Maybe you will have to pay for PDF Import library at http://www.pdflib.org

  4. #4
    SitePoint Zealot Rio's Avatar
    Join Date
    Nov 2001
    Location
    United Kingdom
    Posts
    171
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Both Google and Adobe does this kind of things and both seem to be using perl script for the job. Maybe I should post this thread in Perl Forum

  5. #5
    SitePoint Wizard
    Join Date
    Apr 2002
    Posts
    2,322
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    did you find an answer to this? i'd really like to know how to do this, or better still some code that'll convert pdf to plain text.

    i found this page with various pdf conversion links - www.technoir.nu/hplx/hplx-l/9908/msg00460.html

    also does anyone know what is the situation with pdf is? is there any danger of it becoming another gif?

  6. #6
    SitePoint Zealot Rio's Avatar
    Join Date
    Nov 2001
    Location
    United Kingdom
    Posts
    171
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi,

    I've just found a couple of pages on the net and They seem to be quite promising.

    http://www.sanface.com/pdfprint/
    http://pdftohtml.sourceforge.net/

    The problem is, I'm no quite sure how to go about incorporating these applications into the site. Also, as they involve shell scripts and perl, I feel it's not quite right to go on much further in this forum (I'm not quite sure the exact protocol regarding this - Can anyone advise me of this, please?)

  7. #7
    SitePoint Zealot jinx3's Avatar
    Join Date
    May 2002
    Location
    Vancouver, WA
    Posts
    127
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I am also trying to figure this same problem out. It would be for taking print advertisements saved as a pdf and converting it to html with some links and other features.

    The pdftohtml seems to be the best option that I have found as well.

    Any PDF experts out there???


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •