SitePoint Sponsor

User Tag List

Results 1 to 17 of 17

Thread: pdf or html

  1. #1
    SitePoint Addict
    Join Date
    Mar 2003
    Location
    New York city
    Posts
    212
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    pdf or html

    I need to put about 30 or so articles on a web site. Most of them are not in electronic form. One suggestion was to scan them and save them as pdf files. Then just link to the pdf files on the web site. The other option would be to scan them and convert them into html files. The first option seems like much les work to me. Any thoughts?

  2. #2
    SitePoint Zealot SuperFunZoo's Avatar
    Join Date
    Feb 2004
    Posts
    161
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I would go for the PDF format if I were you.

    It is easy to print the document and easy to add it as an attachment if you need to send it as an email sometime. And almost everyone can use it.

  3. #3
    SitePoint Zealot willoworks's Avatar
    Join Date
    May 2003
    Location
    Trappe, MD
    Posts
    140
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    For optimal search engine coverage, you should probably convert them to HTML. Some search engines are doing pdf, but not all of them (correct me if I'm wrong). Plus some people (me for example) find pdf to be annoying. I want to continue using my browser, not bring up another program to read we content.

  4. #4
    SitePoint Zealot SuperFunZoo's Avatar
    Join Date
    Feb 2004
    Posts
    161
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    For optimal search engine coverage, you should probably convert them to HTML. Some search engines are doing pdf, but not all of them (correct me if I'm wrong). Plus some people (me for example) find pdf to be annoying. I want to continue using my browser, not bring up another program to read we content.
    Yep, you're right. Another option is to make the web pages printable. It depends on what you want to do with it in the future. If you want the visitors to be able to download the articles, the PDF format is perfect. Why not make two versions?

  5. #5
    SitePoint Addict
    Join Date
    Mar 2003
    Location
    New York city
    Posts
    212
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    They would like them to be searchable. Do you know if Google searches pdfs? The main purpose to that visitors can print them.
    Quote Originally Posted by willoworks
    For optimal search engine coverage, you should probably convert them to HTML. Some search engines are doing pdf, but not all of them (correct me if I'm wrong). Plus some people (me for example) find pdf to be annoying. I want to continue using my browser, not bring up another program to read we content.

  6. #6
    SitePoint Addict
    Join Date
    Mar 2003
    Location
    New York city
    Posts
    212
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Making two versions would be double the work which for 30 articles can be a lot. That would mean scaning the article, converting to a word doc. with an ocr program, then converting to a html doc. Seems like too much work.
    Quote Originally Posted by SuperFunZoo
    Yep, you're right. Another option is to make the web pages printable. It depends on what you want to do with it in the future. If you want the visitors to be able to download the articles, the PDF format is perfect. Why not make two versions?

  7. #7
    SitePoint Zealot SuperFunZoo's Avatar
    Join Date
    Feb 2004
    Posts
    161
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

  8. #8
    Getting there... Willigogs's Avatar
    Join Date
    Aug 2003
    Location
    UK
    Posts
    394
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I have to do this on a regular basis, as a number of people I work for require document libraries on their websites. They have HTML versions for quick reference and PDFs for printing purposes.

    Creating the PDFs isn't a problem - as all their documents are in word format, so it's a simple conversion.

    I usually create the HTML versions in Dreamweaver, so I can setup how it's going to look and then cut and paste the text in. However, this does cause problems when wanting to convert tables or charts, etc. I usually add these as an image or re-create these with tables (which can take a while).

    My policy has always been to provide HTML versions, but always inform the users that:

    "Please note that HTML documents may not reflect the full quality that a PDF document can provide. However, PDF files can sometimes be quite large, therefore they need to be allowed time to download."

    That usually does the trick

  9. #9
    Yugo full of anvils bronze trophy hillsy's Avatar
    Join Date
    May 2001
    Location
    :noitacoL
    Posts
    1,859
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    HTML is about 750x better than PDF for online viewing, but if you print it, that's another story.

    Side note - if you just scan into PDF without doing any kind of OCR, then Google will not be able to index it. That's because all the PDF will be is a series of bitmaps with no actual text to index.
    that's me!
    Now A Pom. And a Plone Nut
    Broccoli Martinez Airpark

  10. #10
    SitePoint Addict
    Join Date
    Mar 2003
    Location
    New York city
    Posts
    212
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The side note is a good point. Since someone else is doing the scanning, I'de better make sure that he uses a OCR. Since these articles are 5-7 pages long, Most visitors would probably prefer to print rather than read it on the screen. (At least I would)
    Quote Originally Posted by hillsy
    HTML is about 750x better than PDF for online viewing, but if you print it, that's another story.

    Side note - if you just scan into PDF without doing any kind of OCR, then Google will not be able to index it. That's because all the PDF will be is a series of bitmaps with no actual text to index.

  11. #11
    Ensure you finish what you sta bronze trophy John Colby's Avatar
    Join Date
    Aug 2003
    Location
    University of Central England, U.K.
    Posts
    487
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Another angle - pdf is a proporietary file format in as much as Adobe retains control of the spec - HTML isn't - if that makes any difference to you.

    And there's print style sheets if you want to print.

    IMHO HTML files are smaller than the equivalent PDF - especially if you use external style sheets.
    John
    No electrons were harmed during the creation, transmission
    or reading of this posting. However, many were excited and
    some may have enjoyed the experience.

  12. #12
    SitePoint Addict
    Join Date
    Mar 2003
    Location
    New York city
    Posts
    212
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The problem is that many of these articles are only in print. I would first have to convert them into a word processing doc like word, check to make sure the formatting is right, then convert them into html, make sure then the formatting is right again then post them on the web. Too many steps.
    Quote Originally Posted by John Colby
    Another angle - pdf is a proporietary file format in as much as Adobe retains control of the spec - HTML isn't - if that makes any difference to you.

    And there's print style sheets if you want to print.

    IMHO HTML files are smaller than the equivalent PDF - especially if you use external style sheets.

  13. #13
    SitePoint Addict
    Join Date
    Dec 2002
    Posts
    386
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Also be aware that many people do not have a pdf reader, and still more hate them and do not use them.

    Just make sure that you warn your users of the size and file type of any PDf docs

    a modern (X)HTML document can be made to look fine on screen and for print.

  14. #14
    One website at a time mmj's Avatar
    Join Date
    Feb 2001
    Location
    Melbourne Australia
    Posts
    6,282
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    HTML is a common denominator on the world wide web - every browser supports it. If you do it in only one format, make sure it's HTML. Ideally, do it in HTML and include a link to the PDF version from the HTML.
    [mmj] My magic jigsaw
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    The Bit Depth Blog Twitter Contact me
    Neon Javascript Framework Jokes Android stuff

  15. #15
    SitePoint Member
    Join Date
    Jan 2004
    Location
    Warrnambool, Vic, Australia
    Posts
    5
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Just a suggestion is that you could place the PDF on the site, and have like a description of what is each article.

    eg: This article contains information about this subject and that subject, please download here to view offline.

    This would reduce your work and also if the description is good have a influence on the search engine. This is not the optimal way though.
    Check Page Rank: www.myrank.info
    Aussie Coders: www.coders.net.au

  16. #16
    Ensure you finish what you sta bronze trophy John Colby's Avatar
    Join Date
    Aug 2003
    Location
    University of Central England, U.K.
    Posts
    487
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Just another comment from another mailing list:

    What about peopl using documents from systems that are text only or speech or Braille - PDF causes them problems.
    John
    No electrons were harmed during the creation, transmission
    or reading of this posting. However, many were excited and
    some may have enjoyed the experience.

  17. #17
    SitePoint Addict
    Join Date
    Mar 2003
    Location
    New York city
    Posts
    212
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    There seems to be lots of pros and cons on this subject. Perhaps the best way is to initially put them on as pdfs and then add them later as htmls. I would like to get them on the web as soon as possible and it seems as if pdf would be the fastest way.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •