SitePoint Sponsor

User Tag List

Results 1 to 3 of 3
  1. #1
    SitePoint Member
    Join Date
    May 2002
    Posts
    10
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Japanese characters / PDF

    Hi everyone!
    I have made a mistake and now I have to create a Japanese homepage. My only ressources I get are PDF files which display the kanjis rather nicely.
    Now I have 2 big problems:

    1. Is there any way to format the pdf files into html?
    (Or at least get the data out of it? cut-and -paste does not work with kanjis...)

    2. Which test encoding should I use ? Japanese Shift JS?


    For any kind of tips I would be really grateful..

    Cheers,

    John

  2. #2
    SitePoint Member
    Join Date
    May 2002
    Posts
    10
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    RE: Mistake

    ... by mistake I mean only that I have the disadvantage and am not able to write (or speak) Japanese which makes the whole project quite difficult...

    John

  3. #3
    will code HTML for food Michel V's Avatar
    Join Date
    Sep 2000
    Location
    Corsica
    Posts
    552
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    There is at least one PDF-to-HTML converter out there, but it's commercial. Free version only outputs 1 page out of 2 (the 2nd page gets replaced by an advertisement for the company that edits the converter).
    Even if you purchased the full version, the results in HTML are so terrible that you'll regret your purchase (trust me, been there done that, at least money came from my employer so I only lost time).

    What you can do, is compile PHP with PDF extensions on your computer, then use PHP's PDF functions to open the PDF file and get the data out of it. Not sure how you would go about this, but it seems a better method, in the way that it could let you extract just the data you need.

    There are probably other ways to extract the data, since PDF is an open format. You could look for these other methods too.


    As for the encoding, I would say 'use Unicode' but I'm not sure about the widespread usage of it in Japan itself. Shift-JIS is widespread though, more than Shift-EUC and other encodings in my memories.


    I hope this helps
    Ganbatte kudasai !


    PS: when I read 'I made a mistake', I thought 'ouch, should have checked you had enough for some more sushi, now restaurant owner very mad', eheh
    Last edited by Michel V; May 6, 2002 at 04:26.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •