SitePoint Sponsor

User Tag List

Results 1 to 9 of 9
  1. #1
    SitePoint Member
    Join Date
    Dec 2008
    Posts
    5
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Reading Word Document in PHP

    Hi All

    I am reading an word document by using the following code

    <?php
    $filename = "C:/wamp/www/OpenID.doc";
    $word = new COM("word.application") or die("Unable to instantiate Word");
    $word->Documents->Open($filename);
    $new_filename = substr($filename,0,-4) . ".txt";
    // the '2' parameter specifies saving in txt format
    $word->Documents[1]->SaveAs($new_filename,2);
    $word->Documents[1]->Close(false);
    $word->Quit();
    //$word->Release();
    $word = NULL;
    unset($word);

    $fh = fopen($new_filename, 'r');
    // this is where we exit Hell
    $contents = fread($fh, filesize($new_filename));
    fclose($fh);
    unlink($new_filename);
    echo("<pre>$contents</pre>");
    ?>

    But then it prints the $contents on the browser then the word formatting are missing , can anybody suggest me how to maintain the formatting of the document. And it should be displayed on browser as it is in the word document.

  2. #2
    From Italy with love silver trophybronze trophy
    guido2004's Avatar
    Join Date
    Sep 2004
    Posts
    9,506
    Mentioned
    163 Post(s)
    Tagged
    4 Thread(s)
    Quote Originally Posted by mudgil.gaurav View Post
    the '2' parameter specifies saving in txt format
    I don't know this "word.application", but when you save as a text file, you loose all formatting. Isn't there a $word->Documents[1]->Display or something like that, that echoes the formatted document content?

  3. #3
    Programming Since 1978 silver trophybronze trophy felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, NSW, Australia
    Posts
    16,862
    Mentioned
    25 Post(s)
    Tagged
    1 Thread(s)
    I don't think there is any way of converting from a Word document to equivalently formatted HTML without manually coding all the HTML tags yourself. If there were then Microsoft would have incorporated that into Word itself instead of the "Word to garbage that looks a bit like HTML" filter that it currently uses.
    Stephen J Chapman

    javascriptexample.net, Book Reviews, follow me on Twitter
    HTML Help, CSS Help, JavaScript Help, PHP/mySQL Help, blog
    <input name="html5" type="text" required pattern="^$">

  4. #4
    SitePoint Wizard
    Join Date
    Mar 2008
    Posts
    1,149
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    PHP Code:
    // the '2' parameter specifies saving in txt format
    $word->Documents[1]->SaveAs($new_filename,2); 
    to

    PHP Code:
    // the '10' parameter specifies saving in filtered HTML format
    $word->Documents[1]->SaveAs($new_filename10); 

  5. #5
    Visible Ninja bronze trophy
    JeffWalden's Avatar
    Join Date
    Sep 2002
    Location
    Los Angeles
    Posts
    1,709
    Mentioned
    5 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by felgall View Post
    I don't think there is any way of converting from a Word document to equivalently formatted HTML without manually coding all the HTML tags yourself. If there were then Microsoft would have incorporated that into Word itself instead of the "Word to garbage that looks a bit like HTML" filter that it currently uses.
    Gmail does a fairly decent job of this, although it's not perfect.
    TAKE A WALK OUTSIDE YOUR MIND.

  6. #6
    Programming Since 1978 silver trophybronze trophy felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, NSW, Australia
    Posts
    16,862
    Mentioned
    25 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by Hyperbolik View Post
    Gmail does a fairly decent job of this, although it's not perfect.
    You mean GMail does a reasonable job of understanding the Garbage that Microsoft programs create instead of proper HTML. I don't think GMail has the ability to read Word Documents directly.
    Stephen J Chapman

    javascriptexample.net, Book Reviews, follow me on Twitter
    HTML Help, CSS Help, JavaScript Help, PHP/mySQL Help, blog
    <input name="html5" type="text" required pattern="^$">

  7. #7
    Visible Ninja bronze trophy
    JeffWalden's Avatar
    Join Date
    Sep 2002
    Location
    Los Angeles
    Posts
    1,709
    Mentioned
    5 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by felgall View Post
    You mean GMail does a reasonable job of understanding the Garbage that Microsoft programs create instead of proper HTML. I don't think GMail has the ability to read Word Documents directly.
    Aye. Thanks for the correction.
    TAKE A WALK OUTSIDE YOUR MIND.

  8. #8
    Programming Since 1978 silver trophybronze trophy felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, NSW, Australia
    Posts
    16,862
    Mentioned
    25 Post(s)
    Tagged
    1 Thread(s)
    The closest to HTML hat you can get from a Word Document is to open the Word Document in Open Office and use the HTML save option there which will at least produce valid HTML even though it will discard some of the formatting to do so. What formatting it discards has no proper HTML equivalent,
    Stephen J Chapman

    javascriptexample.net, Book Reviews, follow me on Twitter
    HTML Help, CSS Help, JavaScript Help, PHP/mySQL Help, blog
    <input name="html5" type="text" required pattern="^$">

  9. #9
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Openoffice does a fair job of reading word documents, and it can write the output as html. You can get openoffice in a "headless" version - eg. a commandline utility. With that, you can convert word documents to html. I believe Google is using that, in some variant.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •