Copy and Paste Word document to HTML form and preserve formatting

I would like to copy and paste a word document into an html form, hit submit, and the form data go into an mssql database, all while preserving the formatting like paragraphs, bold, italics, etc.

I can get the data inserted into the database, but I would like to know what I need to do to preserve the formatting so that when I pull it out of the database, and display it with html, it will look almost identical to the word version.

Any ideas on how to accomplish this? I would greatly appreciate the help. Thank you!

The problem with word is that it uses a proprietary xml markup language with css…its not standard HTML…I would suggest doing a “save as webpage” then copy and paste…otherwise i’m not sure how it’ll work…Word also uses special characters, so you’ll have to convert them if you have them…eg smart quotes. where the quotes to the left of the word have the dots on the top and the quotes to the right of the word have the dots on the bottom. Just stuff like that…you can try searching the forums to see if you can find how to convert word special chars to html or whatever.


<html xmlns:w="urn:schemas-microsoft-com:office:word">
<xml>
    <w:WordDocument>
    <w:View>Print</w:View>
    <w:Zoom>BestFit</w:Zoom>
    <w:HideSpellingErrors/>
    <w:HideGrammaticalErrors/>
    <w:DoNotOptimizeForBrowser/>
    </w:WordDocument>

    <w:OfficeDocumentSettings>
          <w:ReadOnlyRecommended/>
    </w:OfficeDocumentSettings>

    <w:DocumentProperties>
          <w:Subject>Generating Dynamic Word</w:Subject>
          <w:Author>David Shafik</w:Author>
          <w:Description>
              An Article on generating Dynamic Word Documents in PHP using Office
              HTML
          </w:Description>
      </w:DocumentProperties>
</xml>

<style>
  <!--
   /* Styles for standard page setup */

    @page Section1{
         size: 8.5in 11.0in;
         margin: 1.0in 1.25in 1.0in 1.25in;
     }

    div.Section1 {
        page: Section1;
    }

    /*
    Styles for paragraph formatting.
    mso-pagination is an MS Word specific CSS that corresponds
    to the line breaking options. Possible values are "window-orphan" and
    "lines-together". They can be combined with a space as in "window-orphan
    lines-together".
    */

    p {
        margin: 0pt;
        mso-pagination: widow-orphan;
        font-size: 12.0pt;
        font-family: "Times New Roman";
    }
    -->
    </style>

This is pretty much what it looks like when you break it down…

http://pixelated-dreams.com/pages/dynamic_word_docs.html

This is an article i was using to do dynamic word files, i was also trying to do an “upload .doc” script and have the same happen, but it wasn’t acting right…

you can’t do fread, fopen, include(“word.doc”); or something else and store it an output buffer, cause it handles the data wrong…

I dunno, imaybe I was just doing something wrong…if someone knows, then I’d still have use for this.

Absolutely impossible. Have the user upload a Word document and send it back to them if that’s what you want. This is simply not possible. 99% of the formatting of a Word document doesn’t get copied to the clipboard, 99% of what does doesn’t get transferred when pasting into the text box, and 99% of the original formatting can’t be reproduced systematically without a huge amount of coding.

The only way you could possible do it is with word’s com object model and vb… not VBScript but Visual Basic. Then you could have word, translate the file to html and send it via ftp or http to your database or script and do whatever you want to do with it. This would be really unattractive to your users… they would have to use your word template for all their documents.

I’m not certain what your goal is… if you want a file management then I’d upload the doc and store it either as a file or as a blob in a database.

If your looking for word to be some sort of CMS then I think you should drop it.

Have a look at TinyMCE a JS Wysiwyg webpage “text field” editor.

It has a number of add-ins. One of those is called “Paste from Word”, and from the little I have used it, it works very well in IE - FF is a bit dodgy.

1 paste contents of file into Paste from word window
2 save it to the database as straight html (or just write a html file!)
3 display it from the dbase (or link to the html file)

Proviso: its not much good with copying tables or “I can use all the options in a single document” word processor-mad formatting deamons.

The documentation is good, but a little js knowledge would help you.

HTH

this one also works fairly well in IE: https://www.cfdev.com/activedit/demo/edit.cfm

Hmm. Thanks for your replies everyone. This has been a huge help!
I need to think what the best way to accomplish this. If I should just have them upload the word doc and let people download word docs, or if it should be viewable in the browser as html. I know google can search .docs and that is my main concern.

TinyMCE may be all I need, so that is a possibility.

Thanks for the info, I really do appreciate it.

I know this isn’t a great solution, but you can use Adobe Acrobat and convert eveything to pdf.

BTW - almost everyone formats things in MS Word incorrectly. Even if you could get the formatting, most people like to use spaces/tabs/paragraphs like candy. It won’t reflow correctly. :slight_smile: i’ve been tackling this problem for days, and the only good solution is save as web page, or use Acrobat distiller to bake everything to a common format.

Absolute bane of my life, I am sincerely thinking of wiring their keyboards to the mains supply so that get a painful reminder whenever word tables are used - they use tables to hold ALL the text… Its like the good old days in web 1.0 - everything in nested tables… ahhh the joys.

There must be some way. I know there are wyswyg editors that allows (as foofoonet has told) the copy from word thing.

I think you could also try the fckeditor.

Click here for an interesting link.

I have never tried so can’t tell how efficient this would be, but you could always give it a try …