SitePoint Sponsor

User Tag List

Results 1 to 9 of 9
  1. #1
    SitePoint Member stonehinge's Avatar
    Join Date
    May 2005
    Posts
    17
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Using Word Documents

    Hi Good People,

    I have a problem that cannot possibly be a new deal, except to a relative n00b such as myself. My situation is that a client has product description text in a WORD format which they desire to publish in my custom-designed web pages. When that WORD file is saved as an HTML file, both WORD and MS Publisher insert pages of indigenous codes which are unacceptable for insertion into a custom webpage.

    I need to make use of this data without having to retype it all. However, I don't have a clue how that might be done. Is there an intermediate step whereby all the XML codes can be stripped, thus making the straight HTML more readily available? Surely this problem has been solved before. Hopefully, some of the forum members here have already solved this problem.

    Thanks...jon

  2. #2
    SitePoint Wizard gold trophysilver trophybronze trophy dc dalton's Avatar
    Join Date
    Nov 2004
    Location
    Right behind you, watching, always watching.
    Posts
    5,431
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Dreamweaver has the ability to clean all that MS crap out of the files, although I dont know how well it works because Ive never used it..

    you could also save yourself some aggravation by just copy / paste them into a text file (or have the client do it) ...... word is a mess, always has been and doesnt play well with html usually

  3. #3
    doing my best to help c2uk's Avatar
    Join Date
    May 2005
    Location
    Cardiff
    Posts
    1,832
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by dc dalton
    you could also save yourself some aggravation by just copy / paste them into a text file (or have the client do it) ...... word is a mess, always has been and doesnt play well with html usually
    That would be my advise as well.

    Or get the word document and copy and paste it into your dreamweaver document and style it there the way you want it.

  4. #4
    SitePoint Member stonehinge's Avatar
    Join Date
    May 2005
    Posts
    17
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    OK...no Dreamweaver...and they want to keep their HTML codings...

    I had them run it through their copy of FrontPage today and they claim that most of the crap was removed. I'll know more tomorrow after I see the updates.

    Thanks, and I'd appreciate any more ideas that may crop up.

    jon

  5. #5
    SitePoint Wizard gold trophysilver trophybronze trophy dc dalton's Avatar
    Join Date
    Nov 2004
    Location
    Right behind you, watching, always watching.
    Posts
    5,431
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    OH BOY, word html regurgetated thru front page ..... that should be interesting ... let us know how it comes out!

  6. #6
    SitePoint Wizard silver trophybronze trophy Nadia P's Avatar
    Join Date
    Oct 2004
    Location
    NSW Australia
    Posts
    3,564
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by dc dalton
    OH BOY, word html regurgetated thru front page ..... that should be interesting ... let us know how it comes out!
    LOL - should be good to look at

    Nadia

  7. #7
    SitePoint Member stonehinge's Avatar
    Join Date
    May 2005
    Posts
    17
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I left the same question with the CSSCreator forum and I got some good suggestions. I'll summarize everything so far.

    1) Use NotePad Lite which has a module which purports to clean the WORD HTML. I tried this and got nowhere. None of the XML/URN stuff was removed.

    2) Load into a browser and then paste into a better editting environment. Surprisingly, this worked rather well when pasting from IE to the Open Office HTML editor. The bloat was removed, but the individual line formats were retained. It did not work so well with Firefox because much more of the bloat was retained. This method requires a WYSIWYG editor to capture the rich text formatting that is available through the copy/paste operation. It does not work at all if you paste into a text editor such as TSW Webcoder, Notepad, etc. The code still needs work to validate.

  8. #8
    SitePoint Member
    Join Date
    Jun 2005
    Posts
    23
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    HTML Tidy is supposed to be useful here:
    Tidy can now perform wonders on HTML saved from Microsoft Word 2000! Word bulks out HTML files with stuff for round-tripping presentation between HTML and Word. If you are more concerned about using HTML on the Web, check out Tidy's "Word-2000" config option! Of course Tidy does a good job on Word'97 files as well!
    http://www.w3.org/People/Raggett/tidy/

  9. #9
    SitePoint Member stonehinge's Avatar
    Join Date
    May 2005
    Posts
    17
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    the code is so bad, Tidy just upchucks and leaves a blank page...

    i'll leave a report here when this is over
    no noobs up in this ma'


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •