SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Zealot
    Join Date
    Mar 2004
    Location
    nyc
    Posts
    121
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Simple Site Question

    We have a site that is like an online magazine so for the most part we are just publishing articles. When a writer sends an article it is usually in Microsoft Word, then we take it and turn it into a HTML doc for use - the problem is that the way my web guy explains is that the code gets all screwy coming from Word and converting to HTML. What method would some of you think would work better. Thanks a bunch!

  2. #2
    SitePoint Zealot
    Join Date
    May 2008
    Posts
    182
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Word and open office both add a good deal of extra formatting and uses some weird characters, like curly quotes, that don't work well with html.

    If you are doing it manually, it is best to paste the text into notepad first, because this will remove all of Word's funky formatting. Then you can safely paste it into a html document.
    Stair Lifts & Residential Dumbwaiters, source of health news

  3. #3
    Follow: @AlexDawsonUK silver trophybronze trophy AlexDawson's Avatar
    Join Date
    Feb 2009
    Location
    England, UK
    Posts
    8,111
    Mentioned
    0 Post(s)
    Tagged
    1 Thread(s)
    Correct, Word and rich text editors are not generally equipped to transfer to HTML effectively without lots of “junk code”. What I would recommend builds on the above, copy and paste all of the code into notepad or into your default HTML editor and then use perhaps a template you have created in HTML for all of your articles and simply use that template to place the text inside the correct mark-up tags.

  4. #4
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,158
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by HeathMan View Post
    Word and open office both add a good deal of extra formatting and uses some weird characters, like curly quotes, that don't work well with html.
    They do, but the problem is that Word often uses a different character encoding (like Windows-1252) than the web site (often ISO 8859-1 or UTF-8 for Western sites). Some characters, like curly quotes, are encoded differently and will therefore not display correctly.

    For instance, the curly quotes, U+201C and U+201D, are encoded as 93 and 94 (hexadecimal) in Windows-1252, but as E2 80 9C and E2 80 9D in UTF-8. They cannot be expressed literally in ISO 8859-1, so you have to use character entity references (“ and ”) or numeric character references (“ and ”).
    Birnam wood is come to Dunsinane

  5. #5
    SitePoint Evangelist Ed Seedhouse's Avatar
    Join Date
    Aug 2006
    Location
    Victoria, B.C. Canada
    Posts
    592
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Require that submissions be simple text files unless special formatting is an absolute necessity.
    Ed Seedhouse


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •