SitePoint Sponsor

User Tag List

Results 1 to 9 of 9
  1. #1
    SitePoint Guru DenverDave's Avatar
    Join Date
    Feb 2001
    Location
    Denver, Colorado
    Posts
    630
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    From some versions of Microsoft Word 97 and later there is a feature to save as HTML. This seems like a great capability, however, at least with the latest Word 2000 document that I tried to save this way for a client, there were so many Microsoft specific HTML tags and Fonts that would not be supported that the document was basically worthless.

    I tried to strip out the extra HTML tags, but finally gave up and created the HTML document from scratch and the from scratch document was a fraction of the size of the generated HTML document that I discarded.

    As a second test, I created a Word 2000 document with one line "This is a Test" - the resulting HTML when the save as web page option was used was 85 lines long !!!!! (Seems like half a dozen lines or so should do it.) I have had a similar experience with Excel files.

    My question is, does anyone know of a utility that will take Microsoft Word or Excel files and create standard, simple, easy to read HTML files?

    Thanks.
    Dave

  2. #2
    SitePoint Wizard creole's Avatar
    Join Date
    Oct 2000
    Location
    Nashvegas Baby!
    Posts
    7,845
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Dave...

    I had the same problem with a client created file. Exported as HTML from Excel, it contained TONS of nasty stylseheet tags and classes as well as lots and lots of unnecessary table cells.

    For a Spreadsheet of about 50 rows and 10 or so columns, the resultant HTML page was over 5000 lines long. Nice eh?

    One thing that was an amazing GODSEND to me was a text editor that could use regular expressions.

    For example, each line contained the same font specification except that the size was different. The result was like this:

    <style "font-size: 1.2em">

    and

    <style "font-size: 1.5em">

    All I did was use regular expressions to look for this:

    <style "font-size: 1. [followed by and one-digit number] em">

    Regex replaced every instance of the line with nothing.

    The first time I did the page by hand it took me nearly and hour using normal find and replace. Once I realized that many of the lines were similiar and started using regex, the time dropped to about 5 minutes.

    That would be my suggestion.
    Adobe Certified Coldfusion MX 7 Developer
    Adobe Certified Advanced Coldfusion MX Developer
    My Blog (new) | My Family | My Freelance | My Recipes

  3. #3
    SitePoint Guru DenverDave's Avatar
    Join Date
    Feb 2001
    Location
    Denver, Colorado
    Posts
    630
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Little over my budget & Access Reports

    Regex looks interesting, but I gather that I must have Frontier for it to work. $899 is quite a bit over my budget. However, thanks for the reply and I feel better knowing that others have been in the same situation. Maybe I'll start looking for tools with better pattern matching, but I was hoping for a generalize freeware tool that would just clean this code up. Or a competitors tool that could read the Microsoft files and generate good HTML.

    On another project, I attempted to generate HTML from Microsoft Access 2000. The code was alright, but it did not bring across the cell background color - any ideas on a cheap HTML reportwriter so I don't have to hand code everything?

    Thanks.

    Dave

  4. #4
    SitePoint Wizard creole's Avatar
    Join Date
    Oct 2000
    Location
    Nashvegas Baby!
    Posts
    7,845
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    NO NO NO....

    hehe...

    The editor that I use to work with RegEx costs about $30. It's called EditPlus (www.editplus.com) and you can even download a trial version to see if you like it. In my opinion, it is the best editor on the market for the PC.

    Then you'll need to review RegEx to see what you need to look for. Here are a few pages that explain regex in more detail.

    http://www.ajar.ch/French/Produits/P...pressions.html

    http://www.chemie.fu-berlin.de/chemn...gex/regex.html

    http://www.nmt.edu/tcc/swinv/a2ps/4.12/toc.html

    and this one:

    http://www.nmt.edu/tcc/swinv/a2ps/4.12/info/(regex)Common%20Operators.html
    Adobe Certified Coldfusion MX 7 Developer
    Adobe Certified Advanced Coldfusion MX Developer
    My Blog (new) | My Family | My Freelance | My Recipes

  5. #5
    SitePoint Guru DenverDave's Avatar
    Join Date
    Feb 2001
    Location
    Denver, Colorado
    Posts
    630
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Sounds better - thanks - I'll check it out.

    Dave

  6. #6
    SitePoint Addict Kakarot720's Avatar
    Join Date
    Feb 2001
    Location
    Washington DC
    Posts
    219
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Dreamweaver has an option to clean up HTML documents created by Word. It significantly reduced the amount of code for the "this is a test" test doc. It also has several options to customize exactly what tags you want removed. It seems to work pretty good.

    That is...if you have Dreamweaver

  7. #7
    SitePoint Guru DenverDave's Avatar
    Join Date
    Feb 2001
    Location
    Denver, Colorado
    Posts
    630
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    One more reason to get Dreamweaver. I'm not there yet, but I do some ColdFusion, so I'm being pushed that way. I guess this is an indication that there is a real problem here.

    Thanks - Dave

  8. #8
    SitePoint Addict superbird's Avatar
    Join Date
    Aug 2000
    Location
    Swansea, UK
    Posts
    260
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I have to export racing championship points into one of my sites from Excel documents - to be honest the spreadsheets aren't that complicated so I just use the "extended replace" function in Homesite 4.5 to get rid of most of the unwanted tag attributes.
    ...KartLink...

  9. #9
    midnight coder
    Join Date
    Dec 2000
    Location
    The flat edge of the world
    Posts
    838
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    And you know where to go for hosting after you got your site!

    http://jrap.org/misc/hosting/

    And look at the great packagers!

    http://jrap.org/misc/hosting/packages.html

    5GB space and unlimited bandwidth to hold your 1GB HTML files.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •