SitePoint Sponsor

User Tag List

Results 1 to 24 of 24

Thread: HTML Cleaner

Hybrid View

  1. #1
    SitePoint Wizard jumpthru's Avatar
    Join Date
    Apr 2000
    Location
    Los Angeles, California
    Posts
    1,008
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    HTML Cleaner

    Does anyone know of a really good program that automatically removes redudant nested and empty tags and the such? I have some really bad code I need to clean up, and its so bad it crashed Dreamweaver Ultradev 4, code cleaner. You can see the code at www.lynwoodtheatre.com

    Thanks.

  2. #2
    SitePoint Wizard creole's Avatar
    Join Date
    Oct 2000
    Location
    Nashvegas Baby!
    Posts
    7,845
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You could try HTML Tidy (from the folks at the W3c) [link]
    Adobe Certified Coldfusion MX 7 Developer
    Adobe Certified Advanced Coldfusion MX Developer
    My Blog (new) | My Family | My Freelance | My Recipes

  3. #3
    SitePoint Wizard jumpthru's Avatar
    Join Date
    Apr 2000
    Location
    Los Angeles, California
    Posts
    1,008
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I tried it, and it didn't even scatch the surface. The code has about 5000 lines, and 4500 of them are bad.

  4. #4
    SitePoint Wizard creole's Avatar
    Join Date
    Oct 2000
    Location
    Nashvegas Baby!
    Posts
    7,845
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    5000 lines? The page you linked to only has about 830 lines. If you like, I could clean up your code for you. It would take me about an hour or two. We could work out a trade or some sort of recompense. It's up to you though.

    If that many lines of code are bad, then your best bet would be to create a blank HTML document and start from scratch.

    Wow...I just took a closer look at your code, lines 57-74 only have TWO real words, the rest is redundant tags. Wow..you weren't kidding. But the thing is that it wouldn't take much to fix all that. Actually less than one hour I would say, even if I had to rebuild that page from scratch.

    I would actually recommend using CSS. There are only 3 or 4 real styles on that page. You could EASILY use CSS and have the site be compatible even with Netscape 4.7.

    let me know if you want some help.
    Adobe Certified Coldfusion MX 7 Developer
    Adobe Certified Advanced Coldfusion MX Developer
    My Blog (new) | My Family | My Freelance | My Recipes

  5. #5
    SitePoint Wizard creole's Avatar
    Join Date
    Oct 2000
    Location
    Nashvegas Baby!
    Posts
    7,845
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Good LORD!

    I just spent a few minutes going though your code...that's terrible. I could use this as an example of why NOT to use a WYSIWYG editor.

    I did a find and replace for the </b> and it found 1073 of them. Worse than that, it found 2219 instances of </font>. That's just crazy.
    Adobe Certified Coldfusion MX 7 Developer
    Adobe Certified Advanced Coldfusion MX Developer
    My Blog (new) | My Family | My Freelance | My Recipes

  6. #6
    SitePoint Enthusiast
    Join Date
    Oct 2000
    Posts
    30
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    here ya go

    http://www.lowping.com/misc/bloat.txt

    Went on a delete spree for a few minutes till I got it down to a level of chaos that dreamweaver could halfway handle.. that was by far _the_worst_ and most _bloated_ code I have ever seen in my life. But I guess thats what you get when you combine WYSIWYG with cluele.. err.. people that don't know any better.. :P

    Anyway, next time don't be afraid to just dive into the source and start deleting all of those redundant font tags... there were literally thousands of them.. ugh.. And whoever built that page should take a look at the html source that their WYSIWYG editor is creating before publishing it.

    Moral of the story... WYSIWYG can be downright evil in the wrong hands, and a blessing in the right ones.

    hope this helps.. cya

  7. #7
    SitePoint Wizard jumpthru's Avatar
    Join Date
    Apr 2000
    Location
    Los Angeles, California
    Posts
    1,008
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I am getting paid to clean that site up and another one too. Your right, I can easily go through and manually clean it, but I was hoping there was an easier way since I am pressed for time. By the way: I got the 5000 line number, after running it through pretty print (http://selfpromotion.com/prettyprint.t) and having it formatted. When that was done, there was 5000 lines, and some of them were indented so far it was kind of funny.

    Oh, and the lady claims she used dreamweaver 2 to make it, but I guarantee dreamweaver doesn't make that kind of code. No matter how much you hate WYSIWYG editors, dreamweaver is not that bad...hmm.

    Another thing: That pages takes like a minute to load because of all the bloat. Interesting how much time extra HTML adds.

  8. #8
    SitePoint Wizard creole's Avatar
    Join Date
    Oct 2000
    Location
    Nashvegas Baby!
    Posts
    7,845
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It's because that page as it exists is 99k. I did a little test to see how much the file size would come down. I removed all of the </b> and </font> tags and it came dow to 76k. Just by removing those tags!
    Adobe Certified Coldfusion MX 7 Developer
    Adobe Certified Advanced Coldfusion MX Developer
    My Blog (new) | My Family | My Freelance | My Recipes

  9. #9
    SitePoint Wizard jumpthru's Avatar
    Join Date
    Apr 2000
    Location
    Los Angeles, California
    Posts
    1,008
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Okay, I ended up just recreating the page from scratch. It took me an hour. And it reduced the page from 99k to 18k. Yikes.

    http://www.jumpthru.com/lynwood_theatre/

    Edit:

    i pretty printed it, to make it easier for her to read, and now its 31k...oh well.
    Last edited by jumpthru; Jul 17, 2001 at 14:51.

  10. #10
    busy Steelsun's Avatar
    Join Date
    Mar 2001
    Location
    Houston, Tejas; Future Capital of the World
    Posts
    2,474
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Isn't there a way (aside from CSS) that you can state:
    font face="Arial, Helvetica, sans-serif"
    just once per page and not once in every paragraph.
    That would also reduce the bloat down.
    Brian Poirier
    SunStockPhoto: Stock Photos, Fine Art Photos, Event Photography

  11. #11
    SitePoint Wizard creole's Avatar
    Join Date
    Oct 2000
    Location
    Nashvegas Baby!
    Posts
    7,845
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    There is a tag called BASEFONT, but I have never been able to get to work properly. Defining a font style using CSS is compliant back to v4 of Netscape. You'll be safe doing it that way.
    Adobe Certified Coldfusion MX 7 Developer
    Adobe Certified Advanced Coldfusion MX Developer
    My Blog (new) | My Family | My Freelance | My Recipes

  12. #12
    SitePoint Wizard jumpthru's Avatar
    Join Date
    Apr 2000
    Location
    Los Angeles, California
    Posts
    1,008
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I never have learned CSS because I never thought it was that useful. Maybe its time I learn it. Can you recap the major advatages on why i should learn it?

    BTW: i dont want to put that page in css, to save confusion with the person i am doing the work for.

  13. #13
    SitePoint Wizard creole's Avatar
    Join Date
    Oct 2000
    Location
    Nashvegas Baby!
    Posts
    7,845
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'll recap CSS tomorrow. As for NOT using CSS to avoid confusion...if they are the reason that the page looks like it does now, then you can't afford NOT to use CSS.

    Define the styles for every piece of text on the page and all they will have to do is type in new text and save. No more assigning fonts to the text, colors, sizes, CSS will take cre of it all.
    Adobe Certified Coldfusion MX 7 Developer
    Adobe Certified Advanced Coldfusion MX Developer
    My Blog (new) | My Family | My Freelance | My Recipes

  14. #14
    SitePoint Wizard jumpthru's Avatar
    Join Date
    Apr 2000
    Location
    Los Angeles, California
    Posts
    1,008
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I have no idea how the code got that way. I swear it couldn't of been Dreaweaver 2...

    Adding css will only add explanation for me to her. Giving her the clean code and having her get Dreamweaver 4, and teaching how to use Dreamweaver properly is a better choice, imho.

  15. #15
    SitePoint Wizard creole's Avatar
    Join Date
    Oct 2000
    Location
    Nashvegas Baby!
    Posts
    7,845
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Actually, I would guarantee that it was Dreamweaver.

    Let's say that someone types the word movie into DW. Then they want to make the word "verdana". Then they go back at a later time and make the word font size="4". Then they go back later and make that word green. I would guarantee you that DW would create 3 different font tags to wrap that text in.

    Then (in the visual mode) if you go back simply delete that word, the FONT tags will remain. So, I gather what happened is that whoever maintains the site would edit that page and reapply the styles time after time, resulting in the page as it is now.

    CSS would be an imperative for this person who is maintaining the site. The only thing you need to tell them is "You don't need to apply ANY font styles." All they have to do is type the new text over the existing text, save it and upload it.

    Now, let me give you a quick rundown of the benefits of CSS. I just took a look at that site and it only has 8 different text styles. You could easily create a text style for each one of them and apply it to the text using a span tag. For example, the dates. They are in Arial, green and about a font size of 3. I would creat these styles:

    body, td, p, div { font-family: arial; font-size: 10pt; }

    .date { font-size: 12pt; color: #339900; }

    The first style tells every piece of text that is contained in the BODY tag, a TD tag, a P tag or a DIV tag to be arial at 10pt (which correlates to a font size="2"). That way you don't EVER have to declare a font face again for the rest of the page. It is automatically applied to anything contained inside one of those tags.

    The second style ".date" is a "class". You have to explicitly apply this one. So, when you get to where your date will be you do this:

    <span class="date">July 20 - 26</span>

    It's that simple. Since they will never see the HTML your client can then edit the text to his/her heart's content and will never get screw up the underlying structure.
    Adobe Certified Coldfusion MX 7 Developer
    Adobe Certified Advanced Coldfusion MX Developer
    My Blog (new) | My Family | My Freelance | My Recipes

  16. #16
    SitePoint Wizard jumpthru's Avatar
    Join Date
    Apr 2000
    Location
    Los Angeles, California
    Posts
    1,008
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Your right, maybe Dreamweaver 2 does what you say. But I tested it in Dreamweaver 4, and it combines those font tags, plus after deleting that text, it deletes the font tags.

    CSS looks interesting, i will look into it. thanks for the help.

  17. #17
    Blissed off
    Join Date
    Feb 2001
    Posts
    422
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I agree with creole that css is the way to go. I use css on my site and it works well, but I need to convert it all to a separate "style sheet" to make it work better overall.

    He's right about DW 4 Creole. While it *might* be possible to get it to do dupe font tags, when you edit text now and change the font attributes, it doesn't. It just changes the one value and doesn't add anything else. Also, if you choose the colored text in visual mode and hit "delete" it get's rid of everything, font tags and all. Perhaps older versions of dw had this problem, but macromedia isn't foolish. Most of this is fixed in 4 and will only get better when they do their next rev I would guess...
    Last edited by wert; Jul 18, 2001 at 13:48.

  18. #18
    SitePoint Wizard creole's Avatar
    Join Date
    Oct 2000
    Location
    Nashvegas Baby!
    Posts
    7,845
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    yup...I actually own a copy of DW 2, but I don't have it installed. DW 4 I'm sure does not do this, and I can't guarantee that 2 does it either, but that is plainly what it looks like.


    Is there a particular reason that you could not spend about 15 minutes with your client and explain to her the easier way of doing things?
    Adobe Certified Coldfusion MX 7 Developer
    Adobe Certified Advanced Coldfusion MX Developer
    My Blog (new) | My Family | My Freelance | My Recipes

  19. #19
    SitePoint Wizard jumpthru's Avatar
    Join Date
    Apr 2000
    Location
    Los Angeles, California
    Posts
    1,008
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Because she barely knows HTML, and I barely know CSS, and I think someone that can screw up HTML that badly and not know how to fix it, isn't ready to move to CSS imho.

  20. #20
    SitePoint Wizard creole's Avatar
    Join Date
    Oct 2000
    Location
    Nashvegas Baby!
    Posts
    7,845
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I respect knowing your limitations, and your clients. But, as I mentioned before, you can't afford NOT to use CSS. She doesn't need to know HTML to use it, and you don't need to know CSS to use it. All she would have to do is type over the previous information, easy as that. YOu set up the CSS in advance (I'll help you with it, no big deal) and then leave it alone.

    By the way, I just spent a little time taking out all of the reduntant tags (out of curiousity) and I got the page size (minus images) down to 9k. It would be about 2 or 3 k larger after all of the styles were setup but that's a far cry from 99k (which is what it is now).
    Adobe Certified Coldfusion MX 7 Developer
    Adobe Certified Advanced Coldfusion MX Developer
    My Blog (new) | My Family | My Freelance | My Recipes

  21. #21
    SitePoint Wizard jumpthru's Avatar
    Join Date
    Apr 2000
    Location
    Los Angeles, California
    Posts
    1,008
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    But she will still be using Dreamweaver to change the page. It wont just be a but and dry replace the text in notepad. She will probably be changing text, chaning colors, chaning fonts, etc. all in Dreamweaver.

  22. #22
    SitePoint Wizard creole's Avatar
    Join Date
    Oct 2000
    Location
    Nashvegas Baby!
    Posts
    7,845
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Do the colors on the site change regularly? Is the green always there? Are the fonts always arial? Are the sizes alwasy the same? If she changes those regularly then yes, CSS is not the answer. But if she never changes them (or very rarely ever changes them, then CSS would be perfect.
    Adobe Certified Coldfusion MX 7 Developer
    Adobe Certified Advanced Coldfusion MX Developer
    My Blog (new) | My Family | My Freelance | My Recipes

  23. #23
    SitePoint Wizard jumpthru's Avatar
    Join Date
    Apr 2000
    Location
    Los Angeles, California
    Posts
    1,008
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    That, I do not know, which is why I feel it is better to keep things simple.

  24. #24
    SitePoint Wizard creole's Avatar
    Join Date
    Oct 2000
    Location
    Nashvegas Baby!
    Posts
    7,845
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    fair enough...
    Adobe Certified Coldfusion MX 7 Developer
    Adobe Certified Advanced Coldfusion MX Developer
    My Blog (new) | My Family | My Freelance | My Recipes


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •