SitePoint Sponsor

User Tag List

Results 1 to 9 of 9
  1. #1
    SitePoint Wizard bronze trophy cydewaze's Avatar
    Join Date
    Jan 2006
    Location
    Merry Land, USA
    Posts
    1,096
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)

    I need a validator that catches high ascii chars

    My office used to use the CSE validator for our web pages, but since our new template is mostly CF includes, CSE throws tons of errors on a perfectly good page, as it validates the source code.

    We've switched to Tidy because it validates the rendered page, rather than the source, but it misses high ascii characters, and we're getting a LOT of those now because much of our content starts off in Word 2007, which uses high ascii for things like curly quotes and apostrophes.

    Some of the high ascii characters from Word appear as spaces in the code, so often you don't see the problem until you view the page in a browser and see things like euro symbols scattered throughout the code.

    Having to read every page to check for these is a bit tedious, especially when we have a 5000-word report, so I'm looking for a validator that validates the rendered page, but also looks for high ascii chars.

    Any ideas?
    <cfset myblog = "http://cydewaze.org/">

  2. #2
    Non-Member bronze trophy
    Join Date
    Nov 2009
    Location
    Keene, NH
    Posts
    3,760
    Mentioned
    23 Post(s)
    Tagged
    0 Thread(s)
    Can I assume CF means coldfusion? If so, what is this, 1997?

    As to CSE -- I always thought that was a scam given the REAL validation services from the W3C are free.

    So far as import is concerned it sounds like you have character set encoding differences to deal with -- what are you deploying the websites as for character encoding? Microsoft Turd tends to want everything in windows-1252 so just have whatever you are using for copying from word translate -1252 to either UTF-8 or ISO-8859-1...

    I'm not sure about coldfusion since I've never seen anyone actually use that for new code after 2002, but I imagine it must have a function similar to PHP's ICONV function

    On one of the sites I maintain they cut/paste from word all the time -- I ended up making the form be accept-charset="windows-1252" and then running iconv('windows-1252','UTF-8',$test); on the input before dumping it into the database. I then reverse the process when they go to edit.

    Though it really depends on how complex the site in question is and if you're talking static pages or having a real CMS behind it.

  3. #3
    SitePoint Wizard bronze trophy cydewaze's Avatar
    Join Date
    Jan 2006
    Location
    Merry Land, USA
    Posts
    1,096
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Yes, contrary to not-so-popular belief, there are still new versions of ColdFusion coming out, and they're indeed quite powerful. In fact, one of our long-time PHP people recently admitted that there wasn't anything he could do in PHP that you couldn't do (more quickly) in CF.

    We use the W3C validator, but it doesn't catch the high acsii characters. Some of the paid validators also let you batch-validate entire directories.

    These are pages that grab contact info from a database, but no CMS. We paste content into Dreamweaver (yes, that's still around too) and save it.

    I'm about to prepare an email instructing people how to turn curly quotes off, but that will likely only stop a small percentage of them, since we get a lot of reports from outside the office.
    <cfset myblog = "http://cydewaze.org/">

  4. #4
    SitePoint Member
    Join Date
    Dec 2010
    Location
    near Dallas, TX
    Posts
    3
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hello,

    CSE HTML Validator Std/Pro can definitely check the source of a rendered page. There are several ways to do it. One of the easiest may be to use the integrated web browser which lets you browse the web while checking the source. You can also use server/path mappings to make this easier.

    There's also the Batch Wizard in the pro edition, which can check local files or make HTTP requests to get the "processed/rendered" page source (after the includes have been processed).

    With a little more work, you could also copy and paste the source from a browser to CSE HTML Validator's editor then hit F6 to validate.

    As to CSE HTML Validator being a "scam" - I ask is Windows a scam because Linux is free? Of course not... it has features and other qualities that Linux just doesn't have. The same applies to CSE HTML Validator vs other programs and services (more at http://www.htmlvalidator.com/htmlval...eisbetter.html).

    I hope this helps.

  5. #5
    SitePoint Wizard Stomme poes's Avatar
    Join Date
    Aug 2007
    Location
    Netherlands
    Posts
    10,283
    Mentioned
    51 Post(s)
    Tagged
    2 Thread(s)
    Frankly, you shouldn't just ignore the garbage getting pumped into your code from Word. Even today (!) I run into sites that, on my Linux machine, render characters as ?s when there's ZERO reason for that.

    Every document created in Windows that I have ever viewed in vi was filled with ^M and "smart quotes" and other crap.

    HTMLvalidator may have given you some good idea how to validate the rendered source, but I support Jason's idea of cleaning the garbage out before saving to the DB in the first place if possible. On Windows machines the web page may seem fine. On other machines those same browsers may not bother trying to ignore or change the funky chars.

    (btw I think it's nice when a software vendor can help a member out with particular software, thanks for posting HTMLvalidator and welcome to SitePoint. Just be sure not to cross the spam line or the mods will hunt you down and keep a trophy! : )

    Metrolyrics claims to send out its pages as UTF-8. This is what I get:
    Quién dice cuál es la bandera que sobre un pedazo de tierra ondea
    quién decide quién tiene el poder de limitar mi caminar dime quién
    Someone's getting that text from a Windows program, likely.
    I hit the back button and find another site with actually readable information.

  6. #6
    SitePoint Wizard bronze trophy cydewaze's Avatar
    Join Date
    Jan 2006
    Location
    Merry Land, USA
    Posts
    1,096
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    I might take another look at CSE, but the version we currently have is pretty useless on our CF pages where everything except for the actual content (header, footer, navigation, etc) are includes. If it's been updated to work on the rendered page, it might be a good option.
    <cfset myblog = "http://cydewaze.org/">

  7. #7
    SitePoint Member
    Join Date
    Dec 2010
    Location
    near Dallas, TX
    Posts
    3
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cydewaze View Post
    I might take another look at CSE, but the version we currently have is pretty useless on our CF pages where everything except for the actual content (header, footer, navigation, etc) are includes. If it's been updated to work on the rendered page, it might be a good option.
    Great. I'm glad you might take another look (please try that latest trial version - v10.00 pro)... and I'll monitor this thread if you have any questions. You are also welcome to post any questions on CSE HTML Validator's own support forum.

    This should provide additional information on working with pages with scripts:
    http://www.htmlvalidator.com/htmlval..._scripting.htm

  8. #8
    SitePoint Wizard bronze trophy cydewaze's Avatar
    Join Date
    Jan 2006
    Location
    Merry Land, USA
    Posts
    1,096
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Thanks. I just installed the trial for 10, and posted a question on your forum on how to ignore CF tags. I still think we might have to still use a second validator, because our pages basically look like this:

    Code CFM:
    <cfinclude template="/includes/hep/header.cfm" />
    <div id="pagecontents">
    content here
    </div>
    <cfinclude template="/includes/moddate.txt" />
    <cfinclude template="/includes/hep/footer.cfm" />

    So I don't think I could validate anything but the content in batch mode. Still, that might be OK, since the includes don't ever change and I know they're valid.

    Now if I could run CSE in batch mode using the browser, that would rock.
    <cfset myblog = "http://cydewaze.org/">

  9. #9
    SitePoint Member
    Join Date
    Dec 2010
    Location
    near Dallas, TX
    Posts
    3
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cydewaze View Post
    Thanks. I just installed the trial for 10, and posted a question on your forum on how to ignore CF tags.
    Thanks. I saw your message there and have replied to it. For anyone interested in following, here is the link:
    http://www.htmlvalidator.com/CSEForu...php?f=1&t=1081

    Quote Originally Posted by cydewaze View Post
    I still think we might have to still use a second validator, because our pages basically look like this:
    As I believe you've found out already, you can check the pages directly (without running them through the server) and have CSE HTML Validator ignore all the "cf*" elements (by disabling a flag in the program) - or you could run the pages through the server (using http links) to get the HTML output and have it check that.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •