I don't use HTML entities, but page still validates

I have a few pages with JavaScript calculators of various types. I also display the source code of the calculators on the page with:

<pre>
<?php include ‘sourcefile.js’; ?>
</pre>

The pages have lots of quotes, apostrophes, >, <, not encoded as HTML entities, but they still validate at W3C and the code displays properly in IE7 and Firefox2.

It’s a convenient way to do it, allowing me to keep the page automatically updated whenever I change the JS.

Is there any compelling reason not to do it this way?

Is it the <pre> tags that are allowing the code to validate?

Are you using UTF-8 for your character encoding throughout?

It’s a convenient way to do it, allowing me to keep the page automatically updated whenever I change the JS.

Why? What’s the difference in convenience with

<script type="text/javascript" language="javascript" src="sourcefile.js"></script>

Dan: No, the charset is windows-1252.

It’s a page that uses JS for its functionality (and for that, I use the line you gave), but it also displays the JS source code on the page (open source). That’s what the PHP include is for.

If I change the JavaScript, its display on the page also changes automatically, so I don’t have to open the .js, copy its code, paste it into the page, and republish the page.

Ah, now I understand :blush:
Sorry for the intrusion :smiley:

Just out of curiosity (I’m serious - I’m not trying to jump down your throat here), why are you using Windows-1252?

As long as it validates there is no point to really worry. If you are intrigued you can hit View Source and copy and paste that into the validators input text field and see if that validates.

If he used the <script> for including the JavaScript then it would just make sure it is correctly done, not that the Javascript is right.

I started using it because that (seems to be) the default in FrontPage 2003. In October 2005, I was starting with nearly zero knowledge about creating a site, so character encodings were about the farthest thing from my mind. There were plenty of other things to be confused about, so I used the defaults for that and many other things, in many cases not even knowing that there was a choice to be made, and I wouldn’t have understood many of the alternative options, anyway.

A while back, I tried changing a page to one of the other, non-Windows encodings (the equivalent Western European ISO encoding, whatever it is – ISO-8251?). I’m sure it would have worked fine online, but it affected how FP displayed the page in code view, which I use a lot. The characters are smaller, squarer, and closer together. It wasn’t that big a change, but I didn’t like it, so decided to stick with what was working.

So it wasn’t an ideological, or even particularly reasoned, choice.

Based on your comment, I’ve made a note to experiment with UTF-8. Since my chars are mostly ASCII (and thus 8-bit even in UTF-8?), I don’t think it will affect page size, and it actually might fix a bug in IE7 where non-European language choices in the Google “translate this page” gadget don’t display in the native language charsets as they are supposed to. Although, oddly enough, those language choices do display properly in Firefox, even with the current Windows-1252 encoding of the primary page.

Frontpage was so bad even Microsoft decided to discontinue it. If you want to continue using WYSIWYG editors then you might want to check out Dreamweaver.

Everyone starts somewhere. Character codings tend to fall back on your “to learn” list whilst you are learning everything there is to know.

That is, as you know, because of the way it is rendered. Different encodings render differently. You can set different font families and set letter-spacings in CSS to confront those visual problems that you don’t like.

That’s a good idea! UTF-8 is required by XML parsers to be supported. I don’t know if you know XML but at some point you may want to learn XML and UTF-8 and UTF-16 are required by the XML parsers so you won’t have to change your pages that much.

I did some quick experiments:

Tried replacing the <pre> tag with <code> and then with <p>. In each case, it displays wrong on the page, but still validates.

The code contained no ampersands, so I added one. Still validates. I know the W3C validator rejects unencoded ampersands in URLs. Maybe in normal text these unencoded entities aren’t such a big deal.

I’m going to keep doing it this way for a while. If I have to go back and fix these pages, by that time the code will have stabilized and won’t be changing so often.

FrontPage as a code text editor and file manager is plenty good enough for what I need. For WYSIWYG, all I need is an interface where I can type text without having to type <p> all the time. For table code and anything else, I use code view, but typing HTML tags for lists, list items, etc. is just begging for RSI.

It’s really the webbots, the FrontPage Extensions, and the other automated things associated with FrontPage that Microsoft was abandoning when it discontinued FP. If you only use it as a text editor and file manager, it’s just fine.

People tend to think of FrontPage as some sort of non-HTML “weirdo” editor. I even get visitors to my site looking for how to “convert a FrontPage website to HTML”, which makes me believe that a lot of people think that a site created with FP is not fundamentally HTML.

It’s true that anyone who incorporated the many available webbots into their FP site has a big conversion project if they want to go to another editor, but if you stay away from all that stuff, you can create a completely valid and standards compliant site with FP, just as you can with Notepad.

That is, as you know, because of the way it is rendered. Different encodings render differently. You can set different font families and set letter-spacings in CSS to confront those visual problems that you don’t like.

Not sure whether this applies or not, though… I was talking about FrontPage code view, not page preview or rendering in a browser. The charset actually affects how FP renders the raw HTML in code view. I don’t think I’d be able to fix that with CSS. There are a few code view settings, though, for how code is displayed, font-size, etc. that might be able to fix the code view rendering.

That’s a good idea! UTF-8 is required by XML parsers to be supported. I don’t know if you know XML but at some point you may want to learn XML and UTF-8 and UTF-16 are required by the XML parsers so you won’t have to change your pages that much.

I validated all my pages to HTML 4.01 Transitional. I don’t see any need for XHTML or XML. For some reason that I don’t recall (maybe ad code), I decided at one point that my pages could never validate as XHTML.

Unless you really need XHTML or XML (for php) then you really shouldn’t need to. W3C recommends using HTML 4.01 Strict. I don’t see really why you are using transitional for your DOCTYPE. You should only be using it if you are in a transition phase to Strict. Are you?

Final update: it was just an accident that this PHP include method was working. The mix of unencoded HTML entities that the file happened to have was one that the validator didn’t care about for whatever reasons. Maybe they just happened to be unambiguous, couldn’t be misinterpreted.

I tried the same method with a C++ file with different entities (and a lot more of them), and the validator threw many errors.

Not a good method for general use.

Edit: As an example (a guess) of what might be ok in one context but not another:

The .js file had < and > chars that the validator didn’t complain about. Perhaps it is because each had whitespace on either side of it, as in if(a > b). On the other hand, the C++ file had some text like this in it “<ESC>”. Those generated errors because it looks like an HTML tag but isn’t.

Microsoft came to that conclusion themselves back in 2007 which is why they killed it off and replaced it with Expression Web which can create HTML that works in modern browsers since updating FrontPage so it could do the same would have required too much work.

Of course you may just be using that small part of FP that is valid HTML that any browser can understand and are not creating any of the proprietary codes that only IE understands.

HTML validators (ie, HTML from versions 2 to 4.01 inclusive and not XHTML) can give some fairly unexpected results in some situations because of their use of SGML.

In your case, it could be that the JS file contains > but not < (only < needs to be escaped in HTML), or perhaps some other behaviour of SGML.

For a fascinating essay on this sort of thing that may just open your eyes to some of the strangeness with HTML and some of its problems, read “The Dark Side of the HTML” (you have to view it in the wayback machine as the site appears offline).
http://web.archive.org/web/20021206044812/sem.best.vwh.net/dark_side/index.html

I’d recommend this to anyone interested in HTML weirdness. It talks about HTML 2.0, but is still true today (as long as HTML 4.01 is with us).

I know this thread is 2 years old, but I started it, and I do like to follow up when I find an answer.

Original problem code, not good:


<pre>
<?php include 'sourcefile.js'; ?>
</pre>
 

Solution:

<pre> 
<?php
echo htmlentities(file_get_contents('sourcefile.js'));
?>
</pre>