SitePoint Sponsor |
|
User Tag List
Results 1 to 11 of 11
-
Nov 12, 2007, 13:56 #1
- Join Date
- Apr 2001
- Location
- New York, NY
- Posts
- 18
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Problems with javascript & utf-8 encoding
Sorry in advance for the length of this... you can skip to "the problem" if you want to avoid extraneous backstory.
The background...
I'm a designer with some moderate knowledge of programming working with my client's ASP/javascript programmer. The client's site was created back in 1999 or so, when he used FrontPage 98 to develop all the pages. When I was brought on board I was also forced to use FP, reluctantly, because that was basically the only way to edit / upload his pages.
Fast forward to a few months ago, when I got a new computer with Vista. This meant I had to switch to ExpressionWeb, since everything I'd read about FP & Vista compatibility was not very good. Anyway, xWeb was changing everything to UTF-8 by default, despite there being inconsistent charset definitions throughout the site; most were missing, some were ISO 8859-1, others were something else. This inconsistently was revealed once the pages went live thanks to a bunch of odd characters, primarily related to curly quotes, trademarks, and other symbols that had been transferred over when the client copied/pasted stuff from MS Word.
The problem...
I suggested that we go through page by page and switch everything to UTF-8. I argued that we've been lucky to have gotten away with crappy haphazard coding as long as we have; we need to standardize already. Fortunately a consensus was reached and the project began.
I was in charge of converting the static pages (i.e. not our shopping cart or other script-laden pages), which I did by opening the pages in xWeb, adding the proper charset declaration, and resaving/encoding as UTF-8. The pages I did this with ended up working fine, except for one or two places where old MS Word code was still used. Once the extra stuff was removed it worked fine.
Meanwhile the ASP programmer understandably preferred to do the conversion of the vital ASP- and Javascript-laden pages herself. She uses Microsoft Script Editor rather than FP, specifically because MSE doesn't add extra bloated code.
But when she tested these pages last night, they were broken. There was a new line of code at the top (which I believe was something like %codePage="65001") and boxes (square characters) in the middle of her javascript where there should have been blank spaces. She's at a loss to understand what happened.
Now again, I'm no programmer. I completely cede to her knowledge of ASP and javascript. Nevertheless, it struck me that these errors implied that the pages weren't correctly saved as UTF-8. When we were trying to figure out what caused the problem, I asked if, in addition to adding the meta declaration tag, she actually encoded the files as utf-8.
She said she opened up the pages, added the meta charset definition, and closed them again. This concerned me, since I didn't hear anything about 'encoding' in there. So I asked if they were saved as UTF-8 encoded pages, and she said "MS Script Editor doesn't do Save As, it just saves the file."
I thought the problem seems to be that simply adding the charset declaration isn't enough, the pages have to be specifically encoded to match. You usually have to tell your editor how you want pages to be encoded (i.e. what language). The programmer seemed to get irritated by my suggestion -- of course, she was frustrated, understandably -- and said that MS Script Editor is more advanced than FP and that why she uses it, and basically implied that it knows what to do.
Since I'm not a programmer and I'm not nearly as knowledgable about ASP or Javascript as she is (and I have no experience whatsoever with Script Editor), I really couldn't argue with that, or offer any other suggestions. Also I think she resents me for the whole encoding mess anyway. Maybe she's right.
So my question to you gurus is: anyone have ideas about what might have gone wrong? Does anyone have experience in changing files w/javascript & ASP coding to unicode? Is MSE able to encode files in utf-8?Need a break from work? Visit About Schuyler Falls.
-
Nov 12, 2007, 14:30 #2
- Join Date
- Apr 2006
- Posts
- 802
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
And you still have a job?
It would have been better to learn this before you 'standardized' your client's site.
-
Nov 12, 2007, 14:43 #3
- Join Date
- Apr 2001
- Location
- New York, NY
- Posts
- 18
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Yeah, believe it or not! I guess they're as incompetant as I am.
In my defense, I didn't "standardize" the site on my own decision. (And by the way, all these changes were made to a test/backup site, not the live site. So nothing is permanently broken or anything.)
If the programmer had said flatly that "no, the asp/javascript will be screwed up if we do that," we'd of course have stuck with the old files and I'd have to either resign or use an old computer or something.
But she never said that the asp/javascript code would be screwed up, and indeed the pages on the test site that I converted that do have javascript & ASP are working without a problem. So I don't think it was "standardizing" the site that caused the issue -- it seems to be something to do with Microsoft Script Editor or the method the programmer used to make the change that caused the problem.
Which is the main question here. What could have caused this?Need a break from work? Visit About Schuyler Falls.
-
Nov 12, 2007, 14:59 #4
- Join Date
- May 2006
- Location
- Central Florida
- Posts
- 2,345
- Mentioned
- 192 Post(s)
- Tagged
- 5 Thread(s)
First, regardless of any prejudice (toward or opposing a particular editor) ASP, Javascript, and HTML all should be saved as simple text. The "square boxes" you described are a flag that the MS editor saved some characters beyond the limited 128 ASCII set. {isn't that encoding?}
Can't you open those files in notepad and resave the changes?Don't be yourself. Be someone a little nicer. -Mignon McLaughlin, journalist and author (1913-1983)
►Git is for EVERYONE
►Literally, the best app for readers.
►Make Your P@ssw0rd Secure
►Leveraging SubDomains
-
Nov 12, 2007, 15:19 #5
- Join Date
- Apr 2001
- Location
- New York, NY
- Posts
- 18
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
That's probably what she'll be doing. (I ain't going near those files, myself!) Actually she'll probably just use the backup files. I believe that even Notepad gives you the choice to save things in ANSI or other languages.
I have no idea what those boxes represent; there shouldn't be ANYthing except that blank space. They seem to have been added instead of the usual indenting one finds in scripts (be they php, asp or javascript).Need a break from work? Visit About Schuyler Falls.
-
Nov 13, 2007, 14:11 #6
- Join Date
- May 2006
- Location
- Central Florida
- Posts
- 2,345
- Mentioned
- 192 Post(s)
- Tagged
- 5 Thread(s)
Those boxes represent "non-printable" characters. That is characters that translate to ASCII outside the range of (about) 8 to 160 (I think. I have forgotten the limits). Even Notepad respects Tab!
Don't be yourself. Be someone a little nicer. -Mignon McLaughlin, journalist and author (1913-1983)
►Git is for EVERYONE
►Literally, the best app for readers.
►Make Your P@ssw0rd Secure
►Leveraging SubDomains
-
Nov 13, 2007, 15:07 #7
- Join Date
- Apr 2001
- Location
- New York, NY
- Posts
- 18
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Thanks, ParkinT! I really appreciate your help.
Turns out that I was right -- the programmer did just add the meta tag without saving the files using utf-8 encoding. The important thing is that she was able to open up the files, save them as ASCII and that cleared up the formatting issues with the javascript.
For now we've decided to go back to Western European ISO. Which means I go back in and save/re-encode all of the site's pages. Lesson learned: sometimes you have to go backward in order to go forward!Need a break from work? Visit About Schuyler Falls.
-
Nov 13, 2007, 15:23 #8
- Join Date
- Dec 2006
- Location
- Prague
- Posts
- 210
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Seems like bom signature http://www.w3.org/International/questions/qa-utf8-bom
Last edited by Mirek Komárek; Nov 13, 2007 at 15:23. Reason: ups wrong url in clipboard
-
Nov 13, 2007, 15:37 #9
- Join Date
- Jun 2004
- Location
- Copenhagen, Denmark
- Posts
- 6,157
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
There is no such thing as plain ASCII. No such thing. It's a myth. What she probably saved the files as, is CP-1252 (Which coincidentally is almost the same as ISO-8859-1). Judging from your description so far, you're probably better off, using ISO-8859-1 for charset, since it tends to be the default in most systems (No guarantees though).
Oh, and just to save you the grief later on; meta-tags are only relevant, when the page isn't served from a web server, which sends a HTTP-header. In this case, the header takes precedence.
-
Nov 13, 2007, 21:43 #10
- Join Date
- Apr 2001
- Location
- New York, NY
- Posts
- 18
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Originally Posted by Mirek Komárek
Truthfully I *think* she might be talking about ANSI instead. Notepad offers that as an option instead of unicode, I know that. Not sure. Honestly, I dunno what the story is, there's kind of a political situation here and the less I question her at this point, the better.
Oh, and just to save you the grief later on; meta-tags are only relevant, when the page isn't served from a web server, which sends a HTTP-header. In this case, the header takes precedence.
Content-Type:·text/html
Therefore, the meta tag is important, at least in our situation. Heck, what started us off on this merry adventure in the first place was my discovery that in the pages without any charset declaration (a majority), things were getting royally screwed up. Sigh. We were so innocent back then!
Thanks for your help, guys.Need a break from work? Visit About Schuyler Falls.
-
Nov 14, 2007, 11:21 #11
- Join Date
- Jun 2004
- Location
- Copenhagen, Denmark
- Posts
- 6,157
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Bookmarks