Problem with IE substituting GET request &not for ¬

OK… I have an interesting problem here…

Here is the code to reproduce the issue:

<script type="text/javascript">
	window.location = 'http://www.google.com?var=foo&not_var=foo2';
</script>

In Firefox, this works fine, and directs the browser to the correct URL, along with the GET params.

In IE8 however, it attempts to load the following address:

http://www.google.com?var=foo¬_var=foo2

Notice, that it has interpreted the ‘&not’ part of the GET request as an HTML code and replaced with the corresponding value, in this case: ¬.

Anyone else had this issue and know how to deal with it?

Thanks

Because you can’t really change how Internet Explorer behaves there, the only advice that I have to give it “so don’t do that”.

Sorry if it doesn’t help, but sometimes you just have to accept the situation for what it is and do what you can to prevent further issues.

Thanks for the insight… I have managed to work round this issue although it involved changing a lot of code.

I spent a lot of time searching the net to determine the cause of this functionality and as far as I can see, this is a bug in IE. See the below quote for clarification:

Quote taken from
http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tokenizing-character-references


Consume the maximum number of characters possible, with the consumed
characters matching one of the identifiers in the first column of the
named character references table (in a case-sensitive manner).

If no match can be made, then no characters are consumed, and nothing is
returned. In this case, if the characters after the U+0026 AMPERSAND
character (&) consist of a sequence of one or more characters in the
range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), U+0061 LATIN SMALL
LETTER A to U+007A LATIN SMALL LETTER Z, and U+0041 LATIN CAPITAL LETTER
A to U+005A LATIN CAPITAL LETTER Z, followed by a U+003B SEMICOLON
character (;), then this is a parse error.

If the character reference is being consumed as part of an attribute,
and the last character matched is not a U+003B SEMICOLON character (;),
and the next character is either a U+003D EQUALS SIGN character (=) or
in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), U+0041
LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL LETTER Z, or U+0061 LATIN
SMALL LETTER A to U+007A LATIN SMALL LETTER Z, then, for historical
reasons, all the characters that were matched after the U+0026 AMPERSAND
character (&) must be unconsumed, and nothing is returned.

Otherwise, a character reference is parsed. If the last character
matched is not a U+003B SEMICOLON character (;), there is a parse error.

Return one or two character tokens for the character(s) corresponding to
the character reference name (as given by the second column of the named
character references table).

If the markup contains (not in an attribute) the string I'm &notit; I
tell you, the character reference is parsed as "not", as in, I'm ¬it; I
tell you (and this is a parse error). But if the markup was I'm &notin;
I tell you, the character reference would be parsed as "notin;",
resulting in I'm &#8713; I tell you (and no parse error).

Couldn’t you encode the thing as a url first?

like turn the & into %26 instead?

I tried this. The amazing thing is that it doesnt decode it. It seems like IE is treating a '’ like a ‘;’. Therefore, &not will be substituted, but &notxyz will not be. The only option would be to use &amp_not, but that then becomes &not… not &not

My fix was to dynamically build a form and submit it.

Wow… thats a lot of ‘nots’… sorry :smiley:

Slightly off-topic question:

This is because the encodeURI function converts & to & which changes the string to:
http://www.google.com?var=foo&amp;not_var=foo2

Any url strings should be properly encoded, and the encodeURI function is a useful way to ensure that this occurs.

When I was doing a cookie script, and was looking at others, I noticed some of them encoded the paths and some didn’t. If I use encodeURI, does that necessitate decode later in the code? My script worked without decoding, but I did not know if that was just dumb luck or not.

That depends.

If you’ve stored the encoded string as a string variable, then you will need to decode it to get the original.

If you’re getting the URI from the local bar, then it should already be decoded.

Ultimately, due the the number of situations that could be involved, testing should always be required to ensure that things work as you expect.

Did you test the above code?

Running tests in IE7 & IE8 using


window.location = encodeURI('http://www.google.com?var=foo&not_var=foo2');

are giving me exactly the same results…

Below is the code I used:


<script type="text/javascript">
	window.location = encodeURI('http://www.google.com?var=foo&not_var=foo2');
</script>

Sorry, my memory is failing me. encodeURIComponent is what’s required, not encodeURI

encodeURIComponent also encodes other characters including the ampersand.

This is tested to work.


window.location = encodeURIComponent('http://www.google.com?var=foo&not_var=foo2');

Ah, great… thanks for the clarification pmw57

G