Encoding confusion: most odd!

AutisticCuckoo · March 10, 2010, 12:37pm

On the office website there’s a search form in the middle of the page. The site itself is using ISO 8859-1, for historical reasons. The search application, a Java app, uses UTF-8, so the search form has an accept-charset="utf-8" attribute.

We have serious trouble when searching for phrases that includes non-ASCII characters, like the Swedish letters, ‘å’, ‘ä’, ‘ö’. For testing purposes you can try searching for ‘grön’ (means ‘green’).

In Opera, everything works just fine.

In IE7, the first search is fine, but all subsequent searches with non-ASCII characters fail. Instead of ‘grön’ the search box on the result page shows ‘gr�’ (the last character is U+FFFD, i.e., the Unicode ‘replacement character’).

In Firefox (we’re stuck with 3.0.7), Safari and Chrome, the first search fails if it contains non-ASCII characters, but all subsequent searches work. In this case the result page search box shows ‘grÃ¶n’, so it looks like UTF-8 that has been interpreted as ISO 8859-1 and then encoded as UTF-8.

Does anyone have any ideas what could be going on here? Or should we just try to convince the rest of the world to use Opera once and for all?

AutisticCuckoo · March 11, 2010, 7:07am

Yep, that’s the site’s search engine and it uses ISO 8859-1 like the rest of the site so it doesn’t need the attribute. That’s just a ‘find on site’ feature, while the one in the middle of the page leads to an application for searching our databases for company information.

No, that form is generated by the application, and those pages are encoded as UTF-8 and doesn’t need the attribute.

Yep. But if you then go back to the home page and click the ‘Sök’ button again, it works, right? That’s the weird thing that happens in Firefox, Safari and Chrome. The first search fails if it uses non-ASCII characters, but subsequent searches work. IE is the other way round. Opera and Lynx work every time.

ralphm · March 11, 2010, 9:25am

Don’t know if you’ve uploaded the fix yet or not, but even using those different words I can’t get the problem to happen on my machine (FF, Saf, IE7). It’s so annoying when things just work!

AutisticCuckoo · March 11, 2010, 9:28am

No, the fix isn’t live yet. It’s weird that you don’t get the error, but then this whole thing is weird from start to finish.
(Of course, it’s almost tomorrow where you are, so perhaps the fix has been uploaded then! :D)

Ain’t that the truth!

AutisticCuckoo · March 11, 2010, 9:01am

It looks like we may have solved the problem, although we don’t quite know why.

A developer added a ‘URL filter’ in JBoss on the application server, that merely states that all request data is encoded with UTF-8. Then I’ve added a hidden field with value="&#8211;" to the search form.

Neither of those two actions solve anything by themselves, but together they appear to fix things. At least on the test server (it’s not uploaded to the production server yet).

Oddly enough, adding the hidden field at least made IE behave the same way as Firefox, Safari and Chrome.

ralphm · March 10, 2010, 1:22pm

I guess it would be a really dumb question to ask why you can’t change the encoding of the site. But perhaps it wouldn’t matter anyway. And why are you stuck with FF 3?

Anyway, FYI, I tried the grön search on FF 3.5, Safari 4, Chrome (Mac) and couldn’t replicate the behavior (but you expected that). But I opened IE7 and couldn’t replicate it there either! So what do you mean by “all subsequent searches”? I did the search with grön, then returned to the home page and did the search again. Then did the search again from the results page itself. Still the same result.

AutisticCuckoo · March 10, 2010, 2:14pm

The answer to both questions is: you’ll have to ask our management. :-/

Thank you very much, Ralph! That is very interesting. So it seems the problem only occurs internally within our network?

The procedure you describe is exactly what I meant: search, go back to home page, search again. When we do that in IE it succeeds the first time and fails on all subsequent attempts. Firefox/Safari/Chrome does the exact opposite. Opera works every time.

So why does this work for external visitors, but not for users behind our firewall?

Edit: Just tested in Lynx, at it’s like Opera: works every time.

johnyboy · March 10, 2010, 5:24pm

just a suggestion for something to try to see if it makes any difference: what about stating the character encoding in the search form’s attribute; i can’t remember which attribute it is but there’s a way of specifying form character encoding isn’t there? maybe that’d make a difference?

AutisticCuckoo · March 10, 2010, 5:49pm

As stated, I already do this. If I don’t, it won’t work in any browser (iirc) since the default is to use the same encoding as the page that contains the form, which would be ISO 8859-1 in this case.

I’ve now tested with Opera 10.10, Firefox 3.5.8 and Chrome 5.0 beta on Linux, from home. Opera works fine, Firefox and Chrome behave as at the office. So it’s not a question of an internal problem only. I don’t know why Ralph was immune, unless it showed cached pages or something.

If anyone else would be kind enough to test this, you can try different search words, like ‘grön’, ‘blå’, ‘häst’ to avoid that particular pitfall.

johnyboy · March 10, 2010, 6:01pm

oh yeah sorry, missed that but also one of the two forms on the home page doesn’t have that:

 <!-- Sökformuläret -->
      <form id="sok" action="/adm/verktyg/sok/sok.asp" method="get">
        <hr>
        <div>
		  <label for="fraga">Sök på webbplatsen</label>
          <input id="fraga" name="fraga" type="text" accesskey="4" title="Sökord">
          <input id="sokknapp" type="submit" value="Sök">
        </div>
      </form>

and i realise that’s not the form you’re talking about. sorry.

johnyboy · March 10, 2010, 6:08pm

on the form which is on the page you get after doing a search on the middle of the home page, on the form on the results page, there is no such encoding in the form.

and in safari i’m getting grÃ¶n in that form’s field

Topic		Replies	Views
Language display problem Accessibility	3	1900	October 8, 2014
Content-type: iso-8859-1 or utf-8? HTML & CSS	12	30730	February 17, 2010
Mysterious UTF8 Encoding Problem JavaScript	3	642	September 9, 2013
Encoding issues with Swedish characters Server Config	7	16875	May 1, 2010
Boxes instead of arabic characters HTML & CSS	7	6661	January 9, 2010

Encoding confusion: most odd!

Related topics