Unicode escape non-ascii chars

Hi, I’m looking for a function that will convert non-ascii characters to unicode escaped string.

For example, “あ” => “\u3042”.

A similar piece of code is below. However, it convert strings to “\\uxxxx” instead of “\uxxxx”. Changing “\\” to “\” in code below still won’t work because that result in ‘\u’ + ‘xxxx’ which print as “uxxxx”.

I have been searching for a few days already, and start wondering if this is at all possible. ; (


var unicodeEscape = function(str) {
    var code, pref = {1: '\\\\x0', 2: '\\\\x', 3: '\\\\u0', 4: '\\\\u'};
    return str.replace(/\\W/g, function(c) {
        return pref[(code = c.charCodeAt(0).toString(16)).length] + code;
    });
};

Thanks in advance.

The following google search:
javascript unicode convert
turned up this site for the first link.

Unicode Code Converter v6
http://rishida.net/scripts/uniview/conversion.php

From which you should be able to reverse-engineer a solution for yourself.

Yes I’ve found that site too.

It gives similar result (e.g. \\uxxxx) when converting from Char to Javascript escaped char.
and it ignores \\u and only use the xxxx when converting back in the reverse direction.

Below is what is used in the converter v6.


switch (code) {
	case 0: outputString += '\\\\0'; break;
	case 8: outputString += '\\\\b'; break;
	case 9: outputString += '\\\	'; break;
	case 10: outputString += '\\\
'; break;
	case 13: outputString += '\\\\r'; break;
	case 11: outputString += '\\\\v'; break;
	case 12: outputString += '\\\\f'; break;
	case 34: outputString += '\\\\"'; break;
	case 39: outputString += '\\\\\\''; break;
	case 92: outputString += '\\\\\\\\'; break;
	default: if (code > 0x1f && code < 0x7F) { outputString += String.fromCharCode(code); }
			else if (code > 0xFFFF) { 
				code -= 0x10000
				outputString += '\\\\u'+ dec2hex4(0xD800 | (code >> 10)) +'\\\\u'+ dec2hex4(0xDC00 | (code & 0x3FF));
				}
			else { 
				pad = '';
				if (listArray[i].length == 1) { pad = '000'; }
				else if (listArray[i].length == 2) { pad = '00'; }
				else if (listArray[i].length == 3) { pad = '0'; }
				outputString += '\\\\u'+pad+listArray[i]; 
				}
	}

still thanks for your reply.

It might pay though to approach the site owner and engage him in your particular query.

Being an authority in that rather specialized domain, he is likely to know if what you’re after is at all possible, and if not then ways that you can achieve what you’re after.

I am able to get correct conversion to \\uxxxx with either one of the two pieces of code above; however, both stuck at the control character been automatically converted.

Is there a way to build a “raw” string in JS?

Hang on, you’re wanting to convert a unicode string to an escaped string. Well that seems to be quite easy, using the encodeURIComponent method.

AFAIK, encodeURIComponent gives output in the format “%xx”.
Correct me if I’m wrong.

It does, you can decode it back to the original string.
I suspect that resulting in a string encoded as we would write the unicode characters is not possible, at least not in a manner that is usefully able to be decoded by the system.

Yes, we can type those characters for static strings, but that is only providing us a way to get that information passed in to the string itself. Once the unicode characters are in there as a string, there is no use for scripting itself to make use of that form.

Or to put it in another way, the unicode escape sequence applies ONLY to string literals.
http://docs.sun.com/source/816-6409-10/ident.htm#1009568

I got it.

Thanks a lot