Textbox maxlen for multibyte characters differs for IE and Chrome/Firefox

For this textbox → “input type=text maxlen=128”
If we enter multi-byte characters like 4-byte Surrogate pair for eg:“𠀋”
Only 64 such characters are allowed in Chrome and Firefox.
Where as IE allows 128 such characters.
My requirement is :
We should be able to enter 128 characters even if it is multi-byte character or single byte or double byte character, it should be similar to IE behavior.
How can we achieve this across all browsers.
Charset value is set to UTF-8 in the html file.

Any reply in this regard will be highly helpful

Are you sure Javascript is where you want this @avdp2111? The HTML/CSS forum seems more appropriate and is where you are more likely to get an answer. I will move it if you wish.

Thank you for pointing it out…may be…could you please move it

1 Like

This is a known javascript problem that is explained in detail here. It has to do with internal representation of Unicode strings. In short, this is a mess. Interestingly, IE exhibits a more correct behaviour. HTML’s maxlength property seems to be implemented in browsers with the same algorithm for counting characters so the problem exists there as well.

The article I linked to offers some solutions for counting characters, which I used to implement a better maxlength in javascript. I simply use js to remove all maxlength attributes and instead apply a js check for the input fields, this time using a more correct method to count characters:

function fixMaxlength() {
	var inputs = document.querySelectorAll('input[maxlength]');
	var inp;
	
	for (var i=0; i<inputs.length; i++) {
		inp = inputs[i];
		
		if (inp.maxLength > 0) {
			inp.setAttribute('data-maxlength', inp.maxLength);
			inp.removeAttribute('maxlength');

			inp.addEventListener('input', applyMaxLength);
		}
	}
}

function applyMaxLength(e) {
	var inp = e.target;
	var maxLength = parseInt(inp.getAttribute('data-maxlength'));
	var val = inp.value;
	var valLen = uniLen(val);
	
	if (valLen > maxLength) {
		inp.value = uniLimitStr(val, maxLength);
	}
}

function uniLen(string) {
	var regexSymbolWithCombiningMarks = /([\0-\u02FF\u0370-\u1DBF\u1E00-\u20CF\u2100-\uD7FF\uDC00-\uFE1F\uFE30-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF])([\u0300-\u036F\u1DC0-\u1DFF\u20D0-\u20FF\uFE20-\uFE2F]+)/g;
	
	// Remove any combining marks, leaving only the symbols they belong to:
	var stripped = string.replace(regexSymbolWithCombiningMarks, function($0, symbol, combiningMarks) {
		return symbol;
	});
	
	// Account for astral symbols / surrogates, just like we did before:
	var regexAstralSymbols = /[\uD800-\uDBFF][\uDC00-\uDFFF]/g;
	return stripped.replace(regexAstralSymbols, '_').length;
}

function uniLimitStr(string, maxLength) {
	var arr = Array.from(string.normalize('NFC'));
	var newStr = '';
	
	for (var i=0; i<maxLength; i++) {
		newStr += arr[i];
	}
	
	return newStr;
}

if (document.addEventListener && String.prototype.normalize && Array.from) {
	document.addEventListener('DOMContentLoaded', fixMaxlength);
}

This code uses some new ECMAScript 6 features which are not available in IE (or older browsers) - but IE doesn’t need the fix anyway, so this is not a problem. We cannot use the standard length or substr methods because they will not work properly with all Unicode characters - therefore, I implemented their equivalents using information from that article. I haven’t tested it very thoroughly but it appears to work for the characters you posted and others I have tried.

2 Likes

Hi.
Thanks for the reply, but i have few doubts,
could u please explain the lines :

"var stripped = string.replace(regexSymbolWithCombiningMarks, function($0, symbol, combiningMarks) {
return symbol;
});

// Account for astral symbols / surrogates, just like we did before:
var regexAstralSymbols = /[\uD800-\uDBFF][\uDC00-\uDFFF]/g;
return stripped.replace(regexAstralSymbols, '_').length;"


i am able to understand that for the variable “regexSymbolWithCombiningMarks”, u have added all possible uni-coding to check for the character of entered value(please let me know if what i have understood is correct)… further lines i am not understanding . Could you please explain them…

Thank you in advance

regexSymbolWithCombiningMarks detects unicode characters (combining marks) that do not count as separate characters but act as modifiers to other characters - for example, they add a dash over another character. This regex removes those modifiers only leaving the actual characters to count.

‘Astral symbols’ are characters in the higher unicode range that would normally be counted as two characters by length. The regex is to replace them with a single character so that length counts them as single characters.

I’m not sure if these regexes account for all characters they should, I believe they do. They are taken from the article I linked to and there you will find more explanation what they do.

1 Like

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.