Replacing html ascii codes with text characters

Hello,

I have a situation where I am taking some text out of some html page source and pasting it into a word doc or some other places…Anyways, characters like an apostrophe are represented in html using ' instead of ’
I am using the javascript clipboardData object to accomplish this.

To solve my problem, I can write a function before I call setData on the clipboardData which takes a regular expression matching my string I want to replace which is ' in this case with the character ’

However, it seems to me that since this is probably a common problem there is probably a javascript utility out there already that does this. I have searched google with no results though. Is anyone familiar with anything like this?

Thanks in advance.

Those apostrophes look the exact same to me.

Word quotes are “ and ” (201c and 201d in unicode). You can use simple regular expression to replace both with standart double quote:


text = text.replace(/[\\u201c\\u201d]/g, "\\"")

If you’re generating html, it would be probably better to replace them with « and »

oops, it replace my code with the apostrophe.
the html character in the source is &.#39; without the .
i put the . in there because i can’t figure out how to get it to read it as a literal and not a code.

Ah, got it… Try this:


text = "entities ' " ! ";

text = text.replace(/&#(x?)(\\d+);/g, function($0, $1, $2) {
	return String.fromCharCode(parseInt($2, $1 ? 16 : 10));
})

alert(text)