Regular Expression - Strip "non-keyboard" characters?

I am getting data I need to parse and store in a database and within this data are a bunch of non-standard English keyboard characters. What I’d like to do is strip out anything that does not appear on a standard English keyboard, but also keep the copyright symbol, registered symbol, and trademark symbols. Everything else should be stripped away.

An example of what I’m getting is:


In this case, I’d like to run that through preg_replace and return


Can anyone help with this?

Hi there,

This should work:

header('Content-Type: text/html; charset=utf-8');
$String ="Texßßt®";
echo preg_replace('/[^a-zA-Z®©]/s', '', $String);

It strips out everything except for a-z, A-Z, ® and ©

So do I have to just specify the full list of characters I want to include then? I want to keep everything you can type on your standard English keyboard (all of the symbols above the number keys, the brackets, forward/backward slashes, punctuation, new lines, and tabs)?

Ah ok.
Yeah, you basically do have to do that.
You can of course define character classes as above to make your life easier.
Also try experimenting with /w which matches any word character.

Also, maybe this will help: