I’m sending xml files to a client, but they’re complaining that some strings are failing to validate in their app. Strings like Åó.
My xml declares itself as utf8, and I think these characters are utf8, so I’m thinking it’s not invalid xml, but the client’s application doesn’t like it. Without getting an explicit list of disallowed characters in their app, what sanitisation of generally “weird” characters would you suggest, either in php or mysql? I could just remove non ascii characters, but that seems a bit heavy handed.
I’m trying to find out, but I don’t even know if they know!
I know it’s a vague question, but I could fix the allowed characters for this client, and have another client with a slightly different set of allowed characters. That’s why I was aiming for something a bit more general. I mean, if you didn’t know what characters your client could accept, how would you cleanse your data?
Are you sure that it is the characters that are invalid? I would assume that encoding is broken somewhere along the chain, causing malformed non-ascii characters.