identify extremely similar rows
I have a table in which I'd like a column to have no duplicate data. I'm familiar with the old standard of using HAVING and GROUP BY to identify exact matches, and the actual hard UNIQUE constraint for future data. This is proving to not be good enough though.
What I'd like is to produce a query/algorithm to identify extremely similar data (ie. if only 2-3 characters are off in a 200 character string, I need to kill one of those rows), so that I can scrub up my data a bit better. It's making my brain spin. Any hints?