Using 'Lower' in queries vs storing lowercase columns (MSSQL)

When I was looking through the schema for the ASP.NET default membership provider I noticed a few things that I wanted to change, and a few things that I wasn’t sure about.

I noticed that the username and some other nvarchar 255 fields were stored in regular mixed case, and again lowercase. The reason for this was for comparisons in the stored procedures - they would cast the parameter to lower and compare it against the lowered column. I was wondering how much the performance savings of doing

 WHERE lower(@username) = table.lowered_username

instead of

 WHERE lower(@username) = lower(table.username)

actually were, and if this was a best practice, or if the space savings of not storing and keeping track (making sure they are consistent) of all of the extra lowered tables.

It seems that this might be a way of avoiding table scans by putting an index on the lowered columns. I’m not sure that using lower would force a table scan if there was an index on the column, but it seems likely.

I’m worried that if I do use this technique it would be very possible that the values in the lowered columns could become out of synch with the columns that they mimic, say if someone wrote a procedure to change the Username, that forgot to update the loweredUsername column.

Does anyone know if storing a lowered column is the way to go, or is changing the schema just to accommodate a few (important) queries not really a good choice in practice?

I saw that too, but I surmised it was to make things bulletproof:

  1. While most people leave Sql server running in case insensitive mode, it can very easily be run in case sensitive mode. Storing a lowered representation of the strings could make alot of sense then.
  2. The main consumer of this database (.NET apps) is a case sensitive environment, so having everything in lowered case makes more of a “lowest common denominator” for it to work with.

Now, once you realize there is a requirement somewhere for treating everything as lowered, and you realize that most of the stuff in question is very SELECT heavy with likely few UPDATES, storing 2 copies of relatively small values can make alot of sense.

I’m worried that if I do use this technique it would be very possible that the values in the lowered columns could become out of synch with the columns that they mimic, say if someone wrote a procedure to change the Username, that forgot to update the loweredUsername column.

In all honesty, this database should be considered “black box” and you should not update it except via the membership API. If you want to see how they keep things in synch, I think they published the source code to the SqlMembershipProvider soon after ASP.NET 2.0 was released.

Thanks for your fast reply!
I had always heard that it is not good practice to assume case insensitivity, it is much easier to go from case sensitive to case insensitive than the other way around.

As for 2, I try not to think about the consumer at all in ‘database design mode’, as it might not always just be ASP.NET.

I’m wondering if this is a good thing to do in other situations, not just for using the .net membership provider. (That’s just how I came across the method. :slight_smile: )

I wonder if it’s a good idea to add extra columns (that don’t really store meaningful info) to your schemas to facilitate comparisons in a new database scenario. :?

If a new proc needs to be written in a generic scenario and you use this method, you have to remember the gotcha that your value is actually in two places, and you have to remember to only store lowercase values in the lowered column or break all of your selects. (Small thing, I know. But still a possible trap - better to get these worked out at design time.)