Handling telephone numbers

Code_Warrior · March 3, 2011, 12:58pm

Dr_John:

a telephone number is NOT a number - there is no meaning to the concept of adding, subtracting multiplying or dividing them. A phone number IS a string of characters, from the domain of characters, which just by coincidence all happen to be in the range 0123456789. But they are not numbers. So don’t use a numerical field.

storing as a string gives you options on the way the data is entered that at the present moment you may not have thought about or think you don’t need. for example we all split up phone numbers into groups such as 01324 897 465 because this makes reading them and remembering them easier. While at present you think that 01324897465 is easier, at some point in the future you or another user may choose to enter 01324-897-465 or use spaces, which will not be accepted. Your idea is universally considered wrong, so why choose something that everyone says is wrong.

Now the phone number example you have forgotten about:
01324 897 465 ext 2305

think about that one as a number…

I’m not sure I agree. By your definition anything that isn’t added, subtracted, multiplied or divided should not be stored in a database as a number. As an example I happen to store status codes or error codes in a database as INTs, these codes are only EVER comprised of numbers but no arithmetic is ever performed on them. Should I change these to CHARs?

With regards to telephone representation, I’ve already covered this in posts above. Storing them as strings doesn’t solve this issue because as you pointed out the number can be represented as 01234567890 or 01234 567890 or 01234 567 890 or (01234) 567890 or 01234-567-890… which format do you store? Therefore the need to manipulate the input/output cannot be avoided if you wish to handle these cases. Therefore I cannot see any advantage to storing as a string instead of a number in this case.

This is why I’m questioning the “universally” accepted method of storing as a string. It’s important to question these things to have a better understanding of why. This is after all a discussion forum so I want to find out why “everyone” says it’s wrong when it seems right to me in this particular situation.

Telephone extensions aren’t used in my case but that would be an easy addition - store them in their own field.

Code_Warrior · March 3, 2011, 1:03pm

Lemon_Juice:

I would do it in 1 column - at least at the beginning. Either store the unformatted number (digits only) or formatted (digits + spaces and other extra characters) depending on your needs.

When using the latter approach use a function while searching that will strip out special characters. In the future, if the database grows large and you need to index the numbers for searches then use another column as r937 suggests. Or do it immediately if you want to be prepared for that already.

I have a similar situation in a client table but with tax id numbers - they are stored as entered by users (sometimes with dashes and/or spaces). The table now has 12000 rows and doing a search for the number using REPLACE functions is lightning fast - so I’m not even considering the need to add another column just to be able to use index. But it may depend on how much data you have in your rows - for example, if each row stores large amount of data in TEXT or BLOB columns then doing a full table scan over 12000 rows can be time-consuming. My client table has only 12 varchar and 2 numeric columns, and occupies 1.5MB so that is not an issue and I expect it won’t be still for quite a long time.

I thought about storing them as entered but doesn’t it make more sense (even in your tax id case) to store them all in a uniform formatted way. This makes searching and displaying them easier because you know what format they are in and you only need to worry about validating / formatting the data entered by the user which is easy enough especially if it’s only digits you are interested in.

Code_Warrior · March 3, 2011, 1:12pm

Thanks, that’s actually how I’m currently doing it but I still don’t see why it needs to be stored as a string when I’m stripping out “any space and non-nuemric characters”? Granted the leading zero won’t be stored but that’s not issue (if anything it’s a good thing since it’s redundant information). The zero and spaces can be later added when displaying the number or stripped out from user input when performing a search.

This seems to be the best solution. But still willing to hear any valid reasons against doing it this way.

Lemon_Juice · March 3, 2011, 2:12pm

What makes sense in my case doesn’t have to make sense in your case. I don’t need tax id numbers all in a uniform format, I want to allow people to store them in their preferred format and this is what they get on the invoices from us. If a uniform format makes more sense in your case then use it.

The ‘valid’ reasons have more to do with logic than any real implementation issues. You seem to have different logical thinking than most other people. While you ask why not use INT, I would ask why use INT - because I don’t see any real advantages of INT over VARCHAR and my logic says a phone number is a string of digits and not a number. There’s only one ‘real’ disadvantage of INT I can think of: you cannot use index when doing a search using LIKE ‘123%’.

Just try INT, your application will work fine, not a big deal. BTW, INT may be not enough for all numbers, you would need BIGINT or DECIMAL.

IBazz · March 3, 2011, 6:59pm

fwiw, my way would be to store it as a set of numbers, with - ( ) etc removed from whatever input the user made. Keeing it a string would enable me to keep the leading 0.

So when I want to search, I can but when I want to output, say, in a webpage, I can format that consistently stored string, into however I want to display it.

Doesn’t matter (from a sotrage perspective), how someone may input their phone number. IMV they should be stored consistently and formatted for whatever output is required.

But as has been written several times, each application has its own requirements and then so does the app designer.

What I would suggest it that the issue shouldn’t be discussed on the basis that it requires to store only UK phone numbers. That may be how it works now but, what if the scenario changes? Much better to build it once to work for any requirement relating to phone numbers and then, to use it as needed.

example of change: coule of years ago, it was recommended to me to store VAT simply as a value of 17.5% (in the case of the UK). ie.


create table vat_rates
( country char(3) not null
, vat_rate DECIMAL not null
, primary key (country, vat_rate)

Because I am so old; I remember when vat was 7.5% and that it ahs changed several times since and until it reached now 20%. with it not being set in concrete, I decided to store it on a date-related basis. And what happened? The rate changed.

By storing it with the flexibility built in, I was able to change the vat rate from 2010-01-04 well ahead of time so that when the flip-over came, it was painless and pricing was correct throughout.

OK, that’s a bit different from the phone number issue but, I hope it helps with the discussion with regards to future-proofing and scaleability and data management.

my 2c

bazz

Code_Warrior · March 4, 2011, 2:40pm

The whole point of using (BIG)INT instead of VARCHAR is for size and performance benefits. You can’t see any real advantages to using INT over VARCHAR? What about:

Smaller storage requirements: BIGINT = 8 bytes, VARCHAR(12) = 13 bytes
Faster database queries.

Code_Warrior · March 4, 2011, 2:45pm

IBazz: thanks for the example, I agree future-proofing is important though it can be a fine line to tread.

r937 · March 4, 2011, 2:57pm

if you can give a quantitative answer to the question “faster by how much?” then you may have a valid point, otherwise it’s just another urban database myth

me, i prefer to use the most appropriate datatype for each element

unless you are seriously going to find AVG(phonenumber) or SUM(phonenumber) then it is ~not~ numeric data, and should be stored as VARCHAR

my advice is to store two values, formatted for display and unformatted for search – but they’re both VARCHARs

Code_Warrior · March 5, 2011, 2:40pm

I can’t give a quantitative answer because I haven’t measured it but it’s certainly not an urban database myth; it’s a fact that queries on (BIG)INT fields are going to be faster than on larger sized (VAR)CHAR fields, especially when indexes are involved. It might not be much faster but it WILL be faster.

It that case please tell me what datatype you would use to store a TCP or UDP port number?

I appreciate the advice, I think I will probably stick with VARCHAR but just have the one unformatted value. Can you explain why you would choose to store both?

r937 · March 5, 2011, 2:56pm

oh come on, it’s not that hard, find just one authoritative source that backs you up on this

you’ll have to give examles of those numbers because i am not familiar with them

let’s say that the phone number you want to store is (937) 555-1212

i would store two VARCHARs: 9375551212 for searching, and B 555-1212[/B] for display

then when you run a search, use

SELECT display_number
  FROM phonenumbers
 WHERE search_number = REPLACE(REPLACE(REPLACE(REPLACE('$form_number','(',''),')',''),'-',''),' ','')

this strips out the formatting characters from the user-entered value in the form field, and allows the index on the search_number to be used

of course, you could also strip the extraneous characters in your application language prior to running the search (e.g. regular expression which ditches everything except digits)

so if the user enters 937-555-1212 or (937)5551212 or other formats, they will all be found – efficiently

Code_Warrior · March 5, 2011, 8:37pm

MySQL :: Re: Primary key performance int vs. char

Really? List of TCP and UDP port numbers - Wikipedia, the free encyclopedia

What datatype would you use for the port numbers?

To me that doesn’t seem like a good solution, for the following reasons:

You are using two fields when you could just use one.
On insert you need to ensure the numbers of both fields match.
On updates you need to change the number in two places and also ensure they match.

With one field (formatted by the application on display) I can avoid the above problems, plus:

If I decide I want to change the display format from b 555-1212[/b] to 937-555-1212 I can do that without having to update all the records in the database.
I have the ability to allow the user to decide which format the number is displayed to them.

Lemon_Juice · March 5, 2011, 10:43pm

Even if that is true the difference in speed is so small that it shouldn’t be a concern.

What datatype would you use for the port numbers?

I don’t know what r937 will say but to me they qualify to be int because port numbers are single numeric entities, they are not written as formatted in any way, and - as the table shows - they have certain meaningful ranges, which means that it makes sense to make numeric comparisons and searches using < and >, while it doesn’t make sense to search port numbers using LIKE ‘%…%’. With the phone numbers it’s the opposite: numerical searches with < and > make no sense while textual searches with LIKE ‘%…%’ make sense, that’s why it’s more logical to use a text field for phone numbers. And, if you want to find all phone numbers from a certain area (by first few digits) then your argument about speed is lost completely because no index is used on an INT column with LIKE ‘…%’.

r937 · March 6, 2011, 1:26am

excellent

SMALLINT

Code_Warrior · March 6, 2011, 12:33pm

Search for all numbers in the Oxford (01865) area:


SELECT fields
 FROM table
 WHERE phone_number > 1865000000 AND phone_number < 1865999999

Code_Warrior · March 6, 2011, 12:41pm

But you wouldn’t seriously find AVG(portnumber) or SUM(portnumber).

r937 · March 6, 2011, 12:58pm

i’m weird like that

Lemon_Juice · March 6, 2011, 4:40pm

It’s as weird as trying to use

WHERE numeric_field LIKE '1__'

instead of

WHERE numeric_field BETWEEN 100 AND 199

Now consider the fact that not all numbers are 10-digit long and your where clause becomes even weirder. I’m not saying it won’t work. It’s just a clumsy workaround.

Code_Warrior · March 6, 2011, 8:44pm

Yes you are.

Code_Warrior · March 6, 2011, 9:05pm

It would be very easy to setup the correct query within the application based on the number length.

But the point was that searches with < and > can make sense.