SitePoint Sponsor

User Tag List

Results 1 to 13 of 13
  1. #1
    SitePoint Zealot
    Join Date
    May 2007
    Posts
    163
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Generate unique values not more than 16 characters

    I need to generate a unique value for each user, currently I'm
    doing a sha1 on parameters unique to each user e.g email address and
    time stamp, but there is a requirement that the value should be 16
    characters long or less, time stamp alone may not be unique, what do
    you suggest.

    Thanks.

  2. #2
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    How about you store that the (truncated) value in a database table, make the row unique, and in the case of a clash on insert ("Duplicate row found" on mysql, err number 1062 - I think) generate another key and try again.

  3. #3
    SitePoint Enthusiast
    Join Date
    Nov 2012
    Posts
    24
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    you should make a column in your database called 'id' or something and set it to 'auto increment'

  4. #4
    SitePoint Zealot
    Join Date
    May 2007
    Posts
    163
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks, but I wanted to avoid the database route, if there is an algorithm that generates
    16 or less characters, if there is none, I might have to take that route...

  5. #5
    Programming Since 1978 silver trophybronze trophy felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, NSW, Australia
    Posts
    16,597
    Mentioned
    24 Post(s)
    Tagged
    1 Thread(s)
    No hashing algorithm can be used to provide unique values as all they guarantee is that a small change in the source will result in a completely different hash. There will always be a large number of completely different source values that will match to the same hash. Of course if you have a limited length value in the first place then the chances of getting two values that generate the same hash will be relatively unlikely.
    Stephen J Chapman

    javascriptexample.net, Book Reviews, follow me on Twitter
    HTML Help, CSS Help, JavaScript Help, PHP/mySQL Help, blog
    <input name="html5" type="text" required pattern="^$">

  6. #6
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    925
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by tentim View Post
    I need to generate a unique value for each user, currently I'm
    doing a sha1 on parameters unique to each user e.g email address and
    time stamp, but there is a requirement that the value should be 16
    characters long or less, time stamp alone may not be unique, what do
    you suggest.

    Thanks.
    It would be easier for us if you explained what you need this value for. From your description it is not clear whether you want to generate the unique value once and store it in the db for future use or you want to generate the value multiple times and each time it can be different. Do you have any requirements what characters the value should consist of? From your limited description, the advice from localhost8080 seems to satisfy your requirements perfectly if you are willing to get the id from the db.

  7. #7
    SitePoint Zealot
    Join Date
    May 2007
    Posts
    163
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Lemon Juice View Post
    It would be easier for us if you explained what you need this value for. From your description it is not clear whether you want to generate the unique value once and store it in the db for future use or you want to generate the value multiple times and each time it can be different. Do you have any requirements what characters the value should consist of? From your limited description, the advice from localhost8080 seems to satisfy your requirements perfectly if you are willing to get the id from the db.
    The value is generated and sent to another site for use, but it has to be a different value
    each time and I don't want to go through the 'troubles' of storing in db and checking each time.

    Quote Originally Posted by felgall
    No hashing algorithm can be used to provide unique values...
    Really? If that is the case, then I'll have to go the db route.

  8. #8
    Keeper of the SFL StarLion's Avatar
    Join Date
    Feb 2006
    Location
    Atlanta, GA, USA
    Posts
    3,747
    Mentioned
    64 Post(s)
    Tagged
    0 Thread(s)
    "Roll a dice" Okay. I rolled a 6.
    "Now never roll that number again. But you cant remember what the number you just rolled was."

    uh... yeah. You're gonna need some form of database/flatfile/whatever to remember what has already been used. Any 'random' system will inevitably have clashes at the rate of <used>/<available>

    Consider the dice above.

    I roll a dice. As i have not yet rolled a dice, I can roll any number. My chances of a collision are 0 (0/6).
    I've rolled a six. The next time I roll the dice, my chances of collision are 1/6.
    If I dont know I've rolled a 6, I have a blind-chance 1/6 of sending a bad result.
    The more I roll the dice, the higher the chance goes, until I cant roll the dice anymore (6/6).

    Your available keys will be greatly larger than 6, of course (X^16, depending on your pool of available characters), but as the system gets used more and more, the chances will steadily grow of a bad key. Which is why your system needs to remember what has already been used.
    Never grow up. The instant you do, you lose all ability to imagine great things, for fear of reality crashing in.

  9. #9
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    925
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by tentim View Post
    The value is generated and sent to another site for use, but it has to be a different value
    each time and I don't want to go through the 'troubles' of storing in db and checking each time.
    Are you going to use these values temporarily? I assume that is the case if you are not going to store them in the database. In that case you might simply generate a 16-character random string and use it as an ID. The probability of a collision is so low that in practical terms it's insignificant unless you are using them for systems where absolute security is critical. You could also generate a random hash for this purpose. As fellgall said, you are not guaranteed for each value to be unique but the probability of uniqueness is extremely high.

    Here is some code for a fairly good random 16-character string generator:
    PHP Code:
    $random_bytes mcrypt_create_iv(12MCRYPT_DEV_URANDOM);
    $string base64_encode($random_bytes); 
    The string will have alphanumeric characters and / and +. For a hexadecimal string you can use this:
    PHP Code:
    $random_bytes mcrypt_create_iv(8MCRYPT_DEV_URANDOM);
    $string bin2hex($random_bytes); 
    However, as you can see the first one is better because it has more bytes and therefore less chance of getting a collision.

    Alternatively, you could use uniqid('', true) - a pretty good function for unique ID generation combining microtime and a pseudo random number generator. However, it's longer than 16 characters but you might be able to get rid of the dot and pack the data in some more efficient way.

  10. #10
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    925
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by StarLion View Post
    Your available keys will be greatly larger than 6, of course (X^16, depending on your pool of available characters), but as the system gets used more and more, the chances will steadily grow of a bad key. Which is why your system needs to remember what has already been used.
    This might be a problem but it depends on how it is used (we don't have enough information from the OP). If each random value is supposed to be used temporarily, for example it is valid for a few hours and then discarded then this would not be a problem because even though you would have millions of users there would be just a small number of random IDs in use at a given time and therefore the chance of collision being extremely low. But yes, if those IDs were to accumulate over time in a database then the collision probability will grow with each new ID.

  11. #11
    Keeper of the SFL StarLion's Avatar
    Join Date
    Feb 2006
    Location
    Atlanta, GA, USA
    Posts
    3,747
    Mentioned
    64 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Lemon Juice View Post
    This might be a problem but it depends on how it is used (we don't have enough information from the OP). If each random value is supposed to be used temporarily, for example it is valid for a few hours and then discarded then this would not be a problem because even though you would have millions of users there would be just a small number of random IDs in use at a given time and therefore the chance of collision being extremely low. But yes, if those IDs were to accumulate over time in a database then the collision probability will grow with each new ID.
    Personally, I'd rather my chance of a bad key to be 0. *shrug*
    Never grow up. The instant you do, you lose all ability to imagine great things, for fear of reality crashing in.

  12. #12
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    925
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by StarLion View Post
    Personally, I'd rather my chance of a bad key to be 0. *shrug*
    Each to his own but there are cases where the work to guarantee 0 chance is too expensive to be practical. There's not need for 100% perfection if 99.999999999% will suffice and save people a lot of work and complications. There are cases where this is even impossible - for example revision keys in distributed version control systems.

  13. #13
    SitePoint Wizard wonshikee's Avatar
    Join Date
    Jan 2007
    Posts
    1,223
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    You can just use timestamp + first 6 characters of the email if you can't rely on a persistent storage to double check. The chance of a collision is so tiny it's a waste of time to consider it.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •