Session ID Collision

I’m working on a site that receives a moderate amount of traffic, ~5,000 unique visitors daily. On this site, I’m using session_id() to set the visitors’ session identifier and then storing that ID in a database which is used to track other information.

Yesterday we had two users assigned the exact same session ID. My understanding of how these session IDs are generated is that PHP prevents this type of thing from happening. Moreover, it’s a random number so it seems incredibly unlikely that this should ever happen. But it did and I have the database records to show for it.

How it this possible? More importantly, what can I do to guarantee that it doesn’t happen in the future? I suppose rather than relying on the session_id() from PHP I could insert a record into a table and use the auto_increment from MySQL to generate unique sessions IDs. My only concern there is that we’re going to run out of numbers at some point.

Anybody else ever had issues with a session ID collision?

As far as i know, PHP doesnt -prevent- the collision from occuring, it’s just considered rare for anything but very high traffic systems.
A quick google-fu turns up that you might want to edit your server’s session.entropy_length setting if this becomes a notable problem.

Maybe this is a stupid questions, but by increasing the session.entropy_length I would be increasing the pool of random strings generated as a session_id? Any recommendation as to what I should increase it to?

Do not do this, ever! It makes session hijacking so very easy. If I have session id the previous guy must have session id 19, and the guy before that 18, etc.

Open up firebug, find cookie, change it to 19, voila, session hijacked.

Seriously, it’s a disaster waiting to happen.

(don’t even get the ids and md5() them, or hash it together with the IP or something. It’s insecure to use incremental ids no matter how much you sugar coat it)

Off Topic:

This is the exact reason DNS pollution is so easy to pull of; it relies on incremental IDs. DNSSec doesn’t anymore.

Point taken, thanks.

Here’s what I’ve decided to do:

  1. I’ll generate a random number using PHP and hash it with md5. Insert that number into the table and that is the new session ID.
  2. When generating a number, I’ll verify it’s unique with a database query. If not, start the process over again.
  3. Automatically archive session IDs after 24 hours to keep the table as small as possible.

Am I still opening myself up to a world of problems here?

Yes that sounds workable, although basing it on one single random number doesn’t give you a whole lot of entropy. But it should indeed work.

FYI, you might also want to read this, DevNetwork Forums • View topic - “Accidental” Session ID Collision…. Interesting stuff.

Thinking about this for a bit, the chance of PHP assigning the same session ID twice are so extremely small, I think it’s fair to say the chances of a session hijack are bigger than the chances of PHP (okay, the OS’s RNG) screwing up. Have you looked in to that possibility that the session was hijacked and that that was the reason two people had the same id?
Or maybe someone created an image of their HDD, and restored it on a different PC?

All good points. The session could have been hijacked, but here’s why it’s not likely:

  1. The site is a very simple ecommerce site. The session IDs help to create assign the shopping cart contents to the correct customer. We only saw the duplicate session IDs once both customers called to say that their shopping cart was mysteriously adding and removing items. In short, both users with the duplicate session ID were confused and reported it to us.

  2. No financial information is stored or transmitted using this system. We stay completely outside of PCI-DSS scope by using a hosted payments page. The only benefit to hijacking a session is to adjust the cart contents before the order is complete. The product being sold, gift certificates, aren’t even mailed out. They’re picked up so a hijacker wouldn’t even be able to change the delivery address.

How would a duplicate image of a HDD restored elsewhere cause this issue? I would imagine that each browser instance would generate a unique session ID given that they are indeed separate instances.

A cookie is a file. It has no sense of ownership. If the cookie is in the ‘cookie jar’ of the browser when it connects, it sends the cookie.

If you duplicate a HDD with a cookie file on it, and put that duplicate into another machine, it still has the cookie, and will send it.

The browser doesn’t generate the session ID, PHP does, but yeah it was a really long stretch and very (very!) unlikely.

Edit:

Indeed, it would work as StarLion outlined above.

Anyway, maybe this function will help


function generateRandomString($length, $pattern = null)
{
	$str = '';
	if (!is_null($pattern))
	{
		$patternLength = strlen($pattern);
		if ($patternLength < $length)
			$pattern = str_repeat($pattern, floor($length / $patternLength)).substr($pattern, 0, $length % $patternLength);
		else if ($patternLength > $length)
			$pattern = substr($pattern, 0, $length);
	}
	else
	{
		$chars = array('d', 'c', 'C');
		$pattern = '';
		for ($i = 0; $i < $length; $i++)
		$pattern .= $chars[rand(0,2)];
	}
	for ($ch = 0; $ch < strlen($pattern); $ch++)
	{
		if ($pattern[$ch] == 'd') $char = rand(48, 57);
		if ($pattern[$ch] == 'c') $char = rand(97, 122);
		if ($pattern[$ch] == 'C') $char = rand(65, 90);
		$str .= chr($char);
	}
	return $str;
}

It’s what I use to generate random strings. It has an optional $pattern parameter that you can use if you want a random string following a pattern, e.g. ‘dCCdcc’ will give a password of digit-capital-capital-digit-char-char (chars are lowercase). Omit the $pattern and it will generate a pattern at random and then generate a random string using that random pattern (which, believe it or not, I’m fairly certain does not make it more random).

This is great, thanks for the help from both of you. I’ll give that random string generator a go and hopefully avoid this in the future.

I sadly have an obsession with UUID…or GUID as SQL Server (Microsoft) calls them. I use a version 4 UUID which is a pseudo-random UUID for unknown agents. When a user logs in, I use regenerate a new UUID and use there user UUID (I don’t use incremented integers for IDs) to create a version 5 UUID. Complicated I know…If you want to know more about that just ask.

Anyways a function to create version 4 UUIDs. Plucked off the PHP manual because I’m too lazy at the moment to write my own comments.


// format: xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx
//      y: 8, 9, A, or B
function uuidv4 ()
{
  return sprintf( '%04x%04x-%04x-%04x-%04x-%04x%04x%04x',

    // 32 bits for "time_low"
    mt_rand( 0, 0xffff ), mt_rand( 0, 0xffff ),

    // 16 bits for "time_mid"
    mt_rand( 0, 0xffff ),

    // 16 bits for "time_hi_and_version",
    // four most significant bits holds version number 4
    mt_rand( 0, 0x0fff ) | 0x4000,

    // 16 bits, 8 bits for "clk_seq_hi_res",
    // 8 bits for "clk_seq_low",
    // two most significant bits holds zero and one for variant DCE1.1
    mt_rand( 0, 0x3fff ) | 0x8000,

    // 48 bits for "node"
    mt_rand( 0, 0xffff ), mt_rand( 0, 0xffff ), mt_rand( 0, 0xffff )
  );
}

As mentioned its very rare this would happen, especially on a site with only 5k unique visitors a day.

Though this is a non-problem even if it happen if you setup the application the right way.

If you use the custom session handler, you can store the sessions in a database, memcache layer etc. this allows you to add more restrictions for a member to receive the session data than to just have the correct session id.

To make things short, you create a user key, then you require the session id and the user key to match to release the session data. You would create the key by environment data collected from the user, though don’t base it on the users IP address as those can change in-between requests depending on the ISP.

This way you can have countless of users with the session “A” without them ever being aware that someone else is using the exactly same session id.

It will still be possible for someone to make a session hijacking, but if you implement this properly, unless the person is on the same network and has the users information it would be impossible.

That approach makes a lot of sense. I’ll give the user key route a go and see where that takes me.

While this application isn’t so large as to require overwhelming layers of security, it’s good to know.

There are some good ideas here for you to choose from, but I figured that I’d throw in a possibility too.


Also look into PHP: uniqid - Manual


If you take 2 long random strings and encode them along with a string that is unique to the user at the time of login (like ip address or user agent or both), the probability of collision is astronomical. 2 strings being random and add in one, the ip address and the odds are pretty close to impossible.

Do not use uniqid on its own. Its actually just “microtime” with fancy formatting.

Again: If you take 2 long random strings and encode them along with a string that is unique to the user at the time of login (like ip address or user agent or both), the probability of collision is astronomical. 2 strings being random and add in one, the ip address and the odds are pretty close to impossible.

example:


 
if(!function_exists('get_real_ip')){function get_real_ip(){$ip = FALSE;if(!empty($_SERVER['HTTP_CLIENT_IP'])){$ip = $_SERVER['HTTP_CLIENT_IP'];}if(!empty($_SERVER['HTTP_X_FORWARDED_FOR'])){$ips = explode(", ", $_SERVER['HTTP_X_FORWARDED_FOR']);if($ip){array_unshift($ips, $ip);$ip = FALSE;} for($i = 0;$i < count($ips);$i++){if(!preg_match("/^(10|172\\.16|192\\.168)\\./i", $ips[$i])){if(version_compare(phpversion(), "5.0.0", ">=")){if(ip2long($ips[$i]) != FALSE){$ip = $ips[$i];break;} } else {if(ip2long($ips[$i]) != - 1){$ip = $ips[$i];break;} } } } } return ($ip ? $ip : $_SERVER['REMOTE_ADDR']);} }
 
$randString = sha1($uniqid() . get_real_ip() . $randNum);
 

LE os not saying your method is incorrect, he’s saying the use of uniqid might not be the best one since it is derived from the current time. So of I know your IP and an approximation of when you logged in there is a donate set pf possible ids you could have gotten assigned within that time frame. Which makes or easier (not easy, easier) for someone to hijack the session.
less entropy = better predictability = less security

Oh I know he wasn’t saying that, no worries was just making sure that point was read.

With the encryption used there, this is sufficient if you are the only one seeing the source of the script.

One can also just generate 2x 15 character strings to use as salt for salting a random string generated at login. This would be impossible to break unless you own a super computer or a really really large botnet and at that, would take a very long time.

3 total random strings of 15 random numbers for example, I believe that number of possible combinations would come out to:

and that isn’t including the variations needed once encrypted.

The best defense, imo, is also never giving ANYONE access to your scripts.

Sorry if that seemed directed at you. I wasn’t directing that post at any one person. Just want to make sure someone doesn’t think they can just use uniqid and be done with it. You failed to mention they shouldn’t use uniqid on its own, so I took the liberty to remind them.