Help: Need "Who Is Online" script for very large site

Greetings,

Let’s say I have a site with millions of members and about 500,000 users are typically logged in at any one time. The site runs on Linux, Apache, PHP, MySQL with jQuery/AJAX.

I would like to have a “Who Is Online” badge on my homepage which updates minute by minute and displays a count of all the people currently logged into the site. If you click on it, it shows a list of the users currently online. If people go offline, I need to be able to show when they were last online (like 15 minutes ago, 1 day ago, etc)

What would be the best way to do this? I tried looking into this and one idea was an AJAX/jQuery script which calls a PHP file every 60 seconds and the PHP file updates the database. The problem is this would require 500,000 MySQL query updates every 60 seconds. Is there a better way of doing this for an extremely large site? How do the really big sites do this?

Thanks
Kind regards

Are you using files or a database table to store sessions?

I don’t believe I’m using neither of these, the sessions are stored on Linux? I send a cookie to the member’s browser after they login and store the cookie in the database, so it automatically logs them whenever the session itself expires.

Is there a strategy involved with sessions to solve my problem?

You could use some kind of cache. Like Redis or Memcached. Seems like overkill for something so simple, but 500k active sessions is not a trivial amount.

With my own site I store the sessions in a database table, one of the fields of that table is for the user ID (which is 0 for default), when someone logs in I use an update query to change the user id to match the person’s user ID.

That then enables me to be able to quickly do a join query to get a list of all online users

@SpacePhoenix - How do you know when a user goes offline if they don’t press the “Logout” button? and how many sessions does your site deal with at any one time. Sites like Facebook with millions of sessions can somehow see when you’ve left the page on a minute by minute basis (without checking for expired sessions).

They hire thousands of the world’s best engineers.

lol

You will need to use some type of heartbeat monitor. A common approach is a mix of AJAX and server-side side scheduled job to monitor the heart beats.

If I do the AJAX/MySQL method every minute with 500,000 online users, this would amount to 500,000 MySQL Update queries all at once every 60 seconds. Does this sound like the right way, or is there another way to do this more efficiently?

I think you are going to trade off between efficiency and accuracy.

The heartbeat way is more accurate but less efficient (not perfectly accurate as it will omit anyone with JavaScript off).

Simply tracking the number of open sessions is less accurate but a lot more efficient (it can’t track those who have simply forgotten to log out or who have left the page showing while they go to lunch properly)

I find it very difficult to believe you’re running a site with 500K concurrent users and need to ask this question. A site with 500K concurrent users surely has an entire dev and it department to manage not only the application but the infrastructure. Be reasonable when anticipating the future. Planning for 500K concurrent users isn’t very realistic because it requires a completely different mind set than what most standard and even most enterprise level sites/apps would require to run effectively.

3 Likes

A site with high volume of concurrent traffic would also need full page caching for get requests. When a significant number of requests aren’t hitting the application layer the only thing you can really rely on for the heartbeat is JavaScript.

A more efficient way might be not to update mysql on every heartbeat request but to save them to a file or memcache and then with a cron job every minute move them to mysql in one bulk operation.

But… 500,000 ajax requests per minute is 8333 requests per second if they are evenly distributed - certainly they will not be evenly distributed so you might easily expect about 30,000 requests per second in the more busy seconds. To this you must add normal non-ajax requests of users doing stuff on your site, which might surely double that and amount to about 60,000 requests per second. With this frequency even the most efficient in-memory storage method will not be enough and your server will die from the sheer number of requests to your php scripts. You might need a distributed multi-server set up for that - with lots of resources (and money) to manage it.

1 Like

@Lemon_Juice - Thanks for the advice. How would I be able to save this to memcache, what is the process for doing this? Is there any examples or a guide for saving this to memcache and then transferring it to MySQL during a cronjob?

The OP could have say any session with a last seen time within ten minuets of the current time as “Online”. Any session with a last seen time greater then 10mins ago but less then 20 as “Idle” and any with a last seen greater then 20mins as “Offline”

Sorry, I don’t have enough experience with memcache but there should be many tutorials on the web. The important thing is to make sure that the process of transferring data from memcache to the db does not disrupt receiving new heartbeats. The transferring itself may happen in a fraction of a second if done efficiently but in this process many new heartbeats may be coming. This is relatively easy to do with files - you just rename the file and transfer data from the renamed (static) file while new heartbeats are saved to another file and the processes are independent. I don’t know how to achieve the same thing with memcache.

If you click on it, it shows a list of the users currently online.

All 500.000 of them?

Which you already have with 500K concurrent users.

Thanks for the additional advice, I will have to test out the number of suggestions here and find out which one works most efficiently.

No, I know better than to display a list of 500,000 listings at once :slight_smile: . I will probably just display the first 10,000 in pagination and then suggest to users to use the search features to filter down the listings, if necessary.