Memcache to Limit Update Queries

I have a stats server that gets hammered with update queries. All the queries are basically like this:

$addweekview = mysqli_query($conn,"UPDATE row SET views=views+1 WHERE id=$id LIMIT 1");

Basically, we’re counting impressions on content. Instead of this, I was thinking of having memcache do the counting and run an update query every time the count gets to some number like 50.

So something like:

    $memkey = "page_count_". $id;
    $views = $memcache->get($memkey);

    $views = $views+1;

    if($views == 50) {
    //update query here, adding +50 to the row
    $memcache->set($memkey, 0);
    } else {
    $memcache->set($memkey, $views);
    }

Would this work, or is there a better way to integrate memcache with counting and updating. And, while it’s saving about 50 mysql queries, how hard is this on the webserver running memcache?

Cheers!
Ryan

No feedback on this?

Why not just use google analytics than all the tracking load is taken off your server.

Instead of saying that you will update on every X visits, instead set it up to be every X hours.

We have used a similar approach with success on a API service that received thousands of hits per second, here we bundled it with the system that temporary blocks keys for abuse. Since it already store the keys, ips and hits per hour this was just updated so it kept storing the information instead of removing it after a hour.

Then twice per day, the information get processed and inserted into the database.

What is important here is to make certain the memcache instance(s) that the servers have access too is large enough, so the data is not truncated before it get processed.

Please note that in your case, you could also map reduce the access log and get the same result, i.e. views per page. Though doing it by memcache would most probably be easier if you have a cluster of web servers.

In some cases you cannot use Google Analytics, or similar systems.

In addition, even if you use it for tracking, there is many business reports that would be difficult to create if you also dont have the information stored in a way that allow you to generate the reports management or marketing ask for.

Interesting,

But wouldn’t having a particular time to run the updates just swamp the mysql server? I’d probably have to update about 150,000 rows with new counts all at once.

We have our access log disabled most the times as it grows to gigs and gigs in size and slows down the server.

Cheers!
Ryan

I was thinking about that also. Use Google Analytics and then make requests to their core api at particular times during the day in order to update numbers.

So you have about 150,000 unique pages each one getting a certain number of hits every day? Or do you mean 150,000 hits total for all pages and the number of pages is actually smaller? How many hits per second are we talking about?

Have you measured that the mysql updates are the bottleneck? There are some ways to make updates faster like using myisam tables, tuning innodb_flush_log_at_trx_commit variable or updating many rows in a transaction. You could also use the Memory storage engine as a temporary table for storing current hits and then periodically moving the data to the proper tables for permament storage.

1 Like

The content pages get about 300,000 views per day. Tag on player counting, add another 250,000 views, updates on views spread across two tables: Video Pages (115,000 rows) and Library Pages (30,000 rows). All rows have chance of getting visited.

Right now we’ve been just doing +1 updates to a database server dedicated strictly to counting views, but are contemplating shutting that down to save money.

Cheers!
Ryan

*should point out we’ve been slow this season. Our content pages can spike to 1.5M views in a day.

You have to realize the essential unreliability of memcache, which is losing data by design. Due to TTL or lack of memory or anything.

Memcache can be used only to replicate a data that already exists somewhere. Otherwise you’ll be losing your views. By tens.

Thus memcache is absolutely not a tool you’re looking for. If you want to stick to an external solution, a nosql database like Redis or Mongo would be more reliable. However, I believe that Mysql is still good for such a number of writes, if used properly.

There are basically two problems with writing data.

First one is ACID compliance of innodb tables which makes each write take a lot of time. If you have tables dedicated for storing views and nothing else, I’d suggest to use myisam engine for them, which will speed up your write queries 70-fold in average, if currently you are using innodb with innodb_flush_log_at_trx_commit set to 2. Just setting it to 1 will make a great relief to your servers too, but it depends.

Another problem is more generic - an index update. Each write triggers an index update which for huge tables can be slow. To overcome this problem usually the same approach as you proposed with memcache is used - to collect your views in temp tables, and then flush them into main tables once in a while. The difference with memcache is that you always have everything that you wrote.

1 Like

This is about 17.3 visits per second - a busy site but not so much as not to be able to manage counting visits in an efficient way. One way to speed things up is use the mysql solutions I and @colshrapnel were talking about above. If you are updating an Innodb table in mysql’s standard configuration then these writes can be resource consuming since each one is triggering rsync - a physical write to disk. You probably don’t need such rock solid data retention in this case.

Another solution might be to simply use plain files for counting visits - files are usually much faster for writes than the database and writing to them takes advantage of the OS buffer meaning the buffer is physically written only every X seconds, which improves performance a lot. For example, you can have files with the row ID in their names and on each page visit do this:

file_put_contents('visits-' . $id, '-', FILE_APPEND);

and then a cron job can periodically inspect the length of these files and simply issue an update for each row to increment the value and clear the files. If you do the cumulative update of all rows in a transaction it will perform quite well even for Innodb tables.

I once wanted to implement a mechanism that logs each visit of a busy site into a database and was looking for a fast solution. Inserting data into a myisam tables was much faster than into Innodb. However, the fastest solution turned out to be writing each visit to a text file in a separate line and then have a cron job run every 5 minutes and transfer that data to the database.

Another idea to consider: count visits with javascript. In many cases web robots constitute about 80% of all page hits so using js you decrease your count update volume a lot and additionally you ignore most of the traffic that is not human - which can be a good or bad thing depending on your goal. Triggering a js script can be done asynchronously so your visitors will never be slowed down by it and you can even use a separate server for this task. However, this will increase the number of http requests so this is also something to consider, especially if you want to use the same server.

1 Like

Are you currrently saving up the updates or doing each one as a user views whatever pages they view?

Storing the information to the database should not be a problem with that little views. Your problem is not storing it in the database, but the way it is done.

When you update a row (from what I understand dedicated to that page), it is locked (if MyISAM the entire table is locked), depending on your settings you might or might not allow concurrent reads of the same data. If you get more than one hit to the same page at the same time, they will need to wait before the previous lock is released before doing their update.

Create linked tables, where you just insert each hit. Then you can map reduce that once a day if you want, for example down to hits per day, per hour etc.

Depends how important the data is, i.e. if you lose a day of data is it mission critical. In our case it was not, as it was just used for trend statistics on the api usage.

Though, so far I do not believe we have lost even one day of data (been running this system for over two years), the key is however as you mention as well, that you need to be certain that you have more than enough memory on the memcache servers.

If the data is mission critical and you receive too much data, then NoSQL is a good solution. Though I would use Cassandra in that case setup with multiple nodes (mainly as its fault tolerance is better).

Thanks for the replies,

In terms of memcache and how critical the info is, it’s not critical. Memcache crashes or memory clear or server restart and at worst we missed reporting up to 100 views for some content, which is nothing.

The Library table is myisam, so that will experience table locks during an update, while the Video table is innodb, and will experience row locking for the update.

Cheers!
Ryan

Well,

That’s not factoring requests from our Apps, Feed and API. Our API alone runs about 3 million select requests against the database daily.

The feed only about 300 since now memcache. The Mobile is about another 100K per day, but going to memcache that soon.

Cheers!
Ryan

Just to be sure.

Do you realize that memcache is losing data not only on restart? That it’s intended to lose data, by design? And that you’d likely will get no views at all for the less popular videos, that being viewed just once in a while?

I’m not exactly sure what you mean. Why would it just lose the data? I have 2GB of my 32GB memory set aside for it, of which the max I’ve used is 600MB.

In terms of the page counting I’d set the variable to expire when space is needed, which should be never.

Right now I’m hammering a dedicated database with +1 updates every time a page is viewed. But would like to save up a bundle of views (like 50 or 100) and then do a +100 to the table.

imagine there is a less popular video that is watched once a day. It will take two months to hit the counter. Likely there will be a restart during that time, and all data will be lost.

By the way, what TTL you set?

Don’t know if it would work better, try having it write the needed info to an XML for each day file then have a cron job start a new file at the start of a new day and do a bulk insert into the database, based on the data in the XML file (I’ve never worked with XML files so can’t say if it’ll be viable)

When it comes to keeping count, there will be no TTL.

On my actual memcached pages and archives, the TTL is sometimes 6 minutes, while other times it’s 29 days, only getting deleted and recreated when row is edited or something is added. If the key doesn’t exist, the content makes a request to the database and creates the key again.

I’m going to have two dedicated innodb tables for getting the updated views, and they will sync their data over to primary tables only once per day.

Cheers
Ryan