How to overwrite a file atomically?

I have a php script that is executed quite often via ajax and its purpose is to record the time a user was last ‘seen’ on the site (this is for the purpose of availability status in a chat). I decided to write this information to small files because it might be too heavy for the database.


file_put_contents("100.txt", date('Y-m-d H:i:s'), LOCK_EX);

In 99% of cases this overwrites a file with the same name. I thought that the LOCK_EX flag would lock the file before writing and guarantee it’s either written or not. But it does not appear to be so. I read the file in the following way and just out of curiosity I log every occurrence of it being corrupt (wrong date format):

$timeFormatted = @file_get_contents("100.txt");

if ($timeFormatted !== false && !preg_match('/^20[0-9]{2}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}$/', $timeFormatted)) {
	// corrupt file - log this event
	// ...

Occasionally, file_get_contents reads an empty file - empty string is returned. I don’t get partial writes but empty string is certainly not correct. I’ve found that file_put_contents with FILE_APPEND and LOCK_EX works well in concurrent scenarios - and that’s what I find is most often discussed on other discussion boards. But when I don’t use FILE_APPEND then it becomes problematic. What would you use for atomic file write?

Maybe check the return value of file_put_contents(…).

If it is false or zero then try clearstatcache() then apply the file_put_contents(…) again and again…

Also maybe try using touch(…) to set the file modified time instead of writing date(‘Y-m-d H:i:s’).

The modified file time can be retrieved using filemtime(…).

Hi @Lemon_Juice, may I ask you a couple of questions?

  1. Does each user have a dedicated file for that data?
  2. When are you writing to that file and why do you run into concurrent write scenarios?

I haven’t tried that, I may catch the return value and log it if it’s false or 0. After a few days I might see if it errors out or not. This should be producing warning errors, unfortunately the error logging on the site was broken for some reason so I didn’t get this information.

Great - this looks like a more efficient way of handling this particular task and I’m sure it would work well. I’ll put in away for later because I’m still interested to solve the concurrent write issue but then I may switch to touch.


Because the write is triggered by ajax every couple of seconds and it’s also triggered by normal page requests. A user often has several tabs open so it can happen that two requests will arrive at the same time.

1 Like

I think having AJAX requests every 2 seconds to write in that data may be overkill and problematic. Perhaps I would limit that write to a normal page request based on the current user session data, as there should not be a considerable gap between actions for that user.

Maybe try clearstatcache() to refresh the directory contents.

This function caches information about specific filenames, so you only need to call clearstatcache() if you are performing multiple operations on the same filename and require the information about that particular file to not be cached.

It begins at every 2 seconds but increases up to 12 seconds when the user is inactive. But increasing the interval would only make the problem less frequent but not solved.

clearstatcache() is only important for affected functions listed in the manual but I don’t use any of those functions so I don’t see how this would change anything?

I just had a quick glance and thought that if the file_get_contents() was being called many times in a short period of time then it may check at the file times and not bother reading the contents again, especially if the size of the file seldom changes.

Trusting the preciseness of the results is being optimistic :slight_smile:

It could very well be something like this happening

It will use memory mapping techniques if supported by your OS to enhance performance.

Maybe use the less performant fread instead?

1 Like

No one seems to have ideas about atomic file writes - I know, from what I have researched not many people dive into that territory in PHP :slight_smile:

So in order to try out some solutions I’ve implemented this code for now:

    function atomicFileWrite($file, $contents) {
        $fp = fopen($file, 'c');
        if (flock($fp, LOCK_EX | LOCK_NB)) {
            ftruncate($fp, 0);
            fwrite($fp, $contents);
            flock($fp, LOCK_UN);

I want to see how flock() copes with the problem. I don’t need to use ftruncate() but I included it in the script just to test if the file is really locked - if it is then it should not cause problems.

Ironically, I’ve found an article that says that flock() is not atomic in PHP and therefore doesn’t guarantee a successful lock in concurrent usage and that hard links should be used instead! :open_mouth: Really weird…

For now I’ve deployed the above script and will watch if I get any corrupt writes in the next few days.

1 Like

Hm, maybe. I’m still not sure if the problem is with reading or writing (or both). But with fread how would I make sure the operation is atomic from fopen() to fread()? The file can change in that time slot - would it cause problems? I think I can only test out different solutions since there doesn’t seem to be anything definite written about it in the manual.

It is an interesting problem. I usually write to the database and the only times I’ve used file writes are from an admin page. Hence I’ve never needed to lock a file to prevent concurrency problems.

This about flock isn’t exactly reassuring

On some operating systems flock() is implemented at the process level. When using a multithreaded server API like ISAPI you may not be able to rely on flock() to protect files against other PHP scripts running in parallel threads of the same server instance!

fcntl looks interesting, but I have no experience with it.

I’m afraid I can’t think of anything now that wouldn’t be a kludgy hacky mess more likely to introduce more issues than resolve any.

Which means we should assume flock() may not work at all?

[quote=“Mittineague, post:12, topic:268982”]
I’m afraid I can’t think of anything now that wouldn’t be a kludgy hacky mess more likely to introduce more issues than resolve any.[/quote]
A think a solid solution could be an overwrite via rename() because rename() is supposed to be atomic - at least that’s what people say… But I don’t know about performance - a rename is an additional file operation.

Interestingly, I’ve now found a comment on that describes my problem exactly. According to it the problem is not with file_put_contents() but with file_get_contents() and it’s necessary to lock the file when reading, too. It’s weird that the guy uses file_get_contents() after he opens the file with fopen() but I think that could be changed to fread(). I’ll need to test it out, too.

Linux + most other OSes use Advisory Locking, which means locking only works if all readers + writers optionally cooperate + use the same locking strategy.

Linux Mandatory Locking can be achieved + requires a good bit of complexity…

  1. Mounting your filesystem with -o mand /etc/fstab mount option.

  2. Then you have to manage the set group related file bits on the actual file. provides a simple/thorough overview.

Linux Mandatory Locking == only for the stout of heart.

I’ve read this. The real question is do php readers and writers, I mean php functions, cooperate? I realize an external program running in the system may not respect the advisory lock but shouldn’t php respect its own locks?

Sounds like a perfect time to use an SQLite3 database instead. One file, less I/O overhead, portable and all the locking logic already handled for you.

Work smarter, not harder :stuck_out_tongue:

Except SQLite is very slow compared to plain file_put_contents and the like. Sure, you can tweak it not to rsync with every update but still its overhead is large - relatively. When you have to do 50 updates (or more) and 50 selects every second then every millisecond matters. Establishing SQLite connection alone takes more time than file_put_contents.

Might hold true if you only had one person chatting. You are not just doing file_put_contents() though. Shouldn’t be a problem anywhere near the 2 seconds you talk about earlier.

Easy enough to actually time/test (very few lines of code needed). When dealing with multiple files for this sort of thing SQLite3 can actually be faster in many cases with such small records.

I used to write networking locking B-tree code in C, not easy stuff to do right. I consider my programming time more costly than computer time though…

It’s certainly an interesting PHP problem - has memcached written all over it for a larger scale.

In general, no.

This is code you have to write or inject into existing code, to manage the locks.

There’s a suggestion to use SQLite3, which might resolve your situation, as all the locking is managed in SQLite3 for you.

2 seconds is an interval for one user only, when I have 100 users at the same time this is roughly 50 writes per second. Now I don’t do 50 writes per second to a single file because each user has its separate file but if I had a single sqlite database then it would write 50 times per second to the file - I’m worried this would have poor performance if so many sqlite connections were trying to lock and write to the database almost simultaneously.

Unless I used a separate sqlite database for each user. I might actually try out the solution and time it since I’m curious myself how this will perform. Now this is just for learning purposes since I don’t think anything will beat the idea of using touch() for just storing a timestamp. But I’m willing to give sqlite a chance!