I have a php script that is executed quite often via ajax and its purpose is to record the time a user was last ‘seen’ on the site (this is for the purpose of availability status in a chat). I decided to write this information to small files because it might be too heavy for the database.
In 99% of cases this overwrites a file with the same name. I thought that the LOCK_EX flag would lock the file before writing and guarantee it’s either written or not. But it does not appear to be so. I read the file in the following way and just out of curiosity I log every occurrence of it being corrupt (wrong date format):
Occasionally, file_get_contents reads an empty file - empty string is returned. I don’t get partial writes but empty string is certainly not correct. I’ve found that file_put_contents with FILE_APPEND and LOCK_EX works well in concurrent scenarios - and that’s what I find is most often discussed on other discussion boards. But when I don’t use FILE_APPEND then it becomes problematic. What would you use for atomic file write?
I haven’t tried that, I may catch the return value and log it if it’s false or 0. After a few days I might see if it errors out or not. This should be producing warning errors, unfortunately the error logging on the site was broken for some reason so I didn’t get this information.
Great - this looks like a more efficient way of handling this particular task and I’m sure it would work well. I’ll put in away for later because I’m still interested to solve the concurrent write issue but then I may switch to touch.
Yes.
Because the write is triggered by ajax every couple of seconds and it’s also triggered by normal page requests. A user often has several tabs open so it can happen that two requests will arrive at the same time.
I think having AJAX requests every 2 seconds to write in that data may be overkill and problematic. Perhaps I would limit that write to a normal page request based on the current user session data, as there should not be a considerable gap between actions for that user.
Note:
This function caches information about specific filenames, so you only need to call clearstatcache() if you are performing multiple operations on the same filename and require the information about that particular file to not be cached.
It begins at every 2 seconds but increases up to 12 seconds when the user is inactive. But increasing the interval would only make the problem less frequent but not solved.
clearstatcache() is only important for affected functions listed in the manual but I don’t use any of those functions so I don’t see how this would change anything?
I just had a quick glance and thought that if the file_get_contents() was being called many times in a short period of time then it may check at the file times and not bother reading the contents again, especially if the size of the file seldom changes.
Trusting the preciseness of the results is being optimistic
I want to see how flock() copes with the problem. I don’t need to use ftruncate() but I included it in the script just to test if the file is really locked - if it is then it should not cause problems.
Ironically, I’ve found an article that says that flock() is not atomic in PHP and therefore doesn’t guarantee a successful lock in concurrent usage and that hard links should be used instead! Really weird…
For now I’ve deployed the above script and will watch if I get any corrupt writes in the next few days.
Hm, maybe. I’m still not sure if the problem is with reading or writing (or both). But with fread how would I make sure the operation is atomic from fopen() to fread()? The file can change in that time slot - would it cause problems? I think I can only test out different solutions since there doesn’t seem to be anything definite written about it in the manual.
It is an interesting problem. I usually write to the database and the only times I’ve used file writes are from an admin page. Hence I’ve never needed to lock a file to prevent concurrency problems.
This about flock isn’t exactly reassuring
Warning
On some operating systems flock() is implemented at the process level. When using a multithreaded server API like ISAPI you may not be able to rely on flock() to protect files against other PHP scripts running in parallel threads of the same server instance!
fcntl looks interesting, but I have no experience with it.
I’m afraid I can’t think of anything now that wouldn’t be a kludgy hacky mess more likely to introduce more issues than resolve any.
Which means we should assume flock() may not work at all?
[quote=“Mittineague, post:12, topic:268982”]
I’m afraid I can’t think of anything now that wouldn’t be a kludgy hacky mess more likely to introduce more issues than resolve any.[/quote]
A think a solid solution could be an overwrite via rename() because rename() is supposed to be atomic - at least that’s what people say… But I don’t know about performance - a rename is an additional file operation.
Interestingly, I’ve now found a comment on php.net that describes my problem exactly. According to it the problem is not with file_put_contents() but with file_get_contents() and it’s necessary to lock the file when reading, too. It’s weird that the guy uses file_get_contents() after he opens the file with fopen() but I think that could be changed to fread(). I’ll need to test it out, too.
Linux + most other OSes use Advisory Locking, which means locking only works if all readers + writers optionally cooperate + use the same locking strategy.
Linux Mandatory Locking can be achieved + requires a good bit of complexity…
Mounting your filesystem with -o mand /etc/fstab mount option.
Then you have to manage the set group related file bits on the actual file.
I’ve read this. The real question is do php readers and writers, I mean php functions, cooperate? I realize an external program running in the system may not respect the advisory lock but shouldn’t php respect its own locks?
Sounds like a perfect time to use an SQLite3 database instead. One file, less I/O overhead, portable and all the locking logic already handled for you.
Except SQLite is very slow compared to plain file_put_contents and the like. Sure, you can tweak it not to rsync with every update but still its overhead is large - relatively. When you have to do 50 updates (or more) and 50 selects every second then every millisecond matters. Establishing SQLite connection alone takes more time than file_put_contents.
Might hold true if you only had one person chatting. You are not just doing file_put_contents() though. Shouldn’t be a problem anywhere near the 2 seconds you talk about earlier.
Easy enough to actually time/test (very few lines of code needed). When dealing with multiple files for this sort of thing SQLite3 can actually be faster in many cases with such small records.
I used to write networking locking B-tree code in C, not easy stuff to do right. I consider my programming time more costly than computer time though…
It’s certainly an interesting PHP problem - has memcached written all over it for a larger scale.
2 seconds is an interval for one user only, when I have 100 users at the same time this is roughly 50 writes per second. Now I don’t do 50 writes per second to a single file because each user has its separate file but if I had a single sqlite database then it would write 50 times per second to the file - I’m worried this would have poor performance if so many sqlite connections were trying to lock and write to the database almost simultaneously.
Unless I used a separate sqlite database for each user. I might actually try out the solution and time it since I’m curious myself how this will perform. Now this is just for learning purposes since I don’t think anything will beat the idea of using touch() for just storing a timestamp. But I’m willing to give sqlite a chance!