Atomic file exists + file read?

I wonder if I can do something like this but in an atomic way:

if (file_exists($file)) {
    $contents = file_get_contents($file);
    
    // do something...
}

In a scenario where deletes happen frequently (like caching) a file can be deleted between file_exists() and file_get_contents(). Is there a way to solve it gracefully?

This is the best I could think of:

if (($contents = @file_get_contents($file)) !== false) {
    // do something...
}

However, the @ operator is not nice since there may be other read errors apart from a non existing file and Iā€™d probably would like them to be raised rather than suppressed.

Given that you are working in a high level language and not machine code the chances of the file being deleted part way through the file_get_contents is far more likely than that it will happen exactly between the code the two statements convert to and your alternate solution doesnā€™t cater for that.

The better option if you consider it likely enough for the file to be deleted somewhere during that code would be to use a try block to test for if an error occurs - such as the file being deleted just before the $contents field finishes loading.

Surely the server OS would prevent a file from being deleted while another process is reading it?

I never thought it could be possible but maybe? Probably this depends on the file length and read speed? Iā€™m sure that if I read a large file slowly byte by byte then there is nothing to stop another process from deleting it during the read operation. However, when I have small files (around 50 KB) then maybe this is not a problem?

Also, an important question is that if a file happens to be deleted while file_get_contents() is running then what will be the result of this? Will file_get_contents() error out returning false and a warning - or will it just return partial content without me knowing the content is partial?

I donā€™t know what you mean by deleted just before the $contents field finishes loading? How would it be possible to detect that? I donā€™t really understand the reasoning behind it because ā€˜just beforeā€™ seems a little fuzzy and what is it supposed to mean exactly?

I wouldnā€™t think filesize has anything to do with it - to me, if a process attempts to delete a file, the OS will (or should) check whether any other process has the file open, and return a ā€œfile is in useā€ error to the delete if it finds one or more. I canā€™t say I have any specific in-depth server OS knowledge (or not with a server OS youā€™re likely to be using), but it would seem like a design fault if this is allowed.

It is far more likely that the delete request would arise while the contents are being read (so that by the time the following processing of that data starts the file is gone) than that the file could get deleted in the nanosecond or so between testing if the file exists and starting to read it - thatā€™s why no one worries about that possibility.

Where it is essential that even such remote possibilities be catered for the solution is to use a database instead of separate files so that the whole process can be handled as a transaction that locks everything else out.

@Lemon_Juice

I just Stumbled on this clear, concise and easily understandable explanation. Well worth a read and not only may solve your problem but also reduce script size and make your app faster.

http://blog.rodneyrehm.de/archives/12-Improving-Disk-IO-in-PHP-Apps.html

Thanks - I remember reading that article long time ago and later I forgot about it. It doesnā€™t directly answer my question but it deals with the practical side of the issue. Iā€™ll leave it at that and Iā€™m also finally using @filemtime() as the function that will both check for file existence and its timestamp.

The only unsolved thing from this discussion is what felgall suggested that a file can be deleted while file_get_contents is running. Maybe that is true but I doubt it can be problematic in any way because even if that happens I would suspect file_get_contents to fail, which would make it suitable for checking if the file exists. If it ever returned partial content then I would suspect authors of file intensive libraries (like Smarty or cache implementations) would have encountered this issue.

1 Like

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.