How to overwrite a file atomically?

Okay, so I’ve done some real life tests in recent days to see which methods work well for concurrent writes and how they perform. Let me share the results. This was done with PHP 7 on a Debian Linux server, I don’t know its exact configuration because it was set up my my hosting company but certainly it runs on traditional hard drives and has Opcache enabled.

I decided to do a heavier test - a single file for all users so as to increase concurrency and chances of collisions. I don’t get 50 requests per second yet but sometimes there may be up to 20 and in this test I randomly did either one write or one read to the same file so sometimes there could be up to 10 writes per second.

Also, I made the written content self-checking so that I would detect any kind of corruption in reading or writing immediately, an example:

2017-07-21 15:09:43|8aafed7218a3d566644c45b5094c171b

The hash is md5 of the timestamp and I verified the hash on every read.

First, my last idea I posted above failed the concurrency test when reading with file_get_contents:

function atomicFileWrite($file, $contents) {
    $fp = fopen($file, 'c');

    if (flock($fp, LOCK_EX | LOCK_NB)) {
        ftruncate($fp, 0);
        fwrite($fp, $contents);
        flock($fp, LOCK_UN);
    }

    fclose($fp);
}

I was getting many empty string results from file_get_contents. flock didn’t really lock the file and allowed ftruncate to run in inappropriate moments. When I removed ftruncate then this code did not fail in concurrent runs but then it would work only if the length of each new file content didn’t change. So I kept looking further.

Now I will post other methods and all of them passed the concurrency test for me.

Method 1:

Write code:

file_put_contents($file, $contents, LOCK_EX);

Read code:

function fileRead($file) {
    $fp = fopen($file, 'r');
    $locked = flock($fp, LOCK_SH);
   
    if (!$locked) {
        // this actually never executed - not needed:
        fclose($fp);
        return false;
    }
    
    $cts = stream_get_contents($fp);

    flock($fp, LOCK_UN);
    fclose($fp);

    return $cts;
}

Now this is interesting because I didn’t know of the stream_get_contents() function as an often better alternative to fread(), which doesn’t need the length to be specified. The whole fileRead() function also performed pretty well, roughly 0.04 milliseconds on average, while file_get_contents() ran 0.03 milliseconds in previous tests. Most writes with file_put_contents() lasted about 0.09 ms with occasional spikes reaching even 500 ms - but that’s understandable on a live server where sometimes the write buffer needs to be flushed to disk.

Conclusion: if read locks cooperate with file_put_contents’ LOCK_EX then they seem to actually work. This may be treated as a modernized version of this.

Method 2:

Read code:

$contents = @file_get_contents($file);

Write code:

$tmpFile = "$dir/" . uniqid('', true);
file_put_contents($tmpFile, $contents);
rename($tmpFile, $file);

As expected, this was also a solid performer and a rename seems to be really atomic. However, writes were slower with this method thanks to the rename function - about 0.2 ms on average, still pretty fast but measurably slower than plain file_put_contents. There were also occasional spices to larger values but they were less frequent and not as high.

Method 3:

Using SQLite (version 3.8.7.1) with a very simple 1-row table.

Write code:

$dbExists = is_file($dbFile);
$db = new PDO('sqlite:' . $dbFile, '', '', [
    PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION
]);
$db->exec("PRAGMA synchronous=OFF");

if ($dbExists) {
    $db->exec("UPDATE t SET val=" . $db->quote($contents));
}
else {
    $db->exec("CREATE TABLE t (
        val text NOT NULL
    )");
    $db->exec("INSERT INTO t VALUES(" . $db->quote($contents) . ")");
}

Read code:

$db = new PDO('sqlite:' . $dbFile);
$contents = $db->query("SELECT val FROM t")
    ->fetchColumn();

Obviously, I didn’t get any corrupt writes or reads, as expected SQLite managed the locking stuff for me very well. However, the performance was much worse than plain files. An average write (including db connection) took about 0.7 ms with occasional spikes up to 500 ms. An average read was about 0.45 ms. Still very fast but when compared with plain files this is a lot slower.

When I tried the same without PRAGMA synchronous=OFF the performance went downhill immediately - roughly 40 ms per write with frequent spikes to 500 or 1000 ms - clearly SQLite began choking during the more frequent requests.

Method 4:

The clear winner - using touch() just to store the timestamp.

Write code:

touch($file);

Read code:

$timestamp = @filemtime($file);

Simple, efficient and concurrency-safe. Each write (touch) took roughly 0.02 ms (often it went down to 0.017 ms) and interestingly, there were no high spikes like with previous methods, well, there were, but very rare and only up to 0.5 ms. Reads (filemtime) were 0.006 ms (6 microseconds) on average. A very solid performer!


To sum up - there seem to be ways to overwrite a file atomically, however it’s a pity this is not documented well. I don’t know how this would perform on other systems, especially I’m not sure about method 1 using flock, if it is portable. I suppose the other methods should work fine everywhere.

3 Likes