PHP - - By Bruno Skvorc

Git and WordPress: How to Auto-Update Posts with Pull Requests

At Bitfalls.com, we also use WordPress for now, and use the same peer review approach for content as we do at SitePoint.

We decided to build a tool which automatically pulls content from merged pull requests into articles, giving us the ability to fix typos and update posts from Github, and see the changes reflected on the live site. This tutorial will walk you through the creation of this tool, so you can start using it for your own WordPress site, or build your own version.

The Plan

The first part is identifying the problem and the situation surrounding it.

  • we use WPGlobus for multi-language support, which means content gets saved like this: {:en}English content{:}{:hr}Croatian content{:}.
  • authors submit PRs via Github, the PRs are peer reviewed and merged, and then (currently) manually imported into WP’s Posts UI through the browser.
  • every post has the same folder layout: author_folder/post_folder/language/final.md
  • this is slow and error prone, and sometimes mistakes slip by. It also makes updating posts tedious.

The solution is the following:

  • add a hook processor which will detect pushes to the master branch (i.e. merges from PRs)
  • the processor should look for a meta file in the commit which would contain information on where to save the updated content
  • the processor automatically converts the MD content to HTML, merges the languages in the WPGlobus format, and saves them into the database

Bootstrapping

If you’d like to follow along (highly recommended), please boot up a good virtual machine environment, install the newest version of WordPress on it, and add the WPGlobus plugin. Alternatively, you can use a prepared WordPress box like VVV. Additionally, make sure your environment has ngrok installed – we’ll use that to pipe Github hook triggers to our local machine, so we can test locally instead of having to deploy.

Hooks

For this experiment, let’s create a new repository. I’ll call mine autopush.

In the settings of this repository, we need to add a new hook. Since we’re talking about a temporary Ngrok URL, let’s first spin that up. In my case, entering the following on the host machine does the trick:

ngrok http homestead.app:80

I was given the link http://03672a64.ngrok.io, so that’s what goes into the webhook, with an arbitrary suffix like githook. We only need push events. The json data type is cleaner, so that’s selected as a preference, and the final webhook setup looks something like this:

Webhook setup

Let’s test this now.

git clone https://github.com/swader/autopush
cd autopush
touch README.md
echo "This is a README file" >> README.md
git add -A
git commit -am "We're pushing for the first time"
git push origin master

The ngrok log screen should display something like this:

POST /githook/                  404 Not Found

This is fine. We haven’t made the /githook endpoint yet.

Processing Webhooks

We’ll read this new data into WordPress with custom logic. Due to the spaghetti-code nature of WP itself, it’s easier to circumvent it entirely with a small custom application. First, we’ll create the githook folder in the WordPress project’s root, and an index.php file inside it. This makes the /githook/ path accessible, and the hook will no longer return 404, but 200 OK.

According to the docs, the payload will have a commits field with a modified field in each commit. Since we’re only looking to update posts, not schedule them or delete them – those steps are still manual, for safety – we’ll only be paying attention to that one. Let’s see if we can catch it on a test push.

First, we’ll save our request data to a text file, for debugging purposes. We can do this by modifying our githook/index.php file:

<?php
file_put_contents('test.txt', file_get_contents('php://input'));

Then we’ll create a new branch, add a file, and push it online.

git checkout -b test-branch
touch testfile.md
git add testfile.md
git commit -am "Added test file"
git push origin test-branch

Sure enough, our test.json file is filled with the payload now. This is the payload I got. You can see that we have only one commit, and that commit’s modified field is empty, while the added field has testfile.md. We can also see this happened on refs/heads/test-branch, ergo, we’re not interested in it. But what happens if we make a PR out of this branch and merge it?

Our payload looks different. Most notably, we now have refs/heads/master as the ref field, meaning it happened on the master branch and we must pay attention to it. We also have 2 commits instead of just one: the first one is the same as in the original PR, the adding of the file. The second one corresponds to the change on the master branch: the merging itself. Both reference the same added file.

Let’s do one final test. Let’s edit testfile.md, push that, and do a PR and merge.

echo "Hello" >> testfile.md
git add testfile.md
git commit -am "Added test file"
git push origin test-branch

Ahh, there we go. We now have a modified file in the payload.

Now let’s do a “real” scenario and simulate an update submission. First we’ll create a post’s default folder, and then we’ll PR an update into it.

git checkout master
git pull
mkdir -p authors/some-author/some-post/{en_EN,hr_HR,images}
echo "English content" >> authors/some-author/some-post/en_EN/final.md
echo "Croatian content" >> authors/some-author/some-post/hr_HR/final.md
touch authors/some-author/some-post/images/.gitkeep
git add -A
git commit -am "Added some author"
git push origin master

Then we do the edit.

git checkout -b edit-for-some-post
echo "This is a new line" >> authors/some-author/some-post/en_EN/final.md
git add -A
git commit -am "Added an update on the English version of the post"
git push origin edit-for-some-post

New post edit PR suggested

If we turn this into a pull request in the Github web UI and merge the PR, we’ll get this payload.

If we follow the path from the modified files in the payload, we can easily discern the folder we’re talking about. Let’s modify the index.php file from before.

$payload = json_decode($json, true);
$last_commit = array_pop($payload['commits']);

$modified = $last_commit['modified'];

$prefix = 'https://raw.githubusercontent.com/';
$repo = 'swader/autopush/master/';
$lvl = 2;

$folders = [];
foreach ($modified as $file) {
    $folder = explode('/', $file);
    $folder = implode('/', array_slice($folder, 0, -$lvl));
    $folders[] = $folder;
}
$folders = array_unique($folders);
var_dump($folders);

We fetch the last commit in the payload, extract its modified files list, and find the parent folder of each modified file. The parent is dictated by the $lvl variable – in our case it’s 2 because the folder is 2 levels up: one extra for language (en_EN).

And there we have it – the path of the folder that holds the files that need to be updated. Now all we have to do is fetch the contents, turn the Markdown of those files into HTML, and save it into the database.

Processing Markdown

To process MarkDown, we can use the Parsedown package. We’ll install these dependencies in the githooks folder itself, to make the app as standalone as possible.

composer require erusev/parsedown

Parsedown is the same flavor of Markdown we use at Bitfalls while writing with the Caret editor, so it’s a perfect match.

Now we can modify index.php again.

$payload = json_decode($json, true);
$last_commit = array_pop($payload['commits']);

$modified = $last_commit['modified'];

$prefix = 'https://raw.githubusercontent.com/';
$repo = 'swader/autopush/';
$branch = 'master/';

$languages = [
    'en_EN' => 'en',
    'hr_HR' => 'hr'
];
$lvl = 2;

$folders = [];
foreach ($modified as $file) {
    $folder = explode('/', $file);
    $folder = implode('/', array_slice($folder, 0, -$lvl));
    $folders[] = $folder;
}
$folders = array_unique($folders);

foreach ($folders as $folder) {
    $fullFolderPath = $prefix.$repo.$branch.$folder.'/';
    $content = '';
    foreach ($languages as $langpath => $key) {
        $url = $fullFolderPath.$langpath.'/final.md';
        $content .= "{:$key}".mdToHtml(getContent($url))."{:}";
    }
    if (!empty($content)) {
        // Save to database
    }
}

function getContent(string $url): string {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url.'?nonce='.md5(microtime()));
    curl_setopt($ch, CURLOPT_FRESH_CONNECT, TRUE);
    $data = curl_exec($ch);
    $code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    if ($code != 200) {
        return '';
    }
    curl_close($ch);
    return $data;
}

function mdToHtml(string $text): string {
    $p = new Parsedown();
    $p->setUrlsLinked(true);
    return $p->parse($text);
}

We made some really simple functions to avoid repetition. We also added a mapping of language folders (locales) to their WPGlobus keys, so that when iterating through all the files in a folder, we know how to delimit them in the post’s body.

Note: we have to update all language versions of a post when doing an update to just one, because WPGlobus doesn’t use an extra field or a different database row to save another language of a post – it saves them all in one field, so the whole value of that field needs to be updated.

We iterate through the folders that got updates (there might be more than one in a single PR), grab the contents of the file and convert it to HTML, then store all this into a WPGlobus-friendly string. Now it’s time to save this into the database.

Note: we used a nonce at the end of the URL to invalidate a possible cache issue with raw github content.

Saving Edited Content

We have no idea where to save the new content. We need to add support for meta files.

First, we’ll add a new function which gets this meta file:

function getMeta(string $folder): ?array {
    $data = getContent(trim($folder, '/').'/meta.json');
    if (!empty($data)) {
        return json_decode($data, true);
    }
    return null;
}

Simple, if it exists, it’ll return its contents. The meta files will be JSON, so all the parsing we’ll ever need is already built into PHP.

Then, we’ll add a check to our main loop so that the process skips any folder without a meta file.

foreach ($folders as $folder) {
    $fullFolderPath = $prefix.$repo.$branch.$folder.'/';

    $meta = getMeta($fullFolderPath);
    if (!$meta) {
        continue;
    }

    // ...

We’ll use the WP CLI to make updates. The CLI can be installed with the following commands:

curl -O https://raw.githubusercontent.com/wp-cli/builds/gh-pages/phar/wp-cli.phar
sudo mv wp-cli.phar /usr/local/bin/wp
sudo chmod +x /usr/local/bin/wp

This downloads the WP-CLI tool, puts it into the server’s path (so it can be executed from anywhere), and adds “executable” permission to it.

The post update command needs a post ID, and the field to update. WordPress posts are saved into the wp_posts database table, and the field we’re looking to update is the post_content field.

Let’s try this out in the command line to make sure it works as intended. First we’ll add an example post. I gave it an example title of “Example post” in English and “Primjer” in Croatian, with the body This is some English content for a post! for the English content, and Ovo je primjer! for the Croatian content. When saved, this is what it looks like in the database:

Example post in the database

In my case, the ID of the post is 428. If your WP installation is fresh, yours will probably be closer to 1.

Now let’s see what happens if we execute the following on the command line:

wp post update 428 --post_content='{:en}This is some English content for a post - edited!{:}{:hr}Ovo je primjer - editiran!{:}'

Sure enough, our post was updated.

Updated post

This looks like it might become problematic when dealing with quotes which would need to be escaped. It’s better if we update from file, and let this tool handle the quotes and such. Let’s give it a try.

Let’s put the content :en}This is some English 'content' for a post - edited "again"!{:}{:hr}Ovo je 'primjer' - editiran "opet"!{:} into a file called updateme.txt. Then…

wp post update 428 updateme.txt

Yup, all good.

Screenshot of update from file

Okay, now let’s add this into our tool.

For now, our meta file will only have the ID of the post, so let’s add one such file to the content repo.:

git checkout master
git pull
echo '{"id": 428}' >> authors/some-author/some-post/meta.json
git add -A
git commit -am "Added meta file for post 428"
git push origin master

Note: update the ID to match yours.

At this point, our content repo should look like this (version saved as release, feel free to clone).

Replace the // Save to database line in the code from before and its surrounding lines with:

    if (!empty($content) && is_numeric($meta['id'])) {
        file_put_contents('/tmp/wpupdate', $content);
        exec('wp post update '.$meta['id'].' /tmp/wpupdate', $output);
        var_dump($output);
    }

We make sure that both content and the ID of the post to be updated are somewhat valid, and then we write the contents into a temporary file, from which we then feed it to the wp cli tool.

We should also add some more checks to the beginning of the script to make sure we only execute the updates we want to execute:

// ...

$payload = json_decode($json, true);

if (empty($json)) {
    header("HTTP/1.1 500 Internal Server Error");
    die('No data provided for parsing, payload invalid.');
}

if ($payload['ref'] !== 'refs/heads/master') {
    die('Ignored. Not master.');
}

$last_commit = array_pop($payload['commits']);

// ...

The full index.php file looks like this now:

<?php

require_once 'vendor/autoload.php';

$json = file_get_contents('php://input');
file_put_contents('test.json', $json);
$payload = json_decode($json, true);

if (empty($json)) {
    header("HTTP/1.1 500 Internal Server Error");
    die('No data provided for parsing, payload invalid.');
}

if ($payload['ref'] !== 'refs/heads/master') {
    die('Ignored. Not master.');
}

$last_commit = array_pop($payload['commits']);

$modified = $last_commit['modified'];

$prefix = 'https://raw.githubusercontent.com/';
$repo = 'swader/autopush/';
$branch = 'master/';

$languages = [
    'en_EN' => 'en',
    'hr_HR' => 'hr'
];
$lvl = 2;

$folders = [];
foreach ($modified as $file) {
    $folder = explode('/', $file);
    $folder = implode('/', array_slice($folder, 0, -$lvl));
    $folders[] = $folder;
}
$folders = array_unique($folders);

foreach ($folders as $folder) {
    $fullFolderPath = $prefix.$repo.$branch.$folder.'/';

    $meta = getMeta($fullFolderPath);
    if (!$meta) {
        continue;
    }

    $content = '';
    foreach ($languages as $langpath => $key) {
        $url = $fullFolderPath.$langpath.'/final.md';
        $content .= "{:$key}".mdToHtml(getContent($url))."{:}";
    }
    if (!empty($content) && is_numeric($meta['id'])) {
        file_put_contents('/tmp/wpupdate', $content);
        exec('wp post update '.$meta['id'].' /tmp/wpupdate', $output);
        var_dump($output);
    }
}

function getContent(string $url): ?string {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url.'?nonce='.md5(microtime()));
    curl_setopt($ch, CURLOPT_FRESH_CONNECT, TRUE);

    $data = curl_exec($ch);
    $code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    if ($code != 200) {
        return null;
    }
    curl_close($ch);
    return $data;
}

function mdToHtml(string $text): string {
    $p = new Parsedown();
    $p->setUrlsLinked(true);
    return $p->parse($text);
}

function getMeta(string $folder): ?array {
    $data = getContent(trim($folder, '/').'/meta.json');
    if (!empty($data)) {
        return json_decode($data, true);
    }
    return null;
}

At this point, we can test things. Perfect chance for a brand new branch, too.

git checkout -b post-update
echo 'Adding a new line yay!' >> authors/some-author/some-post/en_EN/final.md
git add -A; git commit -am "Edit"; git push origin post-update

Let’s check our post out.

Final updated post

It works – deploying this script now is as simple as deploying the WP code of your app itself, and updating the webhook’s URL for the repo in question.

Conclusion

In true WordPress fashion, we hacked together a tool that took us less than an afternoon, but saved us days or weeks in the long run. The tool is now deployed and functioning adequately. There is, of course, room for updates. If you’re feeling inspired, try adding the following:

  • modify the post updating procedure so that it uses stdin instead of a file, making it compatible with no-writable-filesystem hosts like AWS, Heroku, or Google Cloud.
  • custom output types: instead of fixed {:en}{:}{:hr}{:}, maybe someone else is using a different multi-language plugin, or doesn’t use one at all. This should be customizable somehow.
  • auto-insertion of images. Right now it’s manual, but the images are saved in the repo alongside the language versions and could probably be easily imported, auto-optimized, and added into the posts as well.
  • staging mode – make sure the merged update first goes through to a staging version of the site before going to the main one, so the changes can be verified before being sent to master. Rather than having to activate and deactivate webhooks, why not make this programmable?
  • a plugin interface: it would be handy to be able to define all this in the WP UI rather than in the code. A WP plugin abstraction around the functionality would, thus, be useful.

With this tutorial, our intention was to show you that optimizing workflow isn’t such a big deal when you take the time to do it, and the return on investment for sacrificing some time on getting automation up and running can be immense when thinking long term.

Any other ideas or tips on how to optimize this? Let us know!

Sponsors