Programming - - By Harry Fuecks

Pick of the Wikis: Dokuwiki

Been slowly evaluating some of the wiki’s out there over the past few months and going to (dare to) recommend one or two of the PHP wiki’s out there.

The main reason for shopping around is the current WACT wiki, which uses PHPWiki.

While there’s nothing fundamentally wrong with PHPWiki, the more documents and versions of documents you have, the slower it seems to get. Times I’ve tried to research why it’s getting slow, I’ve found the code to be lacking in transparency – could be that’s just me but really not interested in spending a lot of time getting to know the code base. So right now PHPWiki amounts to a “beast” I don’t want to messing around with.

Also, as Jeff mentions;

the wiki markup style seems almost non-deterministic. I never know what the page is going to look at when I edit it.

Another problem (a complaint we’ve had) is the lack of some kind of index / site map. There’s a more than a few pages in WIKI which aren’t referenced anywhere and to only way to make them discoverable is by manually building pages of links (as we’ve done here).

Otherwise, with the PHPWiki code base being a “thing” I don’t think any of us want to touch, we can’t really take advantage of it elsewhere, such as truly integrating the generated API docs with the wiki (there’s a separate blog here about integrating generated API docs with dynamic content,for user comments etc. that I’ll get to another time) or generating some kind of downloadable “manual” from it.

Finally I feel uncomfortable about the content being stored in a database – it makes difficult to modify the WIKI via anything but the web interface among many other things.

Anyway, one or two of the other WIKI’s I’ve looked at (these are all PHP based but others like TinyWiki were very tempting);

MediaWiki – now Wikipedia is powered by MediaWiki (as they mention here – the ultimate case for PHP and scaling?) which clearly makes it a very strong candidate. It has some great features like docbook export and the code base is generally a pleasant to explore. What I don’t like though is it uses a database again (although it does come with an excellent selection of supporting admin tools) and get the general feeling that it would become “the center of the universe” if we used it, making things like integration with the API docs tricky.

Yawiki, which is work in progress from Paul Jones who you may know from Savant. In general like where it’s going, particularly the clean code base. It’s using a DB again though and really looking for something more evolved (0.17 last time I looked). Way well take advantage of Text_Wiki when it comes to migrating from PHPWiki though.

PmWiki is a files based wiki and almost the outright “winner” for what I’m looking for. Get a good feeling about the sanity of the developers exploring the code in the sense that they seem to have used the simplest implementation possible in all cases and I suspect it would scale well, in terms of the volume of information it is managing. Not so good is that keyword ‘global’ which is all over the code. Also I have my doubts about the format content is stored in; it’s a kind of ini-file, which is going to need a specialised parser in addition to the parser for wiki markup. Also changes (diffs) are stored along with the content itself (in a single file) which is likely to result in some pretty big files as a page undergoes multiple edits (note is does store the most recent version of the page as a single entity, which is good – it’s not reconstructing the page from the diffs). Otherwise the markup is similar to PHPWiki’s and may well lead to the problems of determinism again.

DokuWiki – the more I look at it, the more I like it (the code is here). I think Andreas Gohr, the author, has managed to get the fundamentals exactly right…

+ It’s files based and what get’s stored is exactly what you typed in. That helps a lot if you need to use standard filesystem tools for editing. For example DokuWiki uses the Unix find(1) and grep(1), by default, for searching.

+ It uses namespaces when creating pages. Namespaces relate to the URL. Each namespace is a directory so if my page ID is wact:tags:list it will correspond to a file / directory structure like ./wact/tags/list.txt – that makes it possible to build useful indexes of the WIKI.

+ The wiki syntax looks deterministic plus Dokuwiki comes with an editing toolbar to help with markup and also supports keyboard shortcuts.

+ Old revisions of documents are stored separately in zipped archives. Online diffs are supported.

+ Dokuwiki manages conflicts, when two people are editing the same document, in a similar way to CVS – the second person to save their document is forced to manage the change (with help from a diff).

+ It’s buzzword compliant (CSS, XHTML, RSS etc.).

On the downside, it doesn’t seem to handle i18n character sets yet (which may involve re-writing it’s wiki parser which currently relies on PCRE) but the code base is (surprisingly) small and generally easy to understand – there’s some room from improvement in the code (seen from my angle anyway) but nothing major and I imagine it would be very easy to make changes incrementally. Because the fundamentals are right (in particular the way files are stored) can see myself being able to integrate this with WACT’s API docs.

Whether it scales is also an unanswered question. The index generation, for example, may need some examination and some of the configuration files (like the wordblock file) may need breaking into smaller pieces as they grow. Again, because I think Dokuwiki does the right thing in storing files, tuning it for capacity shouldn’t be a major issue.

Anyway – hope that’s some useful research. Thanks to Andreas for Dokuwiki – any chance of getting it on Sourceforge?

Sponsors