Blog Post RSS ?

Blogs » PHP » Brion Vibber on Wikipedia and Mediawiki
 

Brion Vibber on Wikipedia and Mediawiki


  • Save to
    Del.icio.us

by Harry Fuecks

Looking at the top 20 of alexa’s global 500 popular sites, one thing that stands out is the majority are primarily “read only” sites - news, search or otherwise where updates to content are primarily managed by those running the site.

The big three exceptions here though are myspaces (running .NET now I believe - was Coldfusion), ebay (have they migrated fully to J2EE yet or is some of home-grown C++ still around?) and wikipedia (LAMP). All of these are, in some way, collaborative sites where content is created primarily by users. In other words, they have to be able to support a significant volume of writes as well as reads. That’s interesting because, in terms of scaling, the more volatile the data you’re providing, the harder it gets to scale - it raises questions like “how do you cache?”, “how do you handle transactions / locking?”, “how to you distribute updates” etc.

Anyway that wikipedia runs LAMP makes it somewhat of a poster-child and, as you may know, the software used on wikipedia is mediawiki, written in PHP. Given the scale of the technical problem the wikimedia foundation has had to solve, what’s been a little frustrating in the past finding detail from those involved on how they do it. Thanks to Brion Vibber we now have more information…

First up is his talk to Google, delever at the end of last month. Some fascinating details and trivia in there (e.g. they’re currently averaging about 1 update / sec) and, considering they “only” have about 100 application servers (running the mediawiki code), the overall impression is almost “is that all it takes? How small the Internet is” - Brion plays down the effort that has gone into making it possible with remarks like “It takes a little work”. He also mentions some of the issues they’re having with their wiki syntax parser, which has similar issues to those we’ve seen before elsewhere - they seem to be attempting to replace it with a C-based parser exposed as a PHP extension but given the date of last change, is that an effort which has stalled? Also, wryly noted, was the number of questions related to how wikimedia is financed - given it was a technical talk and the location, makes you go “Hmmmm…”.

Following that, more detail (with stronger PHP slant) comes from php architects webcast Interview with Brion Vibber - Marcus does a great job of asking pertinent questions - perhaps the biggest item was that wikipedia servers are already running PHP 5, even if the code isn’t yet taking advantage of the fact. Side note: imagine if wikipedia was running on something like .NET - can you imagine how much marketing noise there’d be following a successful move to the latest version? Funny how the LAMP world moves differently. Anyway - lots more detail in there you’ll have to listen to.

Great stuff and thanks to Brion for doing it.

This post has 6 responses so far

  1. I know there is some work being done on WYSIWYG editor for mediawiki. I guess this could be a possibility for replacing parsing.

     
  2. Thanks to Brion, he rocks!

     
  3. I know there is some work being done on WYSIWYG editor for mediawiki. I guess this could be a possibility for replacing parsing.

    How so? You mean eliminate use of wiki markup completely and use (X)HTML purely?

     
  4. From the small conversation I had with someone implementing it it seems the wiki markup has been XML-ified.

     
  5. From the small conversation I had with someone implementing it it seems the wiki markup has been XML-ified.

    OK - now that makes more sense of some of the things I’ve seen on their mailing list. They already seem to have some kind of mediawiki to xml parsing going on in here and the changes look recent. Interesting.

     
  6. Yahoo has more writes / sec than Wikipedia, and it’s mostly PHP (although not completely, of course). I’d use that as the “posterchild” for PHP before Wikipedia.

     

Sponsored Links

Leave a response

You are not logged in, log in with your SitePoint Forum username and password.

-OR- Post Anonymously

* Make sure any code samples are escaped (i.e. ‘<b>’ becomes ‘&lt;b&gt;’).

If not logged in, your comments will be placed in a moderation queue. This means your comment may not appear until one of our moderators approves it.

SitePoint Marketplace

Buy and sell Websites, templates, domain names, hosting, graphics and more.

Logo Design, Web page Design and more!

99designs

  • Custom logo designs created ‘just for you’.
  • Pick the design you like best.
  • Only pay if you’re satisfied with the result.

Want More Traffic?

Get up to five quotes from qualified SEO specialists, with no obligation!

Get A Free SEO Quote Now!