How To Make Microsoft Word Documents with PHP

Taylor Ren
Tweet

As I had pointed out in my previous article, PHP and WMI – Dig deep into Windows with PHP, we do live in a world where we PHP devs have to deal with the Windows operating system from time to time. WMI (Windows Management Interface) is one such occasion and Microsoft Office Interop is another – an even more important and more frequently used one.

In this article, we will see a simple integration between Word and PHP: to generate a Microsoft Word document based on the inputs in an HTML form using PHP (and its Interop extension).

Preparations

First, please make sure a typical WAMP environment has been set up in your Windows development machine. As Interop is purely a Windows feature, we will have to host Apache and PHP under Windows. In this instance, I am using EasyPHP 14.1, which is quite easy to install and configure.

Next, we will have to install Microsoft Office. Its version is not that critical. I am using Office 2013 Pro but any Office version later than 2007 should work.

We then have to make sure the libraries to develop an Interop application (called PIA, Primary Interop Assemblies) are installed. To ascertain this, we can open the Windows Explorer and navigate to: <Windows Directory>\assembly and we will see a bunch of installed PIAs:

We see a Microsoft.Office.Interop.Word entry (underlined in the snapshot). This will be the PIA we use in this demo. Please pay special attention to its “Assembly Name”, “Version” and “Public Key Token”. These are to be used in our PHP scripts very soon.

In this directory, we can also see other PIAs (including the whole Office family) available for programming (not only for PHP, but also for VB.net, C#, etc)

If the PIAs list does not include the whole package of Microsoft.Office.Interop, we will either re-install our Office and include PIA features; or we have to manually download the package from Microsoft and install it. Please consult this MSDN page for detailed instructions.

NOTE: Only Microsoft Office 2010 PIA Redistributable is available to download and install. The PIA version in this package is 14.0.0. Version 15 only comes with Office 2013 installation.

Finally, we have to enable the PHP extension php_com_dotnet.dll in the php.ini file and restart the server.

Now we can move on to the programming.

The HTML form

As the focus of this demo is on the back end processing, we will create a simple front end with a simple HTML form, which looks like the figure below:

We have a text field for “Name”, a radio button group for “Gender”, a range control for “Age” and a text area for “Message”; and finally, of course, a “Submit” button.

Save this file as “index.html” in an directory under the virtual host’s root directory so that we can access it with a URI like http://test/test/interop.

The back end

The back end PHP file is the focus of our discussion. I will first list the code of this file, and then explain it step by step.

<?php

$inputs = $_POST;
$inputs['printdate']=''; 
// A dummy value to avoid a PHP notice as we don't have "printdate" in the POST variables. 

$assembly = 'Microsoft.Office.Interop.Word, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c';
$class = 'Microsoft.Office.Interop.Word.ApplicationClass';

$w = new DOTNET($assembly, $class);
$w->visible = true;

$fn = __DIR__ . '\\template.docx';

$d = $w->Documents->Open($fn);

echo "Document opened.<br><hr>";

$flds = $d->Fields;
$count = $flds->Count;
echo "There are $count fields in this document.<br>";
echo "<ul>";
$mapping = setupfields();

foreach ($flds as $index => $f)
{
    $f->Select();
    $key = $mapping[$index];
    $value = $inputs[$key];
    if ($key == 'gender')
    {
        if ($value == 'm')
            $value = 'Mr.';
        else
            $value = 'Ms.';
    }
    
    if($key=='printdate')
        $value=  date ('Y-m-d H:i:s');

    $w->Selection->TypeText($value);
    echo "<li>Mappig field $index: $key with value $value</li>";
}
echo "</ul>";

echo "Mapping done!<br><hr>";
echo "Printing. Please wait...<br>";

$d->PrintOut();
sleep(3);
echo "Done!";

$w->Quit(false);
$w=null;



function setupfields()
{
    $mapping = array();
    $mapping[0] = 'gender';
    $mapping[1] = 'name';
    $mapping[2] = 'age';
    $mapping[3] = 'msg';
    $mapping[4] = 'printdate';
    

    return $mapping;
}

After setting up the $inputs variable to hold the values posted from our form, and creating a dummy value for printdate – we will discuss why we need this later – we come across these four critical lines:

$assembly = 'Microsoft.Office.Interop.Word, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c';
$class = 'Microsoft.Office.Interop.Word.ApplicationClass';

$w = new DOTNET($assembly, $class);
$w->visible = true;

A COM manipulation in PHP requires an instantiation of a “class” within an “assembly“. In our case, we are to operate with Word. If we reflect on the first screenshot we showed, we will be able to construct the full signature of the Word PIA:

  • “Name”, “Version”, “Public Key Token” are all taken from the information displayed when we browse to “c:\Windows\assembly“.
  • “Culture” is always neutrual.

The class we are to invoke is always the assembly’s name plus “.ApplicationClass“.

With these two parameters set, we will be able to instantiate a Word object.

This object can stay in the background or we can bring it to the foreground by setting its visible attribute to true.

Next, we open the document to be processed and assign the “document” instance to a $d variable.

In that document, to create content based on the inputs from the HTML form, we have a few options.

The most unfavorable way is to hard code all the contents in PHP and then output into the Word document. I strongly discourage this due to the following reasons:

  1. There will be no flexibility. Any change in the output will require modification of the PHP script.
  2. It violates the separation between control and presentation.
  3. It will drastically increase the lines of code if we are to apply styles to the document contents (alignment, font, style, etc). Programmatically changing styles is too cumbersome.

Another way is to do a “search-replace”. PHP has strong built-in capabilities in doing this. We can create a Word document putting some special delimiters around the placeholder contents that are to be replaced. For example, we can create a document containing something like this:

{{name}}

and in PHP, we can simply replace this with the “Name” value we retrieved from the form submission.

This is straightforward and avoids all the disadvantages in the first option. We just need to find the right delimiter, and in this case, we are more like doing a template rendering, except that the template used is now a Word document.

The third option is my recommendation and is an advanced topic in Word. We will use fields to represent the placeholders, and in our PHP code, we will directly update the fields with respective form values.

This approach is flexible, fast and conforms with Word’s best practices. It also avoids full text search in the documents, which helps performance. Note that this option has its drawbacks too.

Word, ever since its debut, has never supported named indexes for fields. Even though we provided a name for the fields we created in the Word document, we still have to use number subscripts to access each field. This also explains why we have to use a dedicated function (setupfields) to do the manual mapping between the field index and the name of the form fields.

To learn how to insert fields in a Word document (click here for a ready-made version), please consult the relevant Word help topics and manuals. For this demo, we have a document with 5 MERGEFIELD fields. Also, we placed the document in the same directory as the PHP script for easy access.

Please note, the field printdate does not have a corresponding form field. That is why we added a dummy printdate key to the $inputs array. Without this, the script can still run but there will be notice saying that the index printdate is not presented in the $inputs array.

After updating the fields with form values, we will print the document using:

$d->PrintOut();

The PrintOut method has a few optional parameters and we are using its simplest form. This will print one copy to the default printer connected to our Windows machine.

We can also choose to use PrintPreview to take a look at the output before we decide to print the document. In a purely automated environment, we will of course use PrintOut instead.

We have to wait for a few seconds before we quit the Word application because the printing job needs some time to be fully spooled. Without delay(3), $w->Quit gets executed immediately and the printing job gets killed too.

Finally, we call $w->Quit(false) to close the Word application invoked by our PHP script. The only parameter provided here is to specify if we want to save changes before quitting. We did make changes to the document but we really don’t want to save them because we want to keep a clean template for other users’ input.

After we complete the code, we can load the form page, input some values and submit the form. The below images show the output of the PHP script and also the updated Word document:


Improving the coding speed and understanding more about PIA

PHP is a weakly typed language. A COM object is of type Object. During our PHP coding, there is no way to get a meaningful code insight out of an object, be it a Word Application, a Document, or a Field. We don’t know what properties it has, or what methods it supports.

This will greatly slow down our development speed. To make it faster, I would recommend we develop the functions in C# first and then migrate the code back to PHP. A free C# IDE I would recommend is called “#develop” and can be downloaded here. I prefer this one to the VS series because #develop is smaller, cleaner, and faster.

The migration of C# code to PHP is not scary at all. Let me show you some lines of C# code:

Word.Application w=new Word.Application();
w.Visible=true;
			
String path=Application.StartupPath+"\\template.docx";
			
Word.Document d=w.Documents.Open(path) as Word.Document;
			
Word.Fields flds=d.Fields;
int len=flds.Count;
			
foreach (Word.Field f in flds)
{
	f.Select();
	int i=f.Index;
	w.Selection.TypeText("...");
}

We can see that C# code is almost identical to the PHP code we showed previously. C# is strongly typed so we see a few type casting statements and we have to explicitly give our variables a type.

With variable type given, we can enjoy code insight and code completion so the development speed is much faster.

Another way to speed up our PHP development is to tap on Word macros. We perform the same actions we need to do and record them with a macro. The macro is in Visual Basic, which can also be easily transformed to PHP.

Most importantly, Microsoft’s official documentation on Office PIA, especially the namespace documentation for each Office applications, is always the most detailed reference material. The mostly used three applications are:

Conclusion

In this article, we demonstrated how to populate a Word document using PHP COM libraries and Microsoft Office Interop capabilities.

Windows and Office are widely used in everyday life. To have knowledge on the power of both Office/Windows and PHP will be essential for any PHP + Windows programmers.

With PHP’s COM extension, the door to mastering this combination is opened.

If you are interested in this area of programming, please leave your comments and we will consider having more articles on this topic. I look forward to seeing more real world applications developed using this approach.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://www.php.net/ Kalle Sommer Nielsen

    The PHP Examples are crippled

    • Taylor Ren

      The PHP code output gets distorted. I can’t edit a post when it is posted.

      Bruno, can you help?

      • http://www.bitfalls.com/ Bruno Skvorc

        Sorry guys, fixed!

        • Taylor Ren

          Thanks!

  • Mathieu SAVELLI

    Even if your tutorial and the Interop extension are both quite interresting, I can see only drawbacks in this method :
    * Windows only, while most hosting offers use Linux,
    * Office Word must be installed on the machine.

    I prefer the solution from https://github.com/PHPOffice that is OS agnostic and uses only pure PHP.
    It only requires XML and ZIP extensions, which are quite common, even on shared hosting.

    • Taylor Ren

      Hi @mathieusavelli:disqus

      Thanks for your comment and recommendation.

      Yes and No.

      The DLL is purely Windows. Office PIAs can be installed (though an older version) instead of a full Office installation.

      Understand your recommendation on a pure PHP implementation to manipulate Word (and other Office application). But this has its drawbacks too.

      First of all, I don’t recommend direct manipulation on a file, which I believe that lib does. It fails when there are changes to the file format and organization. The lib will get re-written — not from time to time, of course. But it is an issue. It is just like we won’t, no matter what, directly manipulate a database table file by simply modify its bits and bytes but rely on APIs.

      Next, there may be limitations of the document to be created by that lib. Some advanced features may be too difficult, if not impossible to implement. I have not tested yet, but would like to hear your comments and input on this.

      • Mathieu SAVELLI

        > First of all, I don’t recommend direct manipulation on a file, which I believe that lib does. It fails when there are changes to the file format and organization.

        Old .doc format is not developped anymore so won’t change, and as modern Office’s OpenXML format is standardized by ECMA, modifications on the format are well documented (and quite rare).
        Modern .docx and other MS Office documents (and Open Document Format too) are nothing more than XML files in a ZIP archive. Writing XML to a bunch of files and zipping it and are common process, and the XML document structures of Office OpenXML are well documented due to standardization.

        > Next, there may be limitations of the document to be created by that lib. Some advanced features may be too difficult, if not impossible to implement.

        I (partially) agree on this point : a native solution to manipulate specific proprietary files is often better for advanced usage.
        But with softwares as complex as MS Office suite, 80% of people use only 20% of the available features. So I don’t think it’s such a loss. And, then again, Office’s OpenXML formats are XML files in Zip archive, there should be nothing “too” hard to implement.

        • johnBas5

          The files produced by MS Office are different from the standards. Basically there exist dozens of OOXML standards, all very similar but slightly different.

  • Arka

    its great if you have vps, for shared server phpdoc is better.

  • Ben

    A simple way on linux would be to build an HTML file, save it to the server, run libreoffice in command-line mode to convert the HTML to a docx, delete the html file. Don’t know how scalable this would be, but it works pretty well, especially for simple documents like in this example.

  • Taylor Ren

    Nice point here.

    I have no objection, nor detachment to any pure PHP lib that manipulates Word (and all other Office) documents.

  • Naseer Ahamed

    What about bookmarks??