FPDF 2 (Call for API input)

Michael_Morris1 · December 2, 2011, 12:14am

FPDF is a library written for PHP 4.x by Oliver Pathey. It’s a wonderful piece of work and I have nothing but the highest respect for the original author. That said, it was written for PHP 4.x. Last version number from the site is 1.6 and was updated about 3 years ago. It’s been used frequently enough by the users here, about once or month or so I see a question relating to it. And after 7 months of heavy use some of its API shortcomings have really gotten onto my nerves. So I’ve resolved to build a better version of it compatible with PHP 5.3.

Before I do that though I want to discuss API goals. Now, I’d put this in the PHP Application design forum, but that forum is - frankly - dead. Last post I made there a month ago is still the 4th thread from the top - and in my opinion its time to just merge that forum into this one and be done with it.

While I’m doing this mostly for myself, I will be releasing this under an open source license (the current one is has a license that has no copyleft protections which I view as a problem). That means the API will be used by others, and it therefore needs to make sense. Since this is a major version change I don’t feel obligated to make sure it’s backwards compatible. I may end up renaming the project though.

The reason for the rewriting is because the current API, frankly, sucks. This is the argument list of the Cell command

Cell(float w [, float h [, string txt [, mixed border [, int ln [, string align [, boolean fill [, mixed link]]]]]]])

So even if you don’t need to define a width or height and just want to write some text, you got to repeat those arguments. Making a link is a headache since you must traverse 3 unneeded arguments to get to it. It works, but it’s a pain.

The library is also emphatically not very extensible because it’s monolithic. It’s really just one 1700 line class. You can extend it, but it’s trickier than it needs to be.

Ok, enough ranting, this is what I plan.

Primary Project Goals

[list][]Minimize external code through use of logical defaults.
[]Rely on chaining for most property setting to allow for self-documenting code.
[]Allow PDF elements to be generated out of sequence and modified arbitrarily up until output time.
[]Modularize code into as many classes as necessary rather than presenting one monolithic class.
[]Allow positioning of elements to be absolute to the page, or relative to other objects on the page.
[]Have a little fun.
[/list]

First, since we are going for PHP 5.3 floor this will be a namespaced library composed of multiple classes. Each class corresponds to a document element, or the document itself. Another reason for the namespacing is to allow these classnames to be generic and intuitive without fear of collision with other products out there. The current planned classes are:

Document: This is the core object, which represents a single PDF document. Documents will be able to merge, and usually you get the other objects from the document.
Section: Documents break up into sections, which share common margins, headers, body, footers.
Page: A single sheet of paper within the document
Header: Header objects can be attached to documents to be document wide, to sections to be section wide, and to pages to be page specific.
Body: A section or document body may span multiple pages, might be broken into columns. Bodies attach to sections or documents.
Footer: As with headers, footers can be attached to documents, sections or pages.
Cell: A single text cell, which is the fundamental unit of PDFs in the system.
Image: An image object
Font: A font definition object.
Drawing: A vector calculated drawing, such as a circle or elipse - as opposed to Image objects which are raster based.

The largest change would be making use of chaining. Most of these objects will return themselves. Here’s the tutorial minimal example from the site.


require('fpdf.php');

$pdf = new FPDF();
$pdf->AddPage();
$pdf->SetFont('Arial','B',16);
$pdf->Cell(40,10,'Hello World!');
$pdf->Output();

Under the new API it would be


require('fpdf.php');

$pdf = new FPDF\\Document();
$pdf->createCell('Hello World')->face('Arial')->size(16)->bold();
$pdf->Output();

Internally, asking the document to create a cell when it has no sections or pages will force a default section and page to be created. This doesn’t have to be explicitly called. The cell height and width doesn’t need to be defined in either version to get the text to appear properly, but because of the current API design you have to pass something in for the first example.

Moving onto something a bit more complex, the second tutorail - a 2 page document with header and footer. The current FPDF library requires extension of the base class.


require('fpdf.php');

class PDF extends FPDF
{
// Page header
function Header()
{
    // Logo
    $this->Image('logo.png',10,6,30);
    // Arial bold 15
    $this->SetFont('Arial','B',15);
    // Move to the right
    $this->Cell(80);
    // Title
    $this->Cell(30,10,'Title',1,0,'C');
    // Line break
    $this->Ln(20);
}

// Page footer
function Footer()
{
    // Position at 1.5 cm from bottom
    $this->SetY(-15);
    // Arial italic 8
    $this->SetFont('Arial','I',8);
    // Page number
    $this->Cell(0,10,'Page '.$this->PageNo().'/{nb}',0,0,'C');
}
}

// Instanciation of inherited class
$pdf = new PDF();
$pdf->AliasNbPages();
$pdf->AddPage();
$pdf->SetFont('Times','',12);
for($i=1;$i<=40;$i++)
    $pdf->Cell(0,10,'Printing line number '.$i,0,1);
$pdf->Output();

I think PHP 5.3 and some chaining will allow for better…


require('fpdf.php');

$pdf = new FPDF\\Document();
$header = $pdf->createHeader()
$header->createImage('logo.png')->width(10)->height(6)->xPos(30);
$header->createCell('Title')->face('Arial')->bold()->center()->border()->marginLeft(80);
$header->marginBottom(20);

$pdf->createFooter(function() use ($self) {
    $self->createCell( 'Page '.$self->pageNumber.' of '.$self->pageTotal )->italic()->face('Arial')->size(8)->positionBottomLeft()->height(15);
});

$pdf->font->face('Times')->size(12);

for($i=1;$i<=40;$i++) {
    $pdf->createCell('Printing line number '.$i)->block();
}

$pdf->Output();

Note it’s much shorter. Other notes.

The biggest difference is the header isn’t a function in an extended class. That wasn’t necessary. The footer however does have to get page number and total on the fly, so a closure is used. For even more demanding customization the Footer and Header class can be extended, as can any other object in the system. Documents, sections, et al can be instructed to use your custom object(s) as necessary.

Font assignments and object properties will cascade - though not as much as in CSS (implementing such in PHP would be a nightmare). That is, if you don’t set a cell’s font, it will use the font of the object that contains it. If that object doesn’t have a font, the page, section, then document are checked.

The largest difference though, which is implementation level and only hinted at here, is that this library will only create the PDF buffer at output time. Until that time you can hold references to cells, pages and so on and modify their contents or appearance based on whatever is going on in your code. This as opposed to the current FPDF library, which requires you to create cells, pages, etc. sequentially in the order they appear in the document. The flexibility of not being required to do that is, more than anything else, the goal that’s driving me to do this work.

Thoughts?

Cups · December 2, 2011, 10:50am

I think its a brilliant idea, Michael.

I too used FPDF though it was years ago and was one of my first forays into OOP at that time - although it performed absolutely fine, I was somewhat scarred by the number and order of arguments each method required.

I cannot think of any immediate API requirement, I’ll let it stew - some use cases will come to mind.

Just to say, well done, good idea, thanks for asking. (ps I think you’ll find its Olivier)

Michael_Morris1 · December 2, 2011, 12:55pm

Off Topic:

I have mild dyslexia. A name like Olivier vs.Oliver is a bear for me to sort out. Same with homonyms - your/you’re, its/it’s, where/were/wear… Number sequences are particularly hard - how I got into programming is beyond me.

Michael_Morris1 · December 2, 2011, 2:32pm

Another thought - This might be better suited to PHP 5.4 because traits might be very helpful, especially in the cascading of object properties. Also, in the closure example above, PHP 5.4 would allow for $this to be used by the closure instead of an aliasing to $self, which is a work around for PHP 5.3. Not that I’m adverse to making it PHP 5.3, but if I can pick up some functionality that would be good.

I was tired when I made the original post so I’m going to explicity state goals for this project. REQUEST TO MODERATOR: Please copy the list that follows to the first post since I cannot edit that post any longer.

Primary Project Goals

[list][]Minimize external code through use of logical defaults.
[]Rely on chaining for most property setting to allow for self-documenting code.
[]Allow PDF elements to be generated out of sequence and modified arbitrarily up until output time.
[]Modularize code into as many classes as necessary rather than presenting one monolithic class.
[]Allow positioning of elements to be absolute to the page, or relative to other objects on the page.
[]Have a little fun.
[/list]

Ren · December 3, 2011, 12:50am

Not sure how applicable this is, but one of the nicest APIs I’ve come across is the Protovis API *.

It’s more geared to charts and such, but still might be of some inspiration.

There is also D3 but the examples on the protovis page looked better.

rpkamp · December 3, 2011, 1:50am

I absolutely love the idea and the chaining looks very shiny! However, I wonder. How would you know when you can actually put the contents on the page?
Since you’re chaining and just returning $this from every function, when do you know the chaining stops and you can actually put the element on the page? Or do you just keep it in memory and write everything to the page when you call some sort of flush() function? Or are there “special” functions that actually put stuff on the page? If so, which ones?

Michael_Morris1 · December 3, 2011, 4:50am

The actual writing of the PDF object will occur when the output function is called. Until then the document holds all the objects as they are created in the order they are created.

In the example above, when $pdf->createCell(); is called the document class makes a section, and that makes a page, and that makes a cell. Each object stores an array of the objects they hold, and I’ll probably implement ArrayObject for them (so foreach iteration over a document would traverse the sections, foreaching the sections traverses pages, and pages traverse cells/images/et al. )

After the cell is created it is returned, but not before it is stored within the creating object. Remember that PHP passes objects by reference, so if you catch the returned cell

$cell = $pdf->createCell(‘Hello World’);

That is how you could easier later modify it

$cell->setText(“Hello My World”);

The fact that the language, to save memory, passes object values by reference is an unused and usually surprise gotcha. But here I’m relying on that behavior for the API.

One final note - #output will only be available from the Document object. $cell->output() won’t work. It’s actually one of three output methods.

$pdf->output – Send the HTTP Headers for a PDF response, parse the document, then echo it.
$pdf->send – Send the HTTP Headers for a PDF attachment response, then echo it.
$pdf->__toString – Get the string value of the PDF. From there do what you want (put in a database, stream to a file, stream into IPP)

In the coming days I’ll go over each object in turn and what they’ll do.

felgall · December 3, 2011, 7:10am

I am looking forward to seeing the finished library. The FPDF library is one of the few pieces of PHP that I use on several of my sites that I didn’t write for myself and this sounds like it will make it far easier to maintain the modifications that I made.

rpkamp · December 3, 2011, 9:49am

Sounds like a sensible approach to me! How will you develop this? Will you put it on GitHub (or similar)?

oddz · December 3, 2011, 10:36pm

I have never used the fpdf library myself. None the less, I’m all for updating existing libraries and frameworks to make use of many of new things 5.3 offers. It seems lately there has been a real lack in that department for existing open source projects…

jeffvdovjak · December 4, 2011, 4:10pm

Off Topic:

I have mild dyslexia. A name like Olivier vs.Oliver is a bear for me to sort out. Same with homonyms - your/you’re, its/it’s, where/were/wear… Number sequences are particularly hard - how I got into programming is beyond me.

I have exactly the same problem

arborint · December 4, 2011, 9:15pm

I notice that there is a more recent release v1.7 (2011-06-18). It has improvements but does not add a fluent interface or multi-class design. Have you contacted the developer about your ideas?

Michael_Morris1 · December 5, 2011, 4:54pm

The only response I got from Olivier was a couple years ago when I mentioned the idea of simply updating to protected / private properties. At the time chaining hadn’t occurred to me. I mentioned the chaining in a subsequent post on the FPDF forum but got no response. I would like his input on this.

Second, I’m seriously thinking PHP 5.4 and using traits. The factory methods really seems to want the new trait system from PHP 5.4. Let me explain. Document, Header, Footer, Section and Page all have a createCell method. That method is always the same and something like this…


public function createCell( $text = '', $class = null ) {
  if (is_null($class) {
    $class = $this->defaultCellClass;
  }

  $cell = new $class($text, $this);

  $this->registerCell( $cell );

  return $cell;
}

Some error correction code would obviously go in there, but the drift is clear to that direction. I could use an inheritance tree, but I’m afraid of it becoming convoluted.

I’m torn on doing that though. Making it PHP 5.3 compatible keeps it available to the largest segment of the PHP community. But using traits would help the code be concise as possible by spreading responsibility around the objects.

Immerse · December 8, 2011, 2:12pm

I still use fpdf extensively at work, it works fine.
I’d love to work with an updated version though, as it’s… getting a little old.

Two suggestions, if I may:

keep it 5.3 for now. No traits, that sucks, but lots more users (in production environments, so more than just Hello Worlds tests).
I use FDPI to be able to import existing PDFs. This should be standard functionality ($fpdf->loadTemplate() or something?)

Michael_Morris1 · December 8, 2011, 3:28pm

PHP 5.3/5.4 is very in the air - but doesn’t affect the external API so that can be tabled for a moment. Incorporating FPDI or some other PDF loading schema would be useful.

I’m going to look at the Cell object in this post and what it needs to do, without thinking too much about which of these methods need to be inherited.

Unless otherwise noted, these functions return $this for chaining.

Positioning Related
#cursorX( $x )
#cursorY( $y )
#cursorZ( $z )
#position ( $x, $y, $z )

These set the respective position of the cursor relative to parent (by default the parent is a page, but cells can hold cells for relative positioning). The z index is the drawing depth of the object. Lowest Z objects get drawn first (so by changing Z you can force one object to draw behind another ).

#top( $x, $y )
#bottom( $x, $y)
#left($ x, $y )
#right($ x, $y )
#topLeft($ x, $y )
#topRight($ x, $y )
#bottomLeft($ x, $y )
#bottomRight($ x, $y )

These position the cell at the respective positions within the parent. The first four only address the relevant coordinate. If you call $cell->setY(20)->right() the $y value will be retained. The X and Y arguments of all these functions are offsets from the start position the function name implies. So $cell->top(-5); will move the cell to the top of its parent (again, usually a page) and then 5 more units up from there. All of these functions ignore paddings and margins.

#above( $x, $y )
#below( $x, $y )
#toLeft( $x, $y )
#toRight( $x, $y )
#upperLeft( $x, $y )
#upperRight( $x, $y )
#lowerLeft( $x, $y )
#lowerRight( $x, $y )
#behind ( $z )
#inFront ( $z )

These move the cell to a position outside the parent. Again, the offsets adjust the position further. Padding and margins are respected by these functions.

#sendToFront()
#sendToBack()

Give the object the highest or lowest z within the parent.

#absolutize()
Detach the cell from it’s parent cell and attach it to the page at the exact same calculated coordinates unless its already attached to the page.

Dimensions Related
#height( $h )
#width ( $w )
#dimensions ( $w, $h )

This should be pretty obvious.

#padding ( [ $v ] || [$v, $h] || $t, $r, $b, $l )
#paddingTop( $v )
#paddingRight( $v )
#paddingBottom( $v )
#paddingLeft( $v )
#margin ( [ $v ] || [$v, $h] || $t, $r, $b, $l )
#marginTop( $v )
#marginRight( $v )
#marginBottom( $v )
#marginLeft( $v )

I’m not a fan of byzantine argument sequences, but the padding() and margin() methods here are mirroring how CSS behaves. Give them one value, and all margins gain that value. Give them two, and the first argument is the top and bottom value and the second argument is the left and right. Given four and all four margins can be set with the one function call in clockwise order starting from the top. Again, not a fan of this, but I anticipate incoming users will be familiar with this setup from mucking with CSS.

Padding and margins will behave the way they are supposed to in CSS. Each cell has a boundary - if you set a border line you’ll be able to see it. The space between the border and the text is the padding. The space between the border and the next cell is the margin. Padding is hard - never overlapping. It also changes the dimensions of the object silently - if you set a height of 10 and a padding of 10 the object will be 30 high (30 of whatever you set the base unit to). If you are familiar with CSS this is old hat. If not prepare to get confused.

Margins overlap each other. If two objects with margins of 10 are beside each other they will be 10 apart, not 20 apart.

Text Related

#face( $s ) - The face of the font of the cell.
#size( $x ) - The size of the font.
#bold(),
#unBold()
#italic()
#unItalic()
#smallCap()
#unSmallCap()
#underline()
#noUnderline()
#strikethrough()
#noStrikethrough()

Text styling toggles.

#font( $face, $size, [$flags] )

The font function does this all in one call. The face and size must come first. The subsequent arguments will be namespace constants. You can pass them in separately in overload fashion or use bitwise operators to add them up. The constants are
FPDF\BOLD = 2^0
FPDF\ITALIC = 2^1
FPDF\SMALLCAP = 2^2
FPDF\UNDERLINE = 2^3
FPDF\STRIKETHROUGH = 2^4

Making the code read ( registered fonts are namespaced constants as well )


// Assuming we are in the FPDF namespace or are using it.
$cell->font( FACE_ARIAL, 14, BOLD, UNDERLINE );

// Otherwise
$cell->font( FPDF\\FACE_ARIAL, 14, FPDF\\BOLD, FPDF\\UNDERLINE );

// Since the styles are bitflags this works too
$cell->font( FACE_ARIAL, 14, BOLD & UNDERLINE );

Text can of course be positioned within the cell.
#align()

And the text align options are constants LEFT, RIGHT, CENTER. I’d like to implement JUSTIFY as well at some point.

Traversal
Cells can create each other, so it will be necessary to move from one object reference to another to traverse the document as one would traverse the DOM.

#parent() - Return parent cell, or the page.
#nthChild() - Return nth child cell
#firstChild() - Return the first child cell, or false
#lastChild() - Return the last child cell, or false
#children() - Return an array of children cells.
#next() - Return the next child cell, or false if none.
#prev() - Return the previous child cell, or false if none.

DOM Mimicry

url( $url ) - If the cell is a hyperlink the page it is to link to goes here.

#id( $string )
#addClass( $string )
#removeClass( $string )
#hasClassName( $string ) returns bool.
#getCellWithID( $string )
#getCellsWithClass( $string ) returns array of cells.

This is mimicry of DOM traversal and the rules for cell ID’s and classes work the same way. Each object searches within itself only. So $cell->getCellsWithID() only considers cells descended from it. To search the document then $pdf->getCellsWithID() will be required.

Other

#createCell( $text, $class )

Creates a cell, using class (or the default cell object class if not stated ) containing text.

#defaultClass( $class )

Sets the default class that will be used to create cells within this cell.

#convert( $class )

Transfer the current cell’s data to a new cell that is a member of class and return it.

#text( $string )

Set the text of the cell and return the cell.

Ok, I’m tired and will pick this up later - but that should be enough to go through for now.

rpkamp · December 8, 2011, 4:54pm

Wow, that’s some good thinking there!

A few pointers if I may:

Regarding the paddings and margins, CSS can take 1, 2, 3 or 4 parameters; you forgot the option with 3.
I would define the function something like


function margin(param1, param2=null, param3=null, param4=null)
{
  top = right = left = bottom = 0
  if (isset(param1) && isset(param2) && isset(param3) && isset(param4))
  {
    top = param1
    right = param2
    bottom = param3
    left = param4
  }
  else if (isset(param1) && isset(param2) && isset(param3))
  {
    top = param1
    right = left = param2
    bottom = param3
  }
  else if (isset(param1) && isset(param2))
  {
    top = bottom = param1
    right = left = param2
  }
  else if (isset(param1))
  {
    top = bottom = right = left = param1
  }
  else
  {
     throw new \\InvalidArgumentException("Inproper arguments given. Supply either one, two, three or four values, and don't skip values.")
  }
}

It would be nice if the margins and paddings could work both old hat and new hat using some sort of static variable as a flag?
You have unBold(), unItalic(), unSmallCap() but noUnderline() and noStrikethrough(). I would use either un, or no, but don’t mix them, as that will get terribly confusing (which one was this again?)
I’d add an extra function noStyles() that removes all styles, so you can do bold()->italic()->text('Hello')->noStyles()->text('look, all styles are gone now').
Instead of getCellWithID and getCellsWithClass I’d use getCellById and getCellsByClass, because that closer resembles the functions we all know.

Michael_Morris1 · December 8, 2011, 6:26pm

It’s going to have to be no because unUnderline just don’t look right

As for implementation, my thoughts…



public function margin() {
  return $this->parseCSSStyleMultiArgumentsForMarginsOrPadding( function_get_args(), 'margin' );
}

protected parseCSSStyleMultiArgumentsForMarginsOrPadding( array $args, $functions ) {
  if (count($args) == 1 ) {
    $finalArguments = array (
      'top' => $args[0];
      'right' => $args[0];
      'bottom' => $args[0];
      'left' => $args[0];
    );
  } elseif (count($args) == 2 ) {

  } elseif (count($args) == 3 ) {

  } elseif (count($args) == 4 ) {

  }

  return $functions ? $this->setMultipleMargins($finalArguments) : $this->setMultiplePaddings($finalArguments);
}

protected setMultipleMargins( array $args ) {
  $this->marginTop( $args['top'] );
  $this->marginRight( $args['right']);
  $this->marginBottom( $args['bottom'] );
  $this->marginLeft( $args['left'] );

  return $this;
}

protected setMultiplePaddings( array $args ) {
  $this->paddingTop( $args['top'] );
  $this->paddingRight( $args['right']);
  $this->paddingBottom( $args['bottom'] );
  $this->paddingLeft( $args['left'] ); 

  return $this
}

Trying to keep things DRY by channeling both these shortcut methods through one mitigator. The actual set margin process remains in the respective long form functions. From an internal API standpoint this means that if you change what happens when, say, marginTop is set, that change will be invoked whether you used marginTop or margin. Internally each function does as little as possible to make the classes as polymorphic as possible.

rpkamp · December 8, 2011, 7:14pm

How about end? endBold, endUnderline, etc

And yes, your way looks better than mine. Nice idea to use func_get_args to get it in an array
There should be an else for empty and/or > 4 size arrays though (IMHO).

Come to think of this btw, the top is always the value of the first argument, and if the second argument is set than that’s always the value of right.

So


protected parseCSSStyleMultiArgumentsForMarginsOrPadding( array $args, $functions ) {
  if (count($args) == 0 || count($args) > 4) {
    throw new \\InvalidArgumentException('invalid argment');
  }
  $finalArguments = array (
    'top' => $args[0];
    'right' => $args[0];
    'bottom' => isset($args[2]) ? $args[2] : $args[0];
    'left' => isset($args[1]) ? $args[1] : $args[0];
  );
  if (count($args) > 1 ) {
    $finalArguments['right'] = $args[1];
  }
  if (count($args) == 4 ) {
    $finalArguments['left'] = $args[3];
  }
 
  return $functions ? $this->setMultipleMargins($finalArguments) : $this->setMultiplePaddings($finalArguments);
}

Michael_Morris1 · December 8, 2011, 8:45pm

Using “no” because it’s affecting the styling of all the text of that cell, not a snippet of it. “end” would imply that only a section was a certain style. Note I think you’re misunderstanding the psuedocode a little.

$cell->text(‘hello world’)->bold()->text(‘hello my world’); will output ‘hello my world’, not a combination of the two. There probably should be an append() function to add more text to whatever has been passed. In any event, a PDF Cell is like an HTML element. Everything about the text in that cell will be the same. If the user wants to change text stylings he’ll have to use multiple cells.

Higher level objects (page, zone, document) will have functions to address this. But cells are pretty dumb things on the whole.

I’m not too worried about error returns yet. Philosophically I’m prefer above using trigger_error to fire off notices and warnings when a logical default is possible, but then using exception throwing for situations that the programmer must address. For the setting of margins, if they pass no parameters to the function then the function does nothing and a E_USER_WARNING is fired. If they pass three then we leave the 4th parameter (left) alone and fire E_USER_NOTICE. This allows the library to be forgiving of small mistakes.