Programming Amazon S3, Part II

In the first part of this series, Jeff showed us how to get set up to start programming Amazon’s S3 storage service. If you followed along with that tutorial, you’ll have your AWS account set up, and have downloaded the CloudFusion library and set it up with your authentication keys. You will have also seen some simple ways to create buckets and retrieve lists of buckets from your PHP code.

This week we’ll take it a little further: we’ll show you how to upload files to S3, create a simple thumbnail storage service, and distribute your files with the CloudFront CDN.

Both these articles were excerpted from SitePoint’s latest book release, Host Your Web Site in the Cloud, Amazon Web Services Made Easy. They were both taken from Chapter 4, which is also available as part of the free PDF sample of the book.

I’m assuming you’ve already worked through Part I. Now that you know how to create buckets and list them in a web page, it’s time to talk about the primary feature of Amazon S3: storing files. Before we do, though, it’s time to take a little detour.

important: Running the Code

The programs in this article that start with a shebang (#!/usr/bin/php) are meant to be run from the command line. The others are meant to be run via a web server.

Processing Complex CloudFusion Data Structures

The CloudFusion functions that I have told you about thus far all return simple data structures—arrays of strings. However, the functions that I will use later in this article return a more complex data structure called a ResponseCore. S3 returns its results in the form of an XML document; CloudFusion parses the XML using PHP’s SimpleXML package and includes the parsed objects in the response where they can be referenced by name.^[1]

warning: CloudFusion Has Become the AWS SDK for PHP

In the time since we published the book and the first article in this series, the CloudFusion library has become the official AWS SDK for PHP. This release comes with improved support for AWS regions, as well as a bevy of other improvements. It might mean that you’ll need to make some slight adjustments to the code if you’re not using the 2.5 version of the library, but since the new 1.0 SDK is entirely based on the old CloudFusion codebase, those changes should be minor, and few and far between.

The following code calls S3 to list the first 1,000 objects in the bucket BOOK_BUCKET, and then calls PHP’s handy print_r function to display the resulting object tree:

Example 1. chapter_04/list_bucket_objects_raw.php (excerpt)

#!/usr/bin/php<?phperror_reporting(E_ALL);require_once('cloudfusion.class.php');require_once('include/book.inc.php');$s3 = new AmazonS3();$res = $s3->list_objects(BOOK_BUCKET);print_r($res);exit(0);?>

The resulting output is far too long to display in its entirety (465 lines for my buckets). Let’s look at some excerpts instead. Here’s the first part:

$php list_bucket_objects_raw.phpResponseCore Object[header] => Array(  [x-amz-id-2] => Ya7yAuUClv7HgR6+JJpz0sYDM1m4/Zy+d0Rmk5cSAu+qV+v+6➥9gLSHlytlD77wAn  [x-amz-request-id] => 14AA13F3F0B76032  [date] => Thu, 28 May 2009 06:51:26 GMT  [content-type] => application/xml  [transfer-encoding] => chunked  [connection] => close  [server] => AmazonS3  [_info] => Array  (    https%3A%2F%2Feditor.sitepoint.com => https://sitepoint-aws-cloud-book.s3.amazonaws.com/    [content_type] => application/xml    ⋮

The first line indicates that the data is of type ResponseCore. Further on, we find some standard PHP arrays. If we need to, we can access the data like this:

$res->header['transfer-encoding']$res->header['_info']['url']

$res is an object and header is one of the object’s instance variables, so it’s accessed using the -> operator. The header instance variable is a PHP array, so its members are accessed using the array syntax.

In the second line the _info member of header is itself an array, so a second set of brackets are used to access the url value inside.

A little bit further down in the output, we find the following:

[body] => SimpleXMLElement Object(  [Name] => sitepoint-aws-cloud-book  ⋮

The body instance variable is of type SimpleXMLElement. It starts out with a Name instance variable, which can be accessed as $res->body->Name.

Even further down we finally find what we came here for—the list of objects in the bucket:

[Contents] => Array(  [0] => SimpleXMLElement Object  (    [Key] => images/2008_shiller_housing_projection.jpg    [LastModified] => 2009-05-22T23:44:58.000Z    [ETag] => "e2d335683226059e7cd6e450795f3485"    [Size] => 236535    [Owner] => SimpleXMLElement Object    (      [ID] => 7769a42be4e57a034eeb322aa8450b3536b6ca56037c06ef19b1e1➥eabfeaab9c      [DisplayName] => jeffbarr    )    [StorageClass] => STANDARD  )  ⋮

You can see that body contains an instance variable called Contents, which is another array containing all the files in the bucket. Each file in the bucket is represented by a SimpleXMLElement object; each has Key, ETag, Size, Owner, and StorageClass members, accessed like this:

$res->body->Contents[0]->Key$res->body->Contents[0]->ETag$res->body->Contents[0]->Size$res->body->Contents[0]->Owner->ID$res->body->Contents[0]->Owner->DisplayName$res->body->Contents[0]->StorageClass

Of course, you’re free to use intermediate variables to make this code shorter or more efficient.

You may be wondering where the object names (Contents, Key, Size, and so forth) come from. The list_objects method makes an HTTP GET request to S3 to fetch a list of the first 1,000 objects in the bucket. The request returns an XML document, and CloudFusion parses and returns it as the body object. The object names are taken directly from the XML tags in the document.

If we were to modify the previous script to print out some of these values, it may look like this example:

#!/usr/bin/php<?phperror_reporting(E_ALL);require_once('cloudfusion.class.php');require_once('include/book.inc.php');$s3 = new AmazonS3();$res = $s3->list_objects(BOOK_BUCKET);print("Bucket Url: " . $res->header['_info']['url'] . "n");print("Bucket Name: " . $res->body->Name   . "n");print("First Key:  " . $res->body->Contents[0]->Key . "n");print("Second Key: " . $res->body->Contents[1]->Key . "n");exit(0);?>

In the above example we output the bucket’s URL and name, followed by the keys of the first two items in the bucket.

We have now come to the end of the detour. I hope that the ride was scenic, yet educational. Next, we will use this newfound knowledge to create a very handy utility function.

Listing Objects in a Bucket as a Web Page

Before we can write a script that outputs a list of all the objects in a bucket within a web page, we first have to write a rather complex function. We’ll add this function to our book.inc.php file and call it getBucketObjects:

Example 2. chapter_04/include/book.inc.php (excerpt)

function getBucketObjects($s3, $bucket, $prefix = ''){  $objects = array();  $next = '';  do  {    $res = $s3->list_objects($bucket,         array('marker' => urlencode($next),            'prefix' => $prefix)      );    if (!$res->isOK())    {      return null;    }    $contents = $res->body->Contents;    foreach ($contents as $object)    {      $objects[] = $object;    }    $isTruncated = $res->body->IsTruncated == 'true';    if ($isTruncated)    {      $next = $objects[count($objects) - 1]->Key;    }  }  while ($isTruncated);  return $objects;}

This function is more complex than anything you’ve seen so far, but there’s no need to worry. Earlier on I told you that one “list bucket” request to S3 will return at most 1,000 keys, regardless of how many keys are in the bucket. Our getBucketObjects function simply calls list_objects again and again until S3 says that there are no more objects to return:

	Our function accepts three arguments: an `AmazonS3` object, an S3 bucket, and a prefix value that defaults to an empty string.
	We use a `do … while` loop, so that the body of the loop always runs at least once.
	Each time I call `list_objects`, I pass in a value called `$next`. The first time through the loop, `$next` is an empty string, and `list_objects` starts at the beginning (alphabetically speaking) of the bucket. On subsequent loop iterations, `$next` is set to the final key returned on the previous iteration. This tells S3 to start retrieving keys alphabetically following the previous iteration’s final key.
	If the `list_objects` call fails, the function returns null.
	We retrieve the `Contents` array from the body of the response returned to our `list_objects` call, then loop through the values storing each one in the `$objects` array. This array will eventually be our return value.
	The data returned by a call to `list_objects` includes an element named `IsTruncated`. If this value is the string `"true"`, the listing is incomplete and there are more objects to be found. This condition is also used to control the loop.
	If the list is incomplete, we set the `$next` value ready to begin the next iteration.
	When the loop terminates, the `$objects` array is returned.

Put it together and this function fetches all the objects in the bucket, puts them all into one array, and returns the array.

tip: Avoid Going Loopy

I will freely admit that I failed to correctly state the termination condition when I first wrote this code. I knew that this would be tricky, so I used a print statement at the top to ensure that I avoided creating a non-terminating loop that would spin out of control and run up my S3 bill. I advise you to do the same when you’re building and testing any code that costs you money to execute.

With this function in hand, creating a list of the objects in the bucket becomes easy. Here’s all we have to do:

Example 3. chapter_04/list_bucket_objects_page.php (excerpt)

<?phperror_reporting(E_ALL);require_once('cloudfusion.class.php');require_once('include/book.inc.php');$bucket = IsSet($_GET['bucket']) ? $_GET['bucket'] : BOOK_BUCKET;$s3 = new AmazonS3();$objects = getBucketObjects($s3, $bucket);$fileList = array();foreach ($objects as $object){  $key = $object->Key;  $url = $s3->get_object_url($bucket, $key);  $fileList[] = array('url' => $url, 'name' => $key,                     'size' => number_format((int)$object->Size));}$output_title = "Chapter 3 Sample - List of S3 Objects in Bucket' .    '${bucket}'";$output_message = "A simple HTML table displaying of all the' .    ' objects in the '${bucket}' bucket.";include 'include/list_bucket_objects.html.php';exit(0);?>

This code generates a web page and can accept an optional bucket argument in the URL query string. Let’s rip this one apart and see how it works:

	This code checks to see if the `bucket` argument was supplied. If it was, then it’s used as the value of `$bucket`. Otherwise, the default value, the `BOOK_BUCKET` constant, is used.
	Here we call our custom `getBucketObjects` function that fetches the list of objects in the given bucket and stores them in the `$objects` array.
	The next step is to iterate over the array and process each one.
	We store three values for each object in the `$fileList` array: the object’s URL, key (which we store as `name`), and size (converted to an integer and formatted like a number).
	We include our HTML template to output the values in the `$fileList` array.

Here’s what the list_bucket_objects.html.php HTML template looks like:

Example 4. chapter_04/include/list_bucket_objects.html.php (excerpt)

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"  "https://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="https://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>  <title><?php echo $output_title ?></title></head><body>  <h1><?php echo $output_title ?></h1>  <p><?php echo $output_message ?></p>  <table>    <thead>      <tr><th>File</th><th>Size</th></tr>    </thead>    <tbody>    <?php foreach($fileList as $file): ?>      <tr>          <td><a href="<?php echo $file['url'] ?>">              <?php echo $file['name'] ?></a>          </td>          <td><?php echo $file['size'] ?></td>      </tr>    <?php endforeach ?>    </tbody>  </table></body></html>

The template iterates over the $fileList array and creates a table row for each file, placing a link to the file in the first column and the file size in the second column.

Figure 1, “Listing objects in an S3 bucket” shows what it looks like (I had already uploaded some files to my bucket).

Figure 1. Listing objects in an S3 bucket

You may have spotted the fact that we now have all the parts needed to make a simple S3 file browser. I’ll leave that as a challenge to you. With just a little bit of work you should be able to connect list_buckets_page.php and list_bucket_objects_page.php.