Simple KVP System with Amazon S3

Tweet

A few weeks ago, I wrote about getting started with Amazon’s SDK for PHP. This article assumes you’re familiar with the basic concepts of using the SDK. Now we’re going to build on that knowledge to create something cool: a light, but powerful data storage system that can scale forever.

When you think of cloud storage, most of the time you think of media, such as photos, videos, and other content files. But cloud storage is also a great place to store other data.

In this article I’ll describe how to create a high-performance system for storing extraneous item-specific data using cloud storage instead of a database.

For example, let’s say you’re building a user account system. Sure, you’ll have the basics, including email address, password, first name and last name, etc. You might have a number of of details about the user, such as address, birthday, and much more. Many of these fields need to be stored in the database because you will want to be able to search and filter your users.

But some data you might want to add doesn’t need to be searchable, or it might be too big to be stored efficiently in a database. For example, perhaps you’d like to allow the user to customize the look and feel of your web app. They will be able to select a few colors, upload a logo, and configure other display options. These options don’t need to be searchable in a database, but you do need a way to easily and reliably access this information when each user signs in.

Amazon’s S3 storage solution provides an extremely easy solution. Here’s how to do it:

Requirements

S3 Service – You’ll need an Amazon AWS account with S3 enabled.

S3 Bucket – You’ll need to create an S3 bucket (discussed in my previous article). In our code, we’ll refer to this bucket as “my-unique-bucket-name”. You’ll use your own bucket name instead.

Web Hosting – You’ll need a regular web host. However, for this particular use case, it is advisable to host your application in the cloud at Amazon on an Amazon EC2 virtual server. The reason is that, when you host your website on an Amazon EC2 instance, all traffic between your server and Amazon S3 is completely free. Hosting anywhere else will incur charges. For testing and setup, these charges are insignificant (and will most likely fall under Amazon’s free usage tiers), but for the long term it’s a good idea to setup on EC2.

Diving into the Code

Time to get our feet wet…

Basic PHP Class

In order to make this system as easy as possible, we’re going to be building a simple PHP class. This will make it easy for us to use a KVP (Key-value-pair) concept for almost anything you can imagine. So to start, create a minimal class that 1) accepts an argument called “itemName” and 2) automatically creates an Amazon S3 object with the AWS SDK. Again, if you’re not sure where we are, start by reading this article.

class s3Kvp {

    private $itemName = Null;    // Property to store item name throughout the class.
    private $awsS3 = Null;       // Property to store an instance of the S3 Object

    // Construct runs when we create the class.
    public function __construct($itemName) {
        $this->itemName = $itemName;    // Store item name as class property.
        $this->awsS3 = new AmazonS3();  // Create s3 object as class property.
    }

}

// Create a new instance of the class
// Submitting 'user101' as the item name
// will allow us to store KVP data specifically
// for User with ID 101.
$myKvp = new s3Kvp('user101');

// We can create a separate instance of this class
// submitting a different itemName. In this case,
// data will be stored specifcially for User with ID 335.
$anotherKvp = new s3Kvp('user335');

// Of course, none of this does anything quite yet...

Loading the Data

Now we’re going to add a few new essentials to our class, including two new properties: one to store the actual data while the class is in use, and one to tell us if the data is dirty (if it has changed since we loaded it). We’re also going to try to load our data from S3. It’s starting to look a little complicated, but you can follow the comments for all the new functionality we’re adding:

class s3Kvp {

    private $itemName = Null;
    private $awsS3 = Null;

    private $data = Null;                        // We'll store our item data here.
    private $isDirty = false;                    // We'll track whether or not anything changed here.
    private $bucket = 'my-unique-bucket-name';   // This is the "bucket", or domain where your data is stored.

    public function __construct($itemName) {
        $this->itemName = $itemName;
        $this->awsS3 = new AmazonS3();

        // Here we're trying to get an object matching our item name
        // located in the folder 'kvpdata'.
        // Start by Creating a Secure URL to Access Your Data
        $url = $this->awsS3->get_object_url('my-unique-bucket-name', 'kvpdata/'.$this->itemName, '10 Days');

        // Then grab the contents of the URL
        // The @ will allow the script to continue
        // if the URL cannot be reached.
        $responseData = false;
        $responseData = @file_get_contents($url);

        // If we got something back in our response data,
        // load it into the "data" property of this class.
        if ($responseData) {

            // We're going to store our data as an array encoded JSON,
            // so if we get anything we know we can
            // decode it back to an array.
            $responseAsArray = json_decode($responseData, true);

            // Check if the data is valid.
            if (is_array($responseAsArray)) {

                // It's valid! Load it up.
                $this->data = $responseAsArray;

            } else {

                // Not valid! Load an empty array.
                // A good app would do some error handling here.
                $this->data = array();

            }

        } else {

            // If we don't get anything back,
            // it means this item is new so we're just going to
            // load 'data' as an empty array.
            $this->data = array();

        }

    }

}

Object Overloading

Overloading in PHP provides means to dynamically “create” properties and methods. It’s really fun, and we’re going to use it to dynamically store item data.

We’re going to create two new methods in our class. One, called “__get”, will allow us to easily retrieve information stored in our item data. The other, “__set”, will allow us to update existing data and store new data.

Here’s how it works:

If you created a new instance of our s3Kvp class called $myKvp and you want to access the property customBackgroundColor (so $myKvp->customBackgroundColor), you will get an error because that property doesn’t exist.

But if we create a method called “__get” in our class, PHP will run that function before getting an error. This allows us to dynamically access data. The same thing works with “__set”. If a property doesn’t exist, but you try to set it, the “__set” method will be called if it exists.

Here’s the code:

class s3Kvp {

    // This function will be called anytime
    // we try to RETRIEVE properties of our class
    // that do not exist.
    public function __get($propertName) {

        // Check to see if we have that property in our
        // data array.
        if (isset($this->data[$propertName])) {

            // If we have it, return it.
            // Simple, right!?
            return $this->data[$propertName];

        } else {

            // If not, return false;
            return false;

        }
    }

    // This function will be called anytime
    // we try to SET properties of our class
    // that do not exist.
    public function __set($propertName, $propertyValue) {

        // Simply set the propertyName submitted
        // in our data array to the value submitted.
        $this->data[$propertName] = $propertyValue;

        // One more thing...
        // We want to know if something is ever "set"
        // so we can mark the data as "dirty".
        // We'll use this later...
        $this->isDirty = true;

    }

    private $itemName = Null;
    private $awsS3 = Null;
    private $data = Null;
    private $isDirty = false;
    private $bucket = 'my-unique-bucket-name';

    public function __construct($itemName) {
        $this->itemName = $itemName;
        $this->awsS3 = new AmazonS3();
        $url = $this->awsS3->get_object_url('my-unique-bucket-name', 'kvpdata/'.$this->itemName, '10 Days');
        $responseData = false;
        $responseData = @file_get_contents($url);
        if ($responseData) {
            $responseAsArray = json_decode($responseData, true);
            if (is_array($responseAsArray)) {
                $this->data = $responseAsArray;

            } else {
                $this->data = array();
            }
        } else {
            $this->data = array();
        }
    }


Save that Baby

Now our class is starting to take shape. We can load data from S3, and we can set and retrieve data values. But something is still missing—saving the data. We need to send it back up to S3.

To do this, we’re going to utilize the __destruct method built into PHP classes. This method, if it exists, is called right before PHP terminates the class. So if we create the method, it will be called each time we use an instance of the class, once PHP knows we’re done with it.

In the __destruct method, we’re going to check if our class “isDirty”, and if so we’ll save our changes up to S3. Before we save, however, we’re going to convert our data to JSON. This makes it easy to store and, if we ever need to, easy to access by other systems.

class s3Kvp {

    // The __destruct is called by PHP as
    // each instance of this class is destroyed.
    public function __destruct() {

        // Check if "isDirty" is true,
        // which tells us that at some point
        // during the script execution
        // data was changed.
        if ($this->isDirty) {

            // Encode our data as JSON.
            $dataToSave = json_encode($this->data);

            // Use "create_object" to upload the data
            // back to S3. If the object already exists,
            // this will simply replace it.
            $options = array('body' => $dataToSave);
            $this->awsS3->create_object('my-unique-bucket-name', 'kvpdata/'.$this->itemName, $options);

        }
    }

    public function __get($propertName) {
        if (isset($this->data[$propertName])) {
            return $this->data[$propertName];
        } else {
            return false;
        }
    }

    public function __set($propertName, $propertyValue) {
        $this->data[$propertName] = $propertyValue;
        $this->isDirty = true;
    }

    private $itemName = Null;
    private $awsS3 = Null;
    private $data = Null;
    private $isDirty = false;
    private $bucket = 'my-unique-bucket-name';

    public function __construct($itemName) {
        $this->itemName = $itemName;
        $this->awsS3 = new AmazonS3();
        $url = $this->awsS3->get_object_url('my-unique-bucket-name', 'kvpdata/'.$this->itemName, '10 Days');
        $responseData = false;
        $responseData = @file_get_contents($url);
        if ($responseData) {
            $responseAsArray = json_decode($responseData, true);
            if (is_array($responseAsArray)) {
                $this->data = $responseAsArray;

            } else {
                $this->data = array();
            }
        } else {
            $this->data = array();
        }
    }

}

Using the Class

Here’s the final class:

class s3Kvp {

    private $itemName = Null;
    private $awsS3 = Null;
    private $data = Null;
    private $isDirty = false;
    private $bucket = 'my-unique-bucket-name';

    public function __construct($itemName) {
        $this->itemName = $itemName;
        $this->awsS3 = new AmazonS3();
        $url = $this->awsS3->get_object_url('my-unique-bucket-name', 'kvpdata/'.$this->itemName, '10 Days');
        $responseData = false;
        $responseData = @file_get_contents($url);
        if ($responseData) {
            $responseAsArray = json_decode($responseData, true);
            if (is_array($responseAsArray)) {
                $this->data = $responseAsArray;

            } else {
                $this->data = array();
            }
        } else {
            $this->data = array();
        }
    }

    public function __get($propertName) {
        if (isset($this->data[$propertName])) {
            return $this->data[$propertName];
        } else {
            return false;
        }
    }

    public function __set($propertName, $propertyValue) {
        $this->data[$propertName] = $propertyValue;
        $this->isDirty = true;
    }

    public function __destruct() {
        if ($this->isDirty) {
            $dataToSave = json_encode($this->data);
            $options = array('body' => $dataToSave);
            $this->awsS3->create_object('my-unique-bucket-name', 'kvpdata/'.$this->itemName, $options);

        }
    }

}

Of course, now that we did all the work, we want to use this thing! Here are a couple of examples:

User Account System

If you’re building a user account system, you might want to store substantial information that is user-specific. In this example, we’re assuming you have a shopping interface with categories. You allow users to decide how many products they want to view per page. When they make a selection, you update their preference using the user-specific KVP class we created:

// Include the class we just created.
require_once 's3kvp.php';

// Create an instance of s4Kvp
// using the current User's ID;
$userId = 12301; //This might be stored in cookie or session variable.
$userKvp = new s3Kvp('user'.$userId);

// In this example,
// We're setting the "prefsItemsPerPage"
// equal to whatever we received in the
// "prefsItemsPerPage" querystring variable;
$submittedPreference = $_GET['prefsItemsPerPage'];
$userKvp->prefsItemsPerPage = $submittedPreference;
echo 'test';

// Include the class we just created.
require_once 's3kvp.php';

$userId = 12301;
$userKvp = new s3Kvp('user'.$userId);

echo $userKvp->prefsItemsPerPage;

Shopping Cart System

I’ve built entire shopping cart systems that use only a KVP class like this. With the advent of Google Analytics, I no longer need to track “partial” or “incomplete” orders. Using a system like this allows me to store tons of shopping details, including robust information about cart items, promotions, checkout data, and much more. All of this is stored in cloud storage until the time of checkout, at which point I copy the specific data I need to be searchable into a database. I never have to worry about increasing my database size with anything but finalized orders.

Of course, for this type of example, I need to create a more robust KVP class—one that can handle lists of items and other features. But the idea is the same: Store item-specific data in cloud storage.

So Now Where Are We?

In fewer than 50 lines of code, we’ve created a robust system for storing itemspecific data; this system is fast and scalable.

Since I’ve been using systems like this, I’ve been able to keep my database tables very light, only storing information that I know needs to be used in searching, filtering, or summary reports. I don’t have to worry about my database tables growing to enormous sizes, and with S3, I never have to worry about running out of storage or backing up my data.

A Few things To Remember

Without some modifications, I don’t recommend a system like this to store data that will be read by multiple users under heavy load. I do use this concept under heavy load, but I add a caching layer to my KVP class.

Also, you should store your data in an S3 bucket that does not have public access; this is critical for security.

What’s Next?

Now that you have this basic class, you can add on to it to meet your needs. Here are a few things I’ve added to my own applications in the past:

Encryption – For an extra layer of security, I add encryption to this class. I insert it in the __construct and __destruct so that anytime I load or save data, I’m decrypting and encrypting.

Amazon SimpleDB – This type of system works great in combination with SimpleDB. Amazon SimpleDB is a highly available, flexible, and scalable non-relational data store that offloads the work of database administration. Using SimpleDB in combination with this S3 KVP storage concept allows you to create scalable, flexible systems that remove the concerns about performance or data backup.

Default Data – In many cases, you don’t want to just return “false” when you don’t find specific user data. You can easily specify default values for various properties and return them in the __get function. Since they are defaults, you don’t even need to store them in the item-specific data in S3, which keeps your data lighter.

Caching – I use caching all the time. Systems like memcached allow you to store data that needs to be accessed frequently on your web server. Adding a caching layer is very similar to adding an encryption layer. At the time of loading, you simply check to see if the data you’re looking for is in cache,and, if it is, load it from cache; otherwise, load it from S3. When you save, you update both the cached version and S3.

Validation – In many cases you’ll want to add data validation to your code. Sometimes you’ll add it right in this KVP class to ensure that data is at least formatted correctly. On the interface side, you’ll want to validate user input whenever it’s submitted before you save it using this KVP class.

Versioning – S3 provides a powerful way to track versions of your data. This is great for creating roll-back functionality. If you turn versioning on in your bucket, Amazon will store a backup version every time you save. You can then roll back to any previous version, anytime. I often use this simply as a backup—if I accidently write code that wipes out data, I can get it back if I have versioning on. There are many other uses. Versioning does increase storage requirements, however, so only use it if you really need it.

Questions?

If you have any questions or ideas about this concept, please start a conversation below. I’d be happy to respond!

Free JavaScript: Novice to Ninja Sample

Get a free 32-page chapter of JavaScript: Novice to Ninja

  • Vito Tardia

    Thankyou David, great article, I need to try the code right now! :)

    Just a a couple of questions:

    1) how do you deal with user-data that is not text? For example profile images and other custom content that a user could upload?

    2) what is a smart way to combine this KVP and SimpleDB?