Multitenancy and Google App Engine (GAE) Java

Vishal Biyani
Vishal Biyani

Multi-tenant applications, one instance serves more than one organization, but at the same time provides virtual isolation to data and applications from other tenants of the application. Since the hardware, Operating system, and in some cases application code are the same for all tenants of the application, it’s easier to maintain, monitor and make incremental changes based on aggregate data from tenants. It also provides economics of scale by which the services can be provided at lower cost to tenants.

The Multi-tenancy Principal is not a new phenomenon, it has been around for at least a few decades, but with emergence of Cloud computing, multi-tenant architecture is gaining more ground in the application space on cloud. When you create an account on a storage service like DropBox, you are assigned is a 2GB of space from petabytes of storage they have. Your 2GB space is exclusively for you and no one else can access it unless you share it. But space assigned to each user is potentially on the same physical storage device or set of physical storage devices. So you are one of tenants of multi tenant storage system of DropBox!

Levels of Multitenancy

Hardware and Resource multitenancy

This is simplest form of multitenancy, and something which has been in existence for quite some time! When you get storage on Amazon S3 for your application, you are getting space on a shared storage space and you are one of many tenants of the infrastructure. Most IaaS consumers are tenants of the infrastructure they are using.

Data multitenancy

Data multitenancy is achieved at a datastore level. Based on security and compliance constraints, and purpose of organization, the degree of architectural multitenancy might vary. Organizations wanting security and complete isolation of data might choose to have an isolated DB setup for their organization. Other possible variations might include having one schema for an organization within same Database, or having separate set of tables for each organization. A detailed treatment on this subject can be found here but would not be the main focus of our discussion here.

When we talk about data multitenancy, the application code accessing the data is same. The application might be hosted on multiple servers for scaling, but essentially the UI screens, logic and customizations of tenants remains identical, it’s only the data that differs.

Application multitenancy

Multitenancy weaved throughout an application is most complex type, and the hardest one to achieve. So, how is it possible that each tenat will have different logic and screens accessing different data still sharing common resources. Let’s look at a high level architecture of such a system:

A complete multi-tenant system achieves multitenancy at data level in similar to data multitenancy we earlier discussed. At the application level, a run-time engine combines the tenant specific metadata and customization data to kernel code, which gives a tenant specific application. The same logic might be applied at data level for cases where each tenant has a different kind of schema/objects. With object databases, it’s far easier to have different schema/objects for each tenat, or even have different attributes on same object for meeting needs of different tenants.

Complexity in having a completely weaved multi-tenant application arises from having the filters for tenants throughout application, at the same time being able to deliver the promise of scale and speed. Various mechanisms such as metadata caching are used to achieve that. Best practical example of a multi-tenancy weaved through application are SalesForce applications.

Multi tenancy in Google App Engine

GAE provides the Namespace API to achieve multitenancy at a data level as of today. The Namespace API is available for Datastore, Task Queues and Memchache. The Blobstore API does not support Namespace yet, so your binary asset will have to be compartmentalized at the application level with mechanism designed in application. Namespace API also supports Google Apps domain, so if your tenants are going to be Google apps users, then you can set their domain name as Namespace for their data. There are various strategies to use namespace for example at user level, google apps domain level and so on.

Namespace can be set before request enters the application by using filters configured in deployment descriptor. Namespace can also be set within request to other namespace to get common data, which can be unset or set to old namespace again.

Sample use cases

  • Setting the Namespace to the current google apps domain – this will split data for each google apps user. This can be done in a filter so that request entering the application is Namespace aware. In the following piece of code, we are checking if Namespace is null and if it’s null we are setting it to the current google apps domain.

[sourcecode language=”java”]
if (NamespaceManager.get() == null) {
} [/sourcecode]

If the data is split for each user, then user ID can be used as namespace;

[sourcecode language=”java”]
UserServiceFactory.getUserService().getCurrentUser().getUserId(); [/sourcecode]

  • A common namespace can be used across various tenants to access data common to all tenants for example to access zip codes of all cities in US can be retrieved from common namespace

[sourcecode language=”java”]
String currentNs = NamespaceManager.get();
// Get data which is needed from common Namespace
NamespaceManager.set(currentNs); [/sourcecode]

  • Seperate namespaces can be used to separate data between dev, QA and staging environments, and versions of GAE can be used to host different stages of codebase for respective environments.
  • Codebase of various modules of an application can be deployed in different versions like and data specific to that module can be stored in individual namespaces.
  • Common data can be stored in common namespace, and namespaces can be switched when needed.