Seven fallacies of developing distributed computing applications were coined in 1997 by Peter Deutsch. Later, the eighth was coined by James Gosling (the father of Java).
These fallacies directly relate to us as PHP developers since we build distributed applications each and every day. We build mashups, applications that interact with SOAP and REST services, authenticate users via Facebook, Google, or Twitter APIs, retrieve information from remote databases and caching services, etc. Make no mistake, we’re building distributed computer applications.
Given that we are building distributed applications, it’s important that we understand the eight fallacies and how they affect us.
1. The Network is Reliable
It’s fair to say that this is obviously untrue. Though network latency has decreased and bandwidth has increased markedly year after year since 1995, to say that the network is reliable is false.
Let’s say we’ve setup a simple application that doesn’t use too many services – a basic PHP application that uses MySQL as it’s backend. There’s arguably not much that could go wrong. However, let’s say that later we decide to go with a hosting provider for MySQL, such as Xeround to provide our database needs. Despite good scalability and high availability, what if something goes wrong at their end? What if their infrastructure suffers a DDoS attack or has downtime because of a internal issue?
We hear quite a lot about 99.999% uptime, but even that is never 100%. With the proliferation of services and the usually high availability of bandwidth available today, it can be easy to forget that nothing’s ever perfect.
How do you account for a failure of a service within your application?
2. Latency is Zero
Though latency may be low, indeed lower than it was some years ago, it’s never zero. To quote Arnon Rotem-Gal-Oz in his Fallacies of Distributed Computing Explained post:
At roughly 300,000 kilometers per second (3.6 * 10E12 teraangstrom per fortnight), it will always take at least 30 milliseconds to send a ping from Europe to the US and back, even if the processing would be done in real time.
Is this a bad thing? Well, yes and no. Depending on how we structure our application and the resources available to us, we can largely mitigate the issue of latency.
Instead of having our applications deployed in a single datacenter, we can host it with a service such as Amazon Web Services and make use of S3 so that we have data located in several regions around the world, bringing it closer to our end users and reducing the latency of the application over the network. But even though we can reduce latency, we can’t remove it. We can employ a series of methods and architectures to reduce its impact on us, but no matter what we do, it will always be present.
Have you considered this when you designed your application?
3. Bandwidth is Infinite
Can bandwidth really be infinite? If so, at what price is it infinite? When we consider that the web is increasingly going mobile, everything old is new again.
Now I’m not suggesting that we’re starting over from the speed of dial-up, and the newer 4G networks are notable faster than the earlier 2G and 3G networks. But still, even their peak data rates are currently less than those of a standard broadband connection.
Also, with the increasing uptake of mobile broadband, the amount of possible users seeking to use our service (we all want to be popular and have at least some of the success of Facebook) is growing at a phenomenal rate. Consider these statistics from mobithinking:
- There are 5.9 billion mobile subscribers.
- There are 1.2 billion mobile web users with 3G coverage.
- Mobile devices account for 8.49 percent of global hits.
Given that, it’s fair to say that even though bandwidth rates and the penetration of it around the world is increasing, the rate of increase of users serves to balance it out. Going further, with the massive flexibility that mobile broadband provides, a clear adhoc consumption of service naturally arrives.
Are you prepared for the sheer volume of potential load on your service? Can you handle the spike that this kind of availability can deliver?
4. The Network is Secure
I think it’s fair to say without going in to too much detail that this is, and will always be, false. If you have any doubts, maybe you should talk to a LinkedIn or eHarmony member.
When we design and deploy our applications, how much emphasis do we place on security, both in where the application’s hosted, such as Rackspace, PagodaBox or cloudControl, and also in the design of the application itself?
According to SecurityAffairs, Prolexic reported:
- 3,000% quarter-on-quarter increase in malicious packet traffic targeting the financial services sector.
- 19.1TB of data and 14 billion packets of malicious traffic against financial services sector during Q4 2011, increasing during 2012.
- 65TB of data and 1.1 trillion packets that were identified and mitigated in 2012, 80 times greater than in 2011.
Given that the network is not secure, we need to be certain that we are using good security practices as a matter of course. Given the plethora of good advice from such sources as Chris Shiflett’s blog, Essential PHP Security, the PHP Security Consortium, and others, it’s hard not to know how and why to bake security in to the core of our applications.
What are your security practices? Do you assess the vendors that you deploy with?
5. Topology doesn’t Change
Doesn’t it? Really? Does it not change, or do we just not know about it? When we host our applications with others, we just don’t know. If the provider reconfigures their data center, upgrades it, adjusts it, for whatever reason, the topology changes.
Given the earlier reference to the increased rate of smart phone usage, the topology changes frequently. From both a user and a provider perspective, topology can change nearly daily!
If the topology changes and an external service that it relies on can no longer be reached, resulting in say, no database access, then sure this is an issue. But if, internally at our provider, things change, and the application continues to function, then it may not be a problem.
Granted, it’s easy to code an application that’s small and hosted in a simple configuration. But applications change, and those that gain in popularity more so. Do you consider changes in topology in your design? How do you account for or handle failures in the application design and deployment design?
6. There is One Administrator
“But I have my application hosted with a single service provider. They provide the OS, database, and web server support”, you say. Okay, assuming that that’s your application, is there really one administrator? And if there was really only one administrator, would you really trust the provider with your application? I’d hate to think what could wrong if they were sick or went on vacation.
Normally, there will be at least a few administrators, though each may not have the same level of training and astuteness, both technically and more broadly. There should be policies in place, such as network intrusion detection and other security policies, but there’s no guarantee they will all follow them with the same level of thoroughness and diligence.
Given the plethora of hosting providers available today and the low time required to update DNS records, we have a lot of choice and flexibility that if one provider’s not meeting our needs and expectations, we can move away from them to another.
Have you considered how this affects you? What if you’re not in a position where you can easily change vendors? What if you have a high amount of vendor lock-in, or it will be costly to move? What if your application’s architecture is not flexible enough? What can you do to mitigate such risks?
7. Transport Cost is Zero
As with all of the statements thus far, the validity of this too is highly unlikely. If the servers that support our application are in the same rack in the same data center then transport cost can be greatly reduced, but in terms of the time cost. What about the monetary costs?
Yes, we can infinitely scale up and down elastically as the demand requires it and we can store the data for our applications across geo-located data centers so that it’s as physically close to our end user as possible, but at what cost?
What’s the architecture composition of your application or service? Is it approaching zero in respect to either cost or time? If you could reduce one, does it increase the other?
8. The Network is Homogeneous
Unlike the other fallacies, I think that this is one where we as PHP developers inherently understand. We host our applications on Windows, Linux, Solaris, BSD and Mac OS X servers. We use MySQL, SQLServer, SQLite, PostgreSQL, mongoDB, Hadoop, and Oracle for storing data. We consume external services via XML or JSON requiring different interfaces. As a multi-operating system and multi-service community, arguably right from the early days, we’ve never expected a homogeneous network.
But the question still needs to be asked, are you flexible in your approach? Can you work with multiple databases and datasources? Do you use relevant design patterns, such as abstract factory, to to consume data from a variety of sources and types with a transparent code interface? Or does your code break if you need to do something as simple as swap from XML to JSON?
I think as PHP developers the eight fallacies of distributed computing are as relevant as they ever were. Given the plethora of information and resources available, we are in a great position to understand them and mitigate the risks that stem from believing in them.
What do you think? Do you account for them when developing your applications and services? How do you think the eight fallacies affect your application?
Image via Perrush / Shutterstock