Turing Machine and REST
We see it all the time. “RESTful” this, “REST” protocol that, etc. However, a lot of us don’t really understand exactly what it means. We’ll fix exactly that gap in this article!
State
In computer science (and mathematics, to some extent), there is a concept of state. A certain system can be in state A or it can be in state B or it can be a bunch of other states (usually, a finite list of states).
As an example, let us say that you write a program which turns the screen red if the temperature is more than 80 degreees farenheit or turns it blue if the temperature is less than 80 degrees farenheit.
We can call the first state (temp > 80 degrees) state “A” and the second state (temp < 80 degrees) state “B”. We’ll call the middling state (temp = 80 degrees) state “C”. As we can see, we have defined behaviours of the programs at state A and state B, but not at state C.
That is the general idea of state. Here’s the definition given by the all-knowing Wikipedia:
“In computer science and automata theory, the state of a digital logic circuit or computer program is a technical term for all the stored information, at a given point in time, which is used by the circuit or program.”
In short, it is a sort of “combination” of all the information that the program takes into account.
Now, we’re going to make a leap into something that’s considered largely theoretical and useless and then somehow connect to the very practical world of REST.
Turing machines
There was this man named Alan Turing, and he was quite a smart mathematician with an interest in the workings of computers. He conceived an imaginary computer (which, as we will see, is actually impossible to actually build) which he used to reason about things that happen in real computers.
The computer consists of a tape of infinite length, with a head through which the tape passes. The tape has “cells”, to which information can be written (numbers, colors, etc.) The tape moves through this machine, left or right, one cell at a time. The machine scans in whatever is already on the tape and, depending on what state it is in, it writes something back onto the tape then, subsequently, changes its state.
You may want to read that definition again for it to sink in completely. The essence of the idea is that it makes operations on memory based on state and states changes according to operations on memory.
This seems like a fairly useless theoretical fantasy (although, there is absolutely nothing wrong with that, read A Mathematician’s Apology if you’re wondering why people care to take an interest in theoretical stuff). However, it is quite possibly the most fundamental concept of computer science.
Using the Turing Machine, computer scientists are able to reason about any algorithm that can be run on a computer. In fact, the Turing machine has led to many advances in computer science.
So, the Turing Machine shows us that today (note: I’m not considering graph reduction processors), literally everything in computer science is based off the idea of state.
The Network
Then came along the concept of the internet. This is a place where packets can go haywire and are resent until they reach their destination. In general, there is never a complete transmission of full data.
Clients come in quickly and leave twice as fast. As such, holding onto client state doesn’t make much sense when working on a network system.
Also, in general, holding too much state in a system has been a cause for major pain (which also leads to funcitonal programming languages, which do not allow for the holding of state in a large portion of your code).
Now, people needed a network protocol for the internet which was simple, fast and handled management of state well in an extremely dynamic environment.
That protocol was HTTP, and the theory that grew out of the work on HTTP was labelled REST.
REST
REST stands for: REpresentational State Transfer. It is a system built around the “client-server” concept that networks are built on top of (well, ther are also peer to peer type networks, but, client-sever is arguably the simplest and most tested architecture). The name itself is slightly misleading, since the server is completely state-free!
There are a few constraints that all systems that claim to be “RESTful” must follow.
First of all, it must be a client-server system. This constraint has been modified in the past, but in the formal definition (and, for the theory of REST to work properly), we have servers to which clients can connect.
The server is fully stateless. This means that for each client request to the server, no state is reserved on the server. For example, if a client (using HTTP), requests the index page and subsequently requests the /user/home page, the two requests are completely independent of each other. The client holds all of the state of the system (hence, we have the back button).
Responses from the server must be cacheable, which means that there must be a system on the server which identifies respones as cacheable or not, so that clients (e.g. web browsers) can take advantage of the cache.
Finally, we must have a simple, clean, and uniform interface. If you have had some experience with how HTTP works, the request forms are very simple. They are largely verbs and nouns with a very easy to parse and human-readable format. SOAP is another form of a network protocol which absolutely obliterates this requirement and, therefore, is often very difficult to work with.
Now, what do all of these properties, when taking together, entail?
Implications of REST
They let us build systems that are cleanly seperated, scalable, and debugged easily.
Let us consider the restriction of statelessness on the server first, it may well be the most important (and, also, the most often violated by so-called RESTful architectures) bit.
As previously mentioned, in a client-server architecture the clients are very nimble; they fire requests and suddenly kill all communication. In this kind of a situation, it is much cleaner to not save state about the client on the server. This lets us reason about HTTP servers very easily; each client request may be treated as if it is a completely new client, and it wouldn’t make a penny of a difference.
Secondly, cacheable responses mean that the clients are able to get data much faster, because often times, the data can be retrieved from the client’s memory. This immediately increases client performance and, in some systems, server scalability.
The uniform interface may just be the most important. Having worked with SOAP, I can tell you that HTTP’s simple format is a blessing when trying to debug server code and the same applies to other RESTful systems.
Designing a RESTful System
Let’s say we need to write a server that returns stock quotes, as well as retaining a list of stock quotes to monitor (and, the user can either add to or clear the list).
This is an excellent case for a RESTful system, because it is inherently independent of identity of client. Therefore, the server need not hold any state about the client.
We first start by defining how the protocol language will work (a bit like the GET and POST verbs of HTTP):
GETQUOTE ticker – gives the price for a particular stock
ADDTICKER ticker – adds the given stock to the list
GETLIST – gets a comma seperated list of stocks in the list
That’s a fairly simple protocol, and, doesn’t hold any state on the server. Now, as for caching, we may say that we update the prices every hour, so caches more than one hour old may be thrown away.
And, that’s all there is to it! Of course, we still have to implement this, but, the general idea of the system is quite simple and clean!
Conclusion
Hopefully, this article gave you a solid understanding of REST.
Also, I hope that you will now be able to call out people who throw around the term “RESTful” too much – I can tell you, there’s a lot of them!
Please ask any questions in the comments, or tweet them out to me at @dhaivatpoincare :)