Anatomy of a Web App: How I Built RedditLater in Clojure
I made RedditLater last year to allow people to post to Reddit at some pre-scheduled time. It has a modest usage; a few hundred visitors per day, with some fraction of those scheduling posts. In this article, I’ll write about how RedditLater works, and why I made some of the decisions I did.
I wrote the project about a year ago. It’s hosted on a single Heroku dyno. I chose Heroku because, hey, free hosting. So technical decisions were made with these limitations in mind – even though usage is modest, the hosting environment is quite limited. The app had to be able to deliver its scheduled posts (approximately) on time, and survive the occasional traffic surge from itself being posted on reddit.
RedditLater works by spinning up a separate worker thread to monitor the queue of posts destined for Reddit, which runs alongside the web server. The worker thread just goes through the list of queued posts and sends them to reddit whenever it finds one where
RedditLater was written using Clojure. I chose Clojure mostly because I was way into Clojure at the time. I still am, but I was then too. In retrospect, though, I can say it was a fine decision. Clojure is a simple, functional language with top-notch support for concurrency. It’s not exactly mainstream, but it’s popular enough to have first-class support on Heroku. RedditLater relies on the concurrency that Clojure makes so accessible to run parallel tasks on that one Heroku instance, especially the Lamina library and its excellent queue structures.
For persistence, posts and user login data are stored in a Mongo database hosted on MongoHQ, which plays nicely with Heroku. I used Mongo because the app isn’t database-intensive, and because Mongo is easy to use, especially from a language with a hash-map literal as Clojure has.
(I’ve come to think of Mongo as the datastore you use until you need to make a decision about which datastore to use.)
The application requires a few bits of functionality that can be separated into agnostic modules. The main functionality can be divided between the request handler and the worker. The request handler is the UI and frontend of the app, accepting user input. The worker takes care of actually posting things to Reddit.
Here’s an overview of the tasks the web worker performs in a typical workflow, where a user logs in and schedules a post:
- Serve a rendered template routed by URL
- Authenticate user with Reddit via Reddit’s OAuth support
- Store user’s authentication credentials for future Reddit API calls in mongo
- Store the desired post (including scheduled time) in mongo
- Put the post into the worker’s queue for posting
The request handler is written using the Ring library, the de-facto standard for Clojure web apps, with Compojure handling routing, and Enlive taking care of template rendering. I also used Middleman to mock out the UI and generate HTML templates to be used with enlive.
There’s nothing too interesting here, just the Clojure equivalents of some really mundane tasks that any developer could relate to. The more interesting part is the post queue. One bonus aspect of using Clojure is that it’s both easy and performant to start your request-serving machinery from your application. This is because Clojure’s concurrency is thread-based (in-process), while Python, Ruby, PHP etc. use multi-process concurrency.
On the worker side, the situation is much simpler:
- Take a post from the queue.
- Fetch the latest version of that post from Mongo.
- Check if it’s time to submit this post.
- If not, put the post at the end of the queue.
- If so, attempt to post. If the attempt fails, add the post back to the queue.
Here’s an annotated example of how this all looks:
myfunction(x, y) in Algol-derived languages is
(myfunction x y) in Clojure.)
;; Define a post queue (def upcoming-post-queue (lamina/queue)) (defn enqueue-post "Enqueue a post in the post queue." [post] (lamina/enqueue upcoming-post-queue post)) (defn time-to-post? "Is schedule < now?" [post] (>= (get post :schedule) (helpers/now))) (defn process-post "Grab a post from the queue. If it's time to post it, post it. If not, requeue."  (let [post @upcoming-post-queue] ; Blocks until a post is in the queue (if (time-to-post? post) (reddit-api/submit post) ; If so, submit the post with the reddit-api module (enqueue-post post))) ; Otherwise, add the post to the queue (Thread/sleep 1000.)) ; Sleep for a second ;; Called by main on startup (defn start-worker  (doall (repeatedly process-post)))
If you can get over the parentheses, you can see how this process is simplified when compared to solutions in languages like Python (Celery) or Ruby (Resque). Both of these require you to run and manage another process (for another $30 per month, on Heroku), and neither has quite as simple of an API.
Of course, there is a downside – scaling this architecture across many servers would require that the post queue implement some sort of sharding. But this method would scale vertically on one server pretty far before it became necessary to distribute processing. After all, since the queue handles locking, there’s no reason but server specs that you couldn’t start as many worker threads as you like.
And that’s all there is to it! Using only these simple tools, RedditLater has been running happily and continuously for over a year (with the occasional bugfix). Of course, there are many other ways to design such an application, but I hope you’ve learned a bit today from the design and the tools I chose. For more on how Redditlater itself works, here’s some more detail.