Ruby Benchmarking Lessons Learned
About a month ago, I put the finishing touches on the first edition of The Ruby Web Benchmark Report, the most comprehensive “Hello World” web app benchmark ever done for Ruby. It was a significant undertaking. Along the way, I learned a few things about understanding performance, benchmarking, and Ruby that I didn’t know when I started. These lessons go far beyond what I expected to learn when I started the project.
You Never Really Know Until You Test
The first thing that was surprising to me is that you never really know what kind of performance you have until you run your own tests against your own code. It might sound obvious, but it’s true.
For example, I always thought Sinatra was the fastest Ruby web framework, but it’s not. In fact, while it’s faster than Rails, it’s not nearly as fast as plain old Rack or frameworks that feel like Sinatra, but have better performance, like Cuba.
Also, I was surprised how much performance potential there is in JRuby. Compared to MRI, the standard Ruby runtime, JRuby was much faster in almost every case, once it got warmed up. Up until now, I considered JRuby to be useful in the context of enterprise Java shops who are moving away from Java EE. I now realize that JRuby provides a high performance alternative that could delay or negate the need to move to a higher performance language like Java, Scala, or Go.
I would not have known any of this without running my own tests.
When you look at my numbers, do not take them at face value. Do your own tests. Whether you prove them right or wrong doesn’t matter. You will learn something simply by experiencing the testing process for yourself. If nothing else, the experience will open your eyes to the benefits of benchmarking your code more often.
Having my own data is great because I can use it to make decisions about what should change and what should stay the same. Without my own data, I feel like I’m flying blind.
The Tools Change The Results Sometimes
Originally, my plan was simply to use Apache Bench to run all of my tests. However, some of the cases showed that certain servers, like WEBrick, don’t work well with Apache Bench. So, I used a tool called wrk, which is also used in the TechEmpower Web Benchmarks.
Because the two tools work differently, you can’t expect the exact same request per second numbers. However, you can see the trends that match between multiple tools. Looking at trends that fit both tools is important. It helps spot where the measuring tool is incorrect to the point that it skews performance numbers.
Just like you calibrate a scale with a known weight, you need to verify that your measurement tools are accurate and sensible.
There were some servers or frameworks that just didn’t like either Apache Bench or wrk. Other times, one server would just work poorly with one tool. In that case, do you blame the tool or the framework or the server or even the runtime? When you take the time to run tests with multiple benchmarking tools, you can see the cracks. Sometimes it’s just very tiny glitches in a larger system, but in other cases the numbers never really reconcile with reality.
So, my advice, when it comes to benchmarking tools, is not to get too hung up on one particular tool. Use two or three or four in your own tests. You don’t have to go overboard, but even if you just run multiple test tools against a smaller testing subset, you will have better data to work with.
What I would hope learn from this section is that you can’t be content to sit on one tool or set of data because that will, eventually, lead you astray. If you verify your data across multiple tools, you will have a better view of true performance.
Mistakes Lead To Good Conversations
One of the most fascinating parts of doing the benchmarks was that results in particular scenarios led to some very good conversations with framework, runtime, and server creators.
In some cases, a framework needed a patch because the default settings didn’t lead to decent benchmarking results. In other cases, there were patches that could speed up a framework significantly. It was good to hear from the creators of various projects about what I could do to speed up their results.
Another conversation that spilled out of this project was about methodology itself. What is the best way to actually run these benchmarks? Should they be run on a local machine? Should they be run on remote machines to more appropriately simulate internet conditions? Should they be run in the cloud? Those are good conversations to have, because they lead to better data and better reasoning.
The most exciting conversation, I think, was around how much of an impact these benchmarks have in the real world. Considering I was testing a simple “Hello World” web app, the benchmark isn’t designed to show anything other than theoretical peak throughput. My thinking was, you will never have an app less complicated than “Hello World”, so everything else you create will be slower than that.
However, other people countered that we are talking about a couple milliseconds difference and, for most apps, that doesn’t matter. The database will almost always be slower. Even just reducing file size on images will have more impact on page load speed, most of the time.
I actually agree. We aren’t talking about performance differences that will make or break most apps in the real world. That being said, there is a real world perception that Rails is slow which has led many teams away from Ruby once their application needs to scale. My hope is this benchmarking effort will show that Ruby can be faster than the way most people deploy it.
It’s good that my benchmarking effort has at least restarted the conversation around Ruby performance and making things faster. I hope the conversation continues.
We Can Always Test More
The last lesson I learned is simply that there is always something else to test or some other variation to try. As soon as people saw the Hello World benchmark, they wanted to see database benchmarks, they wanted more frameworks, different configurations, etc. That is really great, and it brings to light that we probably aren’t doing enough testing of our work.
There isn’t a lot of great data out there about the performance of the various parts of our systems and gems. What framework is fastest? Well, it looks like Cuba right now. What ORM is fastest? I have no idea. I don’t have any data on ORMs yet. What about caching? What about JSON parsing? You get the idea.
There is a ton of performance data that we just don’t have. However, if we build a culture around performance and benchmarking, I think we can improve and make better decisions around our tooling.
Benchmarking is fun, if maybe a bit time consuming. The Ruby community would be better off if we all spent a bit more time benchmarking our code, so that we have better understanding and visibility of the software we create. Performance isn’t the most important thing to every application, but being aware of our application performance will help us create better software and that is a better outcome for everybody involved.
The Ruby Web Benchmark is a starting point for exploring performance, but I don’t intend it to be an end point. I plan on continuing to explore the world of high performance Ruby and if you want to follow along, I’ll be posting more articles, benchmarks, and related projects at my high performance ruby page and here on SitePoint as well.