Testing Cedar-14 Memory Use
Last updated 08 June 2016
As part of our work to the Cedar-14 stack, we benchmarked apps to learn how changes in system libraries from Cedar to Cedar-14 manifested into real world differences in app performance. One thing we found was that memory behavior varied between stacks for some apps. We dug deeper into those variations and this article presents our findings.
For details on how to tune and optimize memory use, see the Tuning glibc memory behavior article.
Memory usage can be higher on Cedar-14 than on Cedar because of an underlying change in glibc’s malloc implementation. The change generally improves app performance when apps do memory allocations from threads. This is because more memory arenas are available to the app. This can cause performance for apps that create many threads (and use glibc’s malloc implementation internally) to have different performance characteristics on Cedar-14.
The way that glibc manages memory arenas can be fine-tuned by setting the environment variable
For the testing presented in this article, we used Cedar as the baseline and compared it to Cedar-14 with various values of
MALLOC_ARENA_MAX. The test candidate was a Ruby app configured to use the Puma web server.
Memory usage (RSS) averages (base is Cedar)
It’s worth noting that, while memory consumption on Cedar-14 was higher than Cedar, this app never used more than the 512 MB available in the dyno.
Median response time averages (base is Cedar)
95th percentile response time averages (base is Cedar)
The glibc default number of memory pools on 64bit systems is 8 times the number of CPU cores (the number of CPU cores seen by dynos on Heroku varies with dyno type).
Limiting the number of arenas from this default generally reduces memory usage. We tested with values of “1” and “2” and both resulted similar memory reductions. The performance differences between arena values of “1” and “2”, however, are measurable. In particular, “1” has worse performance than “2”. Not setting a value for
MALLOC_ARENA_MAX gives default glibc behavior and has the the best performance but also consumes the most memory. For details on how to optimize app performance with
MALLOC_ARENA_MAX see the Tuning glibc memory behavior article.
How we tested?
The Bundler API, the Ruby gem repository, runs on Heroku and we contribute to its maintenance. This let us test it on multiple stacks with different glibc configurations.
We built a service that reads through the logs of the production app and replays all eligible requests on a number of staging apps. Each staging app ran the latest release of Bundler API but was configured independently and with fully isolated resources.
The test ran for a total of 24 hours.