The gist of the talk is that going from a synchronous to a concurrent program in Python requires a significant amount of leg work. The talk took a simple socket program that calculates the Fibonacci sum synchronously and tries to make it concurrent. It compares and contrasts various approachs: threads, multiple processes, and corountines.
My take away was that there are a zillion ways of doing it in Python but none of them are great at taking advantage of multi cores. When I went through the process of typing the code used in his demo I decided for the fun of it to port it to Go.
The first surprises for me was how similar the synchronous version is in both languages. The code and the micro benchmarks that follow should be taken with a grain of salt like always.
Synchronous
The Go version requires a bit more typing and type ceremonies but the structure is very similar.
Synchronous Micro-Benchmark
The benchmark consists of running one instance of perf2.py which simulates a client hammering on our micro service.
Python ~17,000 req/s
PyPy ~22,000 req/s
Go ~21,000 req/s
PyPy is faster than go by a small margin but as far as I am concerned I would say that the 3 solutions are within the same order of magnitude.
Concurrency
The beauty of Go is that it only takes 2 letters to move from a synchronous to a concurrent version. Simply add go in front of the function call to fibHandler(conn). Not only is it simple, but, unlike Python, there is one obvious way to do it.
The Python equivalent is way harder to pull off, one could argue that it is probably out of reach for a huge portion of experienced Python developers. David Beazley illustrates very well the phenomenal diversity of approaches that could be taken, all broken to some extent. I am sure some other candidates comes to your mind: asyncio, Twisted, Tornado, etc.
Below you can see the coroutines version with a zest of ProcessPoolExecutor.
The interesting part is that even with all this work the Python version can’t take advantage of all the cores. Where the Go equivalent is controlled by an environment variable called GOMAXPROCS that determines how many cores you want to allocate to your programs. The performance characteristics are also different by an order of magnitude:
Concurrent Micro-Benchmark
This micro-benchmark does not include PyPy because some of the features used in concurrency.py are not currently supported, specifically the concurrent module.
fib(30)
A single iteration with 30 as the argument to fib.
Python 231ms
Go 5ms
Requests per second
3 clients running perf2.py
Python 275 req/s per perf2.py instance – concurrency.py takes 188MB of RAM
Go (GOMAXPROCS=3): 12500 req/s per perf2.py instance – concurrency.go takes 120MB of RAM
Go is significantly faster than Python; this is fine and expected. What I find more disturbing is how much easier it is to morph a synchronous program into its concurrent equivalent. In addition the resulting piece of Go code is also more readable and easier to reason about. Not all problems require a concurrent solution but for the ones that do Go has a lot to offer.
About the author
Yann Malet
Yann builds and architects performant digital platforms for publishers. In 2015, Yann co-authored High-Performance Django with Peter Baumgartner.
Prior to his involvement with Lincoln Loop, Yann focused on Product Lifecycle Management systems (PLM) for several large …