⚠️ Warning: This book was published in 2014. Some of the details and code samples may be outdated.

The Big Picture


It’s not uncommon to hear people say “Django doesn’t scale”. Depending on how you look at it, the statement is either completely true or patently false. Django, on its own, doesn’t scale. The same can be said of Ruby on Rails, Flask, PHP, or any other language used by a database-driven dynamic website. The good news, however, is that Django interacts beautifully with a suite of caching and load balancing tools that will allow it to scale to as much traffic as you can throw at it. Contrary to what you may have read online, it can do so without replacing core components often labeled as “too slow” such as the database ORM or the template layer.

Django’s scaling success stories are almost too numerous to list at this point. It backs Disqus, Instagram, and Pinterest. Want some more proof? Instagram was able to sustain over 30 million users on Django with only 3 engineers (2 of which had no back-end development experience). Back in 2013, Disqus was serving 8 billion page views per month. You can be certain that by the time you’re reading this, the bigger players are serving many multiples of that. Those are some huge numbers. These teams have proven Django most certainly does scale. Our experience here at Lincoln Loop backs it up. We’ve built big Django sites capable of spending the day on the Reddit homepage without breaking a sweat.

Every site has unique needs and different pain points requiring extra attention to operate at scale. You may be surprised, however, to learn that their general approaches all look very similar. Perhaps even more surprising is that many parts of this infrastructure aren’t even unique to Django applications. The techniques we’ll describe are widely used across high traffic sites of many frameworks and languages.

Our point is this: Django scales, and the tactics described in this book will help you build sites capable of withstanding millions of page views per day and hundreds, if not thousands, of concurrent users. We have years of experience applying these tactics on heavily trafficked production sites. It works for us and we’re confident it will work for you too.


Simplicity is a prerequisite for reliability.

—Edsger W. Dijkstra

For our team at Lincoln Loop, the guiding philosophy in designing high-traffic Django sites is simplicity. Unfortunately, undisciplined developers will always trend towards complexity. Without making a conscious effort to fight complexity at every turn, it is too easy to waste time building complex, unmaintainable monstrosities that will bite you down the road.

Simplicity means:

  1. Using as few moving parts as possible to make it all work. “Moving parts” may be servers, services or third-party software.

  2. Choosing proven and dependable moving parts instead of the new hotness.

  3. Using a proven and dependable architecture instead of blazing your own trail.

  4. Deflecting traffic away from complex parts and toward fast, scalable, and simple parts.

Simple systems are easier to scale, easier to understand, and easier to develop. Of course, any non-trivial web application will bring its own unique set of complex problems to solve but by keeping the rest of the stack simple, you and your team can spend more time focusing on the product rather than on scaling and infrastructure.

The Pain Points

Django apps, and for that matter, most web applications share many common performance characteristics. Here are the most frequent pain points we encounter building performant web applications; they should look familiar to you.


A relational database (eg, Postgres, MySQL) is usually the slowest and most complex beast in the stack. One option is to replace it with a faster and less complex “NoSQL” database, but in many cases, that pushes the complexity into your application and squarely into the hands of your developers. We’ve found it simpler to keep it down in a proven RDBMS and handle the pain via caching.


Templates get complex quickly. To make matters worse, Django’s template engine has made a trade-off for simplicity and usability over speed. We could replace it with a faster templating engine like Jinja2, but it will still be the second slowest part of our stack. We can avoid the pain via caching.


Python is “fast enough” for many workloads and the trade-off it provides by having mature developer tools and a mature ecosystem is well worth it. The same can be said of just about every other mature dynamic scripting language. But we can serve requests faster from a web accelerator (e.g., Varnish) that can serve cached responses before a request even gets to the Python layer.

Cache All the Things

By now you probably see where we’re headed. The simplest general approach is to cache all the way down the stack. No matter how fast and how well tuned your stack is, it will never be as fast as a dedicated cache.

Serving the entire HTTP response directly out of cache is ideal. When it isn’t possible, as many parts of the response as possible should come from cache. Calls to the database can be kept to a bare minimum by implementing a caching layer there as well.

All this caching might sound like a nightmare for readers who know Phil Karlton’s famous quote,

There are only two hard things in Computer Science: cache invalidation and naming things.

In the following chapters, we’ll teach you safe caching techniques to ensure your users never see stale content (unintentionally). Additionally, we’ll show you how to tune your stack so it is as fast as possible, even on a cache miss.

Why the rush to cache?

Multi-layer caching lets us push the bulk of our traffic away from the more complex and custom built software onto battle-tested, high performance, open source software.


At each layer, load may be distributed horizontally across multiple systems. But the farther down the stack any given request travels, the slower and more taxing it will be on the infrastructure. Your goal, therefore, is to serve as much of your traffic from as high up the stack as possible.

The common players in this stack are:

  • Load Balancer:

    • Open Source: Traefik, HAProxy, Nginx, Varnish

    • Managed: All major cloud providers offer hosted load balancing solutions

  • Web Accelerator:

    • Open Source: Varnish, Nginx + Memcached

    • Managed: Fastly, Cloudflare

  • App Server: uWSGI, Gunicorn, Apache/mod_wsgi

  • Cache: Memcached, Redis

  • Database: Postgres, MySQL/MariaDB

The Journey of a Request

At first glance, all these different pieces of software can be daunting. In our consulting practice, we’ve seen sites that get the fundamentals of these functional elements wrong and end up with a fragile infrastructure held together with bailing wire and duct tape. It’s critical to understand the purpose of each one and how they interact with each other before moving forward.

Use your imagination and pretend you are a passenger in a magical vehicle that’s taken the form of an HTTP request and is traversing the layers of the web stack. The journey starts in the browser where an unassuming user sends you on your way by typing the domain of your website in the address bar.

A DNS lookup will happen (unless you’ve set a high TTL and the lookup is already cached). The lookup will point your vehicle to the IP address of a load balancer and send you rocketing off across the information superhighway toward your first stop.

Load Balancer

Your first stop is the load balancer whose main responsibility is to dispatch traffic to the underlying infrastructure. It acts as a single proxy point that receives requests from the internet and dispatches them to healthy application servers (aka, the pool). It also does health checks and removes app servers from the pool if they are determined to be misbehaving.

Most load balancers let you choose an algorithm (e.g., round robin, least connections) for distributing requests to the application servers. It may also be possible to specify weights to force some servers to receive more traffic than others.

For most cases, round robin is a safe default. Routing traffic to the server with the least number of connections sounds like an amazing idea, but it can be problematic in some scenarios. Take, for example, adding application servers to the pool during a traffic spike. The new server will go from zero connections to a flood of connections as soon as it joins the pool. This can lead to an undesirable result: the new server is overwhelmed, declared unhealthy, and taken out of the rotation.

The load balancer is a good place to do TLS termination. This is the act of decrypting a request coming in via HTTPS and passing it down the stack as HTTP. It’s good to do this early on in the stack. Speaking HTTP is easier and the load balancer usually has the spare CPU cycles to handle this task.

Depending on your choice of software, the load balancer may also have some overlapping functionality with the next layer on our journey, the web accelerator.

Web Accelerator

As your vehicle passed through the load balancer, it directed you to one of possibly many web accelerators at the next level of the stack. The web accelerator (aka, caching HTTP reverse proxy) is the first line of defense for your application servers farther down the stack. (In this book, we’ll focus on Varnish1, our preferred web accelerator solution.)

One of the first tasks for the web accelerator is to determine if this is a request for a resource where the response varies with each user. For many applications it might seem like every request varies per user. There are some tricks we’ll show you later to work around this, but the basic question at the web accelerator is this: is this page unique to you or the same for everyone?

If the response is user-specific, it will wave your vehicle on to the next layer in the stack. If not, it will see if it already has a copy of the response in its internal cache. If it’s found, your vehicle’s journey stops here and you’re sent back to the browser with the cached response. If it isn’t found in cache, it will send you down the stack, but take a snapshot of the response on your way back so it can store it in the cache.

Ideally most requests’ journeys end here. The web accelerator absorbs traffic spikes generated by a marketing campaign or viral content on sites like Reddit or Facebook.

Your journey is going to keep going, however. Next stop the application server!



Application Server

Up to now, you’ve been zooming along the high-speed interstate highway but as you start to pull into the application server, the road gets a little more winding and your pace starts to slow down.

The application server has a simple task, it turns your HTTP request into a WSGI request that Python can understand. (Our preferred application server is uWSGI2.) There are lots of lanes of cars passing through the application (aka WSGI) server and on the other side you catch sight of Django. The winding road now becomes city streets complete with turns and stop signs.




The Django streets should look familiar to you. You go through the middleware, hand off your URL to the dispatcher who points you towards one of the views in the application. You notice, however, that there are a few differences between this Django application and the ones you hack on on your laptop.

Some requests are getting responses and zipping home in the middleware zone. That’s Django’s per-site cache in action. As you enter the view you notice that instead of having to stop and wait for every database query, some return out of cache almost immediately (a database query cache). Rather than twisting and turning through the template level, you notice some of the blocks simply fly by (template caching).

While slow compared to the highway you were on earlier, the trip through Django was pretty fast. You’ve now got the full response in tow and need to head home to the browser. On your way, you’ll pass back through each layer, checking in a copy of your response with the per-site cache in Django and again with the web accelerator on your way out to the internet.

How long did your little request journey take? Surprisingly, all this happens in just a fraction of a second.

When you start looking at your own application’s response times, here are some rough numbers you can shoot for. If your application is five times slower, there’s going to be a lot of room for improvement.

Estimated Response Times


Varnish cache hit


Django per-site cache hit


Django with warm cache


Django with cold cache

Breaking it down further, for requests passing through to Django the total time spent should not be overwhelmingly dominated by a single component (database, cache, etc.). The majority of time should be spent working in Python with no more than 30% or so spent in any given component.

How’s your Django app compare? Is there room for improvement? In the next chapter we’ll explore the development process and show you how and where to optimize your application as you build it.