In this long overdue follow-up to Part 1, I'll be discussing the infrastructure issues associated with creating and serving image thumbnails at scale. The naive solution to generating thumbnails is to declare the image sizes you want in your templates (in Django, via template tags). When a response is rendered, the thumbnailing code will check for the existence of the thumbnail on the local filesystem and create it on-the-fly if it does not exist. This approach gives template designers total flexibility to create templates with any size thumbnail they want and requires no extra machinery or services on the server. It's great for small-scale sites, but breaks down quickly on larger scale sites.

Problem 1: Multiple App Servers

One of the most common approaches to scaling a website to handle more traffic is to scale it horizontally (add more servers). In order to balance a site's traffic across multiple servers, the servers' state (databases, uploaded files, etc.) must be shared across all the servers. This is an easy problem to solve these days with AWS services like S3 or EFS on AWS or self-hosted alternatives such as Minio or NFS.

While {E,N}FS can still be accessed like a filesystem, storage services like S3 or Minio are accessed via an HTTP API. This requires a thumbnailer that can speak to those services. Django has us covered in this instance thanks to django-storages, but it's something to consider for other frameworks.

When a filesystem moves from a local disk to the network, expect a significant drop in speed. Operations that were near instantaneous locally (like checking if a file exists) can take tens or even hundreds of milliseconds, especially when they happen tens of times per request. The thumbnailing solution will need to take this into account by caching or some other approach to avoid those slow requests. In the case of Django's popular easy-thumbnails package and its successor easy-images (both created and maintained by our very own Chris Beaven), database caching is used to know which thumbnails exist and which ones need to be created.

Problem 2: Sitting on the Critical Path

On high-traffic websites, a slow page can turn regular traffic into a DDoS by tying up all the site's web workers. Pushing a page live where thumbnails have not been generated yet blocks the rendering of the page while thumbnails are generated. The slow-down is exacerbated by the fact that our files are no longer local, but instead require multiple network requests to download the original, resize it, and upload the thumbnail. In the worst case, all these operations don't complete before the server's timeout limit and requests start failing until all the thumbnails are generated. To avoid this problem, a logical next step is to move thumbnail generation outside the request/response cycle ensuring that pages are never blocked waiting on generation.

Problem 3: Pre-Generation vs. On-Demand

Removing thumbnail generation from the request/response cycle, means either pre-generating thumbnails on upload or using a service which generates the thumbnails on-demand when they are requested. There's no right option here. What you choose depends on your project's specific needs.

Pre-Generation

Thumbnail pre-generation is done via a worker queue. When a new image is uploaded, a job (or jobs) are placed on the queue to generate the necessary thumbnails. The app servers then assume the thumbnails exist when the page is rendered

Pros

  • With the right sized queue, thumbnails are immediately available
  • Well integrated into the Django toolchain
  • If you're already utilizing a job queue, it does not require new dependencies

Cons

  • Lack of flexibility, thumbnail sizes must be pre-defined upfront
  • Lots of storage is needed
  • Thumbnails are generated that may never be used
  • Adding/removing thumbnail sizes requires iterating over every existing image
  • If the queue backs up, pages may be served before thumbnails are generated

On-Demand

In contrast to pre-generating thumbnails, creating thumbnails on-demand uses an external service, like the self-hosted Thumbor or a hosted service like Imgix. The app servers craft a unique URL which specifies the source image and the necessary thumbnail dimensions and properties. The generated thumbnails are either stored indefinitely, but are usually ephemeral and re-generated later if necessary. With heavy caching at the CDN level, it's possible to not store them at all.

Pros

  • Flexibility for designers to use any size image
  • Less storage than pre-generation

Cons

  • Requires lots of CPU which can get expensive quickly
  • Additional service to install, configure, maintain and monitor (keep this in mind for local development as well)
  • First request for a thumbnail is slow, possibly a few hundred milliseconds as the source image is downloaded and thumbnailed
  • Traffic spikes on un-cached (or expired) thumbnails can overwhelm the service, opening an attack vector for DDoS

Go Forth and Thumbnail!

While none of these problems are insurmountable, they are definitely things to consider and plan around. They not only increase the complexity of the project, but also the points of possible failure.

If you have any thumbnail issues or tips I missed here, let me know in the comments.