Unlocking Wagtail’s StreamField: Fast Cross-Page Queries with JSONField

If you work with Wagtail, you already know and love StreamField. It gives content editors a beautiful, block-based authoring experience, allowing them to build complex layouts without breaking the site’s design system.

But as developers, we know the dark side of StreamField: it is notoriously expensive to query.

Because Wagtail stores StreamField content as a deeply nested JSON structure, querying across pages based on what’s inside those blocks is painful.

Imagine you run a food and recipe publication. Your editors use a RecipePage with a StreamField for the body. Inside that StreamField, they build out the page using instruction blocks, video_embed blocks, and most importantly, an ingredient_list block where they type out exactly what’s needed for the dish.

Then, the ad sales team asks for a seemingly simple feature: “We just sold a targeted ad campaign to a dairy brand. Can we inject their banner ad onto every recipe page where ‘milk’ is listed as an ingredient?”

For a product manager, this is a logical request. For a developer, it’s a performance trap. Because of how StreamField data is stored, a standard database query isn’t an option. The naive approach usually looks something like this:

recipes_with_milk =[]

for recipe in RecipePage.objects.all():
    for block in recipe.body:
        if block.block_type == "ingredient_list":
            # Extract ingredients and lowercase them for comparison
            ingredients = [item.lower() for item in block.value]
            if "milk" in ingredients:
                recipes_with_milk.append(recipe)

This requires loading every recipe into memory, deserializing the StreamField, and iterating through every single block on the site. On a mature publication, this will bring your server to its knees.

We need a better way.

The Solution: Separate the Authoring from the Querying

The core realization is that StreamField is built for editors, not for databases.

To solve our query problem, we can let the editors keep their flexible blocks, but extract the data we actually care about into a flat, indexable structure that PostgreSQL can search efficiently.

We do this by adding a hidden JSONField to our base page model. Every time a page is saved, we parse the StreamField once, compute the metadata, and stash it in the JSON field.

1. The Abstract Base Page

First, let’s create a foundation. We’ll add a computed JSONField to a base class and override the save() method.

from django.db import models
from wagtail.models import Page

class BasePage(Page):
    # Default to an empty dict so it's always valid JSON
    # blank=True keeps it out of the Wagtail admin interface
    computed = models.JSONField(default=dict, blank=True)

    class Meta:
        abstract = True

    def save(self, clean=True, user=None, log_action=False, **kwargs):
        # If the child class knows how to compute metadata, run it.
        if compute_func := getattr(self, "compute_metadata", None):
            self.computed = compute_func()

        return super().save(clean, user, log_action, **kwargs)

By using getattr with a fallback, we keep BasePage generic. Subclasses can opt-in simply by defining a compute_metadata method. No signals, no Celery tasks, just standard Django model behavior.

2. Computing the Data

Next, let’s look at our RecipePage. We want to know two things: does this recipe have a video, and what is the exact, lowercased list of ingredients used?

from myapp.base.models import BasePage

class RecipePage(BasePage):
    body = StreamField(...)

    def compute_metadata(self) -> dict:
        ingredients =[]

        for block in self.body:
            if block.block_type == "ingredient_list":
                # Grab the raw strings, lowercase them, and add to our list
                ingredients.extend([item.lower() for item in block.value])

        has_video = any(
            block.block_type == "video_embed" 
            for block in self.body
        )

        return {
            "has_video": has_video,
            "ingredients": list(set(ingredients)), 
        }

Now, every time an editor hits “Publish” in Wagtail, our RecipePage quietly scans its own blocks and saves a flat dictionary to the computed field.

3. The Query

What used to be a N+1 nightmare is now an elegant, single-query lookup using Django’s JSONField lookups:

# Find all recipes with a video
RecipePage.objects.filter(computed__has_video=True)

# Find all recipes containing milk
RecipePage.objects.filter(computed__ingredients__contains=["milk"])

The Secret Sauce: PostgreSQL GIN Indexes

This pattern works well out of the box, but if you want it to be blazing fast, you need to leverage PostgreSQL’s GIN (Generalized Inverted Index).

A GIN index is Postgres’s answer to searching inside JSON documents. Instead of doing a sequential scan (reading every row to check the JSON), a GIN index maintains a map of key/value pairs to the rows that contain them.

When Django sees __contains=["milk"] on a JSONField, it translates that to PostgreSQL’s @> containment operator. If a GIN index is present, Postgres can jump instantly to the matching rows.

You can add this index directly to your model’s Meta class:

from django.contrib.postgres.indexes import GinIndex

class RecipePage(BasePage):
    # ...

    class Meta:
        indexes = [
            GinIndex(fields=["computed"], name="recipe_computed_gin"),
        ]

Note: There is a known gotcha here. If you define an index with a specific name on an abstract base class, Django will attempt to use that exact same name for every subclass, resulting in a database collision and a migration crash.

While Django 2.2+ allows you to work around this using string interpolation (e.g., name=”%(app_label)s_%(class)s_computed_gin”), Wagtail’s complex multi-table inheritance (RecipePage -> BasePage -> Wagtail Page) can still make Meta inheritance unpredictable. The safest, most explicit approach is to declare the GinIndex directly on every concrete subclass rather than putting it on `BasePage.

Unlocking Wagtail’s StreamField: Fast Cross-Page Queries with JSONField

The Solution: Separate the Authoring from the Querying

1. The Abstract Base Page

2. Computing the Data

3. The Query

The Secret Sauce: PostgreSQL GIN Indexes

Martin Mahner

Recent Insights

Ensuring Essential Data Exists in Your Django App on Startup

Siteprobe. A concise command-line audit for every URL in your sitemap

Ensuring View Consistency with abc.ABC in Django