The database is the source of truth for most Django projects. So it is important that you can rely on it representing the world you expect when you retrieve its data. These expectations, such as "this field will never be null for this type of object" or "a user always has a profile" or "finalized orders cannot be modified" are part of the business logic of your project.
When these expectations are enforced in the database, you can be confident about the structure of your data, and avoid filling your code with tedious edge case handling.
Databases have long supported tools and concepts for enforcing these expectations. And Django has increasingly exposed them as part of its API rather than requiring raw SQL. Understanding these tools and avoiding common pitfalls are essential to safe and effective data handling.
Transactions
Transactions are one of the building blocks of databases. All statements within a transaction succeed or fail as a group. Django exposes this functionality with transaction.atomic, which rolls back any database changes if an unhandled exception occurs within its code block.
Use transactions to group related database writes that must occur together.
For example, creating a one-to-one UserProfile when creating a User object:
from django.contrib.auth.models import User
from django.db import transaction
from myapp.models import UserProfile
with transaction.atomic():
user = User.objects.create_user(...)
UserProfile.objects.create(user=user, ...)
Using this everywhere a User is created will ensure each user always has a corresponding profile.
As a basic safety harness, Django, by default, wraps all HTTP requests with a transaction with the ATOMIC_REQUESTS setting. Opening a transaction comes with some overhead, and maintaining a long-running transaction can cause performance and other issues. Also ATOMIC_REQUESTS unnecessarily slows down GET requests that do not make database writes. So it is not appropriate for sites with high traffic. Keep in mind that ATOMIC_REQUESTS does not apply to code that runs outside of the request/response cycle, such as celery tasks and management commands.
To help avoid long transactions, use Django's on_commit to do expensive work (e.g. an API call; email send) outside of a transaction. on_commit runs a callback function after the transaction has successfully committed. Ideally, expensive actions will be implemented in background tasks, but such tasks should be triggered by an on_commit call.
Non-repeatable reads
Because Django uses the "read committed" isolation level for all database transactions (by default), transactions can see the results of other concurrent transactions which have committed to the database during their run. This means, in highly concurrent environments, innocuous code could have unexpected behaviour, even within transactions.
Consider a simple product ordering system. A user can order a product only if it is in stock. Example code:
from django.db import models, transaction
class Product(models.Model):
in_stock_count = models.IntegerField(default=0)
class Order(models.Model):
user = models.ForeignKey("auth.User", on_delete=models.CASCADE)
product = models.ForeignKey(Product, on_delete=models.CASCADE)
@transaction.atomic
def order_product(user, product_id):
"""
Returns whether an order was created for the user.
If an order is created, the product's stock count is updated.
Orders cannot be made for out-of-stock products.
"""
product = Product.objects.get(pk=product_id)
if product.in_stock_count == 0:
return False
Order.objects.create(user=user, product=product)
product.in_stock_count = models.F("in_stock_count") - 1
product.save()
return True
The code correctly uses transaction.atomic to ensure Product stock count is updated only when an Order is created. If the transaction failed, the Product is not updated and Order is not created.
However, the assumption that
Orders cannot be made for out-of-stock products.
would be violated in the following concurrent scenario:
>>> product_id = Product.objects.create(in_stock_count=1)
Two transactions (A and B) both call
>>> order_product(user, product_id)
at the same time for the same product, with different users.
Sequence diagram for the two transactions:
Transaction A
Transaction B
A -> Database: SELECT Product (in_stock_count = 1)
B -> Database: SELECT Product (in_stock_count = 1)
A: sees in_stock_count = 1
B: sees in_stock_count = 1
A -> Database: INSERT Order
B -> Database: INSERT Order
A -> Database: UPDATE Product SET in_stock_count = in_stock_count - 1
Database: product in_stock_count becomes 0
B -> Database: UPDATE Product SET in_stock_count = in_stock_count - 1
Database: product in_stock_count becomes -1
A: returns True
B: returns True
After both transactions commit to the database, the product's stock count is -1! And someone ordered a non-existent product!
This is known as the "non-repeatable read" phenomenon. To handle such a case correctly, you need to use locks...
Locks
Django's mechanism for locking database rows in transactions is QuerySet.select_for_update. When the queryset is evaluated, the selected rows are locked, preventing concurrent updates or locks until the end of the transaction.
Use select_for_update to lock rows sensitive to order-of-operation changes, such as preventing changes to an object at its workflow state transition. Other examples:
- Checking if an object is read-only before modifying it (enforce read-only status)
- Sending a welcome notification to the user on login (prevent duplicate notifications).
- Verifying a token to create a user (handle double submission).
In the above Product and Order example, we can enforce the
Orders cannot be made for out-of-stock products.
business logic by applying select_for_update to the retrieved Product:
@transaction.atomic
def order_product(user, product_id):
"""
Returns whether an order was created for the user.
If an order is created, the product's stock count is updated.
Orders cannot be made for out-of-stock products.
"""
product = Product.objects.select_for_update().get(pk=product_id)
if product.in_stock_count == 0:
return False
Order.objects.create(user=user, product=product)
product.in_stock_count = models.F("in_stock_count") - 1
product.save()
return True
Now, the sequence diagram for the previous example with concurrent transactions A and B would look like:
A -> Database: BEGIN TRANSACTION
B -> Database: BEGIN TRANSACTION
A -> Database: SELECT Product (in_stock_count = 1) FOR UPDATE
Database: lock obtained on product
B -> Database: SELECT Product (in_stock_count = 1) FOR UPDATE
Database: BLOCKED (waiting for lock)
A: sees in_stock_count = 1
A -> Database: INSERT Order
A -> Database: UPDATE Product SET in_stock_count = in_stock_count - 1
Database: product in_stock_count becomes 0
A: returns True
A -> Database: COMMIT
Database: releases lock on product
B -> Database: resumes SELECT, returns Product
Database: lock obtained on product
B: sees in_stock_count = 0
B: returns False
B -> Database: COMMIT
Database: releases lock on product
The business rule is now enforced.
For locking to be effective, you need to use select_for_update at every "write" use and every update of decision factors such as Product.in_stock_count. That is, whenever the field is being read for the purpose of deciding whether to make a change (e.g. database write; API update call; email send), the object with that field needs to be locked. And, whenever the field on that object is updated, that object needs to be locked.
In the above example, this happens in the same transaction, but the use and update of decisions factors are commonly in separate transactions. Consider a manual stock adjustment to a product. This would also need to use select_for_update on the product.
Much of the cross-object business logic your project relies on can be enforced with the proper and consistent use of locks.
(Note: if you need to lock on code not directly related to database rows, you may want to look into distributed locking.)
Keep them short!
Minimize the length of transactions which hold locks on rows. Other transactions will be blocked from updating the row until the locking transaction ends. Excessive blocking will lead to performance issues.
Constraints
Django's Constraint classes prevent invalid data from getting into the database in the first place. They act on the database table like unique constraints, but allow for more diverse rules than just uniqueness. Most queries on a single table can be represented by a constraint.
They can be used to absolutely enforce:
- maximum and minimum values on fields,
CheckConstraint(condition=Q(age__gte=18), name="age_gte_18")
- mutually exclusive fields,
CheckConstraint(
condition=~Q(internal_user__isnull=False, external_user__isnull=False),
name="cannot_have_both_internal_and_external_user",
)
- unique constraints on only certain rows,
UniqueConstraint(
condition=Q(valid=True), fields=["email"], name="unique_email_for_valid_results"
)
- and all kinds of other rules.
The best part: they are automatically validated in form and model cleaning!
When creating constraints, think about the query you would make to find invalid data (i.e. a row which violates the constraint), and negate it.
For example, in a reservation booking system which supports both in-person and virtual reservations, you can safely assume every in-person reservation has a room code with the following constraint:
from django.db import models
class Reservation(models.Model):
in_person = models.BooleanField(default=True)
room_code = models.CharField(
blank=True,
help_text="Room the guest has booked.",
)
class Meta:
constraints = [
models.CheckConstraint(
condition=models.Q(in_person=False)
| ~models.Q(room_code=""),
name="in_person_reservation_requires_room_code",
violation_error_message="Room code is required.",
),
]
The query for invalid data is in_person=True AND room_code="". Its negation is in_person=False OR room_code!="" (by De Morgan's laws).
By enforcing these rules in the database, and relying on Django's constraint cleaning, you do not need custom validation or extra edge case checks in your application for violations of these rules. Such violations are just not possible. I have done away with many custom and tedious clean() implementations with declarative constraints.
For example, date field order is a common assumption which must be enforced:
from django.db import models
from django.core.exceptions import ValidationError
class Conference(models.Model):
start_date = models.DateField()
end_date = models.DateField()
def clean(self):
if self.end_date < self.start_date:
raise ValidationError("Start date must be before End date.")
Rather than implementing the validation in clean(), it can be part of the constraint which enforces the field order directly in the database:
from django.db import models
class Conference(models.Model):
start_date = models.DateField()
end_date = models.DateField()
class Meta:
constraints = (
models.CheckConstraint(
condition=models.Q(start_date__lte=models.F("end_date")),
name="conference_start_date_before_end_date",
violation_error_message="Start date must be before End date.",
),
)
Now the validation lives as a Meta attribute rather than a custom method, and any Conference retrieved from the database is guaranteed to have its start_date before its end_date - no violations can hide in stale data.
You can also enforce the "single falsy value" on nullable string fields. Typically, Django recommends string fields be non-null. But it can be useful to allow a string field to be null when it is also unique, making it an optional unique field. However, you run the risk of empty string values being saved in the field, which can mask uniqueness failures (e.g. a null value unintentionally saved as an empty string) and allows for two falsy values for the field: None and "".
This can be easily solved with a constraint:
class SyncedObject(models.Model):
external_id = models.CharField(
max_length=256, null=True, unique=True
)
class Meta:
constraints = [
models.CheckConstraint(
condition=~models.Q(external_id=""),
name="synced_object_external_id_not_empty",
violation_error_message="External ID cannot be empty.",
),
]
Alternatively, you could use a UniqueConstraint that makes the field unique when it is not the empty string. So the field no longer needs to be nullable but will still be unique, and still has only one falsy value: "" (instead of None).
The model would instead look like:
class SyncedObject(models.Model):
external_id = models.CharField(
max_length=256, blank=True
)
class Meta:
constraints = [
models.UniqueConstraint(
fields=["external_id"],
condition=~Q(external_id=""),
name="synced_object_external_id_unique_not_blank",
violation_error_message="Object with this external ID already exists.",
),
The result is the same: an optional unique string field with a single falsy value.
Note: always give constraints a useful violation_error_message. Django provides opaque generic error messages when a constraint fails validation without a custom message.
These database tools and concepts (transactions, locks, and constraints) are critical to maintaining data integrity and business logic. Their use can also lead to cleaner code by avoiding redundant edge case checks. And, in the case of constraints, validation can be done in a declarative manner rather than with custom methods.