Skip to content

Instantly share code, notes, and snippets.

@rrei
Last active April 12, 2025 18:33
Show Gist options
  • Save rrei/9306f01dd671013714ed24d06c1378cb to your computer and use it in GitHub Desktop.
Save rrei/9306f01dd671013714ed24d06c1378cb to your computer and use it in GitHub Desktop.
Different attempts at solving memory usage problems caused by large Django querysets
def do_stuff(obj):
"""Performs some simple processing on `obj`."""
obj.y = obj.related_obj.x ** 2 + 1 # DB query to fetch `.related_obj`
obj.save() # DB query to update `obj`
# Obtain a large queryset for this example.
qs = MyFurstModel.objects.all()
# Attempt no. 1: NO GOOD. This direct/naive approach has the advantage of describing
# in the simplest form possible what we're trying the achieve: simply call a function
# on each member of the queryset. The script quickly goes into swap because the whole
# queryset is being loaded into memory. Furthermore, since `do_stuff()` is also
# making a couple of queries for each object (and we're running with `DEBUG`
# enabled), Django's query cache just keeps growing.
for obj in qs:
do_stuff(obj)
# Attempt no. 2: NO GOOD. We solved the query cache problem, but the full queryset is
# still being loaded into memory.
for i, obj in enumerate(qs):
if i % 10000 == 0:
django.db.reset_queries()
do_stuff(obj)
# Attempt no. 3: NO GOOD. Use `.iterator()` to avoid caching the queryset, but the
# problem persists because we're using the MySQL driver, which doesn't support
# server-side cursors... damn you MySQL!
for i, obj in enumerate(qs.iterator()):
if i % 10000 == 0:
django.db.reset_queries()
do_stuff(obj)
# Attempt no. 4: pheeeew, we're finally there (with some caveats). We use nested for
# loops because `chunked_queryset()` returns a generator of querysets: the outer loop
# iterates over chunks, and the inner loop iterates of the actual results of the
# original queryset.
for chunk in chunked_queryset(qs):
django.db.reset_queries()
for obj in chunk.iterator():
do_stuff(obj)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment