Last active
April 12, 2025 18:33
-
-
Save rrei/9306f01dd671013714ed24d06c1378cb to your computer and use it in GitHub Desktop.
Different attempts at solving memory usage problems caused by large Django querysets
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def do_stuff(obj): | |
"""Performs some simple processing on `obj`.""" | |
obj.y = obj.related_obj.x ** 2 + 1 # DB query to fetch `.related_obj` | |
obj.save() # DB query to update `obj` | |
# Obtain a large queryset for this example. | |
qs = MyFurstModel.objects.all() | |
# Attempt no. 1: NO GOOD. This direct/naive approach has the advantage of describing | |
# in the simplest form possible what we're trying the achieve: simply call a function | |
# on each member of the queryset. The script quickly goes into swap because the whole | |
# queryset is being loaded into memory. Furthermore, since `do_stuff()` is also | |
# making a couple of queries for each object (and we're running with `DEBUG` | |
# enabled), Django's query cache just keeps growing. | |
for obj in qs: | |
do_stuff(obj) | |
# Attempt no. 2: NO GOOD. We solved the query cache problem, but the full queryset is | |
# still being loaded into memory. | |
for i, obj in enumerate(qs): | |
if i % 10000 == 0: | |
django.db.reset_queries() | |
do_stuff(obj) | |
# Attempt no. 3: NO GOOD. Use `.iterator()` to avoid caching the queryset, but the | |
# problem persists because we're using the MySQL driver, which doesn't support | |
# server-side cursors... damn you MySQL! | |
for i, obj in enumerate(qs.iterator()): | |
if i % 10000 == 0: | |
django.db.reset_queries() | |
do_stuff(obj) | |
# Attempt no. 4: pheeeew, we're finally there (with some caveats). We use nested for | |
# loops because `chunked_queryset()` returns a generator of querysets: the outer loop | |
# iterates over chunks, and the inner loop iterates of the actual results of the | |
# original queryset. | |
for chunk in chunked_queryset(qs): | |
django.db.reset_queries() | |
for obj in chunk.iterator(): | |
do_stuff(obj) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment