Skip to content

Instantly share code, notes, and snippets.

@ianlini
Created October 26, 2020 10:36
Show Gist options
  • Save ianlini/a3f9437bdca367f994ecc342d88a2b26 to your computer and use it in GitHub Desktop.
Save ianlini/a3f9437bdca367f994ecc342d88a2b26 to your computer and use it in GitHub Desktop.
Python dict.setdefault() performance
In [1]: def a(keys):
...: d = {}
...: for key in keys:
...: inner_d = d.setdefault(key, {})
...: return d
...:
In [2]: def b(keys):
...: d = {}
...: for key in keys:
...: if key not in d:
...: inner_d = {}
...: d[key] = inner_d
...: else:
...: inner_d = d[key]
...: return d
...:
In [3]: import random
In [4]: keys = [random.randint(0, 99) for i in range(10000)]
In [5]: keys[:20]
Out[5]: [90, 47, 99, 21, 71, 36, 31, 11, 3, 97, 96, 45, 99, 20, 32, 68, 41, 8, 30, 98]
In [6]: sorted(a(keys).keys()) == list(range(100))
Out[6]: True
In [7]: sorted(b(keys).keys()) == list(range(100))
Out[7]: True
In [8]: %timeit a(keys)
706 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [9]: %timeit a(keys)
733 µs ± 20.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [10]: %timeit b(keys)
480 µs ± 14.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [11]: %timeit b(keys)
494 µs ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
@ianlini
Copy link
Author

ianlini commented Oct 26, 2020

dict.setdefault() is slow because:

  1. it involves a function call, while the in and [] operations for dict are optimized and don't involve function call (CPython)
  2. it creates an empty dict for each iteration even if the key exists

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment