Created
November 27, 2024 02:53
-
-
Save 5j9/e27e78d1ae530cc11fff06997816cdd4 to your computer and use it in GitHub Desktop.
time complexity of `pandas.DataFrame.index.is_unique` looks to be constant for unique index and linear for non-unique index
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from timeit import timeit | |
import matplotlib.pyplot as plt | |
import pandas as pd | |
df = pd.DataFrame(range(10), columns=['A']) | |
times = [] | |
lengths = [] | |
for i in range(26): | |
# try changing ignore_index param to False and compare the result | |
df = pd.concat([df, df], ignore_index=True) | |
times.append(timeit('df.index.is_unique', number=1, globals=globals())) | |
lengths.append(len(df)) | |
plot = pd.DataFrame(times, index=lengths).plot() | |
plt.show() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Typical shape of the plot when

ignore_index
isTrue
:Typical shape of the plot when

ignore_index
isFalse
: