Skip to content

Instantly share code, notes, and snippets.

@5j9
Created November 27, 2024 02:53
Show Gist options
  • Save 5j9/e27e78d1ae530cc11fff06997816cdd4 to your computer and use it in GitHub Desktop.
Save 5j9/e27e78d1ae530cc11fff06997816cdd4 to your computer and use it in GitHub Desktop.
time complexity of `pandas.DataFrame.index.is_unique` looks to be constant for unique index and linear for non-unique index
from timeit import timeit
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(range(10), columns=['A'])
times = []
lengths = []
for i in range(26):
# try changing ignore_index param to False and compare the result
df = pd.concat([df, df], ignore_index=True)
times.append(timeit('df.index.is_unique', number=1, globals=globals()))
lengths.append(len(df))
plot = pd.DataFrame(times, index=lengths).plot()
plt.show()
@5j9
Copy link
Author

5j9 commented Nov 27, 2024

Typical shape of the plot when ignore_index is True:
image

Typical shape of the plot when ignore_index is False:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment