Last active
August 15, 2023 12:28
-
-
Save ktprezes/22d96ee39cc2e9654a7cc13846dc88d3 to your computer and use it in GitHub Desktop.
Python - Pandas - MultiIndex DataFrame selections
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
# suppose we have a simple 'stocks' dataframe: | |
stocks = pd.read_csv('http://bit.ly/smallstocks') | |
"""# stocks | |
Date Close Volume Symbol | |
0 2016-10-03 31.50 14070500 CSCO | |
1 2016-10-03 112.52 21701800 AAPL | |
2 2016-10-03 57.42 19189500 MSFT | |
3 2016-10-04 113.00 29736800 AAPL | |
4 2016-10-04 57.24 20085900 MSFT | |
5 2016-10-04 31.35 18460400 CSCO | |
6 2016-10-05 57.64 16726400 MSFT | |
7 2016-10-05 31.59 11808600 CSCO | |
8 2016-10-05 113.05 21453100 AAPL | |
""" | |
# and we set MultiIndex: | |
stocks_mi = stocks.set_index(['Symbol','Date']) | |
"""# stocks_mi | |
Close Volume | |
Symbol Date | |
CSCO 2016-10-03 31.50 14070500 | |
AAPL 2016-10-03 112.52 21701800 | |
MSFT 2016-10-03 57.42 19189500 | |
AAPL 2016-10-04 113.00 29736800 | |
MSFT 2016-10-04 57.24 20085900 | |
CSCO 2016-10-04 31.35 18460400 | |
MSFT 2016-10-05 57.64 16726400 | |
CSCO 2016-10-05 31.59 11808600 | |
AAPL 2016-10-05 113.05 21453100 | |
""" | |
# selecting rows according to the first (0th actually) MultiIndex level values is easy: | |
stocks_mi.loc["CSCO"] | |
""" | |
Close Volume | |
Date | |
2016-10-03 31.50 14070500 | |
2016-10-04 31.35 18460400 | |
2016-10-05 31.59 11808600 | |
""" | |
# but when it comes to selecting rows according to values of deeper levels of MultiIndex, | |
# it doesn't work :( | |
# these don't work either: | |
# stocks_mi.loc[(,'2016-10-04'),:] | |
# SyntaxError: invalid syntax | |
# stocks_mi.loc[(:,'2016-10-04'),:] | |
# SyntaxError: invalid syntax | |
# stocks_mi.loc[(::,'2016-10-04'),:] | |
# SyntaxError: invalid syntax | |
# we can do it that way: | |
stocks_mi.loc[(slice(None),'2016-10-04'),:] | |
""" | |
Close Volume | |
Symbol Date | |
AAPL 2016-10-04 113.00 29736800 | |
MSFT 2016-10-04 57.24 20085900 | |
CSCO 2016-10-04 31.35 18460400 | |
""" | |
# but we can also do it that way: | |
# selection by one value: | |
stocks_mi[stocks_mi.index.get_level_values(1) == '2016-10-04'] | |
# and selection by values from list: | |
stocks_mi[stocks_mi.index.get_level_values(1).isin(['2016-10-03', '2016-10-04'])] | |
# and when the MultiIndex levels have their names (like here: 'Symbol', 'Date'), | |
# we can use these names in the 'get_level_values' method instead of numbers, e.g.: | |
stocks.index.get_level_values('Date") | |
# with this approach, we can create logical expressions, | |
# and select rows according to values from different levels of MultiIndex | |
# for example: | |
stocks_mi[ | |
stocks_mi.index.get_level_values('Date').isin(['2016-10-04','2016-10-03']) & | |
stocks_mi.index.get_level_values('Symbol').isin(["CSCO", "AAPL"]) | |
].sort_index() | |
""" | |
Close Volume | |
Symbol Date | |
AAPL 2016-10-03 112.52 21701800 | |
2016-10-04 113.00 29736800 | |
CSCO 2016-10-03 31.50 14070500 | |
2016-10-04 31.35 18460400 | |
""" | |
# we can also use the '.xs' method: | |
stocks_mi.xs('2016-10-04', level='Date') | |
""" | |
Close Volume | |
Symbol | |
AAPL 113.00 29736800 | |
MSFT 57.24 20085900 | |
CSCO 31.35 18460400 | |
""" | |
# and for tuples of values from different levels we can use: | |
stocks_mi.xs(('AAPL', '2016-10-04'), level=['Symbol', 'Date']) | |
""" | |
Close Volume | |
Symbol Date | |
AAPL 2016-10-04 113.0 29736800 | |
""" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment