Last active
November 9, 2017 02:55
-
-
Save reservoirinvest/a818e754961d46d9ed25154f095f1061 to your computer and use it in GitHub Desktop.
Keep only certain rows #pandas
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# for scalar values | |
df.loc[df['column_name'] == some_value] | |
# for rows whose column value is in an iterable | |
df.loc[df['column_name'].isin(some_values)] | |
# Combine multiple conditions with &: | |
df.loc[(df['column_name'] == some_value) & df['other_column'].isin(some_values)] | |
# To select rows whose column value does not equal some_value, use !=: | |
df.loc[df['column_name'] != some_value] | |
# isin returns a boolean Series, so to select rows whose value is not in some_values, negate the boolean Series using ~: | |
df.loc[~df['column_name'].isin(some_values)] | |
# In this dataframe | |
# A B C D | |
# 0 foo one 0 0 | |
# 1 bar one 1 2 | |
# 2 foo two 2 4 | |
# 3 bar three 3 6 | |
# 4 foo two 4 8 | |
# 5 bar two 5 10 | |
# 6 foo one 6 12 | |
# 7 foo three 7 14 | |
# If you have multiple values you want to include, put them in a list (or more generally, any iterable) and use isin: | |
print(df.loc[df['B'].isin(['one','three'])]) | |
# Note, however, that if you wish to do this many times, it is more efficient to make an index first, and then use df.loc: | |
df = df.set_index(['B']) | |
print(df.loc['one']) | |
# or, to include multiple values from the index use df.index.isin: | |
df.loc[df.index.isin(['one','two'])] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment