Created
February 6, 2019 18:06
-
-
Save benjaminkaplanphd/783cf23ab409a8dc75d9cc8394c97630 to your computer and use it in GitHub Desktop.
Explode array values in columns to multiple rows
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import pandas as pd | |
def explode(frame: pd.DataFrame, columns: List[str]): | |
""" | |
This helper function explodes a new row | |
for each value in an array of values. | |
If there is more than one column to be exploded, | |
the array lengths must be the same (row-wise) | |
(Adapted from a SE code snippet) | |
Args: | |
frame: The input dataframe | |
columns: the columns with arrays to explode | |
Returns: | |
transformed dataframe | |
""" | |
# all columns that are not arrays of values | |
idx_cols = frame.columns.difference(columns) | |
# calculate lengths of arrays | |
lens = frame[columns[0]].str.len() | |
return pd.DataFrame({ | |
col: np.repeat(frame[col].values, lens) | |
for col in idx_cols | |
}).assign(**{col: np.concatenate(frame[col].values) | |
for col in columns}).loc[:, frame.columns] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment