say we have a dataframe with many columns.
we want a new id
column, which will be a concatenation of two or more columns in the dataframe.
this is useful in a case when we have a table without a "natural" id
.
for example a sales table with a client_id
column, and a purchase_datetime
.
this function will concat our columns:
def concatenated_column(df: pd.DataFrame, id_keys: list[str], separator: str) -> pd.Series:
return df[id_keys[0]].str.cat(df[id_keys[1:]].astype(str), sep=separator)
it can be used as such:
df['concatenated_id'] = concatenated_column(df=df, id_keys=['client_id', 'purchase_datetime'], separator='+')
reference
How to concatenate multiple column values into a single column in Pandas dataframe