Skip to content

Instantly share code, notes, and snippets.

@sudarshan85
Created November 13, 2020 22:29
Show Gist options
  • Save sudarshan85/094d67197dc53e191187dc4b447024d9 to your computer and use it in GitHub Desktop.
Save sudarshan85/094d67197dc53e191187dc4b447024d9 to your computer and use it in GitHub Desktop.
Plots
df = pd.read_parquet(Path(args.data_dir).parent/'dataset_with_splits.parquet')
df['char_len'] = df['text'].apply(len)
df['word_len'] = df['text'].apply(lambda x: len(x.split()))
fig, ax = plt.subplots(1,1,figsize=(10,8))
ax = df['word_len'].plot.hist(bins=100, alpha=0.5)
ax.set_xlim(0,200)
ax.set_ylabel('# emails')
fig, ax = plt.subplots(1,1,figsize=(10,8))
ax = df['char_len'].plot.hist(bins=100, alpha=0.5)
ax.set_xlim(0,1500)
ax.set_ylabel('# emails')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment