Skip to content

Instantly share code, notes, and snippets.

@pyliaorachel
Last active July 20, 2023 06:13
Show Gist options
  • Save pyliaorachel/9cbc3eeb38910f429d7bf49c0d77d07d to your computer and use it in GitHub Desktop.
Save pyliaorachel/9cbc3eeb38910f429d7bf49c0d77d07d to your computer and use it in GitHub Desktop.
Some Regular Expressions that may be useful for data cleaning.
Punctuations, US-ASCII

/[!"#$%&()*+,\-.\/:;<=>?@\[\]^_`{|}~]/

Punctuations, include Unicode ones (\u2000-\u206F: general punctuations, \u2E00-\u2E7F: supplemental punctuations)

/[\u2000-\u206F\u2E00-\u2E7F\\'!"#$%&()*+,\-.\/:;<=>?@\[\]^_`{|}~]/

Chinese characters

/[\u4E00-\u9FFF]/

Non-english letters

/[\u00C0-\u1FFF\u2C00-\uD7FF]/

All letters

[\u00C0-\u1FFF\u2C00-\uD7FF\w]

References
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment