Skip to content

Instantly share code, notes, and snippets.

@SandieIJ
Created April 23, 2020 22:36
Show Gist options
  • Save SandieIJ/6ab7872355c085cdd3534c7477164977 to your computer and use it in GitHub Desktop.
Save SandieIJ/6ab7872355c085cdd3534c7477164977 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**REMOVING NEWLINE CHARACTERS AND NONLETTER CHARACTERS**\n",
"\n",
"When processing text, newline characters and nonletter characters do not add any valuable information to our text, however, they do add to the size of our text. It is, therefore, considered best practice to remove these characters from your data."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"no_new_lines = [re.sub('\\s+', ' ', sent) for sent in descriptions] \n",
"\n",
"#Remove non letter characters\n",
"non_letters = [re.sub('[^a-zA-Z]', ' ', no_new_line) for no_new_line in no_new_lines]\n",
"\n",
"# Remove distracting single quotes\n",
"no_quotes = [re.sub(\"\\'\", '', non_letter) for non_letter in non_letters]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment