Created
April 23, 2020 22:36
-
-
Save SandieIJ/6ab7872355c085cdd3534c7477164977 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"**REMOVING NEWLINE CHARACTERS AND NONLETTER CHARACTERS**\n", | |
"\n", | |
"When processing text, newline characters and nonletter characters do not add any valuable information to our text, however, they do add to the size of our text. It is, therefore, considered best practice to remove these characters from your data." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"no_new_lines = [re.sub('\\s+', ' ', sent) for sent in descriptions] \n", | |
"\n", | |
"#Remove non letter characters\n", | |
"non_letters = [re.sub('[^a-zA-Z]', ' ', no_new_line) for no_new_line in no_new_lines]\n", | |
"\n", | |
"# Remove distracting single quotes\n", | |
"no_quotes = [re.sub(\"\\'\", '', non_letter) for non_letter in non_letters]" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.7.4" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment