Skip to content

Instantly share code, notes, and snippets.

@SandieIJ
Last active April 23, 2020 22:31
Show Gist options
  • Save SandieIJ/1212c8e9768e5c3a83dffd442d97a9f7 to your computer and use it in GitHub Desktop.
Save SandieIJ/1212c8e9768e5c3a83dffd442d97a9f7 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# import all required packages\n",
"import spacy\n",
"import re\n",
"import pandas as pd\n",
"import numpy as np\n",
"import gensim\n",
"from gensim.utils import simple_preprocess \n",
"import gensim.corpora as corpora\n",
"from pprint import pprint"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**CLEANING AND FORMATTING THE DATA**\n",
"\n",
"Before we begin the preprocessing steps, we format the data, containing only game descriptions, as a list, each item in the list corresponding to a single description."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>description</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>Legend Fire Squad survival: Free Fire Battlegr...</td>\n",
" <td>Ready to play an amazing and exciting best sho...</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>Ambulance Game</td>\n",
" <td>You must be a fan of the driving games. We ass...</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>Beam Drive NG Death Stair Car Crash Simulator</td>\n",
" <td>Beam Drive NG Death Stair Car Crash Accidents ...</td>\n",
" </tr>\n",
" <tr>\n",
" <td>3</td>\n",
" <td>Kelime İncileri</td>\n",
" <td>Yeni Kelime Bulmaca Oyununuz! Kelime Arama ve ...</td>\n",
" </tr>\n",
" <tr>\n",
" <td>4</td>\n",
" <td>Word Blocks</td>\n",
" <td>Word Blocks is a new kind of word search puzzl...</td>\n",
" </tr>\n",
" <tr>\n",
" <td>5</td>\n",
" <td>Free Fire Commando - Counter Attack FPS 2019</td>\n",
" <td>Free Fire Commando - Counter Attack FPS 2019 i...</td>\n",
" </tr>\n",
" <tr>\n",
" <td>6</td>\n",
" <td>Fall Race 3D</td>\n",
" <td>The most exciting sky race!Run through the sky...</td>\n",
" </tr>\n",
" <tr>\n",
" <td>7</td>\n",
" <td>Math School Game Basic: Crazy Principal</td>\n",
" <td>Your school principal went crazy and locked yo...</td>\n",
" </tr>\n",
" <tr>\n",
" <td>8</td>\n",
" <td>Jump Cube</td>\n",
" <td>Jump Cube is an addictive game, tap the right ...</td>\n",
" </tr>\n",
" <tr>\n",
" <td>9</td>\n",
" <td>Tien Len Offline</td>\n",
" <td>Một tựa game cũng như cách chơi ko thể quen th...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name \\\n",
"0 Legend Fire Squad survival: Free Fire Battlegr... \n",
"1 Ambulance Game \n",
"2 Beam Drive NG Death Stair Car Crash Simulator \n",
"3 Kelime İncileri \n",
"4 Word Blocks \n",
"5 Free Fire Commando - Counter Attack FPS 2019 \n",
"6 Fall Race 3D \n",
"7 Math School Game Basic: Crazy Principal \n",
"8 Jump Cube \n",
"9 Tien Len Offline \n",
"\n",
" description \n",
"0 Ready to play an amazing and exciting best sho... \n",
"1 You must be a fan of the driving games. We ass... \n",
"2 Beam Drive NG Death Stair Car Crash Accidents ... \n",
"3 Yeni Kelime Bulmaca Oyununuz! Kelime Arama ve ... \n",
"4 Word Blocks is a new kind of word search puzzl... \n",
"5 Free Fire Commando - Counter Attack FPS 2019 i... \n",
"6 The most exciting sky race!Run through the sky... \n",
"7 Your school principal went crazy and locked yo... \n",
"8 Jump Cube is an addictive game, tap the right ... \n",
"9 Một tựa game cũng như cách chơi ko thể quen th... "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Reading loading/data\n",
"data = pd.read_csv(\"https://raw.githubusercontent.com/SandieIJ/Capstone/master/data/sandra_csv_results-20190723-155508.csv\")\n",
"\n",
"# Sample of the output\n",
"data.head(10)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Brick Breaker 3D is a single-tap hyper casual game that will keep you hooked for hours!Hold the screen to aim, swipe the ball to the brick and break all the bricks easily!The game features unlimited levels and 20 beautiful color balls.\n"
]
}
],
"source": [
"# convert the descriptions from a data frame column into a list\n",
"descriptions = data.description.values.tolist()\n",
"\n",
"# sample of the output\n",
"print(descriptions[10])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment