Last active
September 15, 2024 22:15
-
-
Save alfonso1003/1cd7db0053fc89762d2f61506876ca9b to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Fantasy Football Pick 'Em Analysis\n", | |
"\n", | |
"This notebook demonstrates how to scrape data from Yahoo Fantasy Football's pick 'em distribution page, process the data, and analyze it in a structured format using pandas." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [] | |
} | |
], | |
"source": [ | |
"import pandas as pd\n", | |
"import requests\n", | |
"from bs4 import BeautifulSoup" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Pull raw data from web\n", | |
"\n", | |
"Now we fetch the web page from Yahoo Fantasy Football Pick Distribution using the requests library. We use `requests.get` to retrieve the content of the page and `BeautifulSoup` to parse the HTML content for further processing." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"url = \"https://football.fantasysports.yahoo.com/pickem/pickdistribution\"\n", | |
"response = requests.get(url)\n", | |
"soup = BeautifulSoup(response.content, \"html.parser\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Examine HTML, extract, and transform data\n", | |
"\n", | |
"Next, we extract the relevant data from the page: the team names and their pick distribution percentages.\n", | |
"\n", | |
"You can right-click and inspect the site HTML to pinpoint the data you want, in our case, `.favorite .team a`, `.favorite dd.percent`, `.underdog .team a`, and `.underdog dd.percent`.\n", | |
"\n", | |
"I also have a helper function, `clean_percentage`, to convert the percentages to integers.\n", | |
"\n", | |
"<hr>\n", | |
"\n", | |
"```html\n", | |
"<div class=\"bd\" id=\"yui_3_18_\">\n", | |
" <dl class=\"favorite pick-preferred\" id=\"yui_3_18_\">\n", | |
" <dt class=\"team\">Favorite</dt>\n", | |
" <dd class=\"team\" id=\"yui_3_18_\">@ <a href=\"https://sports.yahoo.com/nfl/teams/miami/\" target=\"sports\" id=\"yui_3_18_\">Miami</a> </dd>\n", | |
" <dt class=\"percent\"><span style=\"width:60%\">Favorite Pick Percentage</span></dt>\n", | |
" <dd class=\"percent\">60%</dd>\n", | |
" </dl>\n", | |
" <dl class=\"underdog pick-loser\" id=\"yui_3_18_\">\n", | |
" <dt class=\"team\">Underdog</dt>\n", | |
" <dd class=\"team\" id=\"yui_3_18_\"><a href=\"https://sports.yahoo.com/nfl/teams/buffalo/\" target=\"sports\">Buffalo</a> </dd>\n", | |
" <dt class=\"percent\" id=\"yui_3_18_\"><span style=\"width:40%\">Underdog Pick Percentage</span></dt>\n", | |
" <dd class=\"percent\">40%</dd>\n", | |
" </dl>\n", | |
"</div>\n", | |
"```" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def clean_percentage(percentage_str):\n", | |
" return int(percentage_str.replace(\"%\", \"\").strip())\n", | |
"\n", | |
"\n", | |
"matchups = []\n", | |
"matchup_elements = soup.select(\"div.bd\")\n", | |
"for element in matchup_elements:\n", | |
" try:\n", | |
" favorite_team = element.select_one(\".favorite .team a\").text.strip()\n", | |
" favorite_percentage = element.select_one(\".favorite dd.percent\").text.strip()\n", | |
"\n", | |
" underdog_team = element.select_one(\".underdog .team a\").text.strip()\n", | |
" underdog_percentage = element.select_one(\".underdog dd.percent\").text.strip()\n", | |
"\n", | |
" matchup_info = {\n", | |
" \"favorite\": favorite_team,\n", | |
" \"favorite_percent\": clean_percentage(favorite_percentage),\n", | |
" \"underdog\": underdog_team,\n", | |
" \"underdog_percent\": clean_percentage(underdog_percentage),\n", | |
" }\n", | |
" matchups.append(matchup_info)\n", | |
" except AttributeError as e:\n", | |
" pass" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Analyze and display data using Pandas dataframe\n", | |
"\n", | |
"- Remove any duplicate matchups (if present).\n", | |
"- Create a new column max_percent that represents the higher percentage between the favorite and underdog.\n", | |
"- Sort the DataFrame by this new column in descending order, useful for confidence points if your league uses them.\n", | |
"- Rank the matchups based on the maximum percentage.\n", | |
"- Add a column to indicate if the public prefers the underdog." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>favorite</th>\n", | |
" <th>favorite_percent</th>\n", | |
" <th>underdog</th>\n", | |
" <th>underdog_percent</th>\n", | |
" <th>max_percent</th>\n", | |
" <th>confidence</th>\n", | |
" <th>note</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>Baltimore</td>\n", | |
" <td>98</td>\n", | |
" <td>Las Vegas</td>\n", | |
" <td>2</td>\n", | |
" <td>98</td>\n", | |
" <td>16</td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>Philadelphia</td>\n", | |
" <td>98</td>\n", | |
" <td>Atlanta</td>\n", | |
" <td>2</td>\n", | |
" <td>98</td>\n", | |
" <td>15</td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>Los Angeles (LAC)</td>\n", | |
" <td>97</td>\n", | |
" <td>Carolina</td>\n", | |
" <td>3</td>\n", | |
" <td>97</td>\n", | |
" <td>14</td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>Kansas City</td>\n", | |
" <td>96</td>\n", | |
" <td>Cincinnati</td>\n", | |
" <td>4</td>\n", | |
" <td>96</td>\n", | |
" <td>13</td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>San Francisco</td>\n", | |
" <td>94</td>\n", | |
" <td>Minnesota</td>\n", | |
" <td>6</td>\n", | |
" <td>94</td>\n", | |
" <td>12</td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>Houston</td>\n", | |
" <td>93</td>\n", | |
" <td>Chicago</td>\n", | |
" <td>7</td>\n", | |
" <td>93</td>\n", | |
" <td>11</td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>6</th>\n", | |
" <td>Dallas</td>\n", | |
" <td>92</td>\n", | |
" <td>New Orleans</td>\n", | |
" <td>8</td>\n", | |
" <td>92</td>\n", | |
" <td>10</td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>7</th>\n", | |
" <td>Detroit</td>\n", | |
" <td>92</td>\n", | |
" <td>Tampa Bay</td>\n", | |
" <td>8</td>\n", | |
" <td>92</td>\n", | |
" <td>9</td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8</th>\n", | |
" <td>Washington</td>\n", | |
" <td>89</td>\n", | |
" <td>New York (NYG)</td>\n", | |
" <td>11</td>\n", | |
" <td>89</td>\n", | |
" <td>8</td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td>New York (NYJ)</td>\n", | |
" <td>85</td>\n", | |
" <td>Tennessee</td>\n", | |
" <td>15</td>\n", | |
" <td>85</td>\n", | |
" <td>7</td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>10</th>\n", | |
" <td>Jacksonville</td>\n", | |
" <td>85</td>\n", | |
" <td>Cleveland</td>\n", | |
" <td>15</td>\n", | |
" <td>85</td>\n", | |
" <td>6</td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>11</th>\n", | |
" <td>Pittsburgh</td>\n", | |
" <td>83</td>\n", | |
" <td>Denver</td>\n", | |
" <td>17</td>\n", | |
" <td>83</td>\n", | |
" <td>5</td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>12</th>\n", | |
" <td>Seattle</td>\n", | |
" <td>77</td>\n", | |
" <td>New England</td>\n", | |
" <td>23</td>\n", | |
" <td>77</td>\n", | |
" <td>4</td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>13</th>\n", | |
" <td>Arizona</td>\n", | |
" <td>24</td>\n", | |
" <td>Los Angeles (LAR)</td>\n", | |
" <td>76</td>\n", | |
" <td>76</td>\n", | |
" <td>3</td>\n", | |
" <td>Public prefers underdog</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14</th>\n", | |
" <td>Indianapolis</td>\n", | |
" <td>71</td>\n", | |
" <td>Green Bay</td>\n", | |
" <td>29</td>\n", | |
" <td>71</td>\n", | |
" <td>2</td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>15</th>\n", | |
" <td>Miami</td>\n", | |
" <td>60</td>\n", | |
" <td>Buffalo</td>\n", | |
" <td>40</td>\n", | |
" <td>60</td>\n", | |
" <td>1</td>\n", | |
" <td></td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" favorite favorite_percent underdog underdog_percent \\\n", | |
"0 Baltimore 98 Las Vegas 2 \n", | |
"1 Philadelphia 98 Atlanta 2 \n", | |
"2 Los Angeles (LAC) 97 Carolina 3 \n", | |
"3 Kansas City 96 Cincinnati 4 \n", | |
"4 San Francisco 94 Minnesota 6 \n", | |
"5 Houston 93 Chicago 7 \n", | |
"6 Dallas 92 New Orleans 8 \n", | |
"7 Detroit 92 Tampa Bay 8 \n", | |
"8 Washington 89 New York (NYG) 11 \n", | |
"9 New York (NYJ) 85 Tennessee 15 \n", | |
"10 Jacksonville 85 Cleveland 15 \n", | |
"11 Pittsburgh 83 Denver 17 \n", | |
"12 Seattle 77 New England 23 \n", | |
"13 Arizona 24 Los Angeles (LAR) 76 \n", | |
"14 Indianapolis 71 Green Bay 29 \n", | |
"15 Miami 60 Buffalo 40 \n", | |
"\n", | |
" max_percent confidence note \n", | |
"0 98 16 \n", | |
"1 98 15 \n", | |
"2 97 14 \n", | |
"3 96 13 \n", | |
"4 94 12 \n", | |
"5 93 11 \n", | |
"6 92 10 \n", | |
"7 92 9 \n", | |
"8 89 8 \n", | |
"9 85 7 \n", | |
"10 85 6 \n", | |
"11 83 5 \n", | |
"12 77 4 \n", | |
"13 76 3 Public prefers underdog \n", | |
"14 71 2 \n", | |
"15 60 1 " | |
] | |
}, | |
"execution_count": 4, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df = pd.DataFrame(matchups)\n", | |
"df = df.drop_duplicates(subset=[\"favorite\", \"underdog\"])\n", | |
"df[\"max_percent\"] = df[[\"favorite_percent\", \"underdog_percent\"]].max(axis=1)\n", | |
"df = df.sort_values(by=\"max_percent\", ascending=False)\n", | |
"df[\"confidence\"] = range(len(df), 0, -1)\n", | |
"df[\"note\"] = df.apply(\n", | |
" lambda row: (\n", | |
" \"Public prefers underdog\"\n", | |
" if row[\"underdog_percent\"] > row[\"favorite_percent\"]\n", | |
" else \"\"\n", | |
" ),\n", | |
" axis=1,\n", | |
")\n", | |
"\n", | |
"# drop index for display purposes\n", | |
"df.reset_index(drop=True, inplace=True)\n", | |
"df" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "fantasy_football", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.12.5" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment