Last active
July 13, 2016 11:07
-
-
Save rjw57/704b02d49f691d000d7f357e2af89dba to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Dowloading a list of schools in Cambridgeshire\n", | |
"\n", | |
"Problem: we want to download all the schools from the website at http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx\n", | |
"in machine readable form." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Firstly, we need to make sure that the Python interpreter has some of the more modern features enabled:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"# Make sure this Python acts in a modern way\n", | |
"from __future__ import (\n", | |
" unicode_literals, division, print_function,\n", | |
" with_statement, absolute_import\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Dependencies\n", | |
"\n", | |
"This notebook uses some Python packages. This cell uses `pip` ensure that all the required Python packages are installed." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Requirement already satisfied (use --upgrade to upgrade): requests in /home/zelda/rjw57/.local/lib/python3.5/site-packages\n", | |
"Requirement already satisfied (use --upgrade to upgrade): beautifulsoup4 in /home/zelda/rjw57/.local/lib/python3.5/site-packages\n", | |
"Requirement already satisfied (use --upgrade to upgrade): html5lib in /home/zelda/rjw57/.local/lib/python3.5/site-packages\n", | |
"Requirement already satisfied (use --upgrade to upgrade): six in /home/zelda/rjw57/.local/lib/python3.5/site-packages (from html5lib)\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"0" | |
] | |
}, | |
"execution_count": 2, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"import pip\n", | |
"requirements = 'requests beautifulsoup4 html5lib'.split()\n", | |
"pip.main(['install', '--user'] + requirements)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"collapsed": true | |
}, | |
"source": [ | |
"## Getting a list of schools\n", | |
"\n", | |
"Let's write a function to download a single page of results as HTML. We can use the BeautifulSoup library to parse the HTML." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import requests\n", | |
"import bs4\n", | |
"\n", | |
"def download_html(url):\n", | |
" \"\"\"Download a HTML page at a URL and return a parsed Bautiful Soup document.\n", | |
" Raises on HTTP error.\n", | |
" \n", | |
" \"\"\"\n", | |
" print('Downloading:', url)\n", | |
" r = requests.get(url)\n", | |
" r.raise_for_status()\n", | |
" return bs4.BeautifulSoup(r.content, 'html5lib')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The two pages we're interested in are the list of schools at http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=XXX and an individual school at http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=XXX." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Parsing the index page\n", | |
"\n", | |
"Having \"view source\"-ed the HTML page in a web browser, I know that we're looking for a `<li>` elements within a `<ul>` with class `school-left`." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"from collections import namedtuple\n", | |
"\n", | |
"try:\n", | |
" import urlparse\n", | |
"except ImportError:\n", | |
" import urllib.parse as urlparse\n", | |
"\n", | |
"School = namedtuple('School', 'name address url type id')\n", | |
"\n", | |
"def fetch_schools(page_num):\n", | |
" \"\"\"Download a page of search results and return a list of School objects.\n", | |
" \n", | |
" \"\"\"\n", | |
" # Download HTML document\n", | |
" url = 'http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page={:d}'.format(page_num)\n", | |
" document = download_html(url)\n", | |
" \n", | |
" # Get list of school elements\n", | |
" ul = document.find('ul', class_='school-left')\n", | |
" if ul is None:\n", | |
" return []\n", | |
" school_elements = ul.find_all('li')\n", | |
" \n", | |
" # A function to convert a school element into a School object\n", | |
" def school_from_element(element):\n", | |
" address = element.find(class_='school-address')\n", | |
" heading = address.find('h3')\n", | |
" school_url = urlparse.urljoin(url, heading.find('a')['href'].strip())\n", | |
" url_qs = urlparse.parse_qs(urlparse.urlsplit(school_url).query)\n", | |
" return School(\n", | |
" name=heading.text.strip(), address=address.find('p').text.strip(),\n", | |
" type=element.find(class_='school-type').text.strip(), url=school_url,\n", | |
" id=int(url_qs['baseID'][0]),\n", | |
" )\n", | |
" \n", | |
" return [school_from_element(e) for e in school_elements]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=1\n", | |
"School(name='Abbey College Ramsey', address='Abbey College, Abbey Road, Ramsey, PE26 1DG', url='http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=1600', type='Secondary with 6th', id=1600)\n" | |
] | |
} | |
], | |
"source": [ | |
"# Test the above code\n", | |
"print(fetch_schools(1)[0])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"collapsed": true | |
}, | |
"source": [ | |
"### Download all the schools\n", | |
"\n", | |
"We can now download a list of schools by downloading all of the index pages until we ge tone with no schools." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=1\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=2\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=3\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=4\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=5\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=6\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=7\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=8\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=9\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=10\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=11\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=12\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=13\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=14\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=15\n", | |
"Number of schools: 269\n" | |
] | |
} | |
], | |
"source": [ | |
"import itertools\n", | |
"\n", | |
"schools = []\n", | |
"for pagenum in itertools.count(1):\n", | |
" page_schools = fetch_schools(pagenum)\n", | |
" if len(page_schools) == 0:\n", | |
" break\n", | |
" schools.extend(page_schools)\n", | |
"\n", | |
"print('Number of schools:', len(schools))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Getting information for a school\n", | |
"\n", | |
"Each school has an information page which is pointed to by the `url` field in the `School` object. The information of interest is within a `<div>` tag with an id of `content`:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=1600\n", | |
"<div id=\"content\">\n", | |
" \n", | |
" <h1>Find a school or college</h1>\n", | |
" \n", | |
" \n", | |
"\n", | |
"\n", | |
"\t\t<h2>\n", | |
"\t\t\tAbbey College Ramsey\n", | |
"\t\t</h2>\t\t\n", | |
"\t\t <div class=\"school_blue_box\">\n", | |
"\t\t <div class=\"school_blue_box_left\">\n", | |
"\t\t<h3>Contact details</h3>\n", | |
"\t\t<p><strong>Headteacher / Principal:</strong> Mr Andrew Christoforou</p>\n", | |
"\t\t<p><strong>Telephone:</strong> 01487 812352</p>\n", | |
"\t\t<p><strong>Fax:</strong> 01487 813839</p>\n", | |
"\t\t<p><strong>Email</strong> <a class=\"email\" href=\"mailto:[email protected]\n" | |
] | |
} | |
], | |
"source": [ | |
"document = download_html(schools[0].url)\n", | |
"print(str(document.find(id='content'))[:500])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can write a function to extract details of a school from the `School` object." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import re\n", | |
"\n", | |
"def get_school_details(url):\n", | |
" document = download_html(url)\n", | |
" content = document.find(id='content')\n", | |
" \n", | |
" # Convert contact details headings to CSV-friendly names\n", | |
" heading_map = {'Headteacher / Principal': 'headteacher_principal'}\n", | |
" def key_to_heading(key):\n", | |
" key = heading_map.get(key, key)\n", | |
" key = key.lower()\n", | |
" key = re.sub(r'\\s', '_', key)\n", | |
" return key\n", | |
" \n", | |
" # Find all tags of the form <p><strong>key:</strong> value</p>\n", | |
" detail_dict = {}\n", | |
" for p in content.find_all('p'):\n", | |
" children = list(p)\n", | |
" # Skip <p> tags which don't start with a <strong> element\n", | |
" if len(children) == 0 or children[0].name != 'strong':\n", | |
" continue\n", | |
" \n", | |
" key = children[0].text.strip().rstrip(':')\n", | |
" value = ' '.join(c.string.strip() if c.string is not None else '' for c in children[1:])\n", | |
" detail_dict[key_to_heading(key)] = value\n", | |
" \n", | |
" return detail_dict" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=1600\n", | |
"{'type': 'Secondary with 6th', 'catchment_area': ' Map of Abbey College Ramsey catchment area', 'fax': '01487 813839', 'email': ' [email protected]', 'classification': 'Academy', 'address': 'Abbey College, Abbey Road, Ramsey, PE26 1DG', 'ofsted': ' See latest Ofsted report(s)', 'district': 'Huntingdonshire', 'headteacher_principal': 'Mr Andrew Christoforou', 'telephone': '01487 812352', 'age_range': '11-19'}\n" | |
] | |
} | |
], | |
"source": [ | |
"print(get_school_details(schools[0].url))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Fetch and write results to a CSV" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=1600\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=143\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=1\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=2\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=3\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=137\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=4\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=5\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=7\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=8\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=9\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=10\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=11\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=12\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=208\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=13\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=14\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=31\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=15\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=209\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=16\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=2133\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=19\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=248\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=20\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=22\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=23\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=24\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=25\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=26\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=27\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=10786\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=1338\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=28\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=553\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=29\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=30\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=210\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=12331\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=32\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=33\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=34\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=212\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=249\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=35\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=213\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=36\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=37\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=214\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=261\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=215\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=38\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=39\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=40\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=41\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=42\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=43\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=44\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=45\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=46\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=47\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=48\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=211\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=50\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=51\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=19286\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=216\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=52\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=53\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=54\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=55\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=56\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=19604\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=57\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=58\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=59\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=60\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=61\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=62\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=63\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=64\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=65\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=218\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=66\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=67\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=18473\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=68\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=70\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=552\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=71\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=77\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=72\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=74\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=73\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=75\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=78\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=79\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=80\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=81\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=83\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=82\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=84\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=85\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=86\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=87\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=238\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=1473\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=217\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=88\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=89\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=251\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=90\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=91\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=252\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=92\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=253\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=3345\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=1474\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=219\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=14835\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=96\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=4465\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=99\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=100\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=134\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=101\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=8416\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=103\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=310\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=104\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=106\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=105\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=220\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=107\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=108\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=109\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=110\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=1472\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=221\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=111\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=112\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=113\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=114\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=242\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=115\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=223\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=116\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=16176\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=118\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=119\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=120\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=121\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=406\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=124\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=125\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=126\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=128\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=129\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=222\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=131\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=132\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=2094\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=309\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=133\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=136\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=226\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=19283\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=138\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=139\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=140\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=142\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=144\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=145\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=7190\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=147\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=148\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=262\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=149\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=244\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=228\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=151\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=152\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=229\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=154\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=230\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=231\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=155\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=156\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=158\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=245\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=159\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=160\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=162\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=163\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=232\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=165\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=233\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=166\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=167\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=168\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=169\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=170\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=171\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=16177\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=234\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=254\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=173\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=174\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=175\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=176\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=177\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=178\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=179\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=180\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=181\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=235\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=182\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=6\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=2569\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=4003\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=1339\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=4705\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=4706\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=8387\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=49\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=8454\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=76\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=308\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=95\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=372\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=224\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=127\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=20048\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=130\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=135\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=146\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=3304\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=12332\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=157\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=545\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=2642\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=183\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=291\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=8432\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=188\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=189\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=19606\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=8632\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=15696\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=8725\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=15995\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=190\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=14898\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=191\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=192\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=193\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=194\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=195\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=19225\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=1601\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=198\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=16178\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=199\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=201\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=202\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=203\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=236\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=204\n", | |
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=205\n" | |
] | |
} | |
], | |
"source": [ | |
"import csv\n", | |
"\n", | |
"school_index_headings = 'id url'.split()\n", | |
"school_detail_headings = \\\n", | |
" 'headteacher_principal telephone fax email address type age_range classification district'.split()\n", | |
" \n", | |
"with open('cambs-schools.csv', 'w') as outfile:\n", | |
" w = csv.writer(outfile)\n", | |
" w.writerow(school_index_headings + school_detail_headings)\n", | |
" for school in schools:\n", | |
" details = get_school_details(school.url)\n", | |
" rowdict = {}\n", | |
" for k in school_index_headings:\n", | |
" rowdict[k] = getattr(school, k, '')\n", | |
" for k in school_detail_headings:\n", | |
" rowdict[k] = details.get(k, '')\n", | |
" w.writerow([rowdict[k] for k in school_index_headings + school_detail_headings])" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.5.1+" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment