Skip to content

Instantly share code, notes, and snippets.

@rjw57
Last active July 13, 2016 11:07
Show Gist options
  • Save rjw57/704b02d49f691d000d7f357e2af89dba to your computer and use it in GitHub Desktop.
Save rjw57/704b02d49f691d000d7f357e2af89dba to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Dowloading a list of schools in Cambridgeshire\n",
"\n",
"Problem: we want to download all the schools from the website at http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx\n",
"in machine readable form."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Firstly, we need to make sure that the Python interpreter has some of the more modern features enabled:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Make sure this Python acts in a modern way\n",
"from __future__ import (\n",
" unicode_literals, division, print_function,\n",
" with_statement, absolute_import\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dependencies\n",
"\n",
"This notebook uses some Python packages. This cell uses `pip` ensure that all the required Python packages are installed."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied (use --upgrade to upgrade): requests in /home/zelda/rjw57/.local/lib/python3.5/site-packages\n",
"Requirement already satisfied (use --upgrade to upgrade): beautifulsoup4 in /home/zelda/rjw57/.local/lib/python3.5/site-packages\n",
"Requirement already satisfied (use --upgrade to upgrade): html5lib in /home/zelda/rjw57/.local/lib/python3.5/site-packages\n",
"Requirement already satisfied (use --upgrade to upgrade): six in /home/zelda/rjw57/.local/lib/python3.5/site-packages (from html5lib)\n"
]
},
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pip\n",
"requirements = 'requests beautifulsoup4 html5lib'.split()\n",
"pip.main(['install', '--user'] + requirements)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## Getting a list of schools\n",
"\n",
"Let's write a function to download a single page of results as HTML. We can use the BeautifulSoup library to parse the HTML."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import requests\n",
"import bs4\n",
"\n",
"def download_html(url):\n",
" \"\"\"Download a HTML page at a URL and return a parsed Bautiful Soup document.\n",
" Raises on HTTP error.\n",
" \n",
" \"\"\"\n",
" print('Downloading:', url)\n",
" r = requests.get(url)\n",
" r.raise_for_status()\n",
" return bs4.BeautifulSoup(r.content, 'html5lib')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The two pages we're interested in are the list of schools at http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=XXX and an individual school at http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=XXX."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Parsing the index page\n",
"\n",
"Having \"view source\"-ed the HTML page in a web browser, I know that we're looking for a `<li>` elements within a `<ul>` with class `school-left`."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from collections import namedtuple\n",
"\n",
"try:\n",
" import urlparse\n",
"except ImportError:\n",
" import urllib.parse as urlparse\n",
"\n",
"School = namedtuple('School', 'name address url type id')\n",
"\n",
"def fetch_schools(page_num):\n",
" \"\"\"Download a page of search results and return a list of School objects.\n",
" \n",
" \"\"\"\n",
" # Download HTML document\n",
" url = 'http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page={:d}'.format(page_num)\n",
" document = download_html(url)\n",
" \n",
" # Get list of school elements\n",
" ul = document.find('ul', class_='school-left')\n",
" if ul is None:\n",
" return []\n",
" school_elements = ul.find_all('li')\n",
" \n",
" # A function to convert a school element into a School object\n",
" def school_from_element(element):\n",
" address = element.find(class_='school-address')\n",
" heading = address.find('h3')\n",
" school_url = urlparse.urljoin(url, heading.find('a')['href'].strip())\n",
" url_qs = urlparse.parse_qs(urlparse.urlsplit(school_url).query)\n",
" return School(\n",
" name=heading.text.strip(), address=address.find('p').text.strip(),\n",
" type=element.find(class_='school-type').text.strip(), url=school_url,\n",
" id=int(url_qs['baseID'][0]),\n",
" )\n",
" \n",
" return [school_from_element(e) for e in school_elements]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=1\n",
"School(name='Abbey College Ramsey', address='Abbey College, Abbey Road, Ramsey, PE26 1DG', url='http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=1600', type='Secondary with 6th', id=1600)\n"
]
}
],
"source": [
"# Test the above code\n",
"print(fetch_schools(1)[0])"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"### Download all the schools\n",
"\n",
"We can now download a list of schools by downloading all of the index pages until we ge tone with no schools."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=1\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=2\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=3\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=4\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=5\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=6\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=7\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=8\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=9\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=10\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=11\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=12\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=13\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=14\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school_results.aspx?page=15\n",
"Number of schools: 269\n"
]
}
],
"source": [
"import itertools\n",
"\n",
"schools = []\n",
"for pagenum in itertools.count(1):\n",
" page_schools = fetch_schools(pagenum)\n",
" if len(page_schools) == 0:\n",
" break\n",
" schools.extend(page_schools)\n",
"\n",
"print('Number of schools:', len(schools))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting information for a school\n",
"\n",
"Each school has an information page which is pointed to by the `url` field in the `School` object. The information of interest is within a `<div>` tag with an id of `content`:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=1600\n",
"<div id=\"content\">\n",
" \n",
" <h1>Find a school or college</h1>\n",
" \n",
" \n",
"\n",
"\n",
"\t\t<h2>\n",
"\t\t\tAbbey College Ramsey\n",
"\t\t</h2>\t\t\n",
"\t\t <div class=\"school_blue_box\">\n",
"\t\t <div class=\"school_blue_box_left\">\n",
"\t\t<h3>Contact details</h3>\n",
"\t\t<p><strong>Headteacher / Principal:</strong> Mr Andrew Christoforou</p>\n",
"\t\t<p><strong>Telephone:</strong> 01487 812352</p>\n",
"\t\t<p><strong>Fax:</strong> 01487 813839</p>\n",
"\t\t<p><strong>Email</strong> <a class=\"email\" href=\"mailto:[email protected]\n"
]
}
],
"source": [
"document = download_html(schools[0].url)\n",
"print(str(document.find(id='content'))[:500])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can write a function to extract details of a school from the `School` object."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import re\n",
"\n",
"def get_school_details(url):\n",
" document = download_html(url)\n",
" content = document.find(id='content')\n",
" \n",
" # Convert contact details headings to CSV-friendly names\n",
" heading_map = {'Headteacher / Principal': 'headteacher_principal'}\n",
" def key_to_heading(key):\n",
" key = heading_map.get(key, key)\n",
" key = key.lower()\n",
" key = re.sub(r'\\s', '_', key)\n",
" return key\n",
" \n",
" # Find all tags of the form <p><strong>key:</strong> value</p>\n",
" detail_dict = {}\n",
" for p in content.find_all('p'):\n",
" children = list(p)\n",
" # Skip <p> tags which don't start with a <strong> element\n",
" if len(children) == 0 or children[0].name != 'strong':\n",
" continue\n",
" \n",
" key = children[0].text.strip().rstrip(':')\n",
" value = ' '.join(c.string.strip() if c.string is not None else '' for c in children[1:])\n",
" detail_dict[key_to_heading(key)] = value\n",
" \n",
" return detail_dict"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=1600\n",
"{'type': 'Secondary with 6th', 'catchment_area': ' Map of Abbey College Ramsey catchment area', 'fax': '01487 813839', 'email': ' [email protected]', 'classification': 'Academy', 'address': 'Abbey College, Abbey Road, Ramsey, PE26 1DG', 'ofsted': ' See latest Ofsted report(s)', 'district': 'Huntingdonshire', 'headteacher_principal': 'Mr Andrew Christoforou', 'telephone': '01487 812352', 'age_range': '11-19'}\n"
]
}
],
"source": [
"print(get_school_details(schools[0].url))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Fetch and write results to a CSV"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=1600\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=143\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=1\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=2\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=3\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=137\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=4\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=5\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=7\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=8\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=9\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=10\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=11\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=12\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=208\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=13\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=14\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=31\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=15\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=209\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=16\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=2133\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=19\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=248\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=20\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=22\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=23\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=24\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=25\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=26\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=27\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=10786\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=1338\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=28\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=553\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=29\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=30\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=210\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=12331\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=32\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=33\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=34\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=212\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=249\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=35\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=213\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=36\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=37\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=214\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=261\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=215\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=38\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=39\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=40\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=41\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=42\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=43\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=44\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=45\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=46\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=47\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=48\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=211\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=50\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=51\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=19286\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=216\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=52\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=53\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=54\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=55\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=56\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=19604\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=57\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=58\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=59\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=60\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=61\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=62\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=63\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=64\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=65\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=218\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=66\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=67\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=18473\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=68\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=70\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=552\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=71\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=77\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=72\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=74\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=73\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=75\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=78\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=79\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=80\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=81\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=83\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=82\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=84\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=85\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=86\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=87\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=238\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=1473\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=217\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=88\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=89\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=251\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=90\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=91\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=252\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=92\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=253\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=3345\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=1474\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=219\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=14835\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=96\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=4465\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=99\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=100\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=134\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=101\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=8416\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=103\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=310\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=104\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=106\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=105\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=220\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=107\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=108\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=109\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=110\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=1472\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=221\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=111\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=112\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=113\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=114\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=242\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=115\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=223\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=116\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=16176\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=118\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=119\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=120\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=121\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=406\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=124\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=125\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=126\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=128\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=129\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=222\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=131\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=132\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=2094\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=309\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=133\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=136\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=226\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=19283\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=138\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=139\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=140\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=142\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=144\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=145\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=7190\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=147\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=148\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=262\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=149\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=244\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=228\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=151\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=152\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=229\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=154\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=230\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=231\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=155\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=156\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=158\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=245\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=159\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=160\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=162\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=163\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=232\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=165\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=233\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=166\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=167\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=168\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=169\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=170\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=171\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=16177\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=234\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=254\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=173\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=174\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=175\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=176\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=177\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=178\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=179\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=180\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=181\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=235\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=182\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=6\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=2569\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=4003\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=1339\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=4705\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=4706\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=8387\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=49\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=8454\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=76\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=308\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=95\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=372\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=224\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=127\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=20048\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=130\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=135\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=146\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=3304\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=12332\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=157\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=545\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=2642\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=183\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=291\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=8432\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=188\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=189\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=19606\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=8632\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=15696\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=8725\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=15995\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=190\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=14898\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=191\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=192\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=193\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=194\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=195\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=19225\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=1601\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=198\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=16178\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=199\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=201\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=202\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=203\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=236\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=204\n",
"Downloading: http://www4.cambridgeshire.gov.uk/site/custom_scripts/school.aspx?baseID=205\n"
]
}
],
"source": [
"import csv\n",
"\n",
"school_index_headings = 'id url'.split()\n",
"school_detail_headings = \\\n",
" 'headteacher_principal telephone fax email address type age_range classification district'.split()\n",
" \n",
"with open('cambs-schools.csv', 'w') as outfile:\n",
" w = csv.writer(outfile)\n",
" w.writerow(school_index_headings + school_detail_headings)\n",
" for school in schools:\n",
" details = get_school_details(school.url)\n",
" rowdict = {}\n",
" for k in school_index_headings:\n",
" rowdict[k] = getattr(school, k, '')\n",
" for k in school_detail_headings:\n",
" rowdict[k] = details.get(k, '')\n",
" w.writerow([rowdict[k] for k in school_index_headings + school_detail_headings])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1+"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment