Scrape the table of LLC LPAs into data
Last active
June 3, 2024 19:49
-
-
Save psd/1a66a1779a1874b40859060e4b0c1a8b to your computer and use it in GitHub Desktop.
Create Local Land Charges Programme project markdown
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
.venv | |
organisation.csv | |
llc.html |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The MIT License (MIT) | |
Copyright (c) 2024 Crown Copyright HM Government | |
Permission is hereby granted, free of charge, to any person obtaining a copy | |
of this software and associated documentation files (the "Software"), to deal | |
in the Software without restriction, including without limitation the rights | |
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | |
copies of the Software, and to permit persons to whom the Software is | |
furnished to do so, subject to the following conditions: | |
The above copyright notice and this permission notice shall be included in all | |
copies or substantial portions of the Software. | |
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | |
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | |
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | |
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | |
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | |
SOFTWARE. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
organisation | name | start-date | |
---|---|---|---|
local-authority:BAB | Babergh District Council | 2022-01-20 | |
local-authority:BAE | Bassetlaw District Council | 2023-09-21 | |
local-authority:BDF | Bedford Borough Council | 2024-03-18 | |
local-authority:BLA | Blaby District Council | 2023-10-16 | |
local-authority:BBD | Blackburn with Darwen Borough Council | 2021-10-28 | |
local-authority:BPL | Blackpool Council | 2018-11-20 | |
local-authority:BOT | Boston Borough Council | 2024-01-08 | |
local-authority:BST | Bristol City Council | 2023-07-20 | |
local-authority:BRM | Bromsgrove District Council | 2021-10-13 | |
local-authority:BRT | Broxtowe Borough Council | 2024-04-25 | |
local-authority:BUN | Burnley Borough Council | 2023-04-27 | |
local-authority:BUR | Bury Metropolitan Borough Council | 2022-07-14 | |
local-authority:CAR | Carlisle City Council | 2020-04-21 | |
local-authority:CHL | Chelmsford City Council | 2023-03-28 | |
local-authority:CHT | Cheltenham Borough Council | 2022-09-01 | |
local-authority:LIC | City of Lincoln Council | 2022-04-20 | |
local-authority:LND | City of London Corporation | 2018-10-08 | |
local-authority:WLV | City of Wolverhampton Council | 2023-01-18 | |
local-authority:IOS | Council of the Isles of Scilly | 2019-01-17 | |
local-authority:CRA | Craven District Council | 2023-03-28 | |
local-authority:DUD | Dudley Metropolitan Borough Council | 2021-07-19 | |
local-authority:ECA | East Cambridgeshire District Council | 2022-12-05 | |
local-authority:EDE | East Devon District Council | 2024-01-04 | |
local-authority:ELI | East Lindsey District Council | 2020-06-26 | |
local-authority:EPP | Epping Forest District Council | 2023-04-25 | |
local-authority:FEN | Fenland District Council | 2022-09-06 | |
local-authority:GED | Gedling Borough Council | 2024-03-27 | |
local-authority:HAL | Halton Borough Council | 2023-03-16 | |
local-authority:HAE | Hambleton District Council | 2021-10-28 | |
local-authority:HAO | Harborough District Council | 2022-04-29 | |
local-authority:HRY | Haringey Council | 2021-12-20 | |
local-authority:HAS | Hastings Borough Council | 2024-03-19 | |
local-authority:HAG | Harrogate Borough Council | 2023-03-28 | |
local-authority:HIG | High Peak Borough Council | 2023-04-25 | |
local-authority:IOW | Isle of Wight Council | 2022-04-27 | |
local-authority:KHL | Kingston upon Hull City Council | 2023-04-24 | |
local-authority:KWL | Knowsley Metropolitan Borough Council | 2022-10-12 | |
local-authority:LBH | Lambeth Council | 2019-10-01 | |
local-authority:LDS | Leeds City Council | 2024-01-08 | |
local-authority:LIV | Liverpool City Council | 2018-09-03 | |
local-authority:BEX | London Borough of Bexley | 2024-04-17 | |
local-authority:ENF | London Borough of Enfield | 2022-08-15 | |
local-authority:WND | London Borough of Wandsworth | 2022-11-15 | |
local-authority:MAV | Malvern Hills District Council | 2023-01-12 | |
local-authority:MDE | Mid Devon District Council | 2023-10-10 | |
local-authority:MSU | Mid Suffolk District Council | 2022-01-20 | |
local-authority:MSS | Mid Sussex District Council | 2022-03-21 | |
local-authority:MIK | Milton Keynes Council | 2020-08-27 | |
local-authority:NEA | Newark and Sherwood District Council | 2021-10-26 | |
local-authority:NEC | Newcastle-under-Lyme Borough Council | 2022-05-02 | |
local-authority:NHE | North Hertfordshire District Council | 2024-01-22 | |
local-authority:NKE | North Kesteven District Council | 2021-12-16 | |
local-authority:NLN | North Lincolnshire Council | 2024-05-31 | |
local-authority:NWL | North West Leicestershire District Council | 2022-07-27 | |
local-authority:NOW | Norwich City Council | 2019-07-11 | |
local-authority:PEN | Pendle Borough Council | 2021-11-15 | |
local-authority:PTE | Peterborough City Council | 2020-01-31 | |
local-authority:PLY | Plymouth City Council | 2022-01-07 | |
local-authority:POR | Portsmouth City Council | 2022-04-28 | |
local-authority:RED | Redditch Borough Council | 2021-10-13 | |
local-authority:RIH | Richmondshire District Council | 2022-10-27 | |
local-authority:RUT | Rutland County Council | 2022-12-12 | |
local-authority:RYE | Ryedale District Council | 2023-03-27 | |
local-authority:SLF | Salford City Council | 2023-03-31 | |
local-authority:SAW | Sandwell Metropolitan Borough Council | 2023-10-19 | |
local-authority:SCE | Scarborough Borough Council | 2021-11-29 | |
local-authority:SFT | Sefton Metropolitan Borough Council | 2023-01-16 | |
local-authority:SEL | Selby District Council | 2023-03-14 | |
local-authority:SEV | Sevenoaks District Council | 2021-04-28 | |
local-authority:SOL | Solihull Metropolitan Borough Council | 2022-04-25 | |
local-authority:SGC | South Gloucestershire Council | 2024-07-01 | |
local-authority:SNO | South Norfolk District Council | 2022-06-22 | |
local-authority:SST | South Staffordshire Council | 2022-08-11 | |
local-authority:SPE | Spelthorne Borough Council | 2021-04-23 | |
local-authority:SKP | Stockport Metropolitan Borough Council | 2023-01-26 | |
local-authority:STT | Stockton-on-Tees Borough Council | 2021-04-07 | |
local-authority:STR | Stratford-on-Avon District Council | 2021-05-10 | |
local-authority:STN | Sutton Council | 2022-01-07 | |
local-authority:TAM | Tameside Metropolitan Borough Council | 2021-10-18 | |
local-authority:TON | Tonbridge and Malling Borough Council | 2024-04-25 | |
local-authority:TRF | Trafford Metropolitan Borough Council | 2023-04-24 | |
local-authority:WKF | Wakefield Metropolitan District Council | 2024-04-22 | |
local-authority:WAW | Warwick District Council | 2018-07-11 | |
local-authority:WAT | Watford Borough Council | 2020-02-06 | |
local-authority:WEW | Welwyn Hatfield Borough Council | 2021-05-26 | |
local-authority:WBK | West Berkshire Council | 2024-04-25 | |
local-authority:WLI | West Lindsey District Council | 2023-04-18 | |
local-authority:WOX | West Oxfordshire District Council | 2023-04-13 | |
local-authority:WSK | West Suffolk Council | 2022-07-21 | |
local-authority:WGN | Wigan Metropolitan Borough Council | 2024-04-24 | |
local-authority:WOI | Woking Borough Council | 2024-03-21 | |
local-authority:WYE | Wyre Forest District Council | 2024-04-19 |
description | end-date | entry-date | name | parent-project | project | project-status | provision-reason | documentation-url | start-date | specifications | organisations | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Local Land Charges Programme |
local-land-charges |
in-progress |
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
local-land-charges.md: llc.csv project.py | |
python3 project.py > $@ | |
llc.csv: llc.html organisation.csv parse.py | |
python3 parse.py llc.html $@ | |
llc.html: | |
curl -L 'https://www.gov.uk/government/publications/hm-land-registry-local-land-charges-programme/local-land-charges-programme' > $@ | |
organisation.csv: | |
curl -qfs "https://files.planning.data.gov.uk/organisation-collection/dataset/organisation.csv" > $@ | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import sys | |
import csv | |
from datetime import datetime | |
from pyquery import PyQuery | |
organisations = {} | |
for row in csv.DictReader(open("organisation.csv", newline="")): | |
organisations[row["name"]] = row | |
# Skip Welsh councils | |
for o in [ | |
"Blaenau Gwent County Borough Council", | |
"Caerphilly County Borough Council", | |
"City and County of Swansea Council", | |
"Merthyr Tydfil County Borough Council", | |
"Torfaen County Borough Council", | |
"Pembrokeshire County Council", | |
]: | |
organisations.setdefault(o, {"organisation": ""}) | |
# TBD: create and use a recociliation dataset | |
for o, n in [ | |
("Blackpool Council", "Blackpool Borough Council"), | |
("Haringey Council", "London Borough of Haringey"), | |
("Kingston upon Hull City Council", "Hull City Council"), | |
("Lambeth Council", "London Borough of Lambeth"), | |
("Sutton Council", "London Borough of Sutton"), | |
]: | |
organisations[o] = organisations[n] | |
def find_organisation(name): | |
name = name.strip() | |
if name in organisations: | |
return organisations[name]["organisation"] | |
raise NameError(name) | |
pq = PyQuery(filename=sys.argv[1]) | |
fieldnames = ["organisation", "name", "start-date"] | |
w = csv.DictWriter(open(sys.argv[2], "w", newline=""), fieldnames) | |
w.writeheader() | |
for tr in pq("tr").items(): | |
cols = list(tr("td")) | |
if cols: | |
row = {} | |
row["name"] = cols[0].text | |
row["start-date"] = datetime.strptime(cols[1].text, "%d %B %Y").strftime( | |
"%Y-%m-%d" | |
) | |
row["organisation"] = find_organisation(row["name"]) | |
if row["organisation"]: | |
w.writerow(row) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import csv | |
print("""--- | |
description: '' | |
end-date: '' | |
entry-date: '' | |
name: Local Land Charges Programme | |
parent-project: '' | |
project: local-land-charges | |
project-status: in-progress | |
provision-reason: | |
documentation-url: https://www.gov.uk/government/publications/hm-land-registry-local-land-charges-programme/local-land-charges-programme | |
start-date: '' | |
specifications: | |
organisations:""") | |
for row in csv.DictReader(open("llc.csv", newline="")): | |
print(f'- organisation: {row["organisation"]}') | |
print(f' start-date: {row["start-date"]}') | |
print("---") |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment