Last active
April 18, 2018 00:55
-
-
Save mhbeals/7d64a32a6e1ce7ebf33b9e26011b8f3f to your computer and use it in GitHub Desktop.
A script to download all (as of April 2018) newspaper articles from Papers Past (NZ)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import requests | |
number = 1 | |
# Note, this will take a long time! | |
while number < 1318493: | |
# Make sure to change your API key at the end of the URL | |
urltext = "http://api.digitalnz.org/v3/records.xml?api_key=################&and[collection][]=Papers+Past&sort=date&text=+&and[category][]=Newspapers&direction=asc&page=" + str(number) | |
response = requests.get(urltext) | |
newtext = response.text | |
data = newtext.encode('ascii', 'ignore').decode('ascii') | |
with open('ppnewspapers\\' + str(number) + '.xml', 'w') as f: | |
f.write(data) | |
print(str((number*20)-19) + "-" + str(number*20) + " out of " + str(26369857) + " collected\n") | |
number = number+1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment