This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"""This script traverses all narrower terms of a http://id.loc.gov/ thesaurus | |
(or all terms of a term list) starting at a given term within the tree (replace | |
seedterm in the main code block with your URI of choice) and adds the URI and | |
label to a list. Outputs in CSV and JSON as well as JSONL as patterns for use in | |
rule-based NER with the NLP tool SpaCy. | |
(More info at: https://spacy.io/usage/rule-based-matching#entityruler) | |
NOTE the 5-second rate limit courtesy to the LC servers working hard for your | |
controlled vocabulary needs (see queryTerms() function). You might get away with | |
less, but don't be a jerk about it. |
We can make this file beautiful and searchable if this error is corrected: Unclosed quoted field in line 2.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Alias,CDM_page_id,CDM_field,Value | |
byington,3160,transc,"Thursday October 25th 1900 Did two churnings this morning and finished putting the tomato pickle away. I put in all the afternoon doing mending Will busy about the place and Mrs Evans helped Leonard with the corn the new man went back to town tonight. Friday October 26th 1900 Will took the butter to town. I was busy with the work till noon I did a lot of baking. I cut out shirts and sewed all afternoon. Saturday October 27th 1900 I sewed a little and got the dinner be a little after eleven. Will and I went to town in the afternoon to have my teeth finished. It has been a warm week, today was like summer. I had a letter from Mother. Sunday October 28th 1900 I was busy about the house most of the forenoon Leonard and his wife were away all day. It rained some in the forenoon. We were up to Stevens in the afternoon. I spent the evening reading. Monday October 29th 1900 It rained in the morning so Leonard could not husk corn. Will was bus |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from suds.client import Client | |
class Catcher(object): | |
"""A CONTENTdm Catcher session.""" | |
def __init__(self, url=url, user=user, password=password, license=license): | |
self.transactions = [] | |
self.client = Client('https://worldcat.org/webservices/contentdm/catcher/6.0/CatcherService.wsdl') | |
self.url = url | |
self.user = user | |
self.password = password |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Erdrich, Louise Louise Erdrich | |
Eugenides, Jeffrey Jeffrey Eugenides | |
Farrakhan, Louis Louis Farrakhan | |
Fatunde, Tunde Tunde Fatunde | |
Ames, Jonathan Jonathan Ames | |
Anshaw, Carol, 1946- Carol Anshaw | |
Julavits, Heidi Heidi Julavits | |
Mailer, Norman Norman Mailer | |
Nissen, Thisbe, 1972- Thisbe Nissen | |
Solnit, Rebecca Rebecca Solnit |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import codecs | |
import csv | |
import datetime | |
import pycdm | |
from HTMLParser import HTMLParser | |
#get input: alias + items to retrieve | |
alias = raw_input('collection alias: ') | |
items = raw_input('item identifiers (separate by commas): ') |