Skip to content

Instantly share code, notes, and snippets.

View edsu's full-sized avatar

Ed Summers edsu

View GitHub Profile
@edsu
edsu / table.md
Created November 19, 2025 18:38
table.md
column before after
title 891351 (93%) 804262 (99.9%)
pub_year 893340 (99.8%) 803930 (99.9%)
open_access 589300 (65.8%) 618291 (76.8%)
apc 265024 (29.6%) 274282 (34.0%)
types 894525 (99.9%) 804498 (100%)
publisher 638311 (71.3%) 588595 (73.15%)
doi 553661 (61.8%) 563544 (70.0%)
@edsu
edsu / authorship-counts.csv
Last active November 18, 2025 17:49
OpenAlex changes in number of authorships pre/post the switch to Walden.
walden pre-walden walden-authors pre-walden-authors
https://api.openalex.org/W105472354 https://api.openalex.org/W105472354?data-version=1 1 0
https://api.openalex.org/W107410459 https://api.openalex.org/W107410459?data-version=1 10 0
https://api.openalex.org/W108850305 https://api.openalex.org/W108850305?data-version=1 3 0
https://api.openalex.org/W110028034 https://api.openalex.org/W110028034?data-version=1 3 0
https://api.openalex.org/W111570961 https://api.openalex.org/W111570961?data-version=1 4 0
https://api.openalex.org/W113490272 https://api.openalex.org/W113490272?data-version=1 7 0
https://api.openalex.org/W1139073521 https://api.openalex.org/W1139073521?data-version=1 4 0
https://api.openalex.org/W115477810 https://api.openalex.org/W115477810?data-version=1 9 0
https://api.openalex.org/W1161001478 https://api.openalex.org/W1161001478?data-version=1 36 0
import requests
import json
print(len(requests.get("https://api.openalex.org/w4223476900").json()["authorships"]))
print(len(requests.get("https://api.openalex.org/w4223476900?data-version=1").json()["authorships"]))
#!/usr/bin/env python3
#
# This program will introspect on an OpenAlex API filter call and try to
# determine what record in the cursored result set is causing a problem.
#
# ./openalexbug.py https://api.openalex.org/works?filter=author.id:https://openalex.org/A5003671931&cursor=&per-page=200
# problem record: https://openalex.org/W3200281942
# 121 records
#

openpgp4fpr:DD11F92F1E44644183C06961D012FF557AFFF80A

#!/usr/bin/env python3
#
# This program demonstrates using the Tableau REST API to print out our embedding settings.
# To run it you will need to create a Personal Access Token by:
#
# 1. visiting https://tableau-uat.stanford.edu/
# 2. clicking on your user name in the top right
# 3. select "My Settings"
# 4. scrolling to the "Personal Access Tokens" section
@edsu
edsu / getall.py
Last active October 3, 2025 19:28
import dotenv
from podbucket import oai
from podbucket.oai import XML_NS
dotenv.load_dotenv()
for count, rec in enumerate(oai.list_records("503")):
ds = rec.find(".//oai:header/oai:datestamp", namespaces=XML_NS).text
print(ds, count)
@edsu
edsu / README.md
Last active October 1, 2025 21:49

If you execute ./run.sh browsertrix-crawler will be started up to crawl https://www.trm.dk/nyheder and run a behaviour to fetch all the page results and then feed all the discovered URLs to the crawl queue.

@edsu
edsu / flotilla_df.py
Last active June 6, 2025 14:45
Track the progress of the Freedom Flotilla in a DataFrame https://freedomflotilla.org/ffc-tracker/
import requests
import pandas
url = "https://flotilla-orpin.vercel.app/api/vessel"
df = pandas.DataFrame.from_records(requests.get(url).json()["vessels"]["232057367"]["positions"])
df.last_position_UTC = pandas.to_datetime(df.last_position_UTC)
print(df)
docker run \
--publish 9037:9037 \
-v $PWD/crawls:/crawls/ \
webrecorder/browsertrix-crawler crawl \
--seeds https://www.womenonweb.org/af/ \
--seeds https://www.womenonweb.org/ar/ \
--seeds https://www.womenonweb.org/de/ \
--seeds https://www.womenonweb.org/en/ \
--seeds https://www.womenonweb.org/es/ \
--seeds https://www.womenonweb.org/fa/ \