Skip to content

Instantly share code, notes, and snippets.

@jeremyboggs
Created February 15, 2025 14:54
Show Gist options
  • Save jeremyboggs/967ac06b84486401e138d29c0104e870 to your computer and use it in GitHub Desktop.
Save jeremyboggs/967ac06b84486401e138d29c0104e870 to your computer and use it in GitHub Desktop.
Composting DH Links

Composting DH Links

In the process of doing this review, I personally began to see benefits in possible shifts for how we come to terms with—and reconsider terms for—brokenness, rot, and decay in Digital Humanities scholarship specifically but also in human creations more broadly.

Method.

  • Download the Simple CSV data available at the Index of Digital Humanites Conferences.
  • Create a set of Python scripts to complete a few tasks:
    1. Parse the conference works for any potential URLs mentioned.
    • First pass over the data relies on the URLExtract library to find addresses.
    • We are only interested (at this point!) in unique URLs per work. Primarily to speed up the process of reviewing.
    1. Attempts to request each URL and determine whether that URL works (200), returns a specific HTTP error, or otherwise is unavailable.
    2. Review the results, and determine if further data clean up is needed.
    • In this case we rely either on custom regex searches to find and replace problematic patterns. Some examples:
    • We then rerun the script to check statuses.

Results

There are approximately 20,500 URLs in the DH Conferences index dataset. Of that number:

  • XXXX work! Or at least, we get a HTTP 200 response when requested;
  • XXXX don't work! But, more specifically:
    • 404
    • 500 or similar
    • Just times out, or unknown.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment