In the process of doing this review, I personally began to see benefits in possible shifts for how we come to terms with—and reconsider terms for—brokenness, rot, and decay in Digital Humanities scholarship specifically but also in human creations more broadly.
- Download the Simple CSV data available at the Index of Digital Humanites Conferences.
- Create a set of Python scripts to complete a few tasks:
- Parse the conference works for any potential URLs mentioned.
- First pass over the data relies on the URLExtract library to find addresses.
- We are only interested (at this point!) in unique URLs per work. Primarily to speed up the process of reviewing.
- Attempts to request each URL and determine whether that URL works (200), returns a specific HTTP error, or otherwise is unavailable.
- Review the results, and determine if further data clean up is needed.
- In this case we rely either on custom regex searches to find and replace problematic patterns. Some examples:
- We then rerun the script to check statuses.
There are approximately 20,500 URLs in the DH Conferences index dataset. Of that number:
- XXXX work! Or at least, we get a
HTTP 200
response when requested; - XXXX don't work! But, more specifically:
- 404
- 500 or similar
- Just times out, or unknown.