Skip to content

Instantly share code, notes, and snippets.

@cmoscardi
Created September 1, 2017 15:30
Link Checking Code -- comments below.
@cmoscardi
Copy link
Author

cmoscardi commented Sep 1, 2017

This is messy, but is the worst part of the whole process.

  1. Scraping the links is fairly straightforward (just search for all http/https URLs in your notebook JSON).
  2. Once you do that, you can run this code to check all the web_links - set it up as a defaultdict(int) with the URLs as keys. So it'd look like this:
web_links = {"https://www.google.com": 0}
  1. Last but not least, FuturesSession comes from requests-futures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment