Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save bahoo/bdfedfc47cb971840cae489a844a2408 to your computer and use it in GitHub Desktop.
Save bahoo/bdfedfc47cb971840cae489a844a2408 to your computer and use it in GitHub Desktop.
Monkeypatch for urllib3 + Python requests dealing with extra trailing white space in HTTP header names
# Python requests and urllib3 ( and PyQuery, and other tools that build on requests )
# will hang if it encounters poorly-formed HTTP headers with trailing white space
# in the header field names.
# this is a deliberate choice on the authors' part,
# in line with the RFC https://greenbytes.de/tech/webdav/rfc7230.html#header.fields
# presumably because extra janky white space can create a security risk.
# the right thing to do in such a situation is to contact the adminstrator
# of the web site and get them to fix this condition,
# but that isn't always a practical option if the site isn't under your control,
# or if the authors are otherwise opposed. browser's generally don't care, fwiw. ¯\_(ツ)_/¯
# so instead, what follows below is a very janky workaround
# to just accept extra white space and roll with it.
# I make *no* warranties of this approach,
# it might open up all kinds of terrible stuff, and the sky may well fall on your head,
# but I hope it saves somebody from having a bad day.
import email.feedparser
import re
# from pyquery import PyQuery
# import requests
# adds `\s?` to accept trailing white space in header names
email.feedparser.headerRE = re.compile(r'^(From |[\041-\071\073-\176]*\s?:|[\t ])')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment