Created
August 30, 2019 22:24
-
-
Save bahoo/bdfedfc47cb971840cae489a844a2408 to your computer and use it in GitHub Desktop.
Monkeypatch for urllib3 + Python requests dealing with extra trailing white space in HTTP header names
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Python requests and urllib3 ( and PyQuery, and other tools that build on requests ) | |
# will hang if it encounters poorly-formed HTTP headers with trailing white space | |
# in the header field names. | |
# this is a deliberate choice on the authors' part, | |
# in line with the RFC https://greenbytes.de/tech/webdav/rfc7230.html#header.fields | |
# presumably because extra janky white space can create a security risk. | |
# the right thing to do in such a situation is to contact the adminstrator | |
# of the web site and get them to fix this condition, | |
# but that isn't always a practical option if the site isn't under your control, | |
# or if the authors are otherwise opposed. browser's generally don't care, fwiw. ¯\_(ツ)_/¯ | |
# so instead, what follows below is a very janky workaround | |
# to just accept extra white space and roll with it. | |
# I make *no* warranties of this approach, | |
# it might open up all kinds of terrible stuff, and the sky may well fall on your head, | |
# but I hope it saves somebody from having a bad day. | |
import email.feedparser | |
import re | |
# from pyquery import PyQuery | |
# import requests | |
# adds `\s?` to accept trailing white space in header names | |
email.feedparser.headerRE = re.compile(r'^(From |[\041-\071\073-\176]*\s?:|[\t ])') |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment