Skip to content

Instantly share code, notes, and snippets.

@winder
Created April 10, 2018 19:01
Show Gist options
  • Save winder/a97cc4d9480d4f12620f4602369d61f3 to your computer and use it in GitHub Desktop.
Save winder/a97cc4d9480d4f12620f4602369d61f3 to your computer and use it in GitHub Desktop.
Instagram Scraper April 10 2018
#!/usr/bin/env python3
import requests
import urllib.parse
import hashlib
import json
#CHROME_UA = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'
CHROME_UA = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'
def getSession(rhx_gis, csrf_token, variables):
""" Get session preconfigured with required headers & cookies. """
#"rhx_gis:csfr_token:user_agent:variables"
print(variables)
values = "%s:%s:%s:%s" % (
rhx_gis,
csrf_token,
CHROME_UA,
variables)
x_instagram_gis = hashlib.md5(values.encode()).hexdigest()
session = requests.Session()
session.headers = {
'user-agent': CHROME_UA,
'x-instagram-gis': x_instagram_gis
}
print(x_instagram_gis)
session.cookies.set('ig_pr', '2')
session.cookies.set('csrftoken', csrf_token)
return session
if __name__ == '__main__':
session = requests.Session()
session.headers = { 'user-agent': CHROME_UA }
response = session.get("https://www.instagram.com/selenagomez")
data = json.loads(response.text.split("window._sharedData = ")[1].split(";</script>")[0])
csrf = data['config']['csrf_token']
rhx_gis = data['rhx_gis']
variables = '{"id":"460563723","first":10,"after":"AQBf8puhlt8nU2JzmYdMMTuH0FbMgUM1fnIOZIH7n94DM4VLWkVILUAKVB-5dqvxQEI-Wd0ttlEDzimaaqwC98jccQaDQT4tSF56c_NlWi_shg"}'
session = getSession(rhx_gis, csrf, variables)
query_hash = '42323d64886122307be10013ad2dcc44'
encoded_vars = urllib.parse.quote(variables, safe='"')
url = 'https://www.instagram.com/graphql/query/?query_hash=%s&variables=%s' % (query_hash, encoded_vars)
print(url)
print(session.get(url).text)
@deter3
Copy link

deter3 commented Apr 12, 2018

it seems like x-instagram-gis does not include the user-agent string anymore .

@sweetmoniker
Copy link

sweetmoniker commented Apr 23, 2018

Thanks for posting. My experience also indicates that cookie management is not necessary, though will not harm anything. Nor is the csrf token required anywhere. Really the only code you should need is the following:

def getSession(rhx_gis, variables):
    """ Get session preconfigured with required headers & cookies. """
    #"rhx_gis:csfr_token:user_agent:variables"
    values = "%s:%s" % (
            rhx_gis,
            variables)
    x_instagram_gis = hashlib.md5(values.encode()).hexdigest()

    session = requests.Session()
    session.headers = {
            'x-instagram-gis': x_instagram_gis
            }

    return session

Thanks again though. I was using your code as a base to update my own. It was helpful!

@jasonray716
Copy link

it doesn't work now.. does anyone know the solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment