-
Star
(146)
You must be signed in to star a gist -
Fork
(35)
You must be signed in to fork a gist
-
-
Save cosmocatalano/4544576 to your computer and use it in GitHub Desktop.
<?php | |
//returns a big old hunk of JSON from a non-private IG account page. | |
function scrape_insta($username) { | |
$insta_source = file_get_contents('http://instagram.com/'.$username); | |
$shards = explode('window._sharedData = ', $insta_source); | |
$insta_json = explode(';</script>', $shards[1]); | |
$insta_array = json_decode($insta_json[0], TRUE); | |
return $insta_array; | |
} | |
//Supply a username | |
$my_account = 'cosmocatalano'; | |
//Do the deed | |
$results_array = scrape_insta($my_account); | |
//An example of where to go from there | |
$latest_array = $results_array['entry_data']['ProfilePage'][0]['user']['media']['nodes'][0]; | |
echo 'Latest Photo:<br/>'; | |
echo '<a href="http://instagram.com/p/'.$latest_array['code'].'"><img src="'.$latest_array['display_src'].'"></a></br>'; | |
echo 'Likes: '.$latest_array['likes']['count'].' - Comments: '.$latest_array['comments']['count'].'<br/>'; | |
/* BAH! An Instagram site redesign in June 2015 broke quick retrieval of captions, locations and some other stuff. | |
echo 'Taken at '.$latest_array['location']['name'].'<br/>'; | |
//Heck, lets compare it to a useful API, just for kicks. | |
echo '<img src="http://maps.googleapis.com/maps/api/staticmap?markers=color:red%7Clabel:X%7C'.$latest_array['location']['latitude'].','.$latest_array['location']['longitude'].'&zoom=13&size=300x150&sensor=false">'; | |
?> | |
*/ |
To mitigate Instagram ip detection (on the API side) I use proxies which are usually not located in popular data center ip ranges.
@restyler thanks for replying really appreciated, can you tell me a little more about your login on how you are handling from not getting blocked by instagram, are you using any third party API or anything which provides new IP on each request ? because by looking your code it seems like you're just asking proxy credentials from user and connecting to that proxy server if i'm not wrong. please let me know your comments. Thanks.
hey really enjoyed this post. i made a quick lil mockup on the break down of scraping user tags without login.
https://gist.github.com/ycaty/23cf1c17e6bb6e353f5823b3392c1e01#file-instagram-user-tag-scraping-2020
By any chance does anyone happen to have a way to collect followers without logging in?
hey really enjoyed this post. i made a quick lil mockup on the break down of scraping user tags without login.
https://gist.github.com/levlet/23cf1c17e6bb6e353f5823b3392c1e01By any chance does anyone happen to have a way to collect followers without logging in?
Page not found
hey really enjoyed this post. i made a quick lil mockup on the break down of scraping user tags without login.
https://gist.github.com/levlet/23cf1c17e6bb6e353f5823b3392c1e01
By any chance does anyone happen to have a way to collect followers without logging in?Page not found
updated link
https://gist.github.com/ycaty/23cf1c17e6bb6e353f5823b3392c1e01#file-instagram-user-tag-scraping-2020
looks like instagram blocking scraping using file_get_contents/curl anyone got solution? i wonder how online web scraping tools are working then without block?
Hi 'Cosmocatalano' [ nomen est omen?] :) ,
this is a very interesting solution. I only try it on local host so I have no problem with CORS. But the array names seem to be changed completely. The only one which is still the same seems to be 'entry_data'. Is this changed response still usable with alternative array 'names'? This would be very interesting.
Best regards and thanks
Axel Arnold Bangert
looks like instagram blocking scraping using file_get_contents/curl anyone got solution? i wonder how online web scraping tools are working then without block?
I guess it is just the right amount of good proxies.. I am using https://rapidapi.com/neotank/api/simple-instagram-api to avoid dealing with proxies now because they fail all the time (for Instagram) and get 302 redirect to login..
This GitHub repository is a great resource for ios app urls, but it could be updated for more relevance. By the way, have you explored Insta Pro APK for advanced Instagram features?
Yes. Technically there is a
proxy
method in the API which allows you to submit any instagram.com* link and get raw HTML/JSON response, and there are helper endpoints likegetMediaByUrl
you've mentioned, if you don't need raw response. I'd recommend use helpers when it is feasible, because this approach uses more optimisations on the API side.To mitigate Instagram ip detection (on the API side) I use proxies which are usually not located in popular data center ip ranges.