Created
July 31, 2012 21:31
-
-
Save tmcw/3220747 to your computer and use it in GitHub Desktop.
Archive Tweets
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import requests, os, glob, json | |
you = 'tmcw' | |
data = 'tweets' | |
try: os.mkdir(data) | |
except Exception: pass | |
def run(max_id = False): | |
already = glob.glob("%s/*.json" % data) | |
start = 'http://api.twitter.com/1/statuses/user_timeline.json?screen_name=%s&include_rts=true&count=200' % you | |
if max_id: | |
start = '%s&max_id=%s' % (start, max_id) | |
r = requests.get(start) | |
has_new = False | |
for t in r.json: | |
if ("%s/%s.json" % (data, t['id'])) not in already: | |
json.dump(t, open('%s/%s.json' % (data, t['id']), 'w')) | |
has_new = True | |
if has_new: | |
last = r.json.pop() | |
run(last['id']) | |
print 'starting twitter archive of @%s' % you | |
run() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks for referring me to this Tom.
A minor note, this will not save all of your data, eg: your favorites, users you are following, users who are following you, or avatars, bios, etc. Also (more interestingly) it won't do any spidering to save data eventually needed to meaningfully reconstruct conversations (others' tweets), or embedded media (twitpics in the discussion, or even just preserving links). Are you aware of any other scripts that go this more elaborate route? Any interest in extending this one?