Skip to content

Instantly share code, notes, and snippets.

@jesstess
jesstess / wget_spider_https
Created November 27, 2010 20:02
Use wget to spider a site as a logged-in user.
http://addictivecode.org/FrequentlyAskedQuestions
To spider a site as a logged-in user:
1. post the form data (_every_ input with a name in the form, even if it doesn't have a value) required to log in (--post-data).
2. save the cookies that get generated (--save-cookies), including session cookies (--keep-session-cookies), which are not saved when --save-cookies alone is specified.
2. load the cookies, continue saving the session cookies, and recursively (-r) spider (--spider) the site, ignoring (-R) /logout.
# log in and save the cookies
wget --post-data='username=my_username&password=my_password&next=' --save-cookies=cookies.txt --keep-session-cookies https://foobar.com/login