Skip to content

Instantly share code, notes, and snippets.

@dailenspencer
Created May 14, 2018 23:15
Show Gist options
  • Save dailenspencer/e6cf878ac30fbda6d98d72dd57cc5418 to your computer and use it in GitHub Desktop.
Save dailenspencer/e6cf878ac30fbda6d98d72dd57cc5418 to your computer and use it in GitHub Desktop.
# -- OVERVIEW --
# This script will handle the execution of a the CraigsList Jobs Scrapy crawler
# which gathers job listing contents. We then upload the results to a s3 bucket.
# save file with timestamp prepended
timestamp=$(date +%Y-%m-%d_%H-%M-%S)
filename="$timestamp"_results.json
# execute scrapy and store results in json file
scrapy crawl jobs -o $filename
# copy results to s3 bucket
aws s3 cp $filename s3://craigslist-jobs-app/scrapy-results/$filename
# remove json file
rm $filename
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment