Skip to content

Instantly share code, notes, and snippets.

@shedd
Created February 13, 2025 18:34
Show Gist options
  • Save shedd/54df7cce79a059aedeee2aae5120525d to your computer and use it in GitHub Desktop.
Save shedd/54df7cce79a059aedeee2aae5120525d to your computer and use it in GitHub Desktop.
Archive your Dropbox Sign / HelloSign signed documents

Dropbox Sign PDF Download Script

Dropbox Sign (fka HelloSign) unfortunately makes it so that you must use a premium plan to export all of your completed documents to a cloud storage account.

This Ruby script uses the Dropbox Sign API to:

  1. Fetch all completed Signature Requests for an account in a paginated manner (up to the max of 100 per page).
  2. Cache the list of Signature Requests locally.
  3. Download each Signature Request's PDF file in batches of 25.
  4. Cache results of successfully downloaded and failed downloads to make it easy to resume a long-running process.

Requirements

  • Ruby 2.6+ (Script tested on Ruby 3.x but should work for any reasonably recent Ruby version).
  • The dropbox-sign gem (HelloSign's official Ruby SDK).
  • The fileutils and json standard libraries (already part of the default Ruby installation).

Getting Started

  1. Clone or download this repository to your local machine.
  2. Install Dependencies:
    bundle install
  3. Set Your Dropbox Sign API Key:
    • In the script, locate the following block and replace "API KEY" with your actual Dropbox Sign API Key:
      Dropbox::Sign.configure do |config|
        config.username = "API KEY"  # <--- Replace this
      end
    • Optionally, you can manage this via environment variables (if you prefer not to hardcode the key) by modifying the code:
      config.username = ENV['DROPBOX_SIGN_API_KEY']
  4. Run the Script:
    bundle exec ruby download_dropbox_sign_pdfs.rb
    • By default, it will fetch and store up to 100 requests per page, iterating through all pages.
    • It downloads PDFs into the ./files directory.

What the Script Does

  1. Retrieve Signature Requests

    • Calls the API via SignatureRequestApi#signature_request_list using pagination.
    • Only completed Signature Requests (those with at least one signed document) are fetched (due to complete: true in the code).
    • If a local cache (signature_requests_cache.json) exists, it will be used to avoid refetching everything from the API.
  2. Download Each PDF

    • Processes downloads in batches of 25 requests to respect rate limits.
    • Each PDF is saved with a filename pattern of <requester_email>_<recipient_email>_<title>_<signature_request_id>.pdf in the ./files directory.
    • If a PDF fails to download (e.g., rate limit errors after multiple retries), it is recorded in failed_downloads.json.
  3. Progress & Resuming

    • The script logs every successful download in completed_downloads.json.
    • If you stop or restart the script, it will skip already completed downloads and only attempt the remaining ones.

Configuration Files

  • signature_requests_cache.json
    Caches the complete list of Signature Requests.

    • If present, the script won’t call the API again for signature requests (unless you delete or rename this file).
    • This allows you to resume a failed execution.
  • completed_downloads.json
    Keeps track of Signature Request IDs that have been successfully downloaded.

    • The script won’t redownload existing entries in this file.
    • This allows you to resume a failed execution.
  • failed_downloads.json
    Keeps track of any Signature Requests that couldn’t be downloaded (after all retries).

    • This lets you investigate or retry them later.

You can safely delete or rename these files if you want to force a full re-fetch or re-download process.


Rate Limiting & Retries

  • The Dropbox Sign API has strict rate limits.
  • The script handles:
    • 429 (Too Many Requests) responses by automatically sleeping for 20 seconds and retrying (up to 3 times).
    • A forced sleep at the end of each 25-request batch to ensure no more than 25 requests/minute are made.

License

This script is provided as-is under the MIT License.

require 'dropbox-sign'
require 'fileutils'
require 'json'
# File paths for caching and tracking progress.
CACHE_FILE = "./signature_requests_cache.json"
COMPLETED_FILE = "./completed_downloads.json"
FAILED_FILE = "./failed_downloads.json"
# Configure your HelloSign API credentials.
Dropbox::Sign.configure do |config|
# Configure HTTP basic authorization: api_key
config.username = "API KEY"
end
# Helper method to sanitize strings for file names.
def sanitize_filename(str)
# Replace any character that is not alphanumeric, dot, underscore, or hyphen with an underscore.
str.gsub(/[^0-9A-Za-z.\-_]/, '_')
end
# Helper method to fetch a page with retries on errors (especially 429 errors).
def fetch_signature_request_page(signature_request_api, page_size, page_number, retries = 3)
begin
result = signature_request_api.signature_request_list({
account_id: 'all',
page_size: page_size,
page: page_number,
complete: true
})
# Debug: print the page info from the API response if available.
if result.list_info.respond_to?(:page)
puts "DEBUG: Received page #{result.list_info.page} of #{result.list_info.num_pages}."
else
puts "DEBUG: Fetched page #{page_number} (API did not return explicit page info)."
end
return result
rescue StandardError => e
if e.message.include?("429") && retries > 0
wait_time = 20
puts "Received 429 error on page #{page_number}. Retrying in #{wait_time} seconds..."
sleep(wait_time)
return fetch_signature_request_page(signature_request_api, page_size, page_number, retries - 1)
else
puts "Error fetching page #{page_number}: #{e.message}"
raise e
end
end
end
# Retrieves all complete signature requests using pagination.
# By including complete: true in the query params, only requests with a signed document are returned.
def get_all_signature_requests
if File.exist?(CACHE_FILE)
puts "Loading cached signature requests from #{CACHE_FILE}..."
data = JSON.parse(File.read(CACHE_FILE))
puts "Loaded #{data.size} signature request(s) from cache."
return data
end
signature_request_api = Dropbox::Sign::SignatureRequestApi.new
page_size = 100
signature_requests = []
# Initial call to get pagination info.
initial_result = fetch_signature_request_page(signature_request_api, page_size, 1)
total_pages = initial_result.list_info.num_pages
puts "Total pages to process: #{total_pages}"
(1..total_pages).each do |page_number|
puts "\nProcessing page #{page_number} of #{total_pages}..."
result = fetch_signature_request_page(signature_request_api, page_size, page_number)
current_page_count = result.signature_requests.size
puts "DEBUG: Processing page #{page_number}, received #{current_page_count} requests."
result.signature_requests.each do |req|
current_req = {
"signature_request_id" => req.signature_request_id,
"title" => req.title,
"requester_email_address" => req.requester_email_address,
"recipient_email_address" => (req.respond_to?(:signatures) && req.signatures && !req.signatures.empty?) ?
req.signatures.first.signer_email_address : "unknown"
}
if signature_requests.any? { |r| r["signature_request_id"] == current_req["signature_request_id"] }
puts "DEBUG: Duplicate encountered on page #{page_number}: #{current_req["signature_request_id"]} (title: #{current_req["title"]}, requester: #{current_req["requester_email_address"]}, recipient: #{current_req["recipient_email_address"]})"
else
signature_requests << current_req
end
end
puts " Retrieved #{current_page_count} signature request(s) from page #{page_number}."
puts " Running total (unique complete requests so far): #{signature_requests.size}"
end
# Option 1: Deduplicate by signature_request_id (one file per unique signature request)
puts "\nFinished retrieving signature requests."
puts "Total unique complete signature requests retrieved: #{signature_requests.size}"
File.write(CACHE_FILE, JSON.pretty_generate(signature_requests))
puts "Cached signature requests to #{CACHE_FILE}."
signature_requests
# Option 2: Comment out the deduplication if you want to process every record (even duplicates)
# puts "\nFinished retrieving signature requests."
# puts "Total complete signature requests retrieved: #{signature_requests.size}"
# File.write(CACHE_FILE, JSON.pretty_generate(signature_requests))
# puts "Cached signature requests to #{CACHE_FILE}."
# return signature_requests
end
# Helper function to download a file with retry logic.
#
# This method attempts to download the PDF for a signature request.
# If a 429 error (Too Many Requests) is encountered, it will wait (20 seconds)
# and retry up to max_attempts.
#
# After copying the file from the temporary location to the destination,
# it explicitly closes and removes the temporary file.
def download_with_retry(signature_request_api, req, file_path, attempt=1, max_attempts=3)
begin
file_bin = signature_request_api.signature_request_files(req["signature_request_id"], { file_type: "pdf" })
# Copy the file from the temporary location to the destination.
FileUtils.cp(file_bin.path, file_path)
# Explicitly close and remove the temporary file.
file_bin.close if file_bin.respond_to?(:close)
if file_bin.respond_to?(:unlink)
file_bin.unlink
elsif file_bin.respond_to?(:delete)
file_bin.delete
end
return true
rescue StandardError => e
if e.message.include?("HTTP status code: 429")
if attempt < max_attempts
wait_time = 20 # Reduced wait time as per your configuration.
puts "Received 429 for request #{req["signature_request_id"]} (attempt #{attempt} of #{max_attempts}). Retrying in #{wait_time} seconds..."
sleep(wait_time)
return download_with_retry(signature_request_api, req, file_path, attempt + 1, max_attempts)
else
puts "Exceeded maximum retry attempts for request #{req["signature_request_id"]}."
return false
end
else
puts "Error downloading request #{req["signature_request_id"]}: #{e.message}"
return false
end
end
end
# Downloads the PDF files for each signature request in batches of 25.
# Tracks and caches completed downloads so that the process can be resumed.
def download_requests(requests)
signature_request_api = Dropbox::Sign::SignatureRequestApi.new
batch_size = 25
# Ensure the output directory exists.
FileUtils.mkdir_p('./files')
# Load the list of completed downloads.
completed = if File.exist?(COMPLETED_FILE)
JSON.parse(File.read(COMPLETED_FILE))
else
[]
end
# Load the list of previous failures (so we don't add duplicates).
failures = if File.exist?(FAILED_FILE)
JSON.parse(File.read(FAILED_FILE))
else
[]
end
# Filter out requests that have already been processed OR are missing a valid signature_request_id.
requests_to_download = requests.reject do |req|
id = req["signature_request_id"].to_s.strip
id.empty? || completed.include?(id)
end
total_requests = requests_to_download.size
total_batches = (total_requests / batch_size.to_f).ceil
processed_files_count = 0
puts "\nStarting downloads..."
puts "Total requests to download (skipping completed ones): #{total_requests}"
requests_to_download.each_slice(batch_size).with_index do |batch, batch_index|
current_batch_number = batch_index + 1
start_time = Time.now
puts "\nProcessing batch #{current_batch_number} of #{total_batches}..."
batch.each do |req|
# Build a descriptive file name using the sender's email, recipient's email, and request title.
sender_email = req["requester_email_address"] || "unknown"
recipient_email = req["recipient_email_address"] || "unknown"
request_title = req["title"] || "untitled"
sanitized_sender = sanitize_filename(sender_email)
sanitized_recipient = sanitize_filename(recipient_email)
sanitized_title = sanitize_filename(request_title)
file_path = "./files/#{sanitized_sender}_#{sanitized_recipient}_#{sanitized_title}_#{req["signature_request_id"]}.pdf"
if download_with_retry(signature_request_api, req, file_path)
processed_files_count += 1
puts " Successfully downloaded request #{req["signature_request_id"]} to:"
puts " #{file_path} (Total downloaded in this run: #{processed_files_count})"
completed << req["signature_request_id"]
File.write(COMPLETED_FILE, JSON.pretty_generate(completed))
else
puts " Failed to download request #{req["signature_request_id"]} after retries."
# Check if we already have this failure recorded to avoid duplicates.
unless failures.any? { |f| f["signature_request_id"] == req["signature_request_id"] }
failures << {
"signature_request_id" => req["signature_request_id"],
"title" => req["title"],
"requester_email_address" => req["requester_email_address"],
"recipient_email_address" => req["recipient_email_address"]
}
File.write(FAILED_FILE, JSON.pretty_generate(failures))
end
end
end
files_remaining = total_requests - processed_files_count
batches_remaining = total_batches - current_batch_number
puts "Completed batch #{current_batch_number}/#{total_batches}: #{batch.size} file(s) processed in this batch."
puts "Total files downloaded so far: #{processed_files_count} of #{total_requests}."
puts "Batches remaining: #{batches_remaining} | Files remaining: #{files_remaining}"
# Enforce the rate limit for high-tier endpoints (25 requests per minute).
elapsed = Time.now - start_time
sleep_time = 60 - elapsed
if sleep_time > 0 && current_batch_number < total_batches
puts "Sleeping for #{sleep_time.round} seconds to respect rate limit..."
sleep(sleep_time)
end
end
puts "\nDownload process completed."
puts "Failed requests are logged in '#{FAILED_FILE}' (#{failures.size} failures recorded)."
end
# Main execution flow.
# Load (or retrieve) signature requests and then download the pending ones.
requests = get_all_signature_requests
download_requests(requests)
source "https://rubygems.org"
gem 'dropbox-sign'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment