Last active
February 4, 2017 13:43
-
-
Save camillebaldock/1ecbb184f1f1f112419f to your computer and use it in GitHub Desktop.
Oyster journey history scraping script
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'rubygems' | |
require 'capybara' | |
require 'capybara/dsl' | |
require 'capybara/poltergeist' | |
require 'awesome_print' | |
Capybara.run_server = false | |
Capybara.current_driver = :poltergeist | |
class Oyster | |
include Capybara::DSL | |
def get_results | |
p "Enter username" | |
username = gets.chomp | |
p "Enter password" | |
password = gets.chomp | |
p "Enter Card number" | |
card_number = gets.chomp | |
p "Start date dd/mm/yyyy" | |
start_date = gets.chomp | |
p "End date dd/mm/yyyy" | |
end_date = gets.chomp | |
#Log in | |
visit "https://oyster.tfl.gov.uk/oyster/entry.do" | |
fill_in('UserName', :with => username) | |
fill_in('Password', :with => password) | |
click_button('Sign in') | |
sleep 5 | |
#Select Oyster card number | |
select(card_number, :from => 'cardId') | |
click_button('Go') | |
sleep 5 | |
click_link 'Journey history' | |
sleep 10 | |
#Select date range | |
page.execute_script("$('.hidden-range').fadeIn(); | |
$('#date-range').val('custom date range'); | |
$('#date-range-button').hide().delay('200').fadeIn(); | |
$('#from').val('#{start_date}'); | |
$('#to').val('#{end_date}');") | |
click_button('date-range-button') | |
sleep 10 | |
@scraped_journeys = {} | |
pagination = all('.pagination') | |
if pagination.count == 1 | |
#The journeys are displayed on several pages | |
page_links = get_page_links | |
number_pages = page_links.count | |
page_number = 1 | |
scrape_journeys_from_page | |
while(page_number < number_pages) | |
go_to_next_page(page_number) | |
page_number +=1 | |
sleep 10 | |
scrape_journeys_from_page | |
end | |
else | |
#The journeys are only displayed on one page | |
scrape_journeys_from_page | |
end | |
ap @scraped_journeys | |
end | |
def get_page_links | |
pagination = all('.pagination') | |
pagination.first.all('a') | |
end | |
def go_to_next_page(page_number) | |
page_links = get_page_links | |
if page_number == 1 | |
page_links[page_number-1].click | |
else | |
page_links[page_number].click | |
end | |
end | |
def scrape_journeys_from_page | |
all('.journeyhistory').each do |table| | |
date = nil | |
table.all('tr').each do |row| | |
columns = row.all('td') | |
if row[:class] == "reveal-table-row" | |
#Tube or train journey | |
add_to_scraped_journeys(date.to_s, columns[0].text, columns[1].text) | |
else | |
if columns.size == 2 | |
#Date line | |
date = Date.parse(columns[0].text) | |
end | |
if columns.size == 4 | |
#Bus journey | |
add_to_scraped_journeys(date.to_s, columns[0].text, columns[1].text) | |
end | |
end | |
end | |
end | |
end | |
def add_to_scraped_journeys(date, time, description) | |
if @scraped_journeys[date] == nil | |
@scraped_journeys[date] = [] | |
end | |
@scraped_journeys[date] << { | |
:hour => time, | |
:description => description, | |
} | |
end | |
end | |
Oyster.new.get_results |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Limitations
My personal journey history only contains train, underground and bus journeys. Other modes of travel might be displayed in another way in journey histories: feel free to comment if you notice any errors with other types of journey.
This script assumes that your Oyster account has several cards registered on it. This might not be the case for you. I do not know how the TFL website behaves when you only have one card registered: feel free to comment and/or give some sample HTML if you want the script fixed to adapt to that case.