-
-
Save liuzhoou/205788 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env ruby | |
# Script that wraps grabbing and selecting stuff from HTML page via CSS selectors | |
# Created 2009-10-01 by Jesper Rønn-Jensen, www.justaddwater.dk | |
# | |
# For usage, run parsepage.rb without arguments. | |
# | |
# Feel free to modify, fork and improve as long as you commit your changes back to me :) | |
def usage | |
<<-EOF #.gsub(' ', '') | |
=== USAGE === | |
parsepage.rb [uri] [css_selector] | |
=== example === | |
parsepage.rb "http://www.smashingmagazine.com/2009/09/24/10-useful-usability-findings-and-guidelines/" "h3" | |
U=http://www.smashingmagazine.com/2009/09/24/10-useful-usability-findings-and-guidelines/ | |
parsepage.rb $U "h1,h2,h3" | |
=== OPTIONS === | |
--text-only lists only text from matching elements | |
--list-html lists all matches as HTML <li> elements | |
--count Count number of matches | |
EOF | |
end | |
print usage if ARGV.empty? | |
puts ARGV.inspect | |
require "rubygems" | |
require "nokogiri" | |
require "open-uri" | |
doc = Nokogiri::HTML(open(ARGV[0])) | |
matches = doc.css(ARGV[1]) | |
result = [] | |
result << "no matches found" if matches.empty? | |
if (ARGV.include?('--text-only')) | |
result << matches.map{|element| "#{element.text}"} | |
elsif (ARGV.include?('--html-list')) | |
result << matches.map{|element| "<li>#{element.text}<li>"} | |
else | |
result << matches.to_s | |
end | |
result << '' | |
result << "TOTAL (#{matches.size} match#{'es' if matches.size != 1} found for '#{ARGV[1]}')" if (ARGV.include?('--count')) | |
puts result.join("\n") | |
# TODO stuff that would be great to support | |
# | |
# * support for local files | |
# * refactor so this could be used to return a Nokogiri object with selection via irb | |
# |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment