Skip to content

Instantly share code, notes, and snippets.

@MaherSaif
Forked from peterc/grab.rb
Created July 7, 2025 09:30
Show Gist options
  • Save MaherSaif/48ca2245dd2aa929a230c173a9b552ab to your computer and use it in GitHub Desktop.
Save MaherSaif/48ca2245dd2aa929a230c173a9b552ab to your computer and use it in GitHub Desktop.
Grab all text visible on a Web page with Ruby and Ferrum
# notice the cheating technique of selecting all,
# copying to clipboard, then reading the
# clipboard back via JavaScript(!!)
# there's also some stuff to rip content out of
# shadow roots which can be useful if a page
# is doing dynamic rendering
#
# MIT licensed, (c) 2022 Peter Cooper
require 'ferrum'
def get_browser
browser = Ferrum::Browser.new(slowmo:0.1)
context = browser.contexts.create
[browser, context, context.create_page]
end
def close_browser(browser, context, page)
context.dispose
browser.quit
end
def get_page(url)
browser, context, page = get_browser
page.go_to(url)
browser.mouse.move(x: rand(400), y: rand(400))
shadows = page.evaluate_func <<~JS
function() {
let res = '';
for (let el of document.getElementsByTagName('*'))
if (el.shadowRoot) res += el.shadowRoot.innerHTML + " \\n";
return res;
}
JS
page.keyboard.type([:Ctrl, 'a'],[:Ctrl, 'c'])
contents = page.evaluate %{window.getSelection().toString()}
contents = contents + shadows
close_browser(browser, context, page)
return contents
end
p get_page(.... whatever URL you like here ....)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment