Skip to content

Instantly share code, notes, and snippets.

@jnv
Last active November 20, 2022 16:11

Revisions

  1. jnv revised this gist Dec 1, 2013. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion srt2txt.rb
    Original file line number Diff line number Diff line change
    @@ -15,7 +15,7 @@
    txt = line.text.reject do |l|
    if l =~ /^OK/ # OK. et al.
    false
    elsif l =~ /(\?\.!)$/ # keeps screaming, includes some sounds
    elsif l =~ /[\?\.!]$/ # keeps screaming, includes some sounds
    false
    else
    l =~ /^[^a-z]*$/ # Reject all-upcase lines to remove sound descriptions
  2. jnv created this gist Nov 30, 2013.
    34 changes: 34 additions & 0 deletions srt2txt.rb
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,34 @@
    #!/usr/bin/env ruby
    require "srt"
    require "sanitize"

    REJECT_LINES = [/Best watched using Open Subtitles MKV Player/,
    /Subtitles downloaded from www.OpenSubtitles.org/, /^Subtitles by/,
    /www.tvsubtitles.net/, /[email protected]/, /addic7ed/, /allsubs.org/,
    /www.seriessub.com/, /www.transcripts.subtitle.me.uk/, /~ Bad Wolf Team/,
    /^Transcript by/, /^Update by /, /UKsubtitles.ru/
    ]

    fname = ARGV[0]
    file = SRT::File.parse(File.new(fname))
    file.lines.each do |line|
    txt = line.text.reject do |l|
    if l =~ /^OK/ # OK. et al.
    false
    elsif l =~ /(\?\.!)$/ # keeps screaming, includes some sounds
    false
    else
    l =~ /^[^a-z]*$/ # Reject all-upcase lines to remove sound descriptions
    end
    end
    txt = txt.join(" ").encode("UTF-8")
    txt.gsub!(" .", ".") # . . . ellipsis
    txt.gsub!("\u0092", "'") # RIGHT SINGLE QUOTATION MARK apostrophe
    txt = txt.strip.squeeze(" ")

    next if txt.empty?
    next if REJECT_LINES.any? { |expr| expr =~ txt }

    puts Sanitize.clean(txt)
    end
    puts