Created
January 29, 2020 06:46
-
-
Save Bajena/8412fdc8e0613938a652cd4c78fd31b2 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class Loader | |
def load | |
Enumerator.new { |main_enum| stream(main_enum) } | |
end | |
private | |
def stream(main_enum) | |
reader = nil | |
file_uri.open do |file| | |
reader = Zlib::GzipReader.new(file) | |
reader.each_line.lazy.drop(1).each do |line| | |
main_enum << preprocess_row(line) | |
end | |
end | |
ensure | |
reader&.close | |
end | |
def file_uri | |
URI.parse("ftp://user:[email protected]/file.csv.gz") | |
end | |
def preprocess_row(row) | |
row.chomp.gsub('"', "").split(",") | |
end | |
end |
building a proper streaming CSV parser, you would actually open an IO object, pass that into CSV.foreach, and then feed each line into the IO
what about CSVs containing quoted newlines, nested quotes, etc?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Why are you stripping out the main quote character for CSVs, this absolutely the wrong way to parse CSV data except the most basic input
is going to be parsed completely incorrectly by your
preprocess_row
functionthe correct output would be:
What you're going to get