Skip to content

Instantly share code, notes, and snippets.

@juanger
Created September 18, 2013 19:00
Show Gist options
  • Save juanger/6613895 to your computer and use it in GitHub Desktop.
Save juanger/6613895 to your computer and use it in GitHub Desktop.
CSV Splitter
# This script splits a csv file with header into multiple csv files
# Usage: ruby csv_splitter.rb FILENAME [NUM_CHUNKS=10]
# Example: ruby csv_splitter.rb wmeco_export.csv 20
# Change this method to skip malformed lines
def malformed?(line)
false
## This could be:
# line.start_with?(',')
end
TOTAL_CHUNKS = ARGV[1].to_i || 10
data_count = `wc -l #{ARGV[0]} | cut -f 4 -d " "`.to_i
lines = File.open(ARGV[0]).each_line
header = lines.next
file_basename, file_extension = ARGV[0].split('.')
i = 0
TOTAL_CHUNKS.times do |chunk_number|
File.open("#{file_basename}_#{chunk_number}.#{file_extension}", "w") do |file|
file.write(header)
(data_count/TOTAL_CHUNKS).times do
line = lines.next
file.write(line) unless i >= data_count || malformed?(line)
i += 1
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment