Last active
September 6, 2020 13:41
-
-
Save Arkadeep-sophoIITG/abe67d65223339678f28b5bcc2ae14b3 to your computer and use it in GitHub Desktop.
Accepts an input csv file and shuffles the rows using python pandas dataframe
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
Title : Pandas Row Shuffler | |
Author : Arkadeep | |
''' | |
import numpy as np | |
import pandas as pd | |
import sys | |
arguments = sys.argv[1] | |
args = arguments.strip('.csv'); | |
def shuffler(filename): | |
df = pd.read_csv(filename, header=0,dtype=object,na_filter=False) | |
# return the pandas dataframe | |
return df.reindex(np.random.permutation(df.index)) | |
def main(outputfilename): | |
shuffler(arguments).to_csv(outputfilename, sep=',',encoding = 'utf-8',index = False) | |
if __name__ == '__main__': | |
main(args+'-shufffled.csv') |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi Arkadeep how to shuffle when the filesize is 34 GB and you have 16 MiB of RAM. Will this work?