Created
December 16, 2010 12:02
-
-
Save smacarthur/743324 to your computer and use it in GitHub Desktop.
Example code for using LSF job Arrays
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### Generate a random big file that we want to sort, 10 Million lines | |
perl -e 'for (1..1E7){printf("%.0f\n",rand()*1E7)};' > bigFile | |
### Split the file up into chunks with 10,000 lines in each chunk | |
split -a 3 -d -l 10000 bigFile split | |
### rename the files on a 1-1000 scheme not 0-999 | |
for f in split*;do mv ${f} $(echo ${f} |perl -ne 'm/split(0*)(\d+)/g;print "Split",$2+1,"\n";');done | |
### submit a job array, allowing 50 jobs to be run at anyone time | |
ID=$(bsub -J "sort[1-1000]%50" "sort -n Split\$LSB_JOBINDEX >Split\$LSB_JOBINDEX.sorted" |perl -ne 'm/<(\d+)>/;print "$1"') | |
### merge the sorted files together once all the jobs are finished using the –w dependency | |
ID2=$(bsub -w "done($ID)" "sort -n -m *.sorted >bigFile.sorted" |perl -ne 'm/<(\d+)>/;print "$1"') | |
### Delete the temp files, waits for the merge to finish first | |
bsub -w "done($ID2)" "rm -f Split*" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment