Created
June 17, 2014 01:43
-
-
Save jacob-ogre/aaf5fb7a620b01106c93 to your computer and use it in GitHub Desktop.
NUCmer parallelization and --prefix
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tl; dr: If parallelizing NUCmer by dividing a query into pieces, use the --prefix | |
flag to ensure separate mgap files are created rather than the single out.mgap. | |
### Fuller explanation ### | |
Trying to parallelize NUCmer alignment of assembly contigs to PacBio reads I ran | |
(repeatedly) into the error: | |
ERROR: Could not parse input from 'Query File'. | |
Please check the filename and format, or file a bug report | |
The go-to answers from Google suggest bad fasta files, particularly when Windows | |
EOL encoding (\r\n) is present. The contig fastas were just fine, and after | |
digging into the source code, I found in postnuc.cc the following comment: | |
//-- If a B sequence not seen yet, read it in | |
//-- IMPORTANT: The B sequences in the synteny object are assumed to be | |
// ordered as output by mgaps, if they are not in order the program | |
// will fail. (All like tags must be adjacent and in the same order | |
// as the query file) | |
My parallel processing approach involves just splitting the query into 16 parts | |
and running each search on a separate process. However, the default action of | |
NUCmer is to generate output files with prefix 'out' (so out.mgaps, out.ntref), | |
and each piece of the query is output to the same out.* file. I think that the | |
hits are inter-leaved, which is what the above comment is warning against. | |
The solution is to use the --prefix flag, with a unique prefix for each query | |
file. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I may have posted a little too quickly...not certain that the current run will finish correctly, but it's looking OK so far.