trim.seqs() with "processors" >1 produces corrupt files

Dear mothur team,

I noticed that in mothur v1.29.2 win64 the trim.seqs() command with option “processors” >1 produces corrupt files. Specifically, group names for many sequences are missing in the group file; the name file contains a large white space “header” and misses sequence names from the fasta file.

I observed file corruption with processors={4;8}. Running the trim.seqs() command with processors=1 gives proper results, so I assume the error occurs generally with parallelization.

Please let me know if I can be of further help.

Best regards, Sven

Thanks for reporting this issue. I have uploaded a new version of 1.29.2 that resolves the problem.

I’ve noticed a similar problem with pairwise.seqs() command. When you use more then 1 processor under Windows, obtained distances are somewhat higher than should be, which eventually leads into incorrectly defined OTUs.

Can you look into it? If you need any files or more infromation I am happy to provide it.
I am using pairwise.seqs() when working with non-16S genes.

Thanks for reporting this issue. The pairwise.seqs command was not passing the cutoff to the thread properly. I have fixed this issue and posted a new Windows version of 1.30.1.

mothur v.1.30.2
Last updated: 4/19/2013

The earlier reported “large white space” issue for the multiprocessor trim.seqs() command is also apparent in this version.

Same is true for the 1.31.1 version!

Could it be that it has something to do with running shhh.flows() separately on every sample, followed by merging the fasta, names and groups file in before the trim.seqs() command??

I am not seeing that in our current version. Are you sure you are using 1.31.1, and not an older copy of mothur’s executable somewhere in your path?

I appear to be running into the same problem.

Win 8 Pro, 64 bit
mothur 1.31.2

I was working through the Schloss SOP. After shhh.flows() I ran trim.seqs(). Both executed fine, but when I ran summary.seqs() with both the fasta and name files explicitly given, I received an error:
Using 2 processors.
[ERROR]: IEUJ3KL01AKU92 is not in your name or count file, please correct.
[ERROR]: process 0 only processed 1 or 8831 sequences assigned to it, quitting.

I pinpointed the name file as the source of the problem (summary.seqs works without the name file but not with). When I opened the file, there were a lot of NUL values before the data started. After reading this thread I reran trim.seqs() with processor=1 and everything now works.

I am not able to reproduce the error you are having with version 1.31.2 on WIndows 7 with the example dataset in the Schloss SOP. I don’t think it’s an issue related to your data since you are able to run trim.seqs with processors=1. I am trying to narrow down whether this is an issue with trim.seqs on Windows 8 or an issue with all mothur’s paralellized commands on Windows 8. Are you having trouble with mothur’s other paralellized commands? For example, can you run align.seqs with processors=2?

I’ve been keeping a close eye on all of the parallelized commands since I had this problem. I’ve run through the entire quality control and trimming portion of the 454 SOP except for the mock community related steps with 4 different SFF files since I posted the problem. All other commands work with processors=2 except for trim.seqs.

I know I’m probably in the minority working in Win8 (hey, it’s a significant upgrade from the Vista that came on the machine!), so I understand this isn’t a priority for you, but if I can provide any files or logs to help out, just let me know.

Thanks for the extra information. I would like to resolve the issue for you. Perhaps you data is hitting something in the paralellization that the example data from SOP is not. Could you send your input files and log file to so I can dig deeper into the issue?

Thanks for sending your files. I am not able to reproduce the error you are getting on my Windows 7 computer with your files. It appears it is a bug isolated to Windows 8. We haven’t tested mothur’s paralellization on Windows 8, so for now I would recommend using processors=1. Sorry for the inconvenience.