Following this issue further, it seems that there are output differences between Mothur versions.
I ran the following command using raw fastq files (not fastq.gz files):
make.contigs(file=dio.files, checkorient=T)
When I run this command in v. 1.39.0, the number of sequences in the dio.trim.contigs.fasta and dio.contigs.groups files is the same (in my case: 8,131,113). Now, if I run the same command in v. 1.41.1, the number of sequences in the dio.trim.contigs.fasta file is 8,131,113, whereas it outputs 8,129,290 sequences in the dio.contigs.groups file. Clearly, there is something wrong with this command in v. 1.41.1.
I went into the SOP and next ran screen.seqs with the two Mothur versions, using the outputs of v. 1.39.0 above (because those of v. 1.41.1 were wrong), as follows:
screen.seqs(fasta=dio.trim.contigs.fasta, group=dio.contigs.groups, summary=dio.trim.contigs.summary, maxambig=0, maxlength=301, maxhomop=8)
However, both v. 1.39.0 and v. 1.41.1 gave me wrong dio.contigs.good.groups files. Let’s check the number of sequences being trimmed and clean:
dio.trim.contigs.bad.accnos: 4,666,759 (I know, too many crapy sequences, but let’s forget that for the sake of this discussion).
dio.trim.contigs.good.fasta: 3,464,354
The sum of these two is 8,131,113, the correct number of total sequences after make.contigs. I expected then 3,464,354 sequences in the dio.contigs.good.groups file, but instead I got 3,465,917 with v. 1.39.0 or 3,464,094 in v. 1.41.1. In any case, I can’t go further with the SOP since in the unique.seqs command it always tells me that the fasta and groups files differ.
Any idea of what’s going on here?