…and supposedly the same for all the files that happen to have the same specified barcode in oligos.txt
So, mothur is combining reads from different sff files because the barcode happens to be the same? Is this intentional? Why? Is the step computationally heavy? Can this behavior be disabled? I’m only interested in the *.shhh.trim.fasta files that result from each sff file.
And a related question, am I correct in assuming that the *.shhh.trim.fasta files include all the sequences that passed QC (i.e. no dereplication/grouping or something like that)?
You need to have separate oligos.txt files for each file (eg. oligos1.txt, oligos2.txt, etc). Many of your files have the same barcdoes (e.g. 2/5, 9/10). This likely causes the problems you are seeing.
Thank you for the reply. This is not necessarily a problem, as long as all the .shhh.trim.fasta files are derived from single sff files, as at least their naming suggests. I’d really appreciate it if you could also answer my second question about the .shhh.trim.fasta files including all the sequences that passed QC.
Mothur should have warned you about duplicate barcodes in the oligos file. Unless you create separate oligos files for each sff, mothur store the oligos file like:
if you want each sff file to only use the barcode assigned to it’s file and to know which sff file the sequences with duplicate barcodes came from, then you need separate oligos files:
oligos1.txt
forward ACGGGGCGCAGCAGGCGCGA
barcode TAGCGC file1
The *.shhh.trim.fasta files have gone through the denoising step, and if you provided the oligos file on the trim.seqs command the barcodes and primers have been removed.