sff.multiple outputs some weird files..

Hello,

As input for sff.multiple()

sfffiles.txt:

/../sff/file1.sff oligos.txt
/../sff/file2.sff oligos.txt
/../sff/file3.sff oligos.txt
/../sff/file4.sff oligos.txt
/../sff/file5.sff oligos.txt
/../sff/file6.sff oligos.txt
/../sff/file7.sff oligos.txt
/../sff/file8.sff oligos.txt
/../sff/file9.sff oligos.txt
/../sff/file10.sff oligos.txt
/../sff/file11.sff oligos.txt
/../sff/file12.sff oligos.txt

oligos.txt:

forward ACGGGGCGCAGCAGGCGCGA
barcode TAGCGC file1
barcode TACATA file2
barcode TAGCTA file3
barcode TAGCTA file4
barcode TACATA file5
barcode TAGCGC file6
barcode ACTCAG file7
barcode TACTCG file8
barcode TAGCTA file9
barcode TAGCTA file10
barcode   TACATA file11
barcode ATAGTA file12

Among other files in the output dir I see:

file1.file6.flow
file1.file6.shhh.counts
file1.file6.shhh.fasta
file1.file6.shhh.groups
file1.file6.shhh.names
file1.file6.shhh.qual

…and supposedly the same for all the files that happen to have the same specified barcode in oligos.txt

So, mothur is combining reads from different sff files because the barcode happens to be the same? Is this intentional? Why? Is the step computationally heavy? Can this behavior be disabled? I’m only interested in the *.shhh.trim.fasta files that result from each sff file.

And a related question, am I correct in assuming that the *.shhh.trim.fasta files include all the sequences that passed QC (i.e. no dereplication/grouping or something like that)?

You need to have separate oligos.txt files for each file (eg. oligos1.txt, oligos2.txt, etc). Many of your files have the same barcdoes (e.g. 2/5, 9/10). This likely causes the problems you are seeing.

Thank you for the reply. This is not necessarily a problem, as long as all the .shhh.trim.fasta files are derived from single sff files, as at least their naming suggests. I’d really appreciate it if you could also answer my second question about the .shhh.trim.fasta files including all the sequences that passed QC.

Thanks again

Mothur should have warned you about duplicate barcodes in the oligos file. Unless you create separate oligos files for each sff, mothur store the oligos file like:

forward ACGGGGCGCAGCAGGCGCGA
barcode TAGCGC file6
barcode ACTCAG file7
barcode TACTCG file8
barcode TAGCTA file10
barcode TACATA file11
barcode ATAGTA file12

if you want each sff file to only use the barcode assigned to it’s file and to know which sff file the sequences with duplicate barcodes came from, then you need separate oligos files:
oligos1.txt
forward ACGGGGCGCAGCAGGCGCGA
barcode TAGCGC file1

oligos2.txt
forward ACGGGGCGCAGCAGGCGCGA
barcode TACATA file2

or if you want the sequences with the same barcodes assigned to the same group, you could do something like:

forward ACGGGGCGCAGCAGGCGCGA
barcode TAGCGC 1_6
barcode ACTCAG 7
barcode TACTCG 8
barcode TAGCTA 3_4_9_10
barcode TACATA 2_5_11
barcode ATAGTA 12

The *.shhh.trim.fasta files have gone through the denoising step, and if you provided the oligos file on the trim.seqs command the barcodes and primers have been removed.