sff.multiple outputs some weird files..

rhinoceros · September 5, 2013, 12:03pm

Hello,

As input for sff.multiple()

sfffiles.txt:

/../sff/file1.sff oligos.txt
/../sff/file2.sff oligos.txt
/../sff/file3.sff oligos.txt
/../sff/file4.sff oligos.txt
/../sff/file5.sff oligos.txt
/../sff/file6.sff oligos.txt
/../sff/file7.sff oligos.txt
/../sff/file8.sff oligos.txt
/../sff/file9.sff oligos.txt
/../sff/file10.sff oligos.txt
/../sff/file11.sff oligos.txt
/../sff/file12.sff oligos.txt

oligos.txt:

forward ACGGGGCGCAGCAGGCGCGA
barcode TAGCGC file1
barcode TACATA file2
barcode TAGCTA file3
barcode TAGCTA file4
barcode TACATA file5
barcode TAGCGC file6
barcode ACTCAG file7
barcode TACTCG file8
barcode TAGCTA file9
barcode TAGCTA file10
barcode   TACATA file11
barcode ATAGTA file12

Among other files in the output dir I see:

file1.file6.flow
file1.file6.shhh.counts
file1.file6.shhh.fasta
file1.file6.shhh.groups
file1.file6.shhh.names
file1.file6.shhh.qual

…and supposedly the same for all the files that happen to have the same specified barcode in oligos.txt

So, mothur is combining reads from different sff files because the barcode happens to be the same? Is this intentional? Why? Is the step computationally heavy? Can this behavior be disabled? I’m only interested in the *.shhh.trim.fasta files that result from each sff file.

And a related question, am I correct in assuming that the *.shhh.trim.fasta files include all the sequences that passed QC (i.e. no dereplication/grouping or something like that)?

pschloss · September 5, 2013, 6:25pm

You need to have separate oligos.txt files for each file (eg. oligos1.txt, oligos2.txt, etc). Many of your files have the same barcdoes (e.g. 2/5, 9/10). This likely causes the problems you are seeing.

rhinoceros · September 6, 2013, 6:39am

Thank you for the reply. This is not necessarily a problem, as long as all the .shhh.trim.fasta files are derived from single sff files, as at least their naming suggests. I’d really appreciate it if you could also answer my second question about the .shhh.trim.fasta files including all the sequences that passed QC.

Thanks again

westcott · September 6, 2013, 5:10pm

Mothur should have warned you about duplicate barcodes in the oligos file. Unless you create separate oligos files for each sff, mothur store the oligos file like:

forward ACGGGGCGCAGCAGGCGCGA
barcode TAGCGC file6
barcode ACTCAG file7
barcode TACTCG file8
barcode TAGCTA file10
barcode TACATA file11
barcode ATAGTA file12

if you want each sff file to only use the barcode assigned to it’s file and to know which sff file the sequences with duplicate barcodes came from, then you need separate oligos files:
oligos1.txt
forward ACGGGGCGCAGCAGGCGCGA
barcode TAGCGC file1

oligos2.txt
forward ACGGGGCGCAGCAGGCGCGA
barcode TACATA file2
…

or if you want the sequences with the same barcodes assigned to the same group, you could do something like:

forward ACGGGGCGCAGCAGGCGCGA
barcode TAGCGC 1_6
barcode ACTCAG 7
barcode TACTCG 8
barcode TAGCTA 3_4_9_10
barcode TACATA 2_5_11
barcode ATAGTA 12

The *.shhh.trim.fasta files have gone through the denoising step, and if you provided the oligos file on the trim.seqs command the barcodes and primers have been removed.

Topic		Replies	Views
Same barcode and sfffiles.txt does not work mothur bugs	2	3909	April 14, 2014
sfffiles.multiple and use of oligo file Commands in mothur	2	2656	December 1, 2012
Using Sff.multiple Commands in mothur	1	1919	May 28, 2013
Sff.multiple - problem with oligos file in the last file mothur bugs	5	3808	April 2, 2013
sff.multiple crash Theory behind mothur	1	3173	January 15, 2014

sff.multiple outputs some weird files..

Related topics