trim.flows and shhh.flows on Junior+ data

JamesoK · January 19, 2015, 3:49pm

Hi,

I have a 16S amplicon dataset of V1-V3 regions (amplicon size of just over 500 bp) generated using the Junior+ (LA3 analysis pipeline) and have run trim.flows and shhh.flows but encountered a problem:

There were 46596 sequences in the fasta file extracted using sffinfo with a median length of 467 bases. There were 1670 flows indicated in the flow file. I ran the following for trim.flows:

trim.flows(flow=rund.flow, minflows=900, maxflows=900, order=B, oligos=rund.oligos.txt, bdiffs=1, pdiffs=2, processors=2)

This appended 712 files from process.

I then ran:

shhh.flows(file=rund.flow.files, order=B, lookup=lookup.txt, processors=12)

However, the resulting shhh.fasta file only contained 15341 non-unique sequences.

Do you know why I am losing so many sequences? Should I adjust the number of flows?

pschloss · January 21, 2015, 2:22pm

You should have a shhh.names file that indicates the names of the duplicate sequences. If you run…

summary.seqs(fasta=shhh.fasta, name=shhh.names)

What is the total number of sequences that is outputted?
Pat

Topic		Replies	Views
shhh.flows output is very small! What's wrong? Commands in mothur	5	3318	April 3, 2014
Can't open LookUp_Titanium.pat. Commands in mothur	9	6947	August 16, 2013
different results by shhh.flows command Commands in mothur	6	3907	December 2, 2013
shhh.flows reduce the length of the reads Commands in mothur	0	3431	June 28, 2012
sequences removed during shhh.flows Commands in mothur	2	2365	October 5, 2012

trim.flows and shhh.flows on Junior+ data

Related topics