trim.flows and shhh.flows on Junior+ data


I have a 16S amplicon dataset of V1-V3 regions (amplicon size of just over 500 bp) generated using the Junior+ (LA3 analysis pipeline) and have run trim.flows and shhh.flows but encountered a problem:

There were 46596 sequences in the fasta file extracted using sffinfo with a median length of 467 bases. There were 1670 flows indicated in the flow file. I ran the following for trim.flows:

trim.flows(flow=rund.flow, minflows=900, maxflows=900, order=B, oligos=rund.oligos.txt, bdiffs=1, pdiffs=2, processors=2)

This appended 712 files from process.

I then ran:

shhh.flows(file=rund.flow.files, order=B, lookup=lookup.txt, processors=12)

However, the resulting shhh.fasta file only contained 15341 non-unique sequences.

Do you know why I am losing so many sequences? Should I adjust the number of flows?

You should have a shhh.names file that indicates the names of the duplicate sequences. If you run…

summary.seqs(fasta=shhh.fasta, name=shhh.names)

What is the total number of sequences that is outputted?