Hi,
I have a 16S amplicon dataset of V1-V3 regions (amplicon size of just over 500 bp) generated using the Junior+ (LA3 analysis pipeline) and have run trim.flows and shhh.flows but encountered a problem:
There were 46596 sequences in the fasta file extracted using sffinfo with a median length of 467 bases. There were 1670 flows indicated in the flow file. I ran the following for trim.flows:
trim.flows(flow=rund.flow, minflows=900, maxflows=900, order=B, oligos=rund.oligos.txt, bdiffs=1, pdiffs=2, processors=2)
This appended 712 files from process.
I then ran:
shhh.flows(file=rund.flow.files, order=B, lookup=lookup.txt, processors=12)
However, the resulting shhh.fasta file only contained 15341 non-unique sequences.
Do you know why I am losing so many sequences? Should I adjust the number of flows?