Metrics for shhh.flows and trim.flows

SandraBA · July 12, 2012, 11:12am

Hello,
I’ve trimmed and denoised my 454 16S data sets quite some time ago and recorded the no. of sequences that passed out of the total sequences. A reviewer however suggests adding the metrics for the no. of sequences that were discarded based on length, bp differences in primer, barcode etc. Is this information available from any of the resulting files or can it be generated?
Thank you, Sandra

pschloss · July 12, 2012, 12:05pm

Oy, what a pain. If you look at the scrap.flow/scrap.fasta files you will see each sequence name is followed by several letters. You can count the number of sequences that have each code. Here are some of the codes to get you going…

l = rejected because of length
b = rejected because of the barcode
f = rejected because of the forward primer

Also note that some of these will occur together. For example, if trim.flows/seqs can’t find the barcode it also can’t find the primer. Also, if a sequence is only 20 bases it likely can’t find the forward primer. Also, if you only have 10 samples on a plate with 96 samples, then you will appear to reject a lot of things because of the barcode.

Seems like a dumb thing for a reviewer to ask for…
Pat

SandraBA · July 12, 2012, 12:39pm

he, I agree. Will see if I can find away around it cause that sounds like quite a hassle.
Thanks though!

Topic		Replies	Views
Flows algorithm Theory behind mothur	3	4318	July 25, 2014
trim.flows scrapping based on length mothur bugs	2	2748	November 22, 2013
trim.flows in version 1.32 producing empty flow.files mothur bugs	4	5591	October 8, 2013
Trim.seqs: statistics on the reasons why it failed Commands in mothur	9	6655	July 1, 2014
Retaining non-barcoded sequences with trim.flows Commands in mothur	7	3377	December 10, 2013

Metrics for shhh.flows and trim.flows

Related topics