Hello,
I’ve trimmed and denoised my 454 16S data sets quite some time ago and recorded the no. of sequences that passed out of the total sequences. A reviewer however suggests adding the metrics for the no. of sequences that were discarded based on length, bp differences in primer, barcode etc. Is this information available from any of the resulting files or can it be generated?
Thank you, Sandra
Oy, what a pain. If you look at the scrap.flow/scrap.fasta files you will see each sequence name is followed by several letters. You can count the number of sequences that have each code. Here are some of the codes to get you going…
l = rejected because of length
b = rejected because of the barcode
f = rejected because of the forward primer
Also note that some of these will occur together. For example, if trim.flows/seqs can’t find the barcode it also can’t find the primer. Also, if a sequence is only 20 bases it likely can’t find the forward primer. Also, if you only have 10 samples on a plate with 96 samples, then you will appear to reject a lot of things because of the barcode.
Seems like a dumb thing for a reviewer to ask for…
Pat
he, I agree. Will see if I can find away around it cause that sounds like quite a hassle.
Thanks though!