Trim.seqs: statistics on the reasons why it failed

Is there a simple way to have an overview of the reasons why the sequences failed (scrap file)?
How many “b”, “f”, etc…
I have a large fraction of failure and have difficulties to understand why.
In advance, thanks for your help

http://www.mothur.org/wiki/Trim.seqs tells what the various codes mean. If you are running this on a Mac/Linux box you do the following to get the count:

grep “>” my_file.scrap.fasta | cut -f 2 -d “|” | sort | uniq -c


For flow files from trim.flows you can do...

cut -f 1 -d " " my_file.scrap.flow | cut -f 2 -d “|” | sort | uniq -c

Pat

Thanks!
It works really well for fasta file. For flow files, the output looks good too, except that we have a “720” category. Maybe normal. See below.
1 720
24 b
29319 bf
136854 f
70664 l
33 lb
21241 lbf
67280 lf

The 720 is the first line of the file - you are using 720 flows. We strongly encourage people to use minflows/maxflows=450…

You are losing ~30k because hte barcode is bad and 137k because the primer is bad. I’d check your oligos file to make sure you the right sequences. Did you happen to sequence in both directions? If so, you’ll need to run trim.flows twice and use different forward primers and then analyze the data sets separately.

Pat

Actually, we amplify and sequence two different regions on the 16S, V1-V3 and V4-V6. This is the reason why we have so many f. I do analyze the regions separately, indeed. Thanks for the explanation and your advice.

Hi,

I was wondering: would the command below also work for the trim.seqs output using quality scores?

grep “>” my_file.scrap.fasta | cut -f 2 -d “|” | sort | uniq -c

This is an excerpt of what I get when I run this command:

1 l xy=99_3458
1 l xy=99_3559
1 l xy=99_3777
1 l xy=99_614
1 l xy=99_811
1 l xy=99_841
1 l xy=99_905
1 l xy=99_919
1 n xy=171_1001
1 n xy=340_2877
1 n xy=478_1667
1 n xy=819_627

I have had success running this command for the output of the “flowgram route” of the 454 SOP, but for some reason it doesn’t work if trim.seqs is used in the “quality score strategy” of the 454 SOP.

Can you post the output of this command?

head my_file.scrap.fasta

Hi,

Thank you for the quick reply!

This is an excerpt of what I see after running the command you suggested:

ITZIMBT01BTFGB|bf xy=628_489
CTAGCGAACATCCCGGATTAGATACCCTGGTAGTCCATGCCGTAAACGGTGGGCGCTAGGTGTGGGGTCCTTCCACGGATTCCGTGCCGTAGCTAACGCATTAAGCGCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGCCCGTCACAAGCGGCGGAGCATGTGGATTAATTCGATGCAACGCGAAGAACCTTACCTAGGCTTGACATATACAGGACGACGGCAGAGATGTCGTTTCCCTTGTGGCTTGTATACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCGCAACGAGCGCAACCCCTGTCTCATGTTGCCAGCACGTAATGGTGGGGACTCGTGAGAGACTGCCGGGGTCAACTCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGTCTAGGGCTTCACACATGCTACAATGGCTAGTACAGAGGGCTGCGAGACCGTGAGGTGGAGCGAATCCCTTAAAGCTGGTCTCAGTTCGGATTGGGGTCTGCAACTCGACCCCATGAAGTCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGAATACGT
ITZIMBT01BEK47|bf xy=459_249
CTAGCGAACATCCGGATTAGATACCCGGGTAGTCCACACCGTAAACGATGAACACTAGGTGTTAGGAGGTTTCCGCCTCTTAGTGCCGAAGCTAACGCATTAAGTGTTCCGCCTGGGGAGTACGACCGCAAGGTTGAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGGTCTTGACATCCTTTGAAGCTTTTAGAGATAGAAGTGTTCTCTTCGGAGACAAAGTGACAGGTGGTGCATGGTCGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATTGTTAGTTGCCAGCATTCAGATGGGCACTCTAGCGAGACTGCCGGTGACAAACCGGAGGAAGGCGGGGACGACGTCAGATCAGTCATGCCCCTTATGACCTGGGCTACACACGTGCTACAATGGCGTATACAACGAGTTGCCAACCCGCGAGGGTGAGCTAATCTCTTAAAGTACGTCTCAGTTCGGATTGTAGTCTGCAACTCGACTACATGAAGTCGGAATCGCTAGTAATCGCGGATCAGCACGCCGCGGTGAATACGTTCCCGGGTCTTGTACACACC
ITZIMBT01A3SX8|bf xy=336_958
CTAGCGAACATCCCGGATTAGATACCCTGTAGTCCATGCCGTAAACGTTGGGCACTAGGTGTGGGGAGCATTCCACGTTTTCCGCGCCGTAGCTAACGCATTAAGTGCCCCGCCTGGGGAGTAGCGGCCGCAAGGCTAAAACTCAAAAGGAAGTTGACGGGGGCCCGCACAAGCGGCGGAGCATGCTGATTAATTCGATGCAACGCGAAGAACCTTACCAAGGCTTGACATGCACTGGACGGCTGCAGAGATGTGGCTTTCTTTGGACTGGTGCACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTCGTTCTATGTTGCCAGCACGTAATGGTGGGGACTCATAGGAGACTGCCGGGGTCAACTCGGAGGAAGGTGGGGACGACGTCAAATCATCATGCCCCTTATGTCTTGGGCTTCAAGCATGCTACAATGGTCGGTACAATGGGTTCGAAACTGTGAGGTGGAGCGAATCCCAAAAGCCGGCCTCAGTTCGGATTGGGGTCTGCAACTCGACCCCATGAAGTCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGAATACGT

Thank you in advance!

Hi again

I decided to go ahead and learn what the elements of this command meant and I think I figured out a solution! :slight_smile:

this seems to work: grep “>” file.scrap.fasta | cut -f 2 -d “|” | cut -f 1 -d " " | sort | uniq -c

Thank you and also thank you for Mothur, it is an amazing tool!

Pedro

It looks like the barcode (b) and forward (f) primer are incorrect - you might double check that you have the right sequences in your oligos file