No.of unique sequence decreased to 15% after chimera.uch

Kumari_Richa · October 24, 2014, 4:33pm

Hello ,

I used greengenes reference taxonomy and alignment file. After doing Chimera.uchime and remove.seqs, I lost 85% of my unique sequences. Why ?
Another question, why screen.seq did not work ? Below is the summary written.

mothur > align.seqs(fasta=stability.trim.contigs.trim.good.unique.fasta, reference=gg.refalign, flip=T)

mothur > summary.seqs(fasta=current)

Start End NBases Ambigs Polymer NumSeqs Minimum: 5 2263 370 0 3 1 2.5%-tile: 9 2266 370 0 4 312 25%-tile: 9 2266 371 0 4 3119 Median: 9 2266 372 0 5 6237 75%-tile: 9 2266 373 0 5 9355 97.5%-tile: 9 2266 376 0 6 12161 Maximum: 13 2293 376 0 8 12472 Mean: 9.01379 2266.03 372.169 0 4.71208 # of Seqs: 12472
mothur > screen.seqs(fasta=stability.trim.contigs.trim.good.unique.align, count=stability.trim.contigs.trim.good.count_table, summary=stability.trim.contigs.trim.good.unique.summary, start=9, end=2266, maxhomop=8)

It took 1 secs to screen 12472 sequences.

mothur > summary.seqs(fasta=current, count=current)

Using 8 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 5 2266 370 0 3 1
2.5%-tile: 9 2266 370 0 4 3339
25%-tile: 9 2266 372 0 4 33387
Median: 9 2266 372 0 5 66774
75%-tile: 9 2266 373 0 5 100161
97.5%-tile: 9 2266 376 0 6 130209
Maximum: 9 2293 376 0 8 133547
Mean: 8.99997 2266 372.232 0 4.62525

of unique seqs: 12400

total # of seqs: 133547

Looking forward for suggestion.

pschloss · October 24, 2014, 7:36pm

I used greengenes reference taxonomy and alignment file. After doing Chimera.uchime and remove.seqs, I lost 85% of my unique sequences. Why ?

Because a lot of your unique reads were chimeras. What percentage of your total reads were discarded?

Another question, why screen.seq did not work ? Below is the summary written.

Looks like it worked, what am I missing?

Kumari_Richa · October 27, 2014, 10:19am

Hi Dr. Schloss,

I realised from your question (what is the percentage of my total reads discarded) that I did not loose most of the total sequences. I still have 98% of total sequences. This means only 2 % of the total sequence makes 85% of rubbish unique sequences :shock: …

2) I wanted to exclude sequnce below 5 (start) and above 2266 (end). but I can still see them in summary.seq. However I can see reduction in no. of sequences.
Thank you

pschloss · October 28, 2014, 4:00pm

the start option removes sequences that start after the start position and the end option removes those that end before the end position.

Topic		Replies	Views
Mothur removing group after screen.seqs command? Theory behind mothur	4	1717	November 2, 2017
Chimera.vsearch not removing chimeras - screen.seqs to blame? mothur bugs	8	44	October 12, 2024
error with filter.seqs or chimera.uchime??? mothur bugs	3	1453	March 13, 2017
What is the acceptable limit of sequence removal?	3	328	February 7, 2023
align.seqs issue/ groups removal mothur bugs	5	2819	March 19, 2015

No.of unique sequence decreased to 15% after chimera.uch

of unique seqs: 12400

Related topics