Minimum Quality Requirements for analysing Amplicon Sequencing data ( 16S / 18S ) with mothur pipeline

meta_analyst · October 8, 2017, 7:24am

Hi,

I was planning to pursue 16S V4 analysis on amplicon sequencing data using mothur. Basically, when I quality filter based on some simple set of criteria, I tend to obtain very low number of good quality reads per sample.

Conditions used to Quality filter Reads

Minimum length of read: 200
Minimum Q-value of each base position: 15
Minimum Mean Q-value of each read: 20
Maximum Ns allowed per read: 4
QC software used: prinseq/0.20.4

Sample Num_R1_before Num_R2_before Num_R1_after Num_R2_after
test_sample_01 600381 600381 16 16
test_sample_02 493191 493191 10 10
test_sample_03 435412 435412 8 8
test_sample_04 460862 460862 20 20
test_sample_05 567018 567018 4 4
test_sample_06 407389 407389 3 3
test_sample_07 549802 549802 6 6
test_sample_08 403641 403641 2 2
test_sample_09 292051 292051 3 3
test_sample_10 444006 444006 10 10

My Questions:

Is it a common experience to see such low quality sequences from Amplicon sequencing libraries on the Illumina MiSeq platform?
Based on the above observations ( QC outcomes ), can I reject these samples as being of very low quality not suitable for metagenomic analysis using the mothur pipeline.
Any specific suggestion(s) that I should observe while analysing such amplicon sequencing within mothur framework.

Thank you.

pschloss · October 9, 2017, 11:38am

Have you run make.contigs without doing the quality filtering? We go directly from the sequencer to make.contigs. Have you sequenced a mock community?

Pat

meta_analyst · October 11, 2017, 8:36am

Please excuse me for my late reply. Yes, I initially applied make.contigs without any external quality filtering. As I am currently working on a test data, I don’t have a mock community.

Basically, I proceeded the standard way:
make.contigs → summary.seqs → screen.seqs → unique.seqs → count.seqs → align.seqs → summary.seqs

When I saw the last outcome, I thought may be my quality is to blame for the poor results ( shown below ). That was when I started looking at quality aspect of my data.

mothur > summary.seqs(fasta=fulltest.trim.contigs.good.unique.align, count=fulltest.trim.contigs.good.count_table, processors=8)

Using 8 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 2 13421 2 0 2 331814
25%-tile: 2 13421 2 0 2 3318132
Median: 13421 13423 2 0 2 6636263
75%-tile: 13421 13423 290 0 4 9954394
97.5%-tile: 13421 13423 290 0 6 12940712
Maximum: 13423 13423 316 0 8 13272525
Mean: 9446.18 13412.4 87.4228 0 2.81171

of unique seqs: 8336372

total # of seqs: 13272525

Particularly, I am challenged to see that aligned sequences could Start @ 13421 and End @ 13423, yet can have length 290. I am eager to know how this could happen.
I also wish to know if can one still be able to pursue 16S analysis with mothur in the absence of a mock community.

Thank you.

pschloss · October 12, 2017, 12:25pm

You might also want to check out the rename option in make.contigs which converts the names to numbers.

Pat

Topic		Replies	Views
Do I have low quality reads? Theory behind mothur	1	2267	March 21, 2016
16S and 18S Sequence Mix for analysis!? Commands in mothur	3	3037	June 1, 2015
how to select minimum and maximum length of reads from summary seqs Theory behind mothur	6	3248	March 24, 2017
Interpretation of mothur summary.seqs Commands in mothur	6	559	March 31, 2024
make.contigs - define minimum overlap Commands in mothur	3	4314	February 6, 2014

Minimum Quality Requirements for analysing Amplicon Sequencing data ( 16S / 18S ) with mothur pipeline

of unique seqs: 8336372

Related topics