Blast against PATRIC Pathogen Database?

danieln · April 7, 2014, 3:18am

Hi,

I wonder if we could further classify unique 16s sequences to detect bacterial species or OTUs that are potentially pathogenic in human gut (PMID: 21896772)?

I notice that they’ve already demonstrated a similar approach through the HMP (PMID: 22699609) and would like to know whether this is possible through Mothur?

Thank you.

Daniel

pschloss · April 9, 2014, 5:56pm

Absolutely - you would have to come up with the database, but you can certainly do it. I’d encourage you to take a look at our RDP and SILVA reference taxonomies and use that as a basis for yours. If you have questions or problems as you go along making this, let us know.

danieln · April 15, 2014, 4:27am

Hi Pat,

Thanks for your critical piece of advice! I subsequently went on and generated the following summary outputs from your Mothur commands to my best knowledge:

“patric.bacteria.ssu.fasta” (raw extracted small subunit ribosomal sequences of variable length)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 61 61 0 2 1
2.5%-tile: 1 96 96 0 3 688
25%-tile: 1 355 355 0 5 6872
Median: 1 1438 1438 0 5 13744
75%-tile: 1 1528 1528 0 6 20616
97.5%-tile: 1 1564 1564 0 7 26800
Maximum: 1 3038 3038 34 29 27487
Mean: 1 1018.75 1018.75 0.0294321 5.15946

of Seqs: 27487

“patric.bacteria.ssu.align” (screen.seqs using silva.bacteria.fasta)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1044 1815 68 0 3 688
25%-tile: 1044 15966 329 0 5 6872
Median: 1044 43116 1399 0 5 13744
75%-tile: 1053 43116 1463 0 6 20616
97.5%-tile: 40960 43116 1483 0 7 26800
Maximum: 43116 43116 1612 34 29 27487
Mean: 5065.95 32624.9 969.006 0.0274675 5.12144

of Seqs: 27487

“patric.bacteria.ssu.good.align”
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 43116 1375 0 4 1
2.5%-tile: 1044 43116 1407 0 5 283
25%-tile: 1044 43116 1450 0 5 2826
Median: 1044 43116 1464 0 6 5651
75%-tile: 1044 43116 1469 0 6 8476
97.5%-tile: 1044 43116 1486 0 7 11019
Maximum: 1044 43116 1612 0 8 11301
Mean: 1044 43116 1457.76 0 5.57694

of Seqs: 11301

“patric.bacteria.ssu.good.unique.align”
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 43116 1375 0 4 1
2.5%-tile: 1044 43116 1407 0 5 270
25%-tile: 1044 43116 1450 0 5 2699
Median: 1044 43116 1464 0 6 5398
75%-tile: 1044 43116 1468 0 6 8096
97.5%-tile: 1044 43116 1486 0 7 10525
Maximum: 1044 43116 1612 0 8 10794
Mean: 1044 43116 1457.61 0 5.58236

of Seqs: 10794

Finally, I used classify.seqs command (template = silva.bacteria.fasta, taxonomy=silva.bacteria.silva.tax) to generate wang.taxonomy files with probs=F and cutoff=80 parameters for the creation of patric reference taxonomy.

Judging from the sequence statistics above, do you think this approach is sound enough to make a quality database??

Thank you again!

Daniel

pschloss · April 15, 2014, 10:48am

Looks good - If I were you, I would wonder what those sequences were that got chucked for being too short or not aligning well. You might look for full-length versions of those sequences elsewhere and then bring them into the db.

Pat

Topic		Replies	Views
Tweaking databases to include custom sequences Commands in mothur	14	13001	May 28, 2016
How can I classify OTUs to "SPECIES" level with mothur? Theory behind mothur	9	4805	December 12, 2019
silva database Commands in mothur	6	8130	February 5, 2011
NCBI database Commands in mothur	10	2712	November 8, 2018
Unique nseq & a lot of "Bacteria; unlcassified" Commands in mothur	1	2418	March 30, 2015

Blast against PATRIC Pathogen Database?

of Seqs: 27487

of Seqs: 27487

of Seqs: 11301

of Seqs: 10794

Related topics