Modifying SOP code with updated reference and taxonomy alignments

Hello,

I have worked through the mothur SOP, and am now starting to process my own data. I am wondering what parts of the SOP code I need to modify. Here are a few things I have questions about:

  1. pcr.seqs(fasta=silva.bacteria.fasta, start=11894, end=25319, keepdots=F, processors=8)
    –>First, what exactly is the silva.bacteria.fasta file? Second, the SOP states that you should use the updated SILVA reference files. Does this mean I should find an updated version of the silva.bacteria.fasta file? If so, where do I find that?

  2. classify.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, reference=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax, cutoff=80)
    –>what exactly are the trainset9_032012.pds.fasta and trainset9_032012.pds.tax files? I am assuming I should also get updated versions of these two files. Where do I find them?

  3. Is there anything else in the SOP that needs to be updated or modified when working with your own data?

Many thanks.

Hi Alyssa,

In the SOP, we write…

You can easily substitute these choices (and should) for the reference and taxonomy alignments using the updated Silva reference files, RDP reference files, and Greengenes-formatted databases. We use the above files because they’re compact and do a pretty good job. The various classification references perform differently with different sample types so your mileage may vary.

Those links are what you’re looking for.

  1. The silva reference files are available at Silva reference files. The silva.bacteria.fasta file we use in the SOP is from release 102

  2. The trainset9_032012 files are version 9 of the RDP