Hi there,
I am following the MiSeq SOP for analysis of my sequences.
For classify.seqs…(sorry i Have several questions at once :roll: )
I am wondering what is in the trainset9_032012.pds.fasta (and trainset9_032012.pds.tax) ? Is the “trainset” just a newer version of the SILVA database ? What does the extension ‘pds’ stands for?
Where does the silva.bacteria.fasta (nogap.bacteria.fasta and silva.bacteria.silva.tax) that was used in the past fit in all this? (why not use that instead as it has ~ 4000 sequences more?)
Why also do we use ‘reference’ and ‘taxonomy’, instead of ‘template’ and ‘taxonomy’? (In other words what is the difference between ‘reference’ and ‘template’?)
mothur > classify.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.count_table, reference=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax, cutoff=80)
mothur > classify.seqs(fasta=abrecovery.fasta, template=nogap.bacteria.fasta, taxonomy=silva.bacteria.silva.tax)
Thanks!!