Hi,
Currently I am processing a mock community that was sequenced in triplicate. It consists of 19 different species from 6 phyla. Some are quite closely related, some are more divergent. They were pooled “evenly” based upon 16S copy number as determined with qPCR and are all fully sequenced strains. To verify purity sanger was performed on the genomic extracts.
Now I’m in a bit of a pickle on how to get some error rates here. The seq.error documentation page is rather poorly documented (http://www.mothur.org/wiki/Seq.error) and I would like to know how I should proceed.
What did I try so far:
- Make a fasta file with the partial 16S of each (type) strain from either RDP or EMBL repositories
- Align this fasta with the SILVA v119 reference alignment
- Use pcr.seqs to trim to primer region of my actual mock data
- Run filter.seqs
- Run mock data through MiSeq SOP until point of error comparison
And here starts the pickle. I am supposed I have to set aligned=T ? Does this mean that both the fasta and the reference have to be aligned? I tried it and I get a whole bunch of errors complaining that they are not the same length. This is quite weird as I used pcr.seqs to trim to the correct primer region, should I include an extra screen. seqs after step 3? Is it okay to use the reference sequences from the databases, or would it be a better idea to use the sanger reads from the strains that we’ve put in (which are, theoretically, the same).
Comments & suggestions are warmly welcome
Kind regards,
FM