Silva DBs

Hello!

Before I start exposing my issues, you may have to know I’m on Mothur 1.33.3 and just saw today that you’d released v.1.34 (I don’t know if it’ll make a difference but just in case). I’m following the MiSeq SOP but only using F primer (803F).

I’m stuck for a trivial problem which many people have already wrote to you about (primer position) BUT since there was quite a lot of information out there to test a few things I went along to try and get myself out of trouble before asking for your help. Now, I guess I just wonder which DB to trust, really, and how to choose from the various different results I got. So here goes: my primers are 803F and 1392R. After reading the forum I got a 16S sequence from E. coli (NCBI Reference Sequence: NC_000913.3), cut out the first 803 sequences and only kept everything between bp 804 and bp 1392. I aligned this using the SINA aligner on the Silva website and got the following results:

  1. Start position is 26387
  2. Stop position is 41790
  3. bp is 552bp

Then, just because I was having tons of fun, and also in order to check that the start and stop positions were correct I did the same thing pretending I was using the 27f primer, by just taking the same 16S gene and getting rid of the first 27bp. Results were:

  1. Start position is 1162
  2. Stop position is 43282

Also, I’ve tried aligning on Mothur using the following command:
align.seqs(fasta=trial803_1392.fasta, reference=silva.seed_v119.align)
as well as the following command:
align.seqs(fasta=trial803_1392.fasta, reference=silva.bacteria.fasta)

and I get this:

summary.seqs(fasta=silva.seed__v119.8mer)

We found more than 25% in sequence 1.33.3 to be ambiguous. Mothur is not to set up to process protein sequences

Start End Nbases Ambigs Polymer NumSeqs
Minimum 1 90824876 90824876 90824876 90824876 90824876
2.5%-tile 1 90824876 90824876 90824876 90824876 90824876
25%-tile 1 90824876 90824876 90824876 90824876 90824876
Median 1 90824876 90824876 90824876 90824876 90824876
75%-tile 1 90824876 90824876 90824876 90824876 90824876
97.5%-tile 1 90824876 90824876 90824876 90824876 90824876
Maximum 1 90824876 90824876 90824876 90824876 90824876
Mean 1 90824876 90824876 90824876 90824876 90824876

of seqs 90824876

So, clearly, I’m missing something…

Here are my specific questions:

  1. why does Silva tell me that my section is 552bp when it actually is 589?
  2. why does Silva tell me that the 27f primer starts at position 1162, whereas I’ve read on this forum that is is 1044? Is it because there is a slight difference between the silva.seed_v119.align, the silva.bacteria.fasta and the silva that is used by the SINA aligner on the Silva website and I should therefore not worry too much about it (or should I)?
  3. why does my command to align a 16S fragment with the silva.bacteria.fasta or the silva.seed_v119.fasta, using Mothur, not work?

Sorry, I don’t think I’m too far from getting it but not quite there yet.

Many thanks in advance for your help!

Actually, I got told the answer to my question, in case anyone’s interested out there: it’s better to take a few sequences to find the primer positions (so for example take the first 1000 sequences from your fasta). Then you align this using align.seqs to the newest Silva given on Mothur and it’ll give you the positions. Using only one sequence will probably give a slightly biased result. That’s what I was told anyway.
Good day, everyone!