Hello,
I am looking at the V4 region of 16S. I am a little confused about the generation of the alignment file from pcr.seqs. I have looked at the MiSeq SOP, I can see the start=11894 which I understand, but why is the end=25319? Is this for the entire 16S? What is the original reference for alignment? What version is the silva.bacteria.fasta on the MiSeq SOP tutorial? 102 or 119? I wonder if it is ok to use that file with the same parameters of start and end for looking at the V4 region? I have had the sequencing carried out at source bioscience, so I have the primers but not the primer positions.
kind regards
Catherine
I am looking at the V4 region of 16S. I am a little confused about the generation of the alignment file from pcr.seqs. I have looked at the MiSeq SOP, I can see the start=11894 which I understand, but why is the end=25319? Is this for the entire 16S? What is the original reference for alignment?
The first thing to keep in mind is that the SILVA reference alignment is 50,000 columns long. The 16S rRNA gene is only 1,500 nt long. That means there are a lot of columns in the alignment that only contain spaces. So, yes, the V4 region starts at position 11894 and ends at 25319 in the alignment space
What version is the silva.bacteria.fasta on the MiSeq SOP tutorial? 102 or 119?
The one we link to on the wiki page is 102. But as we mention, if you are doing this for “real” you’ll probably want to use 119 so you can say you’re on the cutting edge (although I haven’t really seen it make a difference). We provide the link to the updated SILVA and RDP files there as well. The alignment coordinates will not change between 102 and 119. The only thing that changed was the number and identity of the sequences in the database.
Thank-you thats very helpful. I have carried out the align.seqs against the Silva.SEED.v119. The start and end alignment came back was 13862 and 23444. So I then ran the pcr.seqs with those positions, it the summary it shows start position minimum 1 and end 9582. To run another screen.seqs with the .align file, name and group file would the start and end positions be the 1 and 9582 from the pcr.seqs summary output?
Catherine
I’d run summary.seqs on your data and see what the start and end positions are and go from there.