pcr.seq

We have sequenced 16S rRNA amplicons for v3_v4 region using pair end chemistry on Miseq platform. The primers with Illumina adapters used:

PCR1F_460 5’ CTTTCCCTACACGACGCTCTTCCGATCTACGGRAGGCAGCAG
PCR1R_460 5’ GGAGTTCAGACGTGTGCTCTTCCGATCTTACCAGGGTATCTAATCCT

I am not a bioinformatician but I am following the nice steps written in Miseq SOP and could execute the example files smoothly. I have few querries for analyzing my se of seqences.

a) At the step:

pcr.seqs(fasta=silva.bacteria.fasta, start=11894, end=25319, keepdots=F, processors=8)
What should be written for start and end for my sequences? Or how to find out start and end position in the alignment?

b) At the steps:

system(rename silva.bacteria.pcr.fasta silva.v4.fasta)
summary.seqs(fasta=silva.v4.fasta)
align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=silva.v4.fasta)

Is it correct to write v4 for my sequences or I have to modify something else?

c) The sequencing company has also provided me the overlapped data using Flash software. Can you please let me know, how to process the already paired data in mother? From which step I have to follow the SOP, if I want to analyse the paired data. Do I need the make stability files here too or something else?

pcr.seqs(fasta=silva.bacteria.fasta, start=11894, end=25319, keepdots=F, processors=8)
What should be written for start and end for my sequences? Or how to find out start and end position in the alignment?

You should take one of your sequences or E. coli’s 16S rRNA gene and trim it to your primers. Then align the sequence and run summary.seqs. You’ll see the start and end positions there. Those are the numbers to use.

Is it correct to write v4 for my sequences or I have to modify something else?

In the rename step, you could change the v4 to v34 and elsewhere to have things make sense.

c) The sequencing company has also provided me the overlapped data using Flash software. Can you please let me know, how to process the already paired data in mother? From which step I have to follow the SOP, if I want to analyse the paired data. Do I need the make stability files here too or something else?

You really want to get your raw data. When you upload the data to NCBI you’ll need the raw data. Also, the output from make.contigs is better than any other output we’ve seen.

Finally, since you’ve sequenced the V3-V4 region, you are likely to run into many problems down the road. You should consult this blog post…

http://blog.mothur.org/2014/09/11/Why-such-a-large-distance-matrix%3F/

Pat

Thanks a lot for your suggestions.