Analysis Seqs from different region

Hi all,

I’ve got seqs from V3 and V8 region from my collaborator (for 1 sample, I have files from V3 and V8 at the same time). Should I analysis them separately or I could simply combine data from the two regions and analysis them as a whole.


1 Like

Some of my thoughts here: I think I should align V3 and V8 regions separately to guarantee a good alignment quality. My main question is where should I combine my separate analysis on V3 and V8? Could I combine the fasta and count files of V3 and V8 as soon as I finish alignment and filter step? Could command merge.files do the work?


I would not combine… You do not know with V3 reads correspond to V8 reads. Some lineages would be in both, some would be absent in one, biases would be different… Analyze them separately, IMO.

Thanks for your reply! I am analyzing V3 region. However, there are too many unique reads after the whole mothur process. Originally there are 30 million reads with 10 million unique and after mothur there are 2 million unique reads, obviously too many. Do you have any idea how I can further reduce my unique reads. I’ve specified silva to V3 region in alignment procedure.

The V3 region is 195 nt long (Customize your reference alignment for your favorite region). I’m not sure how long the V8 region is, but I think it’s probably also shorter than 250 nt.

If you are using 2x250 nt then you likely have barcodes and primers on the sequences still and it’s possible that you are sequencing beyond the length of the fragment. The error goes way up when your sequence reads are longer than the region. The MiSeq can be run with 2x250 chemistry but where you tell the instrument to only generate 195 nt in each direction.

if you are using a different read length, then you don’t have complete overlap of the two reads and won’t get adequate denoising of the data (Why do I have such a large distance matrix).

Ultimately, I think your problem is a data quality problem. Do you know how the sequencing data were generated?


Do you run a pre.cluster? If you post your commands, we might be able to tell you if that in noisy or not…

Thanks for your help! I think I may figure out my problem, I’m working on 16S V1-V3 region. The reason of so many uniques is probably caused by the high error rate of my data.


This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.