Help with contigs

Dear all,
i’ve performed a sequencing of V3-V4 region using a v2 kit Illumina 150x2 (paired-end) which the original amplicon size is around 630 bp. Theoretically, I could not make contigs with overlapping region, right? since the size of reads is much smaller than the original amplicon.
How can I proceed here in Mothur after applying “make.file” indicating R1 and R2 from each sample?

Initially, I’ve performed the make.contigs which resulted the following output:

	     Start	    End        NBases       Ambigs     Polymer       NumSeqs

Minimum: 1 35 35 0 2 1
2.5%-tile: 1 157 157 0 4 199596
25%-tile: 1 249 249 0 4 1995954
Median: 1 288 288 2 5 3991907
75%-tile: 1 296 296 7 6 5987860
97.5%-tile: 1 300 300 28 7 7784217
Maximum: 1 302 302 55 150 7983812
Mean: 1 264 264 5 4

of unique seqs: 7983812

total # of seqs: 7983812

Can I still use this result and go on with the analysis?
Thank you so much in advance.

I’d suggest only using the first read and running it through the phylotype-based pipeline. With no overlap between the reads, things like alignment won’t make sense. Can you regenerate the data using the V4 region with 2x250 nt reads?


Hi Pat,
thank you for your reply.
Where can I find this phylotype-based pipeline to have a look?

Unfortunately, due a budget lacking we can not obtain new data for v4 region using other sequencing kit. Because of this I’m quite worried for using this data. That is the only information we have.
Thanks once again

You would need to adapt the phylotype-based approach found in the MiSeq SOP…


In the MiSeq SOP I could find the following sentence
## Phylotype-based analysis
Phylotype-based analysis is the same as OTU-based analysis, but at a different taxonomic scale. We will leave you on your own to replicate the OTU-based analyses described above with the phylotype data

So it seems we could use the same steps as described previously for OTU analysis but at a different taxonomic scale. Sorry my lack of knowledge, but how to solve it? Then I can’t perform the overlapping of reads using “make.contigs” and follow with R1 reads to “screen.seqs” removing homopolymers, ambiguous reads, etc?

Thanks once again

I’d suggest taking R1 and doing something like using screen.seqs/chop.seqs to trim the sequences to a common length (perhaps 200 nt) in place of make.contigs and then running them through the rest of the pipeline.

Thank you very much Pat!!

Hi Pat,
me again.

in this case, considering only R1 from a 2x150 pb sequencing of V3-V4 region, would you recommend classify.seqs by using OTU (97%) or ASV, and why?
thank you so much

I would recommend classifying your sequences using classify.seqs and then pooling things with the same family or genus. The data will be too low quality to trust them as 97% OTUs or ASVs.

Hey Pat,
thanks for that.
then, I will use only the file “” as output of classify.seqs, and not proceeding to the next steps from SOP? The different samples will get different sequences number. Can I make subsample of them for comparisons?

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.