unique.seqs command

johnpenders1978 · December 8, 2009, 12:22pm

Hi,

I want to analyze my data of a Titanium 454 run containing 16S V1-V3 region amplicons with an average length of ~600 bp. As the amplicon length is longer than the maximum read length, the data set contains sequences that have varying read lengths.
As a result when using the unique.seqs command I only get very few identical sequences (many sequences that are nearly identical but slightly differ in read length). Consequently, I end up with a unique.fasta-file that still contains 200,000-300,000 sequences. Clustering then becomes a problem, mothur is running for days and without finishing the distance matrix already has reached 30 Gb.

My question is whether it is possible to collapse sequences that are identical over their full length to a longer sequence. In other words I would like to consider sequences that are identical but differ slightly in length as identical and include only the longest of such sequences in my unique.fasta-file?

Is this currently possible with mothur or can it be made possible? Or do you have any other suggestions to overcome this problem?

Thanks,
John

pschloss · December 8, 2009, 2:51pm

John,

So the solution is for you to run unique.seqs twice. The first as you describe. Then align the sequences and filter them. Then run unique.seqs, dist.seqs, cluster, etc. It’s a bad idea to compare sequences that don’t completely overlap as the 16S rRNA gene does not vary uniformly across the gene.

Pat

jamesafoster · December 8, 2009, 5:25pm

I second what Pat says.

Also, be sure to run summary.seqs() and look for long homopolymers and N’s, then screen.seqs() to get rid of them. These problems were really messing up my downstream analysis of V1 data.

arifch · February 11, 2013, 7:38am

Hi,
I want to use unique.seqs command to create unique sequences that only take into account mismatch bases and disregarding indels/gaps. I am not sure if there is a way to do it.
for example:
seq 1 CTCGGGATTTCCTGGGAGCA
seq 2 CTCGGGATT-CCTGAGAGCA
seq3 CTCGGGATTTCCTGAGAGCA

Currently unique.seqs command will separate these 3 sequences as 3 unique sequences. However, I want to collasp seq2 and seq3 to be 1 unique seqs (disregard the gap/indel) and seperate seq1 as another unique seq (due to the mismatch of base G in bold).

Is there a way to do this?

Thank you,
Arif

pschloss · February 11, 2013, 1:19pm

Nope.

Topic		Replies	Views
help with unique.seqs Commands in mothur	6	5008	February 28, 2014
unique.seqs for identical but varying length reads Commands in mothur	2	2261	July 31, 2013
An explosion of unique seqs Commands in mothur	6	4015	August 3, 2015
Empty folder after the unique.seqs command Commands in mothur	2	662	October 13, 2019
Error unique.seqs Feature requests	10	10846	January 4, 2016

unique.seqs command

Related topics