Unique.seqs is removing a lot of sequences

ds1405 · February 2, 2021, 9:57am

I this normal? It seems to be removing a lot of my sequences…

summary.seqs(fasta=stability.trim.contigs.trim.good.fasta, processors=5)

	    	Start	End	NBases	Ambigs	Polymer	NumSeqs
    Minimum:		1	150	150	0	3	1
    2.5%-tile:	1	177	177	0	3	447237
    25%-tile:	1	177	177	0	4	4472366
    Median: 		1	178	178	0	5	8944732
    75%-tile:	1	197	197	0	5	13417097
    97.5%-tile:	1	202	202	0	6	17442226
    Maximum:		1	220	220	0	52	17889462
    Mean:		1	186	186	0	4
    # of Seqs:	17889462

It took 320 secs to summarize 17889462 sequences.

unique.seqs(fasta=stability.trim.contigs.trim.good.fasta, processors=6)


stability.trim.contigs.trim.good.names
stability.trim.contigs.trim.good.unique.fasta


summary.seqs(fasta=stability.trim.contigs.trim.good.unique.fasta, processors=1)

Using 5 processors.

    		Start	End	NBases	Ambigs	Polymer	NumSeqs
    Minimum:		1	150	150	0	3	1
    2.5%-tile:	1	177	177	0	3	252900
    25%-tile:	1	177	177	0	4	2528993
    Median: 		1	178	178	0	5	5057986
    75%-tile:	1	197	197	0	5	7586979
    97.5%-tile:	1	202	202	0	6	9863072
    Maximum:		1	220	220	0	52	10115971
    Mean:		1	184	184	0	4
    # of Seqs:	10115971

It took 48 secs to summarize 10115971 sequences.

I am now trying to run count.seqs(name=stability.trim.contigs.trim.good.names, group=stability.contigs.good.groups) and it is doing nothing (has been for the last 10 minutes).

Is this normal?

Edit: formatting

pschloss · February 2, 2021, 2:36pm

If you run summary.seqs like this…

summary.seqs(fasta=stability.trim.contigs.trim.good.unique.fasta, name=stability.trim.contigs.trim.good.names, processors=1)

You should get a line at the bottom that says something like…

Total: 17889462

You have a lot of unique sequences and I suspect that is what is causing count.seqs to take so long to run. Can you also let us know what region you are sequencing and with what chemistry? That looks like a short region.

Pat

ds1405 · February 3, 2021, 5:13pm

Hi Pat

mothur > summary.seqs(fasta=stability.trim.contigs.trim.good.unique.fasta, name=stability.trim.contigs.trim.good.names, processors=1)

Using 1 processors.

Start End NBases Ambigs Polymer NumSeqs

Minimum: 1 150 150 0 3 1

2.5%-tile: 1 177 177 0 3 447237

25%-tile: 1 177 177 0 4 4472366

Median: 1 178 178 0 5 8944732

75%-tile: 1 197 197 0 5 13417097

97.5%-tile: 1 202 202 0 6 17442226

Maximum: 1 220 220 0 52 17889462

Mean: 1 186 186 0 4

# of unique seqs: 10115971

total # of seqs: 17889462

It took 247 secs to summarize 17889462 sequences.

Output File Names:

stability.trim.contigs.trim.good.unique.summary

So I guess that’s okay, right? I am looking at the V3 region which is about 180bp using Illumina MiSeq (is that the right sort of answer? I’m very new to this…)

leocadio · February 3, 2021, 8:05pm

Looks a bit noisy to me (more than half uniques)… Which parameters you used in the make.contigs?

pschloss · February 4, 2021, 9:55pm

Did you do 2x250 or 2x150 reads? If you’re doing 2x250 then there can be a problem with sequencing too much generating a higher error rate. If you’re doing 2x150 then you aren’t going to have fully overlapping reads, which could also be causing problems.

Pat

system · February 14, 2021, 9:55pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
align.seqs and no of bases Commands in mothur	5	2935	January 16, 2015
Screen.seqs getting rid of most sequences Commands in mothur	3	541	March 1, 2021
High no. of unique sequence_problem Commands in mothur	1	2039	September 15, 2014
seqences not in the same lenth mothur bugs	4	5080	October 28, 2014
screen.seqs: removal of high percent of sequences mothur bugs	3	1743	May 4, 2016

Unique.seqs is removing a lot of sequences

Related topics