mothur

Unique.seqs is removing a lot of sequences

I this normal? It seems to be removing a lot of my sequences…

summary.seqs(fasta=stability.trim.contigs.trim.good.fasta, processors=5)

	    	Start	End	NBases	Ambigs	Polymer	NumSeqs
    Minimum:		1	150	150	0	3	1
    2.5%-tile:	1	177	177	0	3	447237
    25%-tile:	1	177	177	0	4	4472366
    Median: 		1	178	178	0	5	8944732
    75%-tile:	1	197	197	0	5	13417097
    97.5%-tile:	1	202	202	0	6	17442226
    Maximum:		1	220	220	0	52	17889462
    Mean:		1	186	186	0	4
    # of Seqs:	17889462

It took 320 secs to summarize 17889462 sequences.

unique.seqs(fasta=stability.trim.contigs.trim.good.fasta, processors=6)


stability.trim.contigs.trim.good.names
stability.trim.contigs.trim.good.unique.fasta


summary.seqs(fasta=stability.trim.contigs.trim.good.unique.fasta, processors=1)

Using 5 processors.

    		Start	End	NBases	Ambigs	Polymer	NumSeqs
    Minimum:		1	150	150	0	3	1
    2.5%-tile:	1	177	177	0	3	252900
    25%-tile:	1	177	177	0	4	2528993
    Median: 		1	178	178	0	5	5057986
    75%-tile:	1	197	197	0	5	7586979
    97.5%-tile:	1	202	202	0	6	9863072
    Maximum:		1	220	220	0	52	10115971
    Mean:		1	184	184	0	4
    # of Seqs:	10115971

It took 48 secs to summarize 10115971 sequences.

I am now trying to run count.seqs(name=stability.trim.contigs.trim.good.names, group=stability.contigs.good.groups) and it is doing nothing (has been for the last 10 minutes).

Is this normal?

Edit: formatting

If you run summary.seqs like this…

summary.seqs(fasta=stability.trim.contigs.trim.good.unique.fasta, name=stability.trim.contigs.trim.good.names, processors=1)

You should get a line at the bottom that says something like…

Total: 17889462

You have a lot of unique sequences and I suspect that is what is causing count.seqs to take so long to run. Can you also let us know what region you are sequencing and with what chemistry? That looks like a short region.

Pat

Hi Pat

mothur > summary.seqs(fasta=stability.trim.contigs.trim.good.unique.fasta, name=stability.trim.contigs.trim.good.names, processors=1)

Using 1 processors.

Start End NBases Ambigs Polymer NumSeqs

Minimum: 1 150 150 0 3 1

2.5%-tile: 1 177 177 0 3 447237

25%-tile: 1 177 177 0 4 4472366

Median: 1 178 178 0 5 8944732

75%-tile: 1 197 197 0 5 13417097

97.5%-tile: 1 202 202 0 6 17442226

Maximum: 1 220 220 0 52 17889462

Mean: 1 186 186 0 4

# of unique seqs: 10115971

total # of seqs: 17889462

It took 247 secs to summarize 17889462 sequences.

Output File Names:

stability.trim.contigs.trim.good.unique.summary

So I guess that’s okay, right? I am looking at the V3 region which is about 180bp using Illumina MiSeq (is that the right sort of answer? I’m very new to this…)

Looks a bit noisy to me (more than half uniques)… Which parameters you used in the make.contigs?

Did you do 2x250 or 2x150 reads? If you’re doing 2x250 then there can be a problem with sequencing too much generating a higher error rate. If you’re doing 2x150 then you aren’t going to have fully overlapping reads, which could also be causing problems.

Pat

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.