filtering uniques and chimera problems

lmac · October 22, 2010, 9:16pm

Hi Pat,

Iâ€™ve got two small problems:

I am working with about 34K sequences, that are trimmed of oligos and all range between 250 to 560bp size, no ambiguities, and therefore I jumped into unique option to obtain a non-redundant set of sequences. With this I reduce my dataset to about 32.8K, which is not impressive. Am I skipping something?

Iâ€™ve aligned the file to see how it works, and then hopping to use the chimera slayer, the mothur comes back with an error:

mothur > chimera.slayer(fasta=hgut.align, template=silva.bacteria.fasta)
Checking sequences from hgut.align …
Reading sequences from silva.bacteria.fasta…Done.
Reading sequences from hgut.align…[ERROR]: St9bad_alloc has occurred in the Sequence class function Sequence.

I kind of guess that this might be because of the large alignment and I probably have to obtain a non-redundant dataset…but would appreciate your suggestions.
Thanks.

pschloss · October 25, 2010, 4:16pm

I am working with about 34K sequences, that are trimmed of oligos and all range between 250 to 560bp size, no ambiguities, and therefore I jumped into unique option to obtain a non-redundant set of sequences. With this I reduce my dataset to about 32.8K, which is not impressive. Am I skipping something?

You’ll probably see the biggest reduction after running filter.seqs with trump=. and running pre.cluster

[ERROR]: St9bad_alloc has occurred in the Sequence class function Sequence.

We typically see this when you don’t have enough memory. Do you know how much RAM you have on your computer?

lmac · October 26, 2010, 2:40pm

Thanks for the answer!

When I use trump option with v.1.13.0, I am getting:
mothur > trim.seqs(fasta=hgut.fasta, qfile=hgut.qual, maxambig=0, maxhomop=8, flip=T, processors=2, trump=.)
trump is not a valid parameter.
The valid parameters are: fasta, flip, oligos, maxambig, maxhomop, minlength, maxlength, qfile, qthreshold, qwindowaverage, qstepsize, qwindowsize, qaverage, rollaverage, allfiles, qtrim, tdiffs, pdiffs, bdiffs, processors, outputdir, and inputdir.

It seems that this version does not recognise trump=. command.

Without trim option, precluster run removed 914 out of 32867 sequences (total left=31K)
Any other suggestions/advises on how to get less redundant dataset?
thank you.

pschloss · October 26, 2010, 2:53pm

Sorry - that was supposed to be filter.seqs (corrected now)

lmac · October 26, 2010, 4:42pm

Ok, tried after preclustering>alignment>filtering, and without preclustering and the resultimg alignment is realy bad:
mothur > summary.seqs(fasta=hgut.trim.unique.align)

Start End NBases Ambigs Polymer
Minimum: 0 0 0 0 1
2.5%-tile: 1044 1062 3 0 1
25%-tile: 1044 1062 9 0 3
Median: 1044 1139 10 0 3
75%-tile: 43012 43116 38 0 3
97.5%-tile: 43113 43116 54 0 4
Maximum: 43116 43117 224 0 7

of Seqs: 32867

and after preclustering:

mothur > summary.seqs(fasta=hgut.trim.unique.precluster.align)

Start End NBases Ambigs Polymer
Minimum: 0 0 0 0 1
2.5%-tile: 1044 1062 3 0 1
25%-tile: 1044 1062 9 0 3
Median: 1044 1139 10 0 3
75%-tile: 43017 43116 38 0 3
97.5%-tile: 43113 43116 54 0 4
Maximum: 43116 43117 224 0 7

of Seqs: 31953

if ignoring and moving to the filter option- all alignment is removed.

any other way to reduce the dataset before alignment, I mean unique step or other, not filtering after aligning step?
(my PC has 12.0 GB by the way).

pschloss · October 26, 2010, 5:36pm

Two things…

Be sure to follow the Costello example analysis. Even if you have removed the oligos, you need to trim the bad quality scoring bases from the sequences. Also, you want to do unique.seqs > align.seqs > chimera.slayer > filter.seqs > unique.seqs > preclustering. You seem to have it a bit backward.
I think your sequences are oriented in the wrong direction. When you run trim.seqs you can flip the sequences. Alternatively, in align.seqs you can also have the aligner automatically try to flip them for you.

lmac · October 26, 2010, 6:31pm

Pat, thanks for your help.

For some reason it looks like mothur is not trimming at all based on qual file (see bellow).

mothur > trim.seqs(fasta=hgut.fasta, qfile=hgut.qual, maxambig=0, maxhomop=8, flip=T, processors=2)
_GM2SIF004IV1XN is in your fasta file and not in your quality file, not using quality file.

When checked, both files, the fasta and qual have the sequence.

I repeated from scratch and again mothur doesn’t read qual telling that there is an error.

mothur > trim.seqs(fasta=hg.fasta, qfile=hg.qual)
Error reading quality file, name blank at position, -1

I am opening both files and both are completely identical.
thanks.

pschloss · October 27, 2010, 4:26pm

Do both sequences have the “_” character? I suspect they don’t and that may have been introduced accidentally.

lmac · October 27, 2010, 5:41pm

no, they are both identical.

here is first seq from qual:

1_GM2SIF004IJDI0 rank=0022353 x=3384.0 y=170.0 length=418
37 37 37 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 37 37 37 37 37 37 37

here is the fasta:

1_GM2SIF004IJDI0 rank=0022353 x=3384.0 y=170.0 length=418
GAGTTTGATCCTGGCTCAGATTGAACGCTGGCGGCGTGCCTAACACATGCAAGTCGAGCGAACGATGAAAGGAGCTTGCTTTTTTCAAAGTTAGCGGCGCACGGGTGAGTAAC

westcott · October 27, 2010, 6:32pm

Would you mind sending your logfile and input files to mothur.bugs@gmail.com?

lmac · November 2, 2010, 12:30am

Any idea what went wrong with qual files and why mothur is not recognizing them? i sent them separately each with hgut name to mothur.bugs@gmail.com.
thanks a lot for your help!!!

westcott · November 2, 2010, 12:47pm

Your quality file contains 35,275 sequences and your fasta file contains 34,839.

pressure · November 4, 2010, 3:45pm

Hello,

I seem to be having similar problems as the original poster. Can you give some elaboration on “flipping the sequences” as suggested above. How would I have the aligner automatically flip my sequences.

Thanks.

pschloss · November 5, 2010, 9:59am

when you run the align.seqs command use flip=T as one of the options. See…

http://www.mothur.org/wiki/Align.seqs#flip_and_threshold

pressure · November 5, 2010, 1:15pm

Thanks. I realized that I was following the Costello example too closely. I had included the Flip=T when it was not necessary and that sent me on the wrong trajectory.

Thanks.

lmac · November 9, 2010, 2:41pm

When I run the chimera slayer, after following Costello’s example:

mothur > chimera.slayer(fasta=1234.trim.unique.good.filter.fasta, template=silva.gold.filter.fasta)
Checking sequences from 1234.trim.unique.good.filter.fasta …
Reading sequences from silva.gold.filter.fasta…Done.
Reading sequences from 1234.trim.unique.good.filter.fasta…Done.
and after this mothur closes instantly.

What could you recommend for this issue?

thank you.

pschloss · November 10, 2010, 1:54pm

How much RAM does your computer have? That would be my first guess. That or the reference files got corrupted and you may need to download fresh copies.

pschloss · December 15, 2010, 1:03pm

Amblyomma,

Can you tell us what platform you are using to run mothur and what version of mothur you are using (v.1.15)?

Thanks,Pat

pschloss · December 15, 2010, 9:58pm

Ah - you forgot to include silva.gold.align when you ran filter.seqs… try this…

filter.seqs(fasta=silva.gold.align-good.fasta, …)
chimera.slayer(template=silva.gold.filter.fasta, candidate=good.filter.fasta)

Topic		Replies	Views
chimera.slayer error Commands in mothur	8	8217	August 13, 2010
1 sequence remains after align.seqs Commands in mothur	7	10110	June 17, 2011
error with filter.seqs or chimera.uchime??? mothur bugs	3	1459	March 13, 2017
align.seqs issue/ groups removal mothur bugs	5	2822	March 19, 2015
Align.seqs removing most basepairs Commands in mothur	1	2046	August 15, 2013

filtering uniques and chimera problems

of Seqs: 32867

of Seqs: 31953

Related topics