filtering uniques and chimera problems

Hi Pat,

I’ve got two small problems:

I am working with about 34K sequences, that are trimmed of oligos and all range between 250 to 560bp size, no ambiguities, and therefore I jumped into unique option to obtain a non-redundant set of sequences. With this I reduce my dataset to about 32.8K, which is not impressive. Am I skipping something?

I’ve aligned the file to see how it works, and then hopping to use the chimera slayer, the mothur comes back with an error:

mothur > chimera.slayer(fasta=hgut.align, template=silva.bacteria.fasta)
Checking sequences from hgut.align …
Reading sequences from silva.bacteria.fasta…Done.
Reading sequences from hgut.align…[ERROR]: St9bad_alloc has occurred in the Sequence class function Sequence.

I kind of guess that this might be because of the large alignment and I probably have to obtain a non-redundant dataset…but would appreciate your suggestions.
Thanks.

I am working with about 34K sequences, that are trimmed of oligos and all range between 250 to 560bp size, no ambiguities, and therefore I jumped into unique option to obtain a non-redundant set of sequences. With this I reduce my dataset to about 32.8K, which is not impressive. Am I skipping something?

You’ll probably see the biggest reduction after running filter.seqs with trump=. and running pre.cluster

[ERROR]: St9bad_alloc has occurred in the Sequence class function Sequence.

We typically see this when you don’t have enough memory. Do you know how much RAM you have on your computer?

Thanks for the answer!

When I use trump option with v.1.13.0, I am getting:
mothur > trim.seqs(fasta=hgut.fasta, qfile=hgut.qual, maxambig=0, maxhomop=8, flip=T, processors=2, trump=.)
trump is not a valid parameter.
The valid parameters are: fasta, flip, oligos, maxambig, maxhomop, minlength, maxlength, qfile, qthreshold, qwindowaverage, qstepsize, qwindowsize, qaverage, rollaverage, allfiles, qtrim, tdiffs, pdiffs, bdiffs, processors, outputdir, and inputdir.

It seems that this version does not recognise trump=. command.

Without trim option, precluster run removed 914 out of 32867 sequences (total left=31K)
Any other suggestions/advises on how to get less redundant dataset?
thank you.

Sorry - that was supposed to be filter.seqs (corrected now)

Ok, tried after preclustering>alignment>filtering, and without preclustering and the resultimg alignment is realy bad:
mothur > summary.seqs(fasta=hgut.trim.unique.align)

Start End NBases Ambigs Polymer
Minimum: 0 0 0 0 1
2.5%-tile: 1044 1062 3 0 1
25%-tile: 1044 1062 9 0 3
Median: 1044 1139 10 0 3
75%-tile: 43012 43116 38 0 3
97.5%-tile: 43113 43116 54 0 4
Maximum: 43116 43117 224 0 7

of Seqs: 32867

and after preclustering:

mothur > summary.seqs(fasta=hgut.trim.unique.precluster.align)

Start End NBases Ambigs Polymer
Minimum: 0 0 0 0 1
2.5%-tile: 1044 1062 3 0 1
25%-tile: 1044 1062 9 0 3
Median: 1044 1139 10 0 3
75%-tile: 43017 43116 38 0 3
97.5%-tile: 43113 43116 54 0 4
Maximum: 43116 43117 224 0 7

of Seqs: 31953

if ignoring and moving to the filter option- all alignment is removed.

any other way to reduce the dataset before alignment, I mean unique step or other, not filtering after aligning step?
(my PC has 12.0 GB by the way).

Two things…

  1. Be sure to follow the Costello example analysis. Even if you have removed the oligos, you need to trim the bad quality scoring bases from the sequences. Also, you want to do unique.seqs > align.seqs > chimera.slayer > filter.seqs > unique.seqs > preclustering. You seem to have it a bit backward.

  2. I think your sequences are oriented in the wrong direction. When you run trim.seqs you can flip the sequences. Alternatively, in align.seqs you can also have the aligner automatically try to flip them for you.

Pat, thanks for your help.

For some reason it looks like mothur is not trimming at all based on qual file (see bellow).

mothur > trim.seqs(fasta=hgut.fasta, qfile=hgut.qual, maxambig=0, maxhomop=8, flip=T, processors=2)
_GM2SIF004IV1XN is in your fasta file and not in your quality file, not using quality file.

When checked, both files, the fasta and qual have the sequence.

I repeated from scratch and again mothur doesn’t read qual telling that there is an error.

mothur > trim.seqs(fasta=hg.fasta, qfile=hg.qual)
Error reading quality file, name blank at position, -1

I am opening both files and both are completely identical.
thanks.

Do both sequences have the “_” character? I suspect they don’t and that may have been introduced accidentally.

no, they are both identical.

here is first seq from qual:

1_GM2SIF004IJDI0 rank=0022353 x=3384.0 y=170.0 length=418
37 37 37 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 37 37 37 37 37 37 37

here is the fasta:

1_GM2SIF004IJDI0 rank=0022353 x=3384.0 y=170.0 length=418
GAGTTTGATCCTGGCTCAGATTGAACGCTGGCGGCGTGCCTAACACATGCAAGTCGAGCGAACGATGAAAGGAGCTTGCTTTTTTCAAAGTTAGCGGCGCACGGGTGAGTAAC

Would you mind sending your logfile and input files to mothur.bugs@gmail.com?

Any idea what went wrong with qual files and why mothur is not recognizing them? i sent them separately each with hgut name to mothur.bugs@gmail.com.
thanks a lot for your help!!!

Your quality file contains 35,275 sequences and your fasta file contains 34,839.

Hello,

I seem to be having similar problems as the original poster. Can you give some elaboration on “flipping the sequences” as suggested above. How would I have the aligner automatically flip my sequences.

Thanks.

when you run the align.seqs command use flip=T as one of the options. See…

http://www.mothur.org/wiki/Align.seqs#flip_and_threshold

Thanks. I realized that I was following the Costello example too closely. I had included the Flip=T when it was not necessary and that sent me on the wrong trajectory.

Thanks.

When I run the chimera slayer, after following Costello’s example:

mothur > chimera.slayer(fasta=1234.trim.unique.good.filter.fasta, template=silva.gold.filter.fasta)
Checking sequences from 1234.trim.unique.good.filter.fasta …
Reading sequences from silva.gold.filter.fasta…Done.
Reading sequences from 1234.trim.unique.good.filter.fasta…Done.
and after this mothur closes instantly.

What could you recommend for this issue?

thank you.

How much RAM does your computer have? That would be my first guess. That or the reference files got corrupted and you may need to download fresh copies.

Amblyomma,

Can you tell us what platform you are using to run mothur and what version of mothur you are using (v.1.15)?

Thanks,Pat

Ah - you forgot to include silva.gold.align when you ran filter.seqs… try this…

filter.seqs(fasta=silva.gold.align-good.fasta, …)
chimera.slayer(template=silva.gold.filter.fasta, candidate=good.filter.fasta)