Hi,
I’m using the latest version of mothur and I have tried quality trimming of my sequences based on a qaveragewindow and no matter the value I use, all the sequences get scrapped. The quality file doesn’t seem to be the problem since trimming based on qaverage works well. Any ideas what the problem may be?
Thanks!
bunbury,
can you check to see if you have v.1.16.1? we sent out a quick release that fixed a problem like this and you may have 1.16.0.
pat
yes, I have version v.1.16.1. I think the problem may have been I downloaded the 64 bit version (Mac). Now I am working with the 32 bit version and it allowed me to trim sequences based on quality scores no problem. However, I noticed that if I trim sequences based on qual scores, ambig seqs, homop, seq length and oligos all at the same time, I get a different result than if I run trim.seqs independently and consecutively for each parameter.
For example:
My data set includes 543703 sequences. If I run this command (trim.seqs(fasta=healthy.fasta, oligos=healthy.oligos, qfile=healthy.qual, maxambig=0, maxhomop=8, pdiffs=2, minlength=250, maxlength=390, qwindowaverage=35, qwindowsize=50, processors=2) I get 242709 in the trim file, while if I run each command at a time (first oligos, then ambig and homop, then seq length and then qual score) I end up with 449456 sequences. Any idea why this may be?
Hmm… That doesn’t make sense. The order of things in trim.seqs is to remove the primers/barcodes, trim by quality scores and then to cull sequences that don’t match the length, homop, or ambig parameters. If anything I’d expect fewer sequences doing it piecemeal because the quality trimming should remove regions of sequences that are long, have long homopolymers or have ambiguities. Can you send us the fasta/qual/oligos file to take a look (mothur.bugs@gmail.com)?
I think I figured out what my problem is. You are right and the order matters. May be you meant to say in your reply the order should be quality first, then oligos, etc. If I remove oligos first, my sequences won’t match anymore the quality files. By triming step by step, first trimming by quality, i get the same result as if i trim in batch. Makes sense?
thanks…just trying to understand what I’m doing!
Right, sorry - quality first, oligos second. I’m glad to see people are trying to dig into the black box. You guys keep us on our toes!
i am facing the same problem here with the trim sequences command. all of my sequences are getting scrapped afterwards (using the defaults as such).
I am using the latest version here (v1.18.1;64 bit for linux), following costello analysis example on my 454 roche barcoded pyrosequencing results.
There is one major difference in our approach and costello’s is that we have amplified a region of around 400bp here running from V1 to V3 region.
thanks,
Dharmesh
have you looked at the codes in the scrap.fasta file and can you tell us what you see?
Sequences names in the scrap.fasta file are succeeded by codes “bf” or “f”. In a few cases “bfq” is also present.
Dharmesh
I would double check that you have your barcode and primer sequences correct in the oligos file