I think I ran into some bugs in trim.seqs. I have no sequences in the scrap file and quite a few empty sequences in the trim file. In summary, I think it’s because instead of separating the good and bad sequences into two files like it’s supposed to, the qthreshold option trims the sequences back to the first base that failed the filter. I can actually see this being useful in some cases, but it deviates from the documented behavior for this function. Is there a way to make it do what it’s supposed to do?
Yeah, so the job of trim.seqs is to trim the sequences If an oligo can’t be found or the sequences don’t meet some criterion then the sequences are sent to the scrap heap. Otherwise they are kept. For the qthreshold feature, you’re right that the sequence is trimmed to the first value below your threshold. To kick out a sequence as being too short, for now, I’d suggest setting the minimum length option. I’ll add this to the list, if the trimmed sequences are blank, then throw them in the scrap heap. Sound good?
We actually kind of want the qthreshold option to behave like other options (i.e., move the passed sequences to the trim file and failed sequences to the scrap heap without altering the original sequence) for some QC trials we’re doing. For 99% of the cases I can see your logic, though. It would be nice to have both options, if that’s not asking too much. :mrgreen:
Thanks to your suggestion, in version 1.7 trim.seqs will include the parameter qtrim. By default qtrim is false, meaning if a sequence falls below the qthreshold it will go to the scrap file. If qtrim is set to true and a sequence falls below the qthreshold, it will be trimmed and put in the trim file.
Does anyone know the meaning of the -1 value for the minimum sequence start and end? I’ve been seeing this behavior on my qthreshold trimming as well. Searching for the -1 length in the summary file and tracking it down in the trimmed fasta file shows that it results from a sequence that has no sequence length (more than one occurrence). The original sequences were short and were trimmed to zero, but the quality of the first bases are above the threshold specified. What does this mean?
The -1 should be 0, and will be fixed in version 1.7. It just means that you had some sequences that were trimmed to zero.
The original sequences were short and were trimmed to zero, but the quality of the first bases are above the threshold specified. What does this mean?
I’d like to look into the issues you are having with mothur trimming sequences to zero where the first bases are above the threshold. Would you mind sending your files to mothur.bugs@gmail.com?