I am doing mothur analysis on bacterial 16S rRNA gene from soil samples. I have searched throughout the forum and have not yet obtained a good grasp on what exactly the trump=. option does and why its better to use filter.seqs with this option than wihtout. I realize it removes a column that contains a trump character in it’s alignment, but does that mean that it will remove the entire column even if only one of the sequences out of all of them contain that character? (meaning that even those sequences that don’t contain this character will also have that column removed?)
I ran my analysis with and without the trump=. option, and I have 17,000 fewer sequences after using the trump option, and about 30,000 fewer OTU’s. I assume this is because those sequences that were removed using trump=. did not align or had this trump character, but do I want these sequences removed?
I am still learning about sequence analysis, and I appreciate any clarification that will help me understand this process better. Thank you!
does that mean that it will remove the entire column even if only one of the sequences out of all of them contain that character?
Yes. Essentially, you want all of your sequences to start and end at the same position – sequences with extraneous data before or after your target amplicon should be trimmed.
It is very important to ensure you have properly run screen.seqs before filtering though! Improperly short sequences – those that start after the correct start alignment position, or end before the correct position will result in unnecessary truncating of all sequences.
Thank you. I did use screen.seqs in order to remove sequences that did not start and end at the proper positions before running filter.seqs.
So, do you mean to say that the TRUMP option will remove the trump characters that occur before the beginning and after the end of the alignments? Or does it also remove any that might be within the alignment? I think that is where I got a bit lost, not sure if it is removing ALL trump characters, or if it is only certain ones.
To clarify a bit: mothur (and some other programs) use two different alignment gap characters.
- is an internal gap, sequence is found before and after it.
. is a terminal gap, sequence is found only before or after it, not both.
So when you filter with trump=. it will only remove columns (alignment positions) that occur at the ends of the sequences, not inside the sequences , the . gap is never found there.
Fantastic! Thank you so much, this was the clarification I needed and where I was confused.