Some of you sequences generated alignments that eliminated too many bases, a list is provided in crdd1.shhh.trim.unique.flip.accnos. If you set the flip parameter to true mothur will try aligning the reverse compliment as well. It took 186 secs to align 25814 sequences.
Some of you sequences generated alignments that eliminated too many bases, a list is provided in crdd1.shhh.trim.unique.flip.accnos. If the reverse compliment proved to be better it was reported. It took 378 secs to align 25814 sequences.
My question is: Should I modify the command line somehow to avoid eliminating too many bases? Or It is acceptable to continue with the downstream analysis using the files generated by one of the commands I used? even though it eliminated a considerable number of bases.
What do the output look like? How many sequences show up in the flip.accnos file? Sometimes spurious DNA fragments get through to this point and because they aren’t 16S sequences, weird things happen. If there aren’t a ton of them and you’re sure the sequences are in the right direction, I wouldn’t worry. They’ll get culled when you run screen.seqs.
Rather than start a new thread I figure this is an appropriate topic for me to post about a similar problem.
When we run align.seqs and then get the summary, we are left with very few sequences of which the smallest is 0bp, largest is 60bp, and median is 13bp. Prior to this command we have no problems with large numbers of sequences all 200-350 bp. When I view the accnos file there are many sequence names in the file but what concerns us most is the fasta file generated in align.seqs consists mostly of “.” with the odd nucliotide base. For example…
…A… …and so on…
We have tried many approaches to the align.seqs to overcome this but have been unsuccessful. Is there something we are doing wrong or is this because the data we get back from the lab is poor?
This may be happening because in trim.seqs you used flip=T when you sequenced from left to right. If you sequence from left to right (i.e. from the 5’ to 3’ end of the gene) then you don’t want to use flip=T.
Thanks Pat, there were just 4 sequences in the accnos file out of a total of 90944 sequences. I guess I am fine, similar thing happens when analyzing 18S datasets.