Help with align.seqs

Hi All,
I am analyzing my MiSeq 300x300 run data. After running the command: align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=silva.v34.fasta)

I get the following message:

“Some of you sequences generated alignments that eliminated too many bases, a list is provided in stability.trim.contigs.good.unique.flip.accnos. If you set the flip parameter to true mothur will try aligning the reverse compliment as well.”

Also, I looked at the flip.accnos file and there are a whole lot of sequences in this file. So, my question is, is it advisable to continue with the analysis?

Thanks
Jatinder

If you run align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=silva.v34.fasta, flip=t) mothur will align your sequences as well as the reverse compliment of your sequences and choose the better alignment.

Thank you. I will run the command as you have suggested.

Jatinder

I am also working with non-aligned sequences, even when using align.seqs(flip=t) in mothur 1.33.3 OS X 10.9 build on OS X 10.9 MacBook. I am aligning to entire silva.bacteria.fasta without trimming

Three questions:

  1. everything was sequenced in the same direction; should I even set the flip parameter to t (true)? About half of non-aligned sequences have better reverse complement alignments, but maybe those reads are just garbage from non-specific PCR. Should I be stringent and go with flip=f ?

  2. could using the entire silva.bacteria.fasta be the cause of bad alignments?

  3. Do Schloss’ dual-index primers for the V4 region also amplify in Archaea?. If yes, maybe I should concatenate bacterial and archaeal reference alignments.

Thanks!

Paul

  1. everything was sequenced in the same direction; should I even set the flip parameter to t (true)? About half of non-aligned sequences have better reverse complement alignments, but maybe those reads are just garbage from non-specific PCR. Should I be stringent and go with flip=f ?

Hmmm… I’m suspicious. If half are better when flipped, that sounds a lot like the read adapter fused to the forward and reverse primers were used. You might look at those that do better with flip=T to see whether they start with the (rc of the?) reverse primer.

  1. could using the entire silva.bacteria.fasta be the cause of bad alignments?

Nah, we use silva.bacteria.fasta for everything.

  1. Do Schloss’ dual-index primers for the V4 region also amplify in Archaea?. If yes, maybe I should concatenate bacterial and archaeal reference alignments.

They might at some very low frequency, but I wouldn’t expect it.

Could you please how to prepare silval.v34.fasta file?

my reads were amplifed by v3-v4 region.

so, i want to prepare silva.v34.fasta file for alignment.

please consider it.

thanks.

please post this on a more pertinent thread or make your own thread.

pat

Hi Pat,

I just ran align.seqs and the accnos file show that about half of my sequences had a better alignment with the reverse complement whereas the other half was not used because the reverse complement did not produce a better alignment. I have used my own local and much smaller reference database to align my seqs to (109 seqs).

Ok, this is a bit complicated from here. I am dealing with a functional gene (nodC) rather than 16S. I ran a blastn on my local db, which includes 109 nodC seqs (not published yet) plus another ~3000 nodC seqs that I downloaded from NCBI. About half of my seqs (~70k seqs) matched something in this blastn search, with many of ~70k seqs matching the 109 nodC seqs I have, and the others matched to the 3000 nodC seqs I downloaded from NCBI. I took the non-matched seqs (~69k) and did an NCBI blastn and found another ~4k more matches. I’ve combined the matched seqs together (~70k + ~4k) and used get.seqs to pull out the seqs from my fasta file to only work with these matched sequences. For the align.seqs, I’ve taken these ~74k seqs and tried to align it to the 109 nodC seqs of my own local db, but this is where I end up having about 50% of the seqs saying that the reverse complement produced a better alignment, and the other half saying the reverse complement did NOT produce a better alignment. I have several questions for you:

  1. Should most of these seqs align to this reference db given that when I did a local blastn with my reference db, most of them came up matching with the seqs in my reference db? (i.e. Seq 1 matches Subj x in the reference db using blastn, but when I try to align Seq 1 to the reference db, it may not align even though it matched subj x quite well in blastn.) (Sorry, this may be more than a “Mothur” question.)

  2. I read in a different thread that you mentioned not using flip=T in the trim.seqs if the align.seqs leads to many seqs aligning better when it is reverse complemented. Well, it has more to do with if the sequences were read left to right to not use flip=T, but it seems like this may pose a problem down the line with align.seqs (??). Does this mean don’t use the flip=T parameter when you run trim.seqs, but you can use it for the align.seqs?

  3. Is there a way to pull out the seqs that did not get used in the align.seqs (i.e. “reverse complement did NOT produce a better alignment so it was not used, please check sequences”) within Mothur so that I can try to align some of these sequences by hand? Or do I have to write some sort of Linux code - which would suck since I’m not good at that.

Thanks so much. I hope it wasn’t too complicated to follow.

  1. Should most of these seqs align to this reference db given that when I did a local blastn with my reference db, most of them came up matching with the seqs in my reference db? (i.e. Seq 1 matches Subj x in the reference db using blastn, but when I try to align Seq 1 to the reference db, it may not align even though it matched subj x quite well in blastn.) (Sorry, this may be more than a “Mothur” question.)

Is it possible that your sequences could be coming off of the sequencer in either direction? Generally with 16S we know that the sequences are all in the same direction. But if you have done 454 and put the sequencing adapter on both ends it’s possible to get the 5-3’ and 3-5’ reads. This would explain the problem. Also, it’s possible that half of your reads are just bad or are the product of non-specific amplification and they don’t align well because they aren’t homologous to nodC.

  1. I read in a different thread that you mentioned not using flip=T in the trim.seqs if the align.seqs leads to many seqs aligning better when it is reverse complemented. Well, it has more to do with if the sequences were read left to right to not use flip=T, but it seems like this may pose a problem down the line with align.seqs (??). Does this mean don’t use the flip=T parameter when you run trim.seqs, but you can use it for the align.seqs?

flip=T is useful in trim.seqs if you know all of your reads are in the opposite direction. flip=T is useful in align.seqs if you don’t know which direction the reads are in

  1. Is there a way to pull out the seqs that did not get used in the align.seqs (i.e. “reverse complement did NOT produce a better alignment so it was not used, please check sequences”) within Mothur so that I can try to align some of these sequences by hand? Or do I have to write some sort of Linux code - which would suck since I’m not good at that.

you can use remove.seqs with the flip.accnos file.

Possibly worse…ion torrent rather than 454 or Illumina. And from a few years ago rather than with more updated technology.

Thanks on the tip for remove.seqs!