chimera checked fasta as input file for dist.seqs?

Hello,

I apologize if this is a basic question or if it has been covered, but after a search I was unable to find something helpful for me. I am using mothur to analyze paired end Miseq data (about 1.5 million reads). I have performed the following steps of the pipeline with my data in this order: make.contigs, screen.seqs, unique.seqs, align.seqs, screen.seqs, filter.seqs (using trump=.), unique.seqs, chimera.uchime, remove.seqs (I did not do a pre-cluster because I am doing some comparisons).

At this point, I want to make a distance-matrix using dist.seqs. My specific questions are:

  1. is the chimera checked fasta file (output of remove.seqs) still “aligned” , meaning, would there still be gaps, or is everything removed after subsequent steps (screen.seqs, filter.seqs, etc). I know that the trump=. option removes terminal . characters, but I’m unclear if internal gaps still exist.
  2. In either case (whether the chimera checked fasta file has gaps or not), can I use this file as the input fasta for dist.seqs?

I am running the program right now, and while its taking a long time (its been over an hour now) I’m not getting any errors as of yet. However based on the SOP the program uses reads gaps/inserts and mismatches to determine distance, so I’m just curious if my chimera checked fasta file is in the correct format for dist.seqs to work properly.

Thank you for the clarification,
Anna

Hi,

  1. The chimera-checked fasta file is still aligned and will contain internal gaps where needed.

  2. I’m not sure if dist.seqs will accept an unaligned fasta file, I haven’t tried this in a long time but I seem to remember mothur throwing an error if the sequences in your fasta file are different lengths. There is the pairwise.seqs() command for working with unaligned fasta files but I’ve never tried it and imagine that it would be much slower than dist.seqs().

Thank you dwaite! I figured that since I didn’t get an error message, that things would be okay, but I wasn’t sure if my file was aligned and if dist.seqs would be able to work on a non-aligned file (in case my file wasn’t aligned). Thank you so much for the clarification!

Anna