I apologize if this is a basic question or if it has been covered, but after a search I was unable to find something helpful for me. I am using mothur to analyze paired end Miseq data (about 1.5 million reads). I have performed the following steps of the pipeline with my data in this order: make.contigs, screen.seqs, unique.seqs, align.seqs, screen.seqs, filter.seqs (using trump=.), unique.seqs, chimera.uchime, remove.seqs (I did not do a pre-cluster because I am doing some comparisons).
At this point, I want to make a distance-matrix using dist.seqs. My specific questions are:
- is the chimera checked fasta file (output of remove.seqs) still “aligned” , meaning, would there still be gaps, or is everything removed after subsequent steps (screen.seqs, filter.seqs, etc). I know that the trump=. option removes terminal . characters, but I’m unclear if internal gaps still exist.
- In either case (whether the chimera checked fasta file has gaps or not), can I use this file as the input fasta for dist.seqs?
I am running the program right now, and while its taking a long time (its been over an hour now) I’m not getting any errors as of yet. However based on the SOP the program uses reads gaps/inserts and mismatches to determine distance, so I’m just curious if my chimera checked fasta file is in the correct format for dist.seqs to work properly.
Thank you for the clarification,