Hi all. The pairwise.seqs command appears to estimating distances incorrectly. I noticed this while calculating pairwise distances for some fungal data and replicated the result with a subset of the esophagus dataset. The distances are, on average, 23 % larger than those calculated after aligning against core_set_aligned.imputed.fasta and 29 % larger than using ESPRIT (using the same gapopen [10] and gapextend [-0.5] parameters as the ESPRIT default), with a large range (0 - 73 % larger).
Here’s the mothur screen output. I was running the 32-bit version on my MacBook Pro. I can supply additional files if they are needed.
Best regards,
Jeff
mothur v.1.17.0 Last updated: 2/25/2011
by
Patrick D. Schloss
Department of Microbiology & Immunology
University of Michigan
pschloss@umich.edu
http://www.mothur.org
When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.
Distributed under the GNU General Public License
Type ‘help()’ for information on the commands that are available
Type ‘quit()’ to exit program
mothur > sub.sample(fasta=esophagus.fasta)
Sampling 71 from 710.
Output File Names:
esophagus.subsample.fasta
mothur > unique.seqs(fasta=esophagus.subsample.fasta)
Output File Names: esophagus.subsample.unique.fasta esophagus.subsample.names
mothur > align.seqs(candidate=esophagus.subsample.unique.fasta, template=core_set_aligned.imputed.fasta)
Reading in the core_set_aligned.imputed.fasta template sequences... DONE. Aligning sequences from esophagus.subsample.unique.fasta ... 70 It took 3 secs to align 70 sequences.
Output File Names: esophagus.subsample.unique.align esophagus.subsample.unique.align.report
mothur > dist.seqs(fasta=esophagus.subsample.unique.align)
0 0
69 0
Output File Name:
esophagus.subsample.unique.dist
It took 0 to calculate the distances for 70 sequences.
mothur > system(cp esophagus.subsample.unique.dist esophagus.align.dist)
mothur > pairwise.seqs(fasta=esophagus.subsample.unique.fasta)
Processing sequences from esophagus.subsample.unique.fasta …
0 0
69 39
It took 39 to calculate the distances for 70 sequences.
Output File Name:
esophagus.subsample.unique.dist
mothur > system(cp esophagus.subsample.unique.dist esophagus.pairwise.dist)
mothur > read.dist(column=esophagus.align.dist, name=esophagus.subsample.names)
********************###########
Reading matrix: ||||||||||||||||||||||||||||||||||||||||||||||||||||
It took 0 secs to read
mothur > cluster()
unique 2 69 1
0.00 9 50 3 2 0 0 0 0 0 1
0.01 16 23 7 1 1 1 1 0 0 0 0 0 0 0 0 0 1
0.02 17 15 9 1 1 1 0 0 0 1 0 0 0 0 0 0 0 1
0.03 19 14 6 2 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1
0.04 20 11 4 4 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1
0.06 20 11 3 3 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1
0.07 21 10 3 3 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1
0.08 24 10 3 2 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0.09 24 10 2 2 1 0 0 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0.12 33 10 2 2 1 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0.13 33 8 1 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0.18 33 6 2 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0.21 33 5 2 1 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0.24 35 4 1 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0.25 35 1 2 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0.27 35 1 2 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1
0.29 54 1 2 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0.30 57 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0.32 59 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0.33 62 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0.37 71 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
Output File Names:
esophagus.align.fn.sabund
esophagus.align.fn.rabund
esophagus.align.fn.list
It took 0 seconds to cluster
mothur > read.dist(column=esophagus.pairwise.dist, name=esophagus.subsample.names)
********************###########
Reading matrix: ||||||||||||||||||||||||||||||||||||||||||||||||||||
It took 0 secs to read
mothur > cluster()
unique 2 67 2
0.00 10 46 4 1 1 0 0 0 0 0 1
0.01 17 29 6 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1
0.02 17 23 9 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1
0.03 17 20 9 2 1 0 1 0 0 0 0 0 0 0 0 0 0 1
0.04 18 19 8 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1
0.05 18 18 7 2 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1
0.07 18 18 7 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 1
0.10 18 18 6 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1
0.11 18 16 4 3 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1
0.13 18 14 5 3 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1
0.15 18 13 5 3 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1
0.18 18 11 6 3 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1
0.25 18 7 8 3 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1
0.30 18 4 8 4 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1
0.35 20 4 7 4 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
0.41 20 3 6 5 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
0.43 20 3 4 5 2 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
0.44 20 2 3 6 2 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
0.47 20 2 2 6 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1
0.48 20 1 2 3 2 1 2 0 0 0 0 0 1 0 0 0 0 0 0 0 1
0.72 20 0 0 3 3 0 3 0 0 0 0 0 1 0 0 0 0 0 0 0 1
0.73 24 0 0 2 2 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1
0.74 24 0 0 1 1 0 1 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1
0.75 24 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1
0.76 31 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
0.77 31 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
0.78 71 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
Output File Names:
esophagus.pairwise.fn.sabund
esophagus.pairwise.fn.rabund
esophagus.pairwise.fn.list
It took 0 seconds to cluster
mothur > pairwise.seqs(fasta=esophagus.subsample.unique.fasta, align=needleman, gapopen=-10, gapextend=-0.5)
Processing sequences from esophagus.subsample.unique.fasta …
0 0
69 36
It took 36 to calculate the distances for 70 sequences.
Output File Name:
esophagus.subsample.unique.dist
mothur > system(cp esophagus.subsample.unique.dist esophagus.pwGapParam.dist)
mothur > read.dist(column=esophagus.pwGapParam.dist, name=esophagus.subsample.names)
********************###########
Reading matrix: ||||||||||||||||||||||||||||||||||||||||||||||||||||
It took 0 secs to read
mothur > cluster()
unique 2 67 2
0.00 10 46 4 1 1 0 0 0 0 0 1
0.01 17 29 6 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1
0.02 17 21 10 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1
0.03 17 18 10 2 1 0 1 0 0 0 0 0 0 0 0 0 0 1
0.04 18 17 9 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1
0.05 18 16 8 2 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1
0.07 18 16 7 2 1 1 0 0 1 0 0 0 0 0 0 0 0 0 1
0.10 18 16 7 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1
0.12 18 15 6 2 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1
0.13 18 13 4 4 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1
0.18 18 11 5 4 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1
0.23 18 9 6 4 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1
0.26 18 7 7 4 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1
0.27 18 5 8 4 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1
0.28 18 5 6 4 2 1 0 0 0 0 0 1 0 0 0 0 0 0 1
0.32 20 5 5 4 2 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1
0.35 20 4 4 5 2 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1
0.38 20 2 5 5 2 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1
0.39 20 2 3 5 3 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1
0.41 20 1 3 5 2 2 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1
0.46 23 1 3 4 2 2 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1
0.51 23 0 1 4 2 3 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1
0.61 23 0 1 4 1 2 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1
0.68 23 0 1 3 1 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1
0.70 23 0 1 2 0 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1
0.71 23 0 0 2 0 1 0 1 1 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1
0.72 23 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1
0.75 34 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1
0.76 53 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0.77 58 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0.78 71 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
Output File Names:
esophagus.pwGapParam.fn.sabund
esophagus.pwGapParam.fn.rabund
esophagus.pwGapParam.fn.list
It took 0 seconds to cluster
mothur > quit()