dist.seqs and cluster

Hello! I’d like to calculate the coverage for my samples, the shannon and the chao1 indices through mothur, even if I run all the sequences analysis through qiime software. I have a unique fasta file containing all the sequences aligned through pynast agorithm. Can I use it as input in the dist.seqs script or do I have to split the sequences per sample? If yes, how can I set the scripts? Moreover, can I use sequences pynast aligned, or do I have to align them through muscle?
Finally, which script can I use to calculate these indices (coverage, shannon and chao1) using a unique file containing sequences from all 10 samples?
Thanks
Francesca

Why would you have to use muscle? We’re generally pretty negative about using muscle.

mothur doesn’t use scripts like QIIME. I’d suggest following along with the Schloss SOP. You would pick up at dist.seqs and go through the alpha diversity steps (although I’d really encourage people to start at the beginning).

Pat

Hi Pat! Thanks for your prompt reply… I saw the Schloss SOP, but I have some more questions:
First of all, I didn’t understand if I can use an aligned file .fasta containing sequences from more samples as input in the dist.seqs command…
Then, which file do I have to use as input in the summary.single command? Can I use the output from dist.seqs without clusterize before?
I’m sorry, but I’m a beginner with mothur and sequences analysis and I’m a bit confuse…
Thanks
Francesca

Right - if you have aligned sequences stored in a fasta-formatted file you can start right after align.seqs or go directly to dist.seqs. If you start right after align.seqs then you can make sure the sequences overlap in the same region, trim them to the same alignment coordinates, and check them for chimeras. After running dist.seqs you’ll have to run cluster to get OTU assignments and then run summary.single to get the parameters you want.

Hope this helps,
Pat

Hi, different to Francesca´s question but still dealing with distance matrix and cluster…
I ran dist.seqs with cutoff 0.15 on 454 data, then cluster with only (cluster(column=final.dist, name=final.names))and got this output:

label numOtus Otu00001 Otu00002 Otu00003 Otu00004
unique 47274 IK6R0XE02DIDUT,IK6R0XE02EDI09,IK6R0XE02DZU0O,IK6R0XE02EO9JP,IK6R
0.01 29883 IOBVYG303HFE00,IOBVYG303GFF89,IOBVYG304H7U3W,IOBVYG303F27L6,IOBV
0.02 18960 IOBVYG303GQOV9,JQ82DGA01BHS0K,IOBVYG303FJBYZ,IK6R0XE02DK5DJ,IOBV
0.03 13951 IOBVYG304JKR2R,IOBVYG304IHW9N,IK6R0XE02CZV8G,IOBVYG304JU2TB,IOBV
0.04 10886 IOBVYG304IX8H3,IOBVYG304IRL5Y,IK6R0XE02D94AF,IOBVYG304IO10X,IK6R
0.05 8713 JHSA45F02DGPRL,JHSA45F02DNBR2,JHSA45F02DLTFQ,JHSA45F02C8H6K,JHSA
0.06 7131 IK6R0XE02EE9AQ,IOBVYG304JU8VH,IOBVYG303GBP89,IOBVYG304H86AI,IOBV
final.an.list (END)

Does it means that I don´t have sequences more distant than 94%, so I wouldn´t be able to define OTUs at a higher level than 94% similarity? Does it means that when we define OTUs at 0.03 distance it more or less equals around 97% similarity between the sequences within an OTU? Or I got this concept wrong?

Sorry for the confusion, but I want to make sure I can understand the meaning of the distances and also how to use the cutoff and label options in dist.seqs, cluster and make.shared.

Thanks!

https://www.mothur.org/wiki/Frequently_asked_questions#Why_does_the_cutoff_change_when_I_cluster_with_average_neighbor.3F