Phylotypes based analysis

Hi,

I would like to do phylotypes based analysis with my MiSeq data for v34 region. As a practice, I processed the example data in the MiSeq SOP and binned sequences into phylotypes, generated the stability.an.shared file. I noticed that the system did not generate the stability…0.03.cons.taxonomy file. I went ahead with the next command: count.groups(shared=stability.an.shared). It generated the output: Mock contains 4016 Total seqs:4061 (The stabilty.an.shared file contains only one group- Mock, is this correct?). In the SOP, under the count.groups command, it says the smallest sample had 2441 sequences in it. So, the question is: where did we get this number “2441” from? As this number is used in the next command and many commands in the analysis. Is this analysis on the correct path?

Thanks,

Jatinder

In the phylotype section of the SOP there are 4 steps…

mothur > remove.groups(count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy, groups=Mock)


mothur > phylotype(taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.taxonomy)

mothur > make.shared(list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.list, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.pick.count_table, label=1)

mothur > classify.otu(list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.list, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.pick.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.taxonomy, label=1)

The last one should generate the cons.taxonomy file. From the looks of your output, I suspect you missed running the first command

Hello Pat, Thanks. I had gone through all the commands you have suggested and it did generate a cons.taxonomy file, but I was a little confused as the file is names are a little different: …tx.1.cons.taxonomy instead of …unique_list.0.03.cons.taxonomy. Please comment on the second part of the question regarding the output of the command:count.groups.
Thank you.

So, the question is: where did we get this number “2441” from? As this number is used in the next command and many commands in the analysis. Is this analysis on the correct path?

It was the size of the smallest library.

Thanks, Pat. So, this number comes from the output of the command : count.groups(shared=stability.an.shared)? :? When I used this command what I got was : Mock contains 4,061 sequences, and the file stability.an.shared has only mock sequences. Please clarify.

Thanks

Jatinder

See above…

From the looks of your output, I suspect you missed running the first command

Hi Pat, Thanks much. I have copied and pasted below from the log file what was done.

mothur > remove.groups(count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy, groups=Mock)

[NOTE]: The count file should contain only unique names, so mothur assumes your fasta, list and taxonomy files also contain only uniques.

Removed 46 sequences from your fasta file.
Removed 4061 sequences from your count file.
Removed 46 sequences from your taxonomy file.

Output File names:
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta
stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.pick.count_table
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.taxonomy


mothur > phylotype(taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.taxonomy) 1 2 3 4 5 6

Output File Names:
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.sabund
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.rabund
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.list


mothur > make.shared(list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.list, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.pick.count_table, label=1) 1

Output File Names:
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.shared
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.F3D0.rabund
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.F3D1.rabund
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.F3D141.rabund
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.F3D142.rabund
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.F3D143.rabund
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.F3D144.rabund
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.F3D145.rabund
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.F3D146.rabund
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.F3D147.rabund
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.F3D148.rabund
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.F3D149.rabund
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.F3D150.rabund
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.F3D2.rabund
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.F3D3.rabund
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.F3D5.rabund
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.F3D6.rabund
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.F3D7.rabund
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.F3D8.rabund
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.F3D9.rabund


mothur > classify.otu(list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.list, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.pick.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.taxonomy, label=1) reftaxonomy is not required, but if given will keep the rankIDs in the summary file static. 1 64

Output File Names:
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.1.cons.taxonomy
stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.1.cons.tax.summary


mothur > system(mv stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.shared stability.an.shared)
mothur > count.groups(shared=stability.an.shared) Mock contains 4061.

Total seqs: 4061.

Output File Names:
count.summary

So, I believe, I have gone through all the necessary commands. :?


Please comment.

Thanks,

Jatinder

Based on the commands you entered, I believe that this…

mothur > system(mv stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.shared stability.an.shared)

should be this…

mothur > system(mv stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.shared stability.an.shared)

Note that you didn’t redo the cluster/cluster.split and make.shared commands on the OTU data, just the phylotype data.

Pat

Hi Pat,

Thanks much. I will give this a try.

Jatinder

Hi Pat,

That worked and I went smoothly through till the alpha diversity part of the SOP. :smiley:

Now, I have two questions:

  1. In the beta diversity part of the analysis, the SOP uses the file: stability.an.0.03.subsample.shared, in heatmap.bin, venn, get.communitytype, metastats and many other commands. I did not generate this file. However, from the sub.sample(shared=stability.an.shared, size=2241) command, I generated the file: stability.an.1.subsample.shared. So, should I be using this file for the beta diversity analysis :?:

  2. The phylotypes analysis did not generate the file: stability.trim.contigs.good.unique.precluster.pick.pick.pick.an.unique_list.0.03.cons.taxonomy. So, which file should I use to rename to: stability.an.cons.taxonomy, to use for the beta diversity analysis :?:

Thank you,

Jatinder

You want the same files, but they will have a tx in them.

Hi Pat,

Thank you. I have 5 different files ending in “.taxonomy”. However, only one file out of these has “tx” in it: stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.tx.1.cons.taxonomy, so is this the right file to use for beta diversity analysis :?:

Also, is the file: stability.an.1.subsample.shared, the right file to be used in place of the file: stability.an.0.03.subsample.shared :?: Please clarify.

Thank you,

Jatinder

Are you following and understanding the SOP? To generate the phylotype-based files you need to follow the instructions found here:

http://www.mothur.org/wiki/MiSeq_SOP#Phylotypes

This will generate a shared file and cons.taxonomy file for you that are based on the phylotype data. A shared file is what is used in the beta diversity analysis. If you look at the beta-divesity analysis steps, you will see this.

Pat

Thank you.

Yes, I am following the SOP for phylotype-based analysis. Before I analyzing my data, I wanted to go through the SOP using the data provided by you and make sure that I was getting the expected output. Also, I wanted to make sure that I was using the right files for the analysis and hence the very basic questions about the file names.