Is there an easy way around to get the number of unique OTU’s from every group? In a Venn diagram you can get number i unique OTU’s from up to 4 groups, but what if you have 5 or more groups?
Anders
Is there an easy way around to get the number of unique OTU’s from every group? In a Venn diagram you can get number i unique OTU’s from up to 4 groups, but what if you have 5 or more groups?
Anders
the get.sharedotu command will do this for you. For example if you have 5 groups a,b,c,d,e, and set groups parameter to a, the output file will just report the otus that contain only sequences from group a. For more info on this command, please refer to the wiki http://www.mothur.org/wiki/Get.sharedotu.
Thanks, However I can’t figure how to get it right. Is it possible for you to make an example of the command line (e.g. for the Sogin data)? As I see it, the example in the help file is not correct. One must specify a group file not the group names of interest( at least I get an error when trying using group names), and what list file is needed? The example in the help file does not specify a list file.
Best regards,
Anders
mothur > get.sharedotu(list=sogin.list, group=sogin.groups, label=0.06, groups=55R)
0.06
mothur > quit()
the command outputs a file sogin.0.06.names that looks like this:
D4WT9DQ07D56XR|55R|4972
D4WT9DQ07D5229|55R|4972
D4WT9DQ07D53B0|55R|4974
D4WT9DQ07D53OZ|55R|4976
D4WT9DQ07D58Z3|55R|4977
D4WT9DQ07D59JT|55R|4978
D4WT9DQ07D5W4I|55R|4982
D4WT9DQ08E25K3|55R|4982
D4WT9DQ07EIAN1|55R|4985
D4WT9DQ07EF3OX|55R|4985
D4WT9DQ07D610G|55R|4985
D4WT9DQ07D61FJ|55R|4986
D4WT9DQ07D62T9|55R|4987
D4WT9DQ07EEZJ4|55R|4989
D4WT9DQ07D67IN|55R|4989
…
name of sequence|group|bin number.
So you can see, in bin 4985 there are 3 sequences that belong to group 55R. We are adding the number of bins to the output in 1.7, so it will look like this:
mothur > get.sharedotu(list=sogin.list, group=sogin.groups, label=0.06, groups=55R)
0.06 247
mothur > quit()
where 247 is the number of otus unique to group 55R at distance 0.06. For now, you can only get that info from counting the bins in the output file.
if you add the fasta parameter:
mothur > get.sharedotu(list=sogin.list, group=sogin.groups, label=0.06, groups=55R, fasta=sogin.fasta)
0.06 247
you also get the file sogin.0.06.fasta that looks like:
D4WT9DQ07EGF3W|55R|5120
AGGGGTTTGATCAATGATACGGAACTTTGTGAAGCAGAAGGTGCCGTTTGGAACGTATACAC
D4WT9DQ07EG5LU|55R|5110
TAGGCTTGACATGCAGATTACTGACCGAAGGAGCACCGCAAGGGAGTCTGCAC
D4WT9DQ07EH3NE|55R|5134
TACTCTTGACATCTACGGAAGACTGCAGAGATGCGGTTGTGCCGTTCGGGAACCGTAAGAC
D4WT9DQ07EGY18|55R|5129
…
so you can analyze sequences from a specific group or set of groups together. With multiple groups the output file will give you bins that contain only sequences from those groups. Example:
mothur > get.sharedotu(list=sogin.list, group=sogin.groups, label=0.06, groups=55R-115R)
0.06 33
D4WT9DQ11GSV4T|115R|4975
D4WT9DQ12HEUE5|115R|4975
D4WT9DQ12HEN3J|115R|4975
D4WT9DQ11G0PSE|115R|4975
D4WT9DQ12HOY4Y|115R|4975
D4WT9DQ07D53GE|55R|4975
D4WT9DQ12HJLQ3|115R|5073
D4WT9DQ07D971U|55R|5073
D4WT9DQ12HGK6F|115R|5100
D4WT9DQ07EFHF7|55R|5100
D4WT9DQ07EEMMZ|55R|5100
D4WT9DQ07EHO1H|55R|5100
D4WT9DQ07EH3U1|55R|5100
D4WT9DQ07D8SPT|55R|5100
D4WT9DQ07D7UPP|55R|5100
D4WT9DQ08E31YY|55R|5100
D4WT9DQ08E3M34|55R|5100
D4WT9DQ08E1LWS|55R|5100
D4WT9DQ08EUNXH|55R|5100
D4WT9DQ07EFHF4|55R|5100
I hope this example helps. Feel free to ask any questions.