combine results from get.oturep and make.shared

I would like to combine the information from the make.shared command and the get.oturep command so that I will have a list of each OTU with the number of representatives in each group and then an OTU identification for the representative sequence. For instance:

label Group numOTUs OTU001 OTU002 OTU003 OTU004
0.03 A 4 45 22 9 1
0.03 B 4 17 81 0 0
representative sequence GX45912 GX48723 GX13245 GX44621

If I do not include a groups or names file for the get.oturep command will this choose a representative sequence that is equidistant from all other sequences in that OTU from all of my groups? Will the order of output of the representative OTUs in the fasta file be the same as in the rank-abundance file from the make.shared command?

You might be interested in the create.database command, http://www.mothur.org/wiki/Create.database. It provides output like:

OTUNumber Abundance repSeqName repSeq OTUConTaxonomy
1 6307 GQY1XT001C296C A-GC–GA-G-A-A-G-T-A … GT-GAA Bacteria(100);“Bacteroidetes”(100);“Bacteroidia”(100);…
2 5124 GQY1XT001A3TJI G-GC–GA-G-A-A-G-T-A … GT-GAA Bacteria(100);“Bacteroidetes”(100);“Bacteroidia”(100);…
3 3177 GQY1XT001CS2B8 G-GC–GA-G-A-A-G-T-A … GT-GAA Bacteria(100);“Bacteroidetes”(100);“Bacteroidia”(100);…
4 2947 GQY1XT001CD9IB G-GC–GA-G-A-A-G-T-A … GT-GAA Bacteria(100);“Bacteroidetes”(100);“Bacteroidia”(100);…

You can also get the abundances broken down by group.

If I do not include a groups or names file for the get.oturep command will this choose a representative sequence that is equidistant from all other sequences in that OTU from all of my groups?

Mothur finds the representative the same way regardless of whether you provide a group file or not. The method parameter allows you to select the method of selecting the representative sequence. Choices are distance and abundance. The distance method finds the sequence with the smallest maximum distance to the other sequences. If tie occurs the sequence with smallest average distance is selected. The abundance method chooses the most abundant sequence in the OTU as the representative. If you provide a group file and do not select groups, mothur will include a list of groups found in the OTU in the rep fasta file. For example: get.oturep(fasta=final.fasta, name=final.names, group=final.groups, list=final.an.list, column=final.dist, label=0.03)

GQY1XT001C296C 1|6104|F003D000-F003D002-F003D004-F003D006-F003D008-F003D142-F003D144-F003D146-F003D148-F003D150
A-G-T-G-A-GC–GA-G-A-AG-T-A–TG-C-GG-A-ATG-C-G-T-G-GT-GT-A-G-CGGT-G-AAA–TG-C-AT-AG–AT-ATC-A-C-G…

If you select groups, mothur will create a rep fasta and rep names file for each group you select. For example: get.oturep(fasta=final.fasta, name=final.names, group=final.groups, list=final.an.list, column=final.dist, label=0.03, groups=all)

GQY1XT001BQRGU 1|409|F003D000
A-G-T-G-G-GC–GA-G-A-AG-T-A–TG-C-GG-A-ATG-C-G-T-G-GT-GT-A-G-CGGT-G-AAA–TG-C-AT-AG–AT-ATC-…


Will the order of output of the representative OTUs in the fasta file be the same as in the rank-abundance file from the make.shared command?

In the fasta file the sequences are as follows: seqName otuNumber|abundance|groups. So in the example above the 1 refers to OTU1. As long as you used the same list file to make the shared file, you can compare the files.

Wow, what a great option to be able to create a database! That is exactly what I needed.

As for my question on choosing a representative sequence for each OTU, I wasn’t very clear, and I think you answered it anyway. I chose distance for the creation of OTUs and when I ran get.oturep I asked for a representative sequence to be chosen for each OTU and for each group. So I was wondering if the representative sequences would be equidistant from all other sequences for all groups within that OTU, or if the representative sequence would be equidistant from all other sequences in that OTU for that one group.

I feel like this would be easier to convey visually. But I think from your answer that the first case is true–the representative is equidistant (I think of it as “centered”) in the middle of the OTU which was created in examining all sequences from all groups. The representative is not centered in the OTU for just the one group for which it is a representative.

If you choose to find a representative for each group, mothur separates the OTU by group and then chooses the representative only considering the sequences in that OTU in that group. Just to be clear here is a simplified example:

OTU1 - seq1,seq2,seq3,seq4,seq5,seq6
seq1,seq2,seq3 belong to group1
seq4,seq5,seq6 belong to group2

If no groups are used mothur finds a representative from the entire OTU: seq1,seq2,seq3,seq4,seq5,seq6.
If groups are used, mothur finds a representative for group1 from: seq1,seq2,seq3, and a representative for group2 from: seq4,seq5,seq6.