combine results from get.oturep and make.shared

lmsteinberg · August 1, 2013, 7:06pm

I would like to combine the information from the make.shared command and the get.oturep command so that I will have a list of each OTU with the number of representatives in each group and then an OTU identification for the representative sequence. For instance:

label Group numOTUs OTU001 OTU002 OTU003 OTU004
0.03 A 4 45 22 9 1
0.03 B 4 17 81 0 0
representative sequence GX45912 GX48723 GX13245 GX44621

If I do not include a groups or names file for the get.oturep command will this choose a representative sequence that is equidistant from all other sequences in that OTU from all of my groups? Will the order of output of the representative OTUs in the fasta file be the same as in the rank-abundance file from the make.shared command?

westcott · August 2, 2013, 12:09pm

You might be interested in the create.database command, Redirecting…. It provides output like:

OTUNumber Abundance repSeqName repSeq OTUConTaxonomy
1 6307 GQY1XT001C296C A-GC–GA-G-A-A-G-T-A … GT-GAA Bacteria(100);“Bacteroidetes”(100);“Bacteroidia”(100);…
2 5124 GQY1XT001A3TJI G-GC–GA-G-A-A-G-T-A … GT-GAA Bacteria(100);“Bacteroidetes”(100);“Bacteroidia”(100);…
3 3177 GQY1XT001CS2B8 G-GC–GA-G-A-A-G-T-A … GT-GAA Bacteria(100);“Bacteroidetes”(100);“Bacteroidia”(100);…
4 2947 GQY1XT001CD9IB G-GC–GA-G-A-A-G-T-A … GT-GAA Bacteria(100);“Bacteroidetes”(100);“Bacteroidia”(100);…
…

You can also get the abundances broken down by group.

If I do not include a groups or names file for the get.oturep command will this choose a representative sequence that is equidistant from all other sequences in that OTU from all of my groups?

Mothur finds the representative the same way regardless of whether you provide a group file or not. The method parameter allows you to select the method of selecting the representative sequence. Choices are distance and abundance. The distance method finds the sequence with the smallest maximum distance to the other sequences. If tie occurs the sequence with smallest average distance is selected. The abundance method chooses the most abundant sequence in the OTU as the representative. If you provide a group file and do not select groups, mothur will include a list of groups found in the OTU in the rep fasta file. For example: get.oturep(fasta=final.fasta, name=final.names, group=final.groups, list=final.an.list, column=final.dist, label=0.03)

GQY1XT001C296C 1|6104|F003D000-F003D002-F003D004-F003D006-F003D008-F003D142-F003D144-F003D146-F003D148-F003D150
A-G-T-G-A-GC–GA-G-A-AG-T-A–TG-C-GG-A-ATG-C-G-T-G-GT-GT-A-G-CGGT-G-AAA–TG-C-AT-AG–AT-ATC-A-C-G…

If you select groups, mothur will create a rep fasta and rep names file for each group you select. For example: get.oturep(fasta=final.fasta, name=final.names, group=final.groups, list=final.an.list, column=final.dist, label=0.03, groups=all)

GQY1XT001BQRGU 1|409|F003D000
A-G-T-G-G-GC–GA-G-A-AG-T-A–TG-C-GG-A-ATG-C-G-T-G-GT-GT-A-G-CGGT-G-AAA–TG-C-AT-AG–AT-ATC-…

Will the order of output of the representative OTUs in the fasta file be the same as in the rank-abundance file from the make.shared command?

In the fasta file the sequences are as follows: seqName otuNumber|abundance|groups. So in the example above the 1 refers to OTU1. As long as you used the same list file to make the shared file, you can compare the files.

lmsteinberg · August 5, 2013, 1:22pm

Wow, what a great option to be able to create a database! That is exactly what I needed.

As for my question on choosing a representative sequence for each OTU, I wasn’t very clear, and I think you answered it anyway. I chose distance for the creation of OTUs and when I ran get.oturep I asked for a representative sequence to be chosen for each OTU and for each group. So I was wondering if the representative sequences would be equidistant from all other sequences for all groups within that OTU, or if the representative sequence would be equidistant from all other sequences in that OTU for that one group.

I feel like this would be easier to convey visually. But I think from your answer that the first case is true–the representative is equidistant (I think of it as “centered”) in the middle of the OTU which was created in examining all sequences from all groups. The representative is not centered in the OTU for just the one group for which it is a representative.

westcott · August 5, 2013, 3:40pm

If you choose to find a representative for each group, mothur separates the OTU by group and then chooses the representative only considering the sequences in that OTU in that group. Just to be clear here is a simplified example:

OTU1 - seq1,seq2,seq3,seq4,seq5,seq6
seq1,seq2,seq3 belong to group1
seq4,seq5,seq6 belong to group2

If no groups are used mothur finds a representative from the entire OTU: seq1,seq2,seq3,seq4,seq5,seq6.
If groups are used, mothur finds a representative for group1 from: seq1,seq2,seq3, and a representative for group2 from: seq4,seq5,seq6.

Topic		Replies	Views
Representative OTU Seqs in Multisample Analyses Commands in mothur	1	3144	June 24, 2010
how to get groups, sequences and numbers in the same file Commands in mothur	4	4426	August 26, 2013
make.shared, fasta file and the most abudant OTUs Commands in mothur	2	1504	June 17, 2015
question about get.oturep Commands in mothur	1	1442	June 1, 2015
get.oturep from get.sharedseqs? Commands in mothur	3	21041	October 12, 2010

combine results from get.oturep and make.shared

Related topics