get.groups from shared file


I have a .shared file that contains several groups. Now I want to extract the original fasta sequences (i.e. the raw sequences with primers and barcodes) from one of those groups.

I have tried the following:
get.groups(shared=final.shared, groups=control, group=test.groups, fasta=original.fasta)

I get a .fasta file with sequences from the group in question. However there are MORE sequences in this .fasta file than if I sum the OTU line of that sample in the final.shared file; 2944 vs. 2600. 2944 is the number of sequences of the “control” group in the test.groups file. So, I essentially get all raw sequences from the control group represented in my test.groups file, not those only represented in my .shared file.

What do I need to do to get only the sequences that are represented in the .shared file?

Kind regards, Anke.

You are correct the get.groups command relies on the group file to determine the split. It looks like there may be a file mismatch between your shared and group files. Here’s how you would get the results you are looking for:

mothur > list.seqs(list=theListFileYouUsedToCreateTheSharedFile) - list sequences you want
mothur > get.seqs(group=test.groups) - select sequences from list file you want to use
mothur > make.shared(list=current, group=current) - create shared file
mothur > get.groups(shared=current, groups=control, group=current, fasta=original.fasta) - select sequences from the control group