Hi, so I am dealing with a collaborator that doesn’t want to give me all the files used in generating the OTUs for our project. I would like to extract sequences for specific samples from the fasta file he has sent me, and in the example on the get.groups command page it indicates that in order to do this I need a groups file (or, that is the file type used in the example).
Is it possible to use the fasta option in get.groups with a list or a shared file? My collaborator tells me these should be good enough for what I want to do, though I think it likely that he doesn’t understand what I want to do. My goal is to extract the nice clean trimmed mothur sequences I want and map them to sequences from another file entirely using usearch.
Not really. The problem is the shared file has group names, but no sequence names just counts so there is no way to relate the groups to sequences. The list file has sequences names but no groups, so you can relate sequence to groups. In theory you could assign the singleton OTUs to groups using the list and shared file. You could look at the the sequence name in the list file and then see which group had abundance of one in the shared file. For any OTUs with abundance > 1 there would be no way to determine which sequence in the list file was represented by which count in the shared file.