Is there a way to generate a new fasta file from the subsampled shared file? When I went through the MiSeq SOP, I saw that after the sub.sample command (prior to OTU-based analysis section), there was no option describing how to create a fasta file of the subsampled data. If that isn’t possible, is there a quick way to remove the sequences/OTUs in fasta file that aren’t in the sub.sampled shared file?
There is not a way to select the subsampled sequences from a shared file. When the shared file is subsampled, mothur is looking at the counts in each OTU in each group, there are no sequence names to reference. You can get at what you are looking for in a slightly different way, by subsampling the list and count or group file.
mothur > sub.sample(list=final.opti_mcc.list, count=final.count_table, persample=t) - the subsample size is set to the size of your smallest group.
mothur > list.seqs(list=current) - list names of sequences in subsample
mothur > get.seqs(accnos=current, fasta=yourFastaFile, taxonomy=yourTaxonomyFile) - select the subsampled sequences from your other files
mothur > make.shared(list=current, count=current) - create a shared file from your subsampled list and count files.
Now the subsampled list, shared, fasta, count and taxonomy files all match.
Hello, I am using this suggestion for sub-sampling and following it with get.oturep in order to rename the sequences in the fasta file to have OTU names that match the shared file (so I can make a tree with matching sequence names and import all into phyloseq). I must be missing something though, when I run get.oturep there is a long list of sequence names that appear to be missing from the fasta file. I’ve checked to make sure all the files are current. The sequence of commands are
get.groups - selecting a subset of my data on group, count, list, fasta, names and taxonomy files
sub.sample - using resulting list and count files from get.groups, persample=t
list.seqs - subsampled list file
get.seqs - resulting acconos, fasta and taxonomy from get.groups - this results reports different numbers of sequences selected from fasta and taxonomy
dist.seqs - fasta is output from get.seqs, phylip format
get.outrep - current phylip, current fasta, current subsampled list - this gives me 495 missing sequences, presumably at are in the list file but not the fasta?
I’m getting output, but I’m concerned about the missing sequences