Generate fasta file from sub.sample shared file

Haemophiluser · December 12, 2017, 5:47pm

Hi,
Is there a way to generate a new fasta file from the subsampled shared file? When I went through the MiSeq SOP, I saw that after the sub.sample command (prior to OTU-based analysis section), there was no option describing how to create a fasta file of the subsampled data. If that isn’t possible, is there a quick way to remove the sequences/OTUs in fasta file that aren’t in the sub.sampled shared file?
Thanks!

westcott · December 19, 2017, 5:37pm

There is not a way to select the subsampled sequences from a shared file. When the shared file is subsampled, mothur is looking at the counts in each OTU in each group, there are no sequence names to reference. You can get at what you are looking for in a slightly different way, by subsampling the list and count or group file.

mothur > sub.sample(list=final.opti_mcc.list, count=final.count_table, persample=t) - the subsample size is set to the size of your smallest group.
mothur > list.seqs(list=current) - list names of sequences in subsample
mothur > get.seqs(accnos=current, fasta=yourFastaFile, taxonomy=yourTaxonomyFile) - select the subsampled sequences from your other files
mothur > make.shared(list=current, count=current) - create a shared file from your subsampled list and count files.

Now the subsampled list, shared, fasta, count and taxonomy files all match.

ADL · July 3, 2018, 3:16pm

Hello, I am using this suggestion for sub-sampling and following it with get.oturep in order to rename the sequences in the fasta file to have OTU names that match the shared file (so I can make a tree with matching sequence names and import all into phyloseq). I must be missing something though, when I run get.oturep there is a long list of sequence names that appear to be missing from the fasta file. I’ve checked to make sure all the files are current. The sequence of commands are

get.groups - selecting a subset of my data on group, count, list, fasta, names and taxonomy files

sub.sample - using resulting list and count files from get.groups, persample=t

list.seqs - subsampled list file

get.seqs - resulting acconos, fasta and taxonomy from get.groups - this results reports different numbers of sequences selected from fasta and taxonomy

dist.seqs - fasta is output from get.seqs, phylip format

get.outrep - current phylip, current fasta, current subsampled list - this gives me 495 missing sequences, presumably at are in the list file but not the fasta?

I’m getting output, but I’m concerned about the missing sequences

Topic		Replies	Views
get rep seqs from subsampled shared file Commands in mothur	2	3379	June 12, 2013
sub.sample feature Commands in mothur	2	3908	January 10, 2011
Which command can subsample fasta file Commands in mothur	3	2893	February 16, 2014
sub.sample and OTUs order of operations Commands in mothur	4	1769	February 25, 2016
Issue for generate shared file (for Subsampling) Commands in mothur	1	1303	March 24, 2015

Generate fasta file from sub.sample shared file

Related topics