Downstream from subsample

jenz · February 17, 2012, 10:13am

I have come recently to trouble with continuing an anlaysis downstream from subsample.
I need to subsample fasta file and list file - but the command will not allow them both together plus a group file option. (It says it wants to build a new groupfile out of either one or another.)
But if I do want to continue with analysis that forks one direction from the fasta and another from list file, how can I do it so that they will remain synchronous. As I understand, if I run subsample(size=170) for fasta and then for subsample(size=170) for list file they will both contain randomly selected 170 sequences per group, this means those sequences will not be the same 170?

To elaborate. The trouble creeps in at this step of my work.
dist.seqs(fasta=CNP.final.fasta, cutoff=0.10, processors=7)
cluster(column=CNP.final.dist, name=CNP.final.names,method=furthest)
make.shared(list=CNP.final.fn.list, group=CNP.proovikaupa.groups, label=0.05)
count.groups()
#in the count the smallest one is 17904
sub.sample(shared=CNP.final.fn.shared, fasta=CNP.final.fasta, name=CNP.final.names, group=CNP.proovikaupa.groups, persample=T, size=17904)
dist.seqs(fasta=CNP.final.subsample.unique.fasta, output=lt, processors=7)
clearcut(phylip=CNP.final.subsample.unique.phylip.dist)
#then a lot of collect,rarefaction heatmap etc. for the subsampled shared file, (abridged them off now)
#but next, I want to look at the relabund file and , to find out the classification of the most abundand otu-s
get.relabund(shared=CNP.final.fn.0.05.subsample.shared, label=0.05)
#now i would like to do classify.otu-s which requires a list file, how can I get a list file that has been resampled in a same fashion that the shared file(and others were before)

I tried to …
cluster(phylip=CNP.final.subsample.unique.phylip.dist, name=CNP.final.subsample.names,method=furthest)

to get a list file from dist. thats from resampled fasta

classify.otu(taxonomy=CNP.final.taxonomy, list=CNP.final.subsample.unique.phylip.fn.list, name=CNP.final.names)
#but it turned out the numer of otus given by this( 0.05 2032) was different from the otu nr of CNP.final.0.05.subsample.shared (2240)

westcott · February 17, 2012, 1:12pm

The CNP.final.fn.subsampled.shared file created from the sub.sample command will not be the same as a file created by running the sub.sample on a fasta file and then running dist.seqs, cluster and make.shared because there is no way to relate the subsampled sequences in a fasta file to those in a shared file.

What you want to do is run:

sub.sample(fasta=CNP.final.fasta, name=CNP.final.names, group=CNP.proovikaupa.groups, persample=T, size=17904)
dist.seqs(fasta=CNP.final.subsample.unique.fasta, output=lt, processors=7)
cluster(phylip=CNP.final.subsample.unique.phylip.dist, name=CNP.final.subsample.names,method=furthest)
make.shared(list=CNP.final.subsample.unique.phylip.fn.list, group=CNP.proovikaupa.subsample.groups)

This will give you a shared file containing your subsampled sequences.

jenz · February 17, 2012, 2:19pm

So I will. And if I make classify.otu(list=CNP.final.subsample.unique.phylip.fn.list) then I will get the OTU classificatsions that are in line with the OTU-s in the generated shared file?
If so then thank you for the answer. Sometimes I get a bit lost in a mothur pipeline :lol:

westcott · February 17, 2012, 9:17pm

And if I make classify.otu(list=CNP.final.subsample.unique.phylip.fn.list) then I will get the OTU classificatsions that are in line with the OTU-s in the generated shared file?

Yup,

jenz · February 20, 2012, 10:13am

From sub.sample(fasta=CNP.final.fasta, name=CNP.final.names, group=CNP.proovikaupa.groups, persample=T, size=17904)
I get the resampled fasta file CNP.final.subsample.unique.fasta.
I classify the sequences.
classify.seqs(fasta=CNP.final.subsample.unique.fasta,template=gg_99.pds.ng.fasta, taxonomy=gg_99.pds.tax, cutoff=80, processors=7)
to use the generated taxonomy file for classifying the otu-s in the generated shared file.
classify.otu(list=CNP.final.subsample.unique.phylip.fn.list, taxonomy=CNP.final.subsample.unique.pds.taxonomy)

The mothur fills my log file with error messages DBNW5DQ1:84:B04B8ABXX:5:1101:5641:2214:1:N:0:_78bp_78.0_0.94_CN_12A is not in your taxonomy file. I will not include it in the consensus.
Whats up with that, where did these sequences get lost? RIght now I’m waiting for the mothur to finish its command (or fill my hard drive with error messages :lol: ) Should I use some other taxonomy file then and how would I get it (a tax file that would be in line with the subsampled list file there, thought i could get it from subsampled fasta file)

To be clearer I paste the actual commands in order.

dist.seqs(fasta=CNP.final.fasta, cutoff=0.10, processors=7)
cluster(column=CNP.final.dist, name=CNP.final.names,method=furthest)
make.shared(list=CNP.final.fn.list, group=CNP.proovikaupa.groups, label=0.05)
count.groups()
sub.sample(shared=CNP.final.fn.shared, fasta=CNP.final.fasta, name=CNP.final.names, group=CNP.proovikaupa.groups, persample=T, size=17910)
dist.seqs(fasta=CNP.final.subsample.unique.fasta, output=lt, processors=7)
cluster(phylip=CNP.final.subsample.unique.phylip.dist, name=CNP.final.subsample.names,method=furthest, cutoff=0.10)
make.shared(list=CNP.final.subsample.unique.phylip.fn.list, group=CNP.proovikaupa.subsample.groups, label=0.05)
classify.seqs(fasta=CNP.final.subsample.unique.fasta,template=gg_99.pds.ng.fasta, taxonomy=gg_99.pds.tax, cutoff=80, processors=7)
classify.otu(list=CNP.final.subsample.unique.phylip.fn.list, taxonomy=CNP.final.subsample.unique.pds.taxonomy)

Bunch of error messages >

westcott · February 20, 2012, 12:49pm

You need to include the name file in the classify.otu command. The taxonomy file only includes the unique sequences, but the list file contains all the sequences. Mothur is looking for the taxonomies of the redundant sequences and can’t find them.

Topic		Replies	Views
Issues subsampling data Commands in mothur	13	11585	September 11, 2014
Generate fasta file from sub.sample shared file Commands in mothur	2	1375	July 3, 2018
Classify.seqs Commands in mothur	4	2267	February 6, 2015
classify.otu with normalised data Commands in mothur	17	17608	September 10, 2014
Subsampled Data Commands in mothur	1	1678	October 14, 2014

Downstream from subsample

to get a list file from dist. thats from resampled fasta

Related topics