Downstream from subsample

I have come recently to trouble with continuing an anlaysis downstream from subsample.
I need to subsample fasta file and list file - but the command will not allow them both together plus a group file option. (It says it wants to build a new groupfile out of either one or another.)
But if I do want to continue with analysis that forks one direction from the fasta and another from list file, how can I do it so that they will remain synchronous. As I understand, if I run subsample(size=170) for fasta and then for subsample(size=170) for list file they will both contain randomly selected 170 sequences per group, this means those sequences will not be the same 170?

To elaborate. The trouble creeps in at this step of my work.
dist.seqs(fasta=CNP.final.fasta, cutoff=0.10, processors=7)
cluster(column=CNP.final.dist, name=CNP.final.names,method=furthest)
make.shared(list=CNP.final.fn.list, group=CNP.proovikaupa.groups, label=0.05)
count.groups()
#in the count the smallest one is 17904
sub.sample(shared=CNP.final.fn.shared, fasta=CNP.final.fasta, name=CNP.final.names, group=CNP.proovikaupa.groups, persample=T, size=17904)
dist.seqs(fasta=CNP.final.subsample.unique.fasta, output=lt, processors=7)
clearcut(phylip=CNP.final.subsample.unique.phylip.dist)
#then a lot of collect,rarefaction heatmap etc. for the subsampled shared file, (abridged them off now)
#but next, I want to look at the relabund file and , to find out the classification of the most abundand otu-s
get.relabund(shared=CNP.final.fn.0.05.subsample.shared, label=0.05)
#now i would like to do classify.otu-s which requires a list file, how can I get a list file that has been resampled in a same fashion that the shared file(and others were before)

I tried to …
cluster(phylip=CNP.final.subsample.unique.phylip.dist, name=CNP.final.subsample.names,method=furthest)

to get a list file from dist. thats from resampled fasta

classify.otu(taxonomy=CNP.final.taxonomy, list=CNP.final.subsample.unique.phylip.fn.list, name=CNP.final.names)
#but it turned out the numer of otus given by this( 0.05 2032) was different from the otu nr of CNP.final.0.05.subsample.shared (2240)

The CNP.final.fn.subsampled.shared file created from the sub.sample command will not be the same as a file created by running the sub.sample on a fasta file and then running dist.seqs, cluster and make.shared because there is no way to relate the subsampled sequences in a fasta file to those in a shared file.

What you want to do is run:

sub.sample(fasta=CNP.final.fasta, name=CNP.final.names, group=CNP.proovikaupa.groups, persample=T, size=17904)
dist.seqs(fasta=CNP.final.subsample.unique.fasta, output=lt, processors=7)
cluster(phylip=CNP.final.subsample.unique.phylip.dist, name=CNP.final.subsample.names,method=furthest)
make.shared(list=CNP.final.subsample.unique.phylip.fn.list, group=CNP.proovikaupa.subsample.groups)

This will give you a shared file containing your subsampled sequences.

So I will. And if I make classify.otu(list=CNP.final.subsample.unique.phylip.fn.list) then I will get the OTU classificatsions that are in line with the OTU-s in the generated shared file?
If so then thank you for the answer. Sometimes I get a bit lost in a mothur pipeline :lol:

And if I make classify.otu(list=CNP.final.subsample.unique.phylip.fn.list) then I will get the OTU classificatsions that are in line with the OTU-s in the generated shared file?

Yup, :slight_smile:

From sub.sample(fasta=CNP.final.fasta, name=CNP.final.names, group=CNP.proovikaupa.groups, persample=T, size=17904)
I get the resampled fasta file CNP.final.subsample.unique.fasta.
I classify the sequences.
classify.seqs(fasta=CNP.final.subsample.unique.fasta,template=gg_99.pds.ng.fasta, taxonomy=gg_99.pds.tax, cutoff=80, processors=7)
to use the generated taxonomy file for classifying the otu-s in the generated shared file.
classify.otu(list=CNP.final.subsample.unique.phylip.fn.list, taxonomy=CNP.final.subsample.unique.pds.taxonomy)

The mothur fills my log file with error messages DBNW5DQ1:84:B04B8ABXX:5:1101:5641:2214:1:N:0:_78bp_78.0_0.94_CN_12A is not in your taxonomy file. I will not include it in the consensus.
Whats up with that, where did these sequences get lost? RIght now I’m waiting for the mothur to finish its command (or fill my hard drive with error messages :lol: ) Should I use some other taxonomy file then and how would I get it (a tax file that would be in line with the subsampled list file there, thought i could get it from subsampled fasta file)

To be clearer I paste the actual commands in order.

dist.seqs(fasta=CNP.final.fasta, cutoff=0.10, processors=7)
cluster(column=CNP.final.dist, name=CNP.final.names,method=furthest)
make.shared(list=CNP.final.fn.list, group=CNP.proovikaupa.groups, label=0.05)
count.groups()
sub.sample(shared=CNP.final.fn.shared, fasta=CNP.final.fasta, name=CNP.final.names, group=CNP.proovikaupa.groups, persample=T, size=17910)
dist.seqs(fasta=CNP.final.subsample.unique.fasta, output=lt, processors=7)
cluster(phylip=CNP.final.subsample.unique.phylip.dist, name=CNP.final.subsample.names,method=furthest, cutoff=0.10)
make.shared(list=CNP.final.subsample.unique.phylip.fn.list, group=CNP.proovikaupa.subsample.groups, label=0.05)
classify.seqs(fasta=CNP.final.subsample.unique.fasta,template=gg_99.pds.ng.fasta, taxonomy=gg_99.pds.tax, cutoff=80, processors=7)
classify.otu(list=CNP.final.subsample.unique.phylip.fn.list, taxonomy=CNP.final.subsample.unique.pds.taxonomy)

Bunch of error messages >

You need to include the name file in the classify.otu command. The taxonomy file only includes the unique sequences, but the list file contains all the sequences. Mothur is looking for the taxonomies of the redundant sequences and can’t find them.