Hello all,
My name is Vladimir and this is my first topic in the forum so i really hope if you guys can help me with the following:
I’m currently working with AMI of Amazon Web Service: m4.4xlarge 16 cores, 64 gb ram, and 600gb of hhd and Windows Base 64 bits
I’m working with 13 samples at the same time (1 stability.files with 13 included) for networking purposes. Right now i’m having problems clustering the distance matrix because the dist.seqs command created a 368 gb dist file and the cluster command can’t read the file and throws that error recommending me to use 64 bits mothur and contact Pat Schloss, etc. The main objective of the pipeline that i’m using is to get the .share file that make.shared command creates, so with this file i can work in another software and create the network that i want between the 13 samples.
I’ve been reading here that you guys recommend to use hcluster command because it doesn’t store the matrix on the ram memory but i don’t know exactly how to use it because that command asks for column and name and i don’t know what to use for the name file (unlike cluster command that asks for column and count). Can anyone tell me how to use hcluster in more details according to the files i manage and mentioned?
I haven’t tried yet setting cutoff to 0.20 on the cluster command as once read here in the forum.
PD: I already did this with other 13 samples with a 90 gb distance file and i successfully got my .shared
Here i leave you the pipeline that i’m using:
*make.contigs(file=stability.files, processors=16)
*screen.seqs(fasta=stability.trim.contigs.fasta, group=stability.contigs.groups, maxambig=0, maxlength=292)
*unique.seqs(fasta=stability.trim.contigs.good.fasta)
*count.seqs(name=stability.trim.contigs.good.names, group=stability.contigs.good.groups)
*align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=silva123.fasta, flip=T)
*screen.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table, *summary=stability.trim.contigs.good.unique.summary, start=2, end=13423, maxhomop=8)
*filter.seqs(fasta=stability.trim.contigs.good.unique.good.align, vertical=T, trump=.)
*unique.seqs(fasta=stability.trim.contigs.good.unique.good.filter.fasta, count=stability.trim.contigs.good.good.count_table)
*pre.cluster(fasta=stability.trim.contigs.good.unique.good.filter.unique.fasta, count=stability.trim.contigs.good.unique.good.filter.count_table, diffs=2)
*chimera.uchime(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate=t, processors=1)
*remove.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, accnos=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.accnos)
*classify.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, reference=silva123.fasta, taxonomy=silva.nr_v123.tax, cutoff=80)
*remove.lineage(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.nr_v123.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)
*summary.seqs -------> # of unique seqs= 160226 / total # of seqs= 624926
*dist.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, cutoff=0.20)
------------------------------------(here follows the 2 commands i need to run but can’t yet)--------------------------------------------------------------------
*cluster(column=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.dist, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.count_table)
*make.shared(list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.count_table, label=0.03)
Thank you !