I’m running an analysis based on the miseq SOP. I’ve used the SOP to generate all the standard outputs and would now like to use create.database.
It’s all running well until the create.database command outputs a long list of “[ERROR]: OTU size info does not match for bin x. The contaxonomy file indicated the OTU represented y sequences, but the repfasta file had z. These should match. Make sure you are using files for the same distance.”
I’ve tried lots of combinations of inputs but am still not sure where I’m going wrong. The summary of the most recent code of I’ve tried is:
Commands as per the SOP up to summary.tax
Then, rename the files as follows, just to try and keep in clear in my head:
system(cp stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta final.fasta)
system(cp stability.contigs.good.groups final.groups)
system(cp stability.trim.contigs.good.names final.names)
system(cp stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.count_table final.count_table)
system(cp stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy final.taxonomy)
system(cp stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.tax.summary final.tax.summary)
Then run the following:
dist.seqs(fasta=final.fasta, cutoff=0.20, processors=32)
cluster(column=final.dist, count=final.count_table)
make.shared(list=final.opti_mcc.list, count=final.count_table, label=0.03)
classify.otu(list=final.opti_mcc.list, count=final.count_table, taxonomy=final.taxonomy, label=0.03)
get.oturep(list=final.opti_mcc.list, label=0.03, fasta=final.fasta, column=final.dist, name=final.names)
create.database(shared=final.opti_mcc.shared, label=0.03, repfasta=final.opti_mcc.0.03.rep.fasta, repname=final.opti_mcc.0.03.rep.names, constaxonomy=final.opti_mcc.0.03.cons.taxonomy)
Any thoughts on where I might be going wrong?
Many thanks!