cluster(column=xxx) & count table

maracashay · July 9, 2015, 1:45pm

Hi everyone,

I am working through the dist.seqs and cluster commands. Dist.seqs works when I use 1 processor and I don’t get any error messages. When I run cluster it gives me “[ERROR]: M00780_M00780_85_000000000-AF4Y6_1_1114_12083_12138 is not in your count table. Please correct.”. Sequence M00780_85_000000000-AF4Y6_1_1114_12083_12138 is in my count table, so does this mean that my .dist file has the wrong sequence name? If so, is there a way to rename the sequence in my .dist file? I tried to open the .dist file up but it is too large to open with my comp.

Thanks all!
-Mara

westcott · July 15, 2015, 12:25pm

Could you post the version of mothur you are using and the command you ran with mothur?

sebasdiazz · July 15, 2015, 5:39pm

Hi,
I am having the same problem with the cluster command. It gives me: “Reading matrix: |||||||||||||||||||||||||||||||||||||||||||||||||||||[ERROR]: HWI-M00234_281_000000000-AF58H_1_2114_807HWI-M00234_281_000000000-AF58H_1_1111_20778_22116 is not in your count table. Please correct.”

I am using mothur v.1.35.1, and with the output files of the uchime command, I ran these commands:
-dist.seqs(fasta=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, cutoff=0.20, processors=4)
-cluster(column=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.dist, count=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table)

Thanks,
Sebastián

westcott · July 15, 2015, 6:56pm

Did you run chimera.uchime with dereplicate=t? Then run remove.seqs(fasta=inputFiletoChimeraUchime, accnos=current). It looks like you did 2 selection commands after chimera.uchime. Could you post those?

sebasdiazz · July 15, 2015, 7:27pm

Hi,

I ran this uchime command:
-chimera.uchime(fasta=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate=t, processors=4)

Later I removed the chimeras with:
-remove.seqs(fasta=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.count_table, accnos=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.uchime.accnos)

I didn’t use the output file generated with uchime:
kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table

Finally I removed the undesirables sequences now using the uchime count_file:
-remove.lineage(fasta=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, taxonomy=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.pick.nr_v119.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)

Thanks,
Sebastián

westcott · July 16, 2015, 3:04am

Thanks for posting your commands. Could you send the output files from chimera.uchime and your taxonomy file to mothur.bugs@gmail.com so I can take a closer look at the issue for you?

westcott · July 16, 2015, 3:16pm

Later I removed the chimeras with:
-remove.seqs(fasta=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.count_table, accnos=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.uchime.accnos)

I didn’t use the output file generated with uchime:
kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table

If you didn’t use the file generated by chimera.uchime, then the dereplicate=t option will effectively not be used. By default remove.seqs removes duplicates as well. So any sequence flagged to be chimeric by in any sample will be removed from all samples.

Finally I removed the undesirables sequences now using the uchime count_file:
-remove.lineage(fasta=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, taxonomy=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.pick.nr_v119.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)

You should also include the taxonomy file above on the remove.seqs command to remove the chimeric sequences from the taxonomy file. Otherwise you will have mismatches later on.

maracashay · July 18, 2015, 4:58pm

Hi,

I am using the 1.35.1 version and the command that I used is dist.seqs(fasta=cb.unique.precluster.pick.pick.fasta, cutoff=0.20) which gave me the output /home/cloutierml/Cave_Bacteria/cb.unique.precluster.pick.pick.dist.

Then I ran cluster(column-cb.unique.precluster.pick.pick.dist, count=cb.unique.precluster.uchime.pick.pick.count_table) and I get this -->
Reading matrix: ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||[ERROR]: M00780_85_000000000-AF4Y6_1_1115_15346M00780_85_000000000-AF4Y6_1_1115_15346_13527 is not in your count table. Please correct.

Every time I run dist.seqs and cluster, I have the same error but the sequence has changed. It seems like the dist.seqs command isn’t working correctly but I have used 1 processor each time that I have ran it, so I really don’t know what I should do.

Does anyone have any ideas on how to either fix the problem or to remove the single sequence that seems to be messed up in my .dist file?

Thanks!
-Mara

westcott · July 20, 2015, 5:48pm

M00780_85_000000000-AF4Y6_1_1115_15346M00780_85_000000000-AF4Y6_1_1115_15346_13527 looks like a merge of 2 sequence names. We have see an error like this before, Large dist.seqs producing corrupt files? Can you try rerunning the dist.seqs command again with processors=1, then the cluster command.

maracashay · July 20, 2015, 6:24pm

Hey!

Yes, in fact, I have ran the dist.seqs command 4 times followed by the cluster command and each time it is a new sequence that is messed up. Everytime I have used the dist.seqs and cluster command I have used 1 processor. I can run it again but it is taking me almost 3 days to run both of the commands, so if there is another way to get around this problem, I would really appreciate it.

Thanks!
-Mara

westcott · July 20, 2015, 8:48pm

The other way we have seen this error is when someone runs out of disk space. Could that be an issue? How large is your distance file? If dist.seqs is taking 3 days, it may be too large to process. Pat has written a blog about this issue, http://blog.mothur.org/2014/09/11/Why-such-a-large-distance-matrix%3F/.

Topic		Replies	Views
Cluster.split problem Commands in mothur	1	2823	October 28, 2014
cluster can't find seq name in count.table Commands in mothur	1	2064	September 23, 2014
cluster and correcting count file Commands in mothur	4	2335	January 8, 2015
Cluster error -- Commands in mothur	10	2055	October 18, 2016
cluster split failing Commands in mothur	9	3668	October 14, 2015

cluster(column=xxx) & count table

Related topics