cluster(column=xxx) & count table

Hi everyone,

I am working through the dist.seqs and cluster commands. Dist.seqs works when I use 1 processor and I don’t get any error messages. When I run cluster it gives me “[ERROR]: M00780_M00780_85_000000000-AF4Y6_1_1114_12083_12138 is not in your count table. Please correct.”. Sequence M00780_85_000000000-AF4Y6_1_1114_12083_12138 is in my count table, so does this mean that my .dist file has the wrong sequence name? If so, is there a way to rename the sequence in my .dist file? I tried to open the .dist file up but it is too large to open with my comp.

Thanks all!
-Mara

Could you post the version of mothur you are using and the command you ran with mothur?

Hi,
I am having the same problem with the cluster command. It gives me: “Reading matrix: |||||||||||||||||||||||||||||||||||||||||||||||||||||[ERROR]: HWI-M00234_281_000000000-AF58H_1_2114_807HWI-M00234_281_000000000-AF58H_1_1111_20778_22116 is not in your count table. Please correct.”

I am using mothur v.1.35.1, and with the output files of the uchime command, I ran these commands:
-dist.seqs(fasta=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, cutoff=0.20, processors=4)
-cluster(column=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.dist, count=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table)

Thanks,
Sebastián

Did you run chimera.uchime with dereplicate=t? Then run remove.seqs(fasta=inputFiletoChimeraUchime, accnos=current). It looks like you did 2 selection commands after chimera.uchime. Could you post those?

Hi,

I ran this uchime command:
-chimera.uchime(fasta=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate=t, processors=4)

Later I removed the chimeras with:
-remove.seqs(fasta=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.count_table, accnos=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.uchime.accnos)

I didn’t use the output file generated with uchime:
kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table

Finally I removed the undesirables sequences now using the uchime count_file:
-remove.lineage(fasta=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, taxonomy=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.pick.nr_v119.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)

Thanks,
Sebastián

Thanks for posting your commands. Could you send the output files from chimera.uchime and your taxonomy file to mothur.bugs@gmail.com so I can take a closer look at the issue for you?

Later I removed the chimeras with:
-remove.seqs(fasta=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.count_table, accnos=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.uchime.accnos)

I didn’t use the output file generated with uchime:
kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table

If you didn’t use the file generated by chimera.uchime, then the dereplicate=t option will effectively not be used. By default remove.seqs removes duplicates as well. So any sequence flagged to be chimeric by in any sample will be removed from all samples.

Finally I removed the undesirables sequences now using the uchime count_file:
-remove.lineage(fasta=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, taxonomy=kissingbug.trim.contigs.good.unique.good.filter.unique.precluster.pick.nr_v119.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)

You should also include the taxonomy file above on the remove.seqs command to remove the chimeric sequences from the taxonomy file. Otherwise you will have mismatches later on.

1 Like

Hi,

I am using the 1.35.1 version and the command that I used is dist.seqs(fasta=cb.unique.precluster.pick.pick.fasta, cutoff=0.20) which gave me the output /home/cloutierml/Cave_Bacteria/cb.unique.precluster.pick.pick.dist.

Then I ran cluster(column-cb.unique.precluster.pick.pick.dist, count=cb.unique.precluster.uchime.pick.pick.count_table) and I get this -->
Reading matrix: ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||[ERROR]: M00780_85_000000000-AF4Y6_1_1115_15346M00780_85_000000000-AF4Y6_1_1115_15346_13527 is not in your count table. Please correct.

Every time I run dist.seqs and cluster, I have the same error but the sequence has changed. It seems like the dist.seqs command isn’t working correctly but I have used 1 processor each time that I have ran it, so I really don’t know what I should do.

Does anyone have any ideas on how to either fix the problem or to remove the single sequence that seems to be messed up in my .dist file?

Thanks!
-Mara

M00780_85_000000000-AF4Y6_1_1115_15346M00780_85_000000000-AF4Y6_1_1115_15346_13527 looks like a merge of 2 sequence names. We have see an error like this before, Large dist.seqs producing corrupt files? Can you try rerunning the dist.seqs command again with processors=1, then the cluster command.

Hey!

Yes, in fact, I have ran the dist.seqs command 4 times followed by the cluster command and each time it is a new sequence that is messed up. Everytime I have used the dist.seqs and cluster command I have used 1 processor. I can run it again but it is taking me almost 3 days to run both of the commands, so if there is another way to get around this problem, I would really appreciate it.

Thanks!
-Mara

The other way we have seen this error is when someone runs out of disk space. Could that be an issue? How large is your distance file? If dist.seqs is taking 3 days, it may be too large to process. Pat has written a blog about this issue, http://blog.mothur.org/2014/09/11/Why-such-a-large-distance-matrix%3F/.