Errors in cluster.split

Hello

I get the following error:
[ERROR]: M03557_132_000000000-KYB35_1_ is not in your count table. Please correct.
/******************************************/
[ERROR]: Could not open stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.singletons.temp

mothur >
[ERROR]: You are missing (
[ERROR]: Invalid.

when running cluster.split.

I didn’t encounter any error when I did the MiSeq SOP tutorial.

Please let me know if you need any other info to find out what went wrong. Thank you for your help.

Hi

I’m happy to help, but I would need to see what command it is having problems with along with how you ran it.

Pat

Hi Dr. Schloss

I am sorry for my late reply. Thank you so much for following up.

I followed the workflow in Mothur MiSeq SOP. I use my existing MiSeq data for 24 samples for V3-V4 on MiSeq v3 600-cycle flow cell (running more than 24 samples already created error “Killed” at the summary.seqs(fasta=stability.trim.contigs.fasta, count=stability.contigs.count_table)).

Below are the commands that I use:
mothur > make.file(inputdir=., type=fastq, prefix=stability)
mothur > make.contigs(file=stability.files)
mothur > summary.seqs(fasta=stability.trim.contigs.fasta, count=stability.contigs.count_table)
mothur > screen.seqs(fasta=stability.trim.contigs.fasta, count=stability.contigs.count_table, maxambig=0, maxlength=475, maxhomop=8)
mothur > unique.seqs(fasta=stability.trim.contigs.good.fasta, count=stability.contigs.good.count_table)
mothur > summary.seqs(count=stability.trim.contigs.good.count_table)
mothur > align.seqs(fasta=ecoli_v3v4_recheck.fasta, reference=silva.seed_v138_1.align)
mothur > pcr.seqs(fasta=silva.seed_v138_1.align, start=6388, end=25316, keepdots=FALSE)
mothur > rename.file(input=silva.seed_v138_1.pcr.align, new=silva.v3v4recheck.fasta)
mothur > summary.seqs(fasta=silva.v3v4recheck.fasta)
mothur > align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=silva.v3v4recheck.fasta)
mothur > summary.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table)
mothur > screen.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table, start=1, end=18929)
mothur > summary.seqs(fasta=current, count=current)
mothur > filter.seqs(fasta=stability.trim.contigs.good.unique.good.align, vertical=T, trump=.) I didn’t get any info about length of filtered alignment etc after the command run was done.
mothur > unique.seqs(fasta=stability.trim.contigs.good.unique.good.filter.fasta, count=stability.trim.contigs.good.good.count_table)
mothur > pre.cluster(fasta=stability.trim.contigs.good.unique.good.filter.unique.fasta, count=stability.trim.contigs.good.unique.good.filter.count_table, diffs=2)
mothur > chimera.vsearch(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate=t)
mothur > classify.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.count_table, reference=trainset18_062020.pds.fasta, taxonomy=trainset18_062020.pds.tax)
mothur > remove.lineage(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pds.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)
mothur > summary.tax(taxonomy=current, count=current)
I didn’t include mock community so I skipped the “Assessing error rates” and “Preparing for analysis” section.
mothur > cluster.split(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pds.wang.pick.taxonomy, taxlevel=7, cutoff=0.03)

For the last command, after calculating distances for 170 groups, the output is as follows:
Output File Names:
stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.169.dist

//
/
/
Finding singletons (ignore ‘Removing group’ messages):

Running command: remove.seqs()
[ERROR]: M03557_132_000000000-KYB35_1_ is not in your count table. Please correct.
/******************************************/
[ERROR]: Could not open stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.singletons.temp

mothur >
[ERROR]: You are missing (
[ERROR]: Invalid.

I appreciate your input/advice on this. Thank you!

It looks like you ran remove.lineage before cluster.split but didn’t use the outputted fasta and count_table files. You did use the outputted taxonomy file. I think you want something like this…

mothur > cluster.split(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pds.wang.pick.taxonomy, taxlevel=**7**, cutoff=0.03)

There should be a ‘pick’ in those file names to reflect that sequences from remove.lineage were removed.

Pat

Hi Dr. Schloss

The only output I got from remove.lineage is stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pds.wang.pick.taxonomy.

I do not have any fasta and count_table files with “pick” in their names.

So when I ran the next command summary.tax (taxonomy=current, count=current), it used stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.count_table as input file for the count parameter.

Where does it go wrong?

Thank you!

Best Regards
Stephanie

If you look in the taxonomy file from before remove.lineage do you see any of the taxa that are listed for removal? If not, skip that step and let me know if you still get an error message

Pat

Hi,

I’m analyzing Illumina MiSeq data by using the MiSeq SOP. I have an error in the cluster.split step.
This is the command I used:

cluster.split(fasta=final.fasta, count=final.count_table, taxonomy=final.taxonomy, taxlevel=4, cutoff=0.03)

When I analyzed the full data (100 samples), it initially went well- split into 186 groups, and saved files eventually up to ‘final.185.dist’ (this part took about 20hr, using 24GB RAM, swamp=6GB, prosessors=16. mothur v.1.48.0).
However, immediately after that, I got the following error message (and all the .temp files were deleted):

//
/
/
Finding singletons (ignore ‘Removing group’ messages):

Running command: remove.seqs()
[ERROR]: MN00371_330_000H is not in your count table. Please correct.
/******************************************/
[ERROR]: Could not open final.singletons.temp

I was looking at the ‘final.count_table’ file, and I see lines like these:
MN00371_330_000H2KWVN_1_12101_14525_14333 1 7,1
MN00371_330_000H2KWVN_1_12103_18577_5199 1 16,1
MN00371_330_000H2KWVN_1_21101_4124_19781 1 88,1
MN00371_330_000H2KWVN_1_13103_12662_6662 1 35,1
MN00371_330_000H2KWVN_1_21102_12699_8286 1 37,1

Interestingly, when I analyzed only a section of my data (18 samples), the process went well (the data was split into 149 groups; the ‘final.dist’ file is 27GB).

I appreciate any relevant input.
Thanks!

This usually means that you renamed the wrong files. What does the command look like where you renamed things?

Pat

Dear Dr. Schloss,

Thanks for your reply.

I followed the SOP instructions until the “Assessing error rates” stage. Unfortunately, in our study, we have not co-sequenced a mock community (I understand it was a mistake!). Therefore, I skipped this stage; meaning, I did not include the get.groups(…, groups=Mock), seq.error(…), and remove.groups(…, groups=Mock) stages.

Accordingely, my rename.file() to have the prefix=final was as followed:

mothur > rename.file(fasta=hproject.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.fasta, count=hproject.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, taxonomy=hproject.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pds.wang.pick.taxonomy, prefix=final)

I’m not sure if it is relevant, but since I sent my question, I have performed the following:

  • I ran the analysis successfully with only a section of my data (18 samples). The cluster.split command was executed smoothly with no error, and the process went well (the data was split into 149 groups; the ‘final.dist’ file was 27GB). The downstream analyses were fine all the way.

  • I performed a phylotype-based analysis with all the data (n=100 samples). It went fine; including alpha diversity, beta diversity measurements, generation of NMDS and PCOA plots, amova, homova, and population-level analyses (my samples are from 3 different environments).

  • In parallel, I also ran the traditional approach using dist.seqs command. As I expected, it took a long and unreasonable time (almost 8 days). The final.dist file is very large (212 GB); It took 637642 secs to find distances for 1404486 sequences. 6248166449 distances below cutoff 0.03. Now I’m waiting for the cluster(column=final.dist, count=final.count_table), which I expect will take some time.

I still would like to understand what I was doing wrong with my initial approach, and to be able to perform the analysis based on OTUs using the cluster.split command, as recommended (with my entire data).

I appreciate any relevant input.

Thanks for your time and expertise!

Tal

Hi Tal,

It’s hard to say at this point if we can’t reproduce the problem. If I had to guess, when you ran it previously you used different files that had undergone different steps in preprocessing.

Pat

Hi Dr. Schloss

No, there is no taxa listed for removal. I will rerun your suggestion and let you know if I encounter the error again.

Thank you!

Best Regards
Stephanie

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.