chimera.vsearch and/or classify.seqs error - no sequences?

Hi all,

I am running a 16S rRNA sequencing run of a mock community DNA sample (from Zymo - should be 8 bacterial members) through the Mothur pipeline. I am following the MiSeq SOP online and notes that I have from a one-day workshop Pat taught at EDAMAME. Anyways, things were going pretty smoothly until I reached the chimera.vsearch step. The output should include a count table (for which I can include in the classify.seqs step) but it does not give me one in the output. Below is my script and results from using chimera.vsearch and further steps. It seemed like this step worked, but it just didn’t give a count table…

From the chimera.vsearch step, I still attempted to run classify.seqs with the current count table (looks like it used from pre.cluster step) and then remove lineage, and run summary.tax() but there are no sequences. Even when I run summary.tax() specifying the file that comes out of classify.seqs (before removing lineage) it gives back the message, 0 seconds 0 sequences. I’m not exactly sure what is going on here. I’m assuming something is not lining up right from the chimera.vsearch step on. But maybe it’s something else? As a note, I’ve already run this mock sample/sequencing data through QIIME and Uparse pipelines already and everything seemed OK, I don’t think there’s anything technically wrong with it, at least that I can detect.

Any help you could provide would be much appreciated!

Thank you in advance,
Vanessa



Chimera.vsearch: chimera.vsearch(fasta=PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate=t)

Using 1 processors.
/opt/mothur/1.39.0/mothurvsearch file does not exist. Checking path…
Found vsearch in your path, using /home/resgoodman/lab/tools/vsearch/1.11.1/bin/vsearch
Checking sequences from PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.fasta …
vsearch v1.11.1_linux_x86_64, 1514.0GB RAM, 48 cores

Reading file PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.temp 100%
6050 nt in 24 seqs, min 251, max 254, avg 252
Masking 100%
Sorting by abundance 100%
Counting unique k-mers 100%
Detecting chimeras 100%
Found 4 (16.7%) chimeras, 20 (83.3%) non-chimeras,
and 0 (0.0%) borderline sequences in 24 unique sequences.
Taking abundance information into account, this corresponds to
15 (0.1%) chimeras, 16979 (99.9%) non-chimeras,
and 0 (0.0%) borderline sequences in 16994 total sequences.

It took 0 secs to check 24 sequences. 4 chimeras were found.

Output File Names:
PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.chimeras
PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.accnos

Removed chimeras:
remove.seqs(fasta=PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.fasta, accnos=PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.accnos)
Removed 4 sequences from your fasta file.

Output File Names:
PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta



classify.seqs(fasta=current, count=current, reference=reference/trainset9_032012.pds.fasta, taxonomy=reference/trainset9_032012.pds.tax, cutoff=80)classify.seqs(fasta=current, count=current, reference=reference/trainset9_032012.pds.fasta, taxonomy=reference/trainset9_032012.pds.tax, cutoff=80) Using PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.count_table as input file for the count parameter. Using PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta as input file for the fasta parameter.

Using 1 processors.
Reading template taxonomy… DONE.
Reading template probabilities… DONE.
It took 5 seconds get probabilities.
Classifying sequences from PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta …
Processing sequence: 20

It took 0 secs to classify 20 sequences.


It took 0 secs to create the summary file for 20 sequences.
Output File Names: PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.tax.summary

Remove lineage:

remove.lineage(fasta=current, count=current, taxonomy=PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)
Using PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.count_table as input file for the count parameter.
Using PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta as input file for the fasta parameter.

[NOTE]: The count file should contain only unique names, so mothur assumes your fasta, list and taxonomy files also contain only uniques.


Output File Names: PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.count_table

summary.tax() Using PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy as input file for the taxonomy parameter.

It took 0 secs to create the summary file for 0 sequences.


Output File Names: PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.tax.summary

Can you upgrade to our current version, 1.39.4 https://github.com/mothur/mothur/releases?

Hi Sarah,

I upgraded to the newest version of Mothur v. 1.39.4 but I am still having the same issue and now also encounterning a new/different issue.

  1. Rerunning the script under newest Mothur version still does not produce a count table from chimera.vsearch and I also tried chimera.uchime. Is this because I’m only using one sample (my mock community sample)? I was thinking this could be an issue. So, I thought I would just start from the beginning and include a second sample and also do in parallel with the exact MiSeq_SOP data.

  2. The new/different issue is now when I start from the beginning, I cannot run the make.file command. It says there is a segmentation fault error even though I’m running this on our high memory node and I did not have a problem with this previously. It gives this same error when I run with the MiSeq SOP and the exact command and script from the MiSeq SOP site. See below.

Any help you can provide on this would be much appreciated. Thanks,

Vanessa

make.file(inputdir=files/, type=gz, numcols=3, prefix=controls)
Setting input directory to: files/
Segmentation fault (core dumped)

make.file(inputdir=MiSeq_SOP, type=fastq, prefix=stability)
Setting input directory to: MiSeq_SOP/
Segmentation fault (core dumped)

Also, just as a note to my previous reply, our HPC programmer looked into this issue from our end (segmentation, core dumped after make.file) and was not sure how to proceed. It happened for her too when running this command whether on our login interactive node or high memory node. Do you think we should go back to a previous version?

Thanks,
Vanessa

Hi all,

I was just wondering if anyone had any suggestions/comments/ideas on moving forward from the errors I’m encountering above. I’m not sure how to continue processing my samples using Mothur. Any help would be greatly appreciated!

Thank you,
Vanessa

Does your count file contain group information? If it does not, mothur does not output a new count file. This is because dereplicate=t only applies to datasets with more than one sample. The dereplicate parameter is used to indicate whether you want sequences flagged as chimeric in one sample to be be removed from all samples. By default dereplicate=f, meaning if mothur finds a sequence to be chimeric in one group it will be removed from all groups. You can remove the chimeric sequences from your count file by including it in the remove.seqs command (this is the same as setting dereplicate=f in the chimera.vsearch command).

mothur > chimera.vsearch(fasta=PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=PositiveControl/Positive.trim.contigs.good.unique.good.filter.unique.precluster.count_table)

mothur > remove.seqs(fasta=current, count=current, accnos=current)

Hi Sarah,

Thanks! That makes sense. The other issue we’re now experiencing though is that with the newest version of Mothur (1.39.4) we can’t proceed past make.file because it’s giving a segmentation fault error (see below). This is with using the MiSeq SOP test data set (not just one sample). I had our computing analyst test it too (interactively on our computing login node and high memory node) and she did not know how to proceed other than to change Mothur versions? Any help you can provide on this would be great. Right now, we’re not sure how to proceed forward with processing the data with Mothur given this error.

Thanks in advance,
Vanessa

make.file(inputdir=MiSeq_SOP, type=fastq, prefix=stability)
Setting input directory to: MiSeq_SOP/
Segmentation fault (core dumped)

Thanks for reporting this bug with make.file. We added the delim parameter in our latest version. By default we set it to ‘’ character. The _ character in the directory name is causing a parsing issue. This will be fixed in the next release. In the meantime when using make.file, please remove any '’ from the directory names.

Note: Version 1.39.5 is our latest release