Chimera.vsearch removes groups and generates empty files

Dear mothur developers, and users,

I apologize for taking your time but I have encountered an error that I could not solve by myself. I have checked the forum, and also other pages, but I did not find any similar problem. I’m new to mothur, so please be patient with me. The situation is the following:

I use mothur v.1.44.3, our server has 99GB RAM, 5 cores, and 2TB disk space assigned for this project. When I run chimera.vsearch, the command generates the fasta and count_table files for each group successfully, however after only creates empty output files (…vsearch.pick.count_table, …vsearch.chimeras, …vsearch.accnos). Since for me the chimera.vsearch takes long time to run, more than a day or two, when I checked the mothur I saw that an error messages printed only on the screen: "
Removing group: EA1A201803 because all sequences have been removed." (please, check the attached screenshot).


My command is the following:

chimera.vsearch(fasta=/home/tendre/metagenomics_kati/data/2018/precluster/2018.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=/home/tendre/metagenomics_kati/data/2018/precluster/2018.trim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate= T, outputdir=/home/tendre/metagenomics_kati/data/2018/chimera)

I have no idea where to search for the solution or where is this problem originates from. Please, help me to solve it.
I attach my log file, in which chimera.vsearch starts from the 662nd line (https://ufile.io/hb3mbf53).

Recently, I noticed that chimera.vsearch reports an Error about an empty file?? How is it possible, I have no idea :frowning: Please look:

Are you seeing this with our current version, Release Version 1.47.0 · mothur/mothur · GitHub? If so, can you send your log, fasta and count file to mothur.bugs@gmail.com?

Dear Dr. Westcott,

Previously, I have tried 1.44.3 because we have a Ubuntu18 installed on a server, but after your reply, I asked our admin and we installed the newest release 1.47 on an Ubuntu20. However, I just realized that the version 1.47 does not create a .groups file when running a make.contigs. May I ask how can I set this information in this version?

I had no chance yet to test if this version produce the same error or not, because I need to reach to that point in my pipeline and the question above stopped me trying :frowning:

The 1.47.0 release does include several changes that will effect end user scripts and batch files. Some are forced and some are optional. The MiSeq_SOP MiSeq SOP reflects the changes. These are some of the highlights:

  1. Addition of mothurhome keyword. This can be used throughout mothur, but perhaps is most helpful for the make.file command.

    mothur > make.file(inputdir=mothurhome, type=fastq, prefix=stability)

  2. Changes intended to move users to count tables over the name / group files. This starts with the make.contigs or trim.seqs commands. Both commands no longer output group files, instead they output count files. This will require updates to batches and scripts since there is no longer a group file outputted.

  3. Screening options are now added to make.contigs. You can still run them separately, but running them with make.contigs improves speed by avoiding reprocessing the files.

    mothur > make.contigs(file=stability.files, maxambig=0, maxlength=275)

  4. Unique.seqs now outputs a count file by default.

  5. Chimeras are removed by default. You can still run the remove.seqs command without error, but it is not necessary.

  6. Blast options are removed, so any batches or scripts with Blast as an option will fail.

2 Likes

Your batch file might look like this:

make.file(inputdir=…/MiSeq_SOP, type=gz, prefix=stability) - create file containing paired fastq files

make.contigs(file=stability.files, maxambig=0, maxlength=275, maxhomop=8) - assemble paired reads screening for length, ambiguous bases and homopolymers

summary.seqs(fasta=current, count=current) - summarize dataset

unique.seqs(count=current) - combine identical sequences. Be sure to include the count file here so mothur will retain the group assignments from make.contigs.

summary.seqs(count=current) - summarize dataset

align.seqs(fasta=current, reference=silva.v4.fasta) - align sequence to V4 region

screen.seqs(fasta=current, count=current, start=1969, end=11551) - screen to ensure good overlap of reads

filter.seqs(fasta=current, vertical=T, trump=.) - filter sequences

unique.seqs(fasta=current, count=current) - combine identical reads

pre.cluster(fasta=current, count=current, diffs=2) - merge with diffs <= 2

chimera.vsearch(fasta=current, count=current, dereplicate=t) - identify and remove chimeras

summary.seqs(count=current) - summarize dataset

classify.seqs(fasta=current, count=current, reference=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax, cutoff=80) - classify reads

remove.lineage(fasta=current, count=current, taxonomy=current, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota) - remove contaminants

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.