Trim.seqs removing all seqs (processing single-end reads)

jbreil · August 5, 2020, 12:06am

Hello,

I, like some others I have seen on the forums here, am in a situation where my MiSeq run produced very low quality reverse reads and so I would like to attempt to use just the forward reads for my analysis. The forward reads are of a high average overall quality, with FastQC (suggesting that the per base average quality score is well above 25 across the entire region).

I am attempting to process these single-end files into a fasta that I can then integrate into the MiSeq_SOP protocol. I have been using the forum post here as a guide for doing so.

I am currently working with mothur in the command line. I used the following for-loop to run fastq.info on each of my forward (R1) fastq files from my MiSeq run, since the fastq.info() file parameter would not accept a file with only a groups and a forward read column. The loop is as follows:

--Begin-------------------------------------------------------------------------------------------------------------------------------
for i in *.fastq
do
	/Users/Brad/Desktop/mothur/mothur 
             "#fastq.info(fastq=/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/$i)"
done
--End---------------------------------------------------------------------------------------------------------------------------------

Please note the mother command itself was entered as one line; the line change is a product of the topic entry window. I cannot figure out how to attach the logfile, but here is a sample of the log files from this loop for one of my fastq files:

--Begin fastq.info() log-----------------------------------------------------------------------------------------------------------

Mac version

Using ReadLine,Boost,HDF5,GSL
mothur v.1.44.1
Last updated: 08/04/2020
by
Patrick D. Schloss

Department of Microbiology & Immunology

University of Michigan
#http://www.mothur.org

When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.

Distributed under the GNU General Public License

Type 'help()' for information on the commands that are available

For questions and analysis support, please visit our forum at https://forum.mothur.org

Type 'quit()' to exit program

[NOTE]: Setting random seed to 19760620.

Script Mode


mothur > fastq.info(fastq=/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/BRU_00410_S6_L001_R1_001.fastq)
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
110000
120000
130000
140000
150000
154990

Output File Names: 
/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/BRU_00410_S6_L001_R1_001.fasta
/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/BRU_00410_S6_L001_R1_001.qual


mothur > quit()


It took 17 seconds to run 2 commands from your script.

--End fastq.info() log-------------------------------------------------------------------------------------------------------------

This loop successfully produced .fasta and .qual files for each fastq. I am now trying to use trim.seqs()
to remove seqs with a low average quality score from each fasta. This is where I am running into trouble. Each time I run the command, all seqs are removed. To test whether I was too stringent in my parameterization of qaverage, I set the parameter to 1 and still all seqs were removed. The command I am running looks as follows:

--Begin-------------------------------------------------------------------------------------------------------------------------------
/Users/Brad/Desktop/mothur/mothur "#trim.seqs(fasta=/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/BRU_00410_S6_L001_R1_001.fasta, qfile=/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/BRU_00410_S6_L001_R1_001.qual, qaverage=25)"
--End---------------------------------------------------------------------------------------------------------------------------------

The log file from this command is as follows:

--Begin trim.seqs() log-----------------------------------------------------------------------------------------------------------

Mac version

Using ReadLine,Boost,HDF5,GSL
mothur v.1.44.1
Last updated: 08/04/2020
by
Patrick D. Schloss

Department of Microbiology & Immunology

University of Michigan
#http://www.mothur.org

When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.

Distributed under the GNU General Public License

Type 'help()' for information on the commands that are available

For questions and analysis support, please visit our forum at https://forum.mothur.org

Type 'quit()' to exit program

[NOTE]: Setting random seed to 19760620.

Script Mode


mothur > trim.seqs(fasta=/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/BRU_00410_S6_L001_R1_001.fasta, qfile=/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/BRU_00410_S6_L001_R1_001.qual, qaverage=1)

Using 8 processors.

Removing group: scrap because all sequences have been removed.
It took 21 secs to trim 154990 sequences.


Output File Names: 
/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/BRU_00410_S6_L001_R1_001.trim.fasta
/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/BRU_00410_S6_L001_R1_001.scrap.fasta
/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/BRU_00410_S6_L001_R1_001.trim.qual
/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/BRU_00410_S6_L001_R1_001.scrap.qual


mothur > quit()


It took 21 seconds to run 2 commands from your script.

--End trim.seqs() log-------------------------------------------------------------------------------------------------------------

Since I have independent confirmation that the average quality of my reads is good (certainly higher than an average of 1 across all bases) I assume I am missing some required parameter in trim.seqs, but cannot figure out what else the command would need based on the trim.seqs() documentation.

Can anyone think of what might be causing all of my sequences to be removed? I greatly appreciate any help or insights.

Cheers,
jbreil

EDIT: This seems to be happening for any operation I attempt. For instance, if I replace qaverage with a minlength and maxlength option, again despite sequences clearly falling within those parameters, all of them are getting removed. Perhaps the program is not reading my fasta correctly? Summary.seqs does successfully summarize the same fasta however.

jbreil · August 5, 2020, 12:08am

I apologize for that post looking so janked up. I clearly was using some formatting options I did not intend to use; I think I turned my commands into titles.

EDIT: I was able to fix that so it looks cleaner

pschloss · August 6, 2020, 8:23pm

With 454 data, we previously suggested using qwindowaverage=35, qwindowsize=50. This requires each 50 nt window across the sequence to have an average quality score of 35. If the average drops below 35, the sequence is trimmed just before that point. What you were doing was to keep reads where the average across the length was 25. That’s actually, not all that good. Bases with a quality score of 35 are 10-fold better than those with a score of 25.

Pat

jbreil · August 10, 2020, 7:54pm

Thank you for your response! I will certainly update these parameters once I have the operation working, but even with the avg qual score set to 1 over the default window (which to my knowledge should essentially mean that none of my seqs are trimmed since this is as lax as possible), all seqs are being removed. For this reason, I do not think this issue is related to how i parameterize my trimming criterion and is related to some other aspect of the operation or the files I have prepared. I, however, cannot figure out what it may be. I have manually inspected the fasta and qual files and everything seems relatively normal (no long strings of nulls or long strings of low qual bases in any of the first few seqs I inspected).

system · August 20, 2020, 7:54pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

westcott · August 24, 2020, 6:21pm

Thanks for reporting this issue. The trim.seqs command was expecting a name, count or group file. These files should not be a requirement for the type of quality screening you are doing with trim.seqs. I will fix this bug, but in the meantime here’s a workaround.

mothur > fastq.info(fastq=F8D147_S365_L001_R1_001.fastq) - extract fasta and files

mothur > unique.seqs(fasta=current) - create names file removing duplicate reads from the fasta file

mothur > list.seqs(fasta=current) - list unique reads

mothur > get.seqs(qfile=current) - select unique reads from the quality file

mothur > trim.seqs(fasta=current, name=current, qfile=current, qwindowaverage=35, qwindowsize=50) - trim reads for quality

westcott · August 28, 2020, 2:38pm

I released version 1.44.3 which includes the fix to trim.seqs. https://github.com/mothur/mothur/releases/tag/v1.44.3 Thanks again for your help resolving this issue.

Topic		Replies	Views
Can ''qwindowaverage'' help me ? Commands in mothur	3	2578	September 17, 2014
trim.seqs bugs mothur bugs	2	3199	May 9, 2011
Help please with a command o mothur Theory behind mothur	8	590	March 7, 2021
File error right from the start Commands in mothur	19	1681	June 12, 2020
problem with trim.seqs Theory behind mothur	9	10414	May 4, 2011

Trim.seqs removing all seqs (processing single-end reads)

Related topics