Hello,
I, like some others I have seen on the forums here, am in a situation where my MiSeq run produced very low quality reverse reads and so I would like to attempt to use just the forward reads for my analysis. The forward reads are of a high average overall quality, with FastQC (suggesting that the per base average quality score is well above 25 across the entire region).
I am attempting to process these single-end files into a fasta that I can then integrate into the MiSeq_SOP protocol. I have been using the forum post here as a guide for doing so.
I am currently working with mothur in the command line. I used the following for-loop to run fastq.info on each of my forward (R1) fastq files from my MiSeq run, since the fastq.info() file parameter would not accept a file with only a groups and a forward read column. The loop is as follows:
--Begin-------------------------------------------------------------------------------------------------------------------------------
for i in *.fastq
do
/Users/Brad/Desktop/mothur/mothur
"#fastq.info(fastq=/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/$i)"
done
--End---------------------------------------------------------------------------------------------------------------------------------
Please note the mother command itself was entered as one line; the line change is a product of the topic entry window. I cannot figure out how to attach the logfile, but here is a sample of the log files from this loop for one of my fastq files:
--Begin fastq.info() log-----------------------------------------------------------------------------------------------------------
Mac version
Using ReadLine,Boost,HDF5,GSL
mothur v.1.44.1
Last updated: 08/04/2020
by
Patrick D. Schloss
Department of Microbiology & Immunology
University of Michigan
#http://www.mothur.org
When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.
Distributed under the GNU General Public License
Type 'help()' for information on the commands that are available
For questions and analysis support, please visit our forum at https://forum.mothur.org
Type 'quit()' to exit program
[NOTE]: Setting random seed to 19760620.
Script Mode
mothur > fastq.info(fastq=/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/BRU_00410_S6_L001_R1_001.fastq)
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
110000
120000
130000
140000
150000
154990
Output File Names:
/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/BRU_00410_S6_L001_R1_001.fasta
/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/BRU_00410_S6_L001_R1_001.qual
mothur > quit()
It took 17 seconds to run 2 commands from your script.
--End fastq.info() log-------------------------------------------------------------------------------------------------------------
This loop successfully produced .fasta and .qual files for each fastq. I am now trying to use trim.seqs()
to remove seqs with a low average quality score from each fasta. This is where I am running into trouble. Each time I run the command, all seqs are removed. To test whether I was too stringent in my parameterization of qaverage, I set the parameter to 1 and still all seqs were removed. The command I am running looks as follows:
--Begin-------------------------------------------------------------------------------------------------------------------------------
/Users/Brad/Desktop/mothur/mothur "#trim.seqs(fasta=/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/BRU_00410_S6_L001_R1_001.fasta, qfile=/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/BRU_00410_S6_L001_R1_001.qual, qaverage=25)"
--End---------------------------------------------------------------------------------------------------------------------------------
The log file from this command is as follows:
--Begin trim.seqs() log-----------------------------------------------------------------------------------------------------------
Mac version
Using ReadLine,Boost,HDF5,GSL
mothur v.1.44.1
Last updated: 08/04/2020
by
Patrick D. Schloss
Department of Microbiology & Immunology
University of Michigan
#http://www.mothur.org
When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.
Distributed under the GNU General Public License
Type 'help()' for information on the commands that are available
For questions and analysis support, please visit our forum at https://forum.mothur.org
Type 'quit()' to exit program
[NOTE]: Setting random seed to 19760620.
Script Mode
mothur > trim.seqs(fasta=/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/BRU_00410_S6_L001_R1_001.fasta, qfile=/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/BRU_00410_S6_L001_R1_001.qual, qaverage=1)
Using 8 processors.
Removing group: scrap because all sequences have been removed.
It took 21 secs to trim 154990 sequences.
Output File Names:
/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/BRU_00410_S6_L001_R1_001.trim.fasta
/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/BRU_00410_S6_L001_R1_001.scrap.fasta
/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/BRU_00410_S6_L001_R1_001.trim.qual
/Users/Brad/Desktop/mothur/Miseq_BRMB_0000_fastq_files/BRU_00410_S6_L001_R1_001.scrap.qual
mothur > quit()
It took 21 seconds to run 2 commands from your script.
--End trim.seqs() log-------------------------------------------------------------------------------------------------------------
Since I have independent confirmation that the average quality of my reads is good (certainly higher than an average of 1 across all bases) I assume I am missing some required parameter in trim.seqs, but cannot figure out what else the command would need based on the trim.seqs() documentation.
Can anyone think of what might be causing all of my sequences to be removed? I greatly appreciate any help or insights.
Cheers,
jbreil
EDIT: This seems to be happening for any operation I attempt. For instance, if I replace qaverage with a minlength and maxlength option, again despite sequences clearly falling within those parameters, all of them are getting removed. Perhaps the program is not reading my fasta correctly? Summary.seqs does successfully summarize the same fasta however.