Processing MiSeq single (unpaired) reads

RobinRohwer · April 24, 2017, 9:59pm

I am trying to process some old 16S datasets that were sequenced using illumina before paired end reads were widely used. How do I add them into mothur’s MiSeq SOP? I am downloading from NCBI’s SRA, so the formats are de-multiplexed fastq files with the primer sequences removed. My understanding is that make.contigs() is where the fastq’s are turned into fasta’s, but that command requires both forward and reverse files as input.
Thanks!

dwaite · April 25, 2017, 9:46pm

If your files are already demultiplexed and have no barcode/primer information, I usually find the quickest way is to quality filter each fastq file separately, and then merge the output files together.

A basic workflow is

Runfastq.infoover each fastq file to split them into fasta and qual files.
Quality filter them withtrim.seqs.
Merge the QC-ed fasta files together withmerge.filesto get your full fasta file.
Create a groups file usingmake.group.
Rununique.seqsto dereplicate the fasta file.
Runcount.seqsover the resulting names file, and your groups file, to get the count table.

From there, you should be able to go back to the MiSeq SOP at the alignment step using the *.unique.fasta file and the count table.

RobinRohwer · April 27, 2017, 6:56pm

Thanks :!: :!: :!:

One follow-up question: I am trying to quality filter with similar stringency to the standard SOP for paired end data.

In trim.seqs the default quality cutoff is qthreshold=25 and in make.contigs the default quality cutoff is insert=20. Are these equivalent cutoffs? i.e. should I set qthreshold to 20 ?

Also, should I include a screen.seqs in your suggested list of commands for this same reason? i.e. after make.group and before unique.seqs ?

dwaite · April 27, 2017, 8:47pm

I’m pretty sure the insert in make.contigs is the same parameter as the qthreshold in trim.seqs. Personally I try to go a bit higher that Q20 - at least Q25, but I think standard practice here varies a bit.

I usually just wait until after alignment to do screen.seqs, because it’s only removing sequences that are too short (or too long), contain Ns, or contain suspicious homopolymers. You definitely need it after alignment to make sure you have good start/stop positions otherwise filtering will get pretty messy, but if you want to screen the fasta/count files before alignment it won’t be a bad thing. It would probably make your alignment faster, since there will be less sequences to process.

Topic		Replies	Views
working with single end reads Commands in mothur	8	3699	January 24, 2024
Already paired input? Commands in mothur	2	762	April 27, 2017
How to generate qfile for joined paired-ends reads? Commands in mothur	4	2233	March 20, 2015
fastq.info command Commands in mothur	6	3915	September 22, 2014
Trim.seqs removing all seqs (processing single-end reads) Commands in mothur	6	1300	August 28, 2020

Processing MiSeq single (unpaired) reads

Related topics