How to work with just forward reads?

Hi,

I have paired end reads from Illumina Miseq. I want to analyze my data using just forward reads. Is this possible using Mothur? If Yes,

  1. how can I start by bypassing make.contigs? How should the first input file (K/A stability.files in MiSeq SOP) be created?
  2. how should I create the “primer.oligos” file to trim the forward primer attached ? Given that adapters and barcodes were already removed from my dataset so I needed to remove just the primers from my dataset when I used both forward and reverse reads for my analysis- in this case, I used “primer.oligos” file created in the following way-

primer GTGCCAGCMGCCGCGGTAA CCGYCAATTYMTTTRAGTTT

Looking forward for suggestions.
Thanks a lot
Richa

Hi Richa,

You can do this and would probably want to follow the quality score approach in the 454 SOP (https://mothur.org/wiki/454_SOP). That tutorial lays out examples for your oligos file. Give that a shot and let us know if you get stuck.

Pat

Hello Dr. Schloss,
Thank you very much for your reply. I still have some doubts on how to start with because I have my data as .fastq files and not as .sff files. So, I could not understand how to create the first input file with just forward sequences and which should be the “first command line” to enter the data into mothur. Kindly suggest how should I proceed.

Looking forward for your suggestion,
Thanks
Richa

If you run fastq.info you’ll get a fasta and qual score file, which is what you’re looking for.

Pat

Hello Dr. Schloss,

Thanks for your reply.
I am trying to understand 454 SOP given in https://mothur.org/wiki/454_SOP as you suggested. But what I am not able to get is-

  1. After running fastq.info for forward sequence fastq files, I will have Fasta+qual files for each sample separately. Now I want to proceed with further screening steps e.g. removing the forward primer attached to the sequence, min & max length, etc . So, how how should I make the input file that contains names of each sample that can be processed together?

  2. Also, what command will be useful in my case for creating a reverse complement of the forward sequences that I have to use for my analyses; And at which step should I use it? Since I do not have “flow data”, and the given SOP has used the screening steps (that simultaneously created reverse complement for each sequence) written below, I am not understanding how to proceed-

mothur > trim.flows(flow=GQY1XT001.flow, oligos=GQY1XT001.oligos, pdiffs=2, bdiffs=1, processors=2)
mothur > shhh.flows(file=GQY1XT001.flow.files, processors=2)


Sincerely looking forward for further suggestions. Thank you very much for help.

Richa

use trim.seqs not trim.flows (flows are 454 “raw” data). You’ll also need to skip sshflows

Hi,
Thanks for reply. Actually, this is what I am trying to understand that since I have to use “trim.seqs”, how should I-

  1. make the input file that contains names of each .fasta file that can be processed using trim.seqs and other commands afterwards?
    Previously, I have been following MiSeq SOP, in which make.contigs step needs an input file (named as stability.file in MiSeq SOP) which contains names of forward and reverse fasta files because its main job is to make contigs but it creates a single file with sequences from all the samples together which can be further processed together. But currently, I am trying to analyse my data using only forward sequences. So, how can I start?

  2. Since I will be using “trim.seqs”, which command can make reverse complement of the forward sequences ? And after which step I should use this command?

I apologize for repeating same questions, but I am at the learning phase. Sincere thanks for all the suggestions. Looking forward for reply.

Richa

Hi!
I know this is old but I’ve recently had to do this so I thought I’d share my approach:

I have hundreds of files, so I wrote a python script to generate the first few steps, which is basically for every *.fastq file in the current directory, create the following steps:

  1. fastq.info(fastq=sequence_file.fastq)
  2. trim.seqs(fasta=sequence_file.fasta, qfile=sequence_file.qual, qwindowaverage=20, minlength=250, processors=16) #Change these to suit you!
  3. Merge the *.trim.fasta files with merge.files
  4. Merge the count table files with merge.files

That then outputs what I call batchfile1. I run that and I end up with ABC.trim.contigs.fasta and ABC.trim.count_table. I then run batchfile2 which runs through all my normal steps from the MiSeq approach.

I’ve hosted both the python script to generate batchfile1 and the subsequent batchfile2 on Git here. Hope this helps some people!