how to select minimum and maximum length of reads from summary seqs

Dear folks,

I am new to analyze for micro biome data. I am trying to use mothur for my analysis. I have MiSeq paired (fastq) microbiome data. After using contigs and summary.seqs command i need to screen the sequences from my reads. The sumary file look like this:

    mothur > summary.seqs(fasta=stability.trim.contigs.fasta)
    
    Using 2 processors.
    
      Start End NBases Ambigs Polymer NumSeqs
    Minimum: 1 35 35 0 3 1
    2.5%-tile: 1 290 290 0 4 98245
    25%-tile: 1 458 458 0 4 982449
    Median:  1 460 460 1 6 1964897
    75%-tile: 1 465 465 4 6 2947345
    97.5%-tile: 1 466 466 22 7 3831548
    Maximum: 1 602 602 59 289 3929792
    Mean: 1 450.165 450.165 3.55126 5.35893
    # of Seqs: 3929792
    
    Output File Names: 
    /Users/Destiny/Desktop/MicroBiome/OurData/stability.trim.contigs.summary

This point i need to select my minimum and maximum length in order to run screen.seqs but seeing summary output i am confused to select these lengths due to i read that microbiome reads supposed to have 250-300 bp and here i am seeing maximum length is about 600bp.
please help.

What did you sequence? I’m guessing you did v4-6 rather than just v4?

dear kmitchell,

The sequencing was done using V3-V4 (for bacteria and archaea - Takahashi et al. 2014). Here i like to ask one more thing. As sequenced was done using forward and reverse primers so i need to clip these adapters using such as cutadpat program and then use mothur or i should not bother about trimming adapters before mothur?

We used illumina standard primers which are:

adapter F: TCGTCGGCAGCGTCAGATGTGTATAAGAGA
adapter R: GTCTCGTGGGCTCGGAGATGTGTATAAGAG

pro341F CCTACGGGNBGCASCAG
pro805R GACTACNVGGGTATCTAATCC

Reference (table 1)

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0105592#pone-0105592-t001

Thanks in advance

Did you use custom sequencing primers (al.la Kozich/Caporasso) or illumina standard? Custom primers generally mean that pcr primers aren’t sequenced, standard primers mean pcr primers are sequenced.

to answer you original question, I’d trim above 465 or 470 and below 400. Not sure about minimum, v4 usually doesn’t need a min length trim

Thank you for your kind reply i would go with your suggestion and take only maximum length for screen.seqs. Moreover please take a look of my second question regarding primers/adpater triiming, i have just updated.

Thanks once again and looking forward for your answer/view regarding trimming apadters along with quality (>20) prior using mothur.

Note: the idea was found in this paper:

“The V4-V5 Illumina datasets were initially demultiplexed using MiSeq Reporter v2.0. The sequences corresponding to the forward and reverse primers were trimmed from the demultiplexed reads using cutadapt (http://code.google.com/p/cutadapt/) using similar stringency settings to those used for the 454 sequences. The trimmed read pairs were then merged into single contigs using SeqPrep (https://github.com/jstjohn/SeqPrep) followed by a length-filtering step prior to analysis with QIIME. The Illumina V4 read pairs were merged and length filtered in a similar manner as the V4-V5 reads to form single contigs prior to being demultiplexed with QIIME. Reads from all datasets were quality filtered using a Q20 minimum value during demultiplexing.”

You can trim the primers using the make.contigs command with an oligos file, https://mothur.org/wiki/Oligos_File#Paired_Primers. The make.contigs command also has parameters to allow you to trim using the quality scores, https://mothur.org/wiki/Make.contigs#insert and https://mothur.org/wiki/Make.contigs#deltaq.