We characterized microbial diversity from different marine environments by paired-end Illumina sequencing (2x300) (V1-V3 region). We ended up with tons of unique contigs. From what I understood from other posts, it is because when reads do not fully overlap, a ton of sequencing errors aren’t denoised like when the reads do fully overlap (e.g.V4). Therefore we would like to work on the sole F reads (which are of greater quality than the R reads) and see if we can get any meaningful information from our data (instead of throwing away all the data ;-)). Is there any equivalent of the command “trim.flows” to read .fastq files and generate trimmed reads based on sequence quality? thank you.
fastq.info will split your fastq file into the corresponding fasta and qual files. You can then quality trim using trim.seqs, which takes both the files and has all the options for trimming (window size, minimum scores etc.).
I don’t know if there’s a way to do it in a single step, but I’ve found that to be pretty quick in the past.
Thanks! works perfectly. As expected, I find FAR less uniques working on the F reads than on the assembled contigs. Assembly seems definitely to be an issue here.