fastq.info

Hi all,
I recently got some Bacteria sequence using HiSeq2500.It gives me segfault when I try to run fastq.info.

mothur>fastq.info(fastq=Hiseq_raw1.fq,format=illumina)

[ERROR]: finding negative quality scores, do you have the right format selected?


[WARNING]: your sequence names contained ‘:’. I changed them to ‘_’ to avoid pr
oblems in your downstream analysis.

mothur > fastq.info(fastq=Hiseq_raw1.fq,format=illumina1.8+)

Output File Names:
Hiseq_raw1.fasta
Hiseq_raw1.qual

[WARNING]: your sequence names contained ‘:’. I changed them to ‘_’ to avoid pr
oblems in your downstream analysis.
My question(s) are :
(1)I am wondering the differents between illumina and illumina1.8+.
(2) how to how to select suitable format for my 16S rRNA gene sequences that are generated using Illumina’s HiSeq2500 platform.
(3)If the WARNING[] have a pernicious effect ondata analysis.
I look forward to your advice.

Regards,
Zhangyu.

Illumina has had 3 versions of its format. Mothur allows for 2 format options for illumina data.

Illumina 1.3+ Phred+64, raw reads typically (0, 40) - format in mothur = illumina
Illumina 1.5+ Phred+64, raw reads typically (3, 40) - format in mothur = illumina
Illumina 1.8+ Phred+33, raw reads typically (0, 41) - format in mothur = illumina1.8+

Most likely all your fastq data is illumina1.8+, which is mothur’s default, and you do not need to provide the format option. We have included other formats to make mothur flexible.

About the warning:
Mothur converts the ‘:’ characters to ‘_’ characters because the ‘:’ is a special character is trees. When you have sequence names that include ':'s, the tree files created will be unable to be read by any tree software. You can ignore the warning.