Nanopore Fastq not recognized by Mothur

Hello everyone! Got our first nanopore sequencing run done. I am exploring my possibilities for analysis and being a long time user of Mothur, I am trying to use Mothur for it.

I took the fastQ files that pased the Nanopore basecaling and that were generate from their program MinKNOW.

I concatenated all files per barcode to get a single fastq file per barcode then, I went to mothur.

My plan is to used fastq.info as a start t generate the fasta files. Unfortunately, mothur do not recognized the file as fastq.

"
mothur > fastq.info(file=nanopore_test1.file)
[WARNING]: Blank fasta name, ignoring read.
[WARNING]: missing sequence for , ignoring.[WARNING]: expected a name with + as a leading character, ignoring.[WARNING]: missing quality for , ignoring.[WARNING]: Blank fasta name, ignoring read.
"

This is an exemple of what the file look like.(Copy paste)

@38676c8c-7575-4957-a161-5ce1fa18506a runid=2ecfd430e213202cb6f7538082e8ad79ba5f9760 read=16 ch=390 start_time=2024-01-31T16:44:23.549452+00:00 flow_cell_id=FAX64935 protocol_group_id=16s_essai1 sample_id=no_sample barcode=barcode96 barcode_alias=barcode96 parent_read_id=38676c8c-7575-4957-a161-5ce1fa18506a basecall_model_version_id=dna_r10.4.1_e8.2_400bps_fast@v4.2.0
TGTTATGTCCTATTTACTTCGTTCAGTTACGTATTGCTGGTGCTGCTGAACGGTCATAAGAGTCCACCATTTAACCTTTCTGTTGGTGCTGAATATTGCAGAGTTTGATTATGGCTCAGGATGAACGCTGGCGGCGTGCCTTAATCATGCCAAGTCGAGCGAACGGACAGAAGCTTGCTCTTCTGATGTTAGCGGCGGACGGGTGAGTAACACGTGGATAACCTACCTATAAGACTGGGATAACTCGGGAACCGGAGCTAATACCGGATAGTATTTTGAACCGCATGGTTCAAAATATTATCGGTATTGGGTTCCGAAGTTATGCCGGTCTTATAGGTAGGTTATCACGTCTACTGGCCCGTTCCGCCGCTAACGTCCGGAGGGCTCCTCGTCCATTTCGCTCGACTTGCAGTGTATTAGGCACACCGCAGCGCTGATCCTGAGCCATGAAACTCTGCCGATATCAGCACCGACGGAAAGGTTAAATGATCAACTCTATAACGTTCGCGGCACCACTGGATA
+
$$β€˜(&((+%%%$&$((,.)>9+)310068311887568++'(39<<<<=<<<1,)22>4==@CDM=E???FCB;7;4427;A@@432200.+1003.731589DBA89=C<989EMBABA>>?<<>A@C777>;96&6912@<=52=:<:<;?;33>>876:&(-9/,+/+34/30.85:<>::7>@>==B?@8687;00094599922254A,/34034C872(-))%%7771101&/2=;<67:=@545–.234599;7&&%-,./.,9677822-/00/))/)/1-,14.–$#&(-%%&&,)%&%&%((%%(’+()$$,/β€˜%)(β€™β€˜&&%)($$)##%.0-+)-(()0,/$$$&(#$&β€˜β€™β€˜β€˜$$#$&$%$%,’&'33223)(),3)’$&$β€˜&####%)),+%&&,’%,++β€˜%#&',)β€™β€˜)0&&%%.+.β€˜β€™##&β€˜β€™()()($$%%)())))0+0,%$’((%&$$&)&&#β€˜$$%&$$’$&&(())&##%β€˜)%
@c0f18f2d-ca5d-48e6-9091-3abb99f64e94 runid=2ecfd430e213202cb6f7538082e8ad79ba5f9760 read=26 ch=443 start_time=2024-01-31T16:44:24.549452+00:00 flow_cell_id=FAX64935 protocol_group_id=16s_essai1 sample_id=no_sample barcode=barcode96 barcode_alias=barcode96 parent_read_id=c0f18f2d-ca5d-48e6-9091-3abb99f64e94 basecall_model_version_id=dna_r10.4.1_e8.2_400bps_fast@v4.2.0
ATGTAACCTACTGGTTCAGTTGCGTATTGCTGGTGCTGCTGAACGGTGATCAAGTCCACCATTAACCTACTTGCCTGTGGCTACTATCTTCTACGGCTACTTGTTACAACTTCACCCCAATCATTTACCACCTTCGACGGCTAGCTCCTAAAAGGTTACTCCACCGGCTTGGGTGTTACAAACTCTCGTGGTGACGGGCGGTGTGTACAACCGGGAACATATCACCAGTAGCATGCTGATCTACGATTACTAGCGATTCCAGCTTCAGTGTCCGTCGAGTAGACTACAATCCCGATTTAAACAACTTTATGGAATTGCTTGACCTCGCGGTTTCGCTGCCCTTTGTATTGTACACACCGCCCGTCACACCACGAGAGTTTGTAACACCCGAAGCCGTGGAGTAACCTTGCCAGAGCTAGCCGTAGAAGGTGGGACGAATCGTTGGGGTGAAGTCGTAACAAGGTAGCCGTAGAAGATCGAGCGACAGGCAGTTAGGTTGATCGGTGGACTCTGACCGTTCAGCAGCACCAGCAATACGTGGCT
+
%β€˜&)&)(()(’($&β€˜))(-’+('(-.779:79>B?<<A9827;41/1&&&+((())735622334;8@92289KCHEB879:,-477:98<;?..;2,&($#%+
’%%-4–,:;+(β€˜##$(%%(334=/β€”/669B9889A??..<46;7&’…&<>B<211<6&&<=776;;9>=98333230//201)(04%%%4965$$%$&,+,//-+β€˜β€™β€˜(’))+(&(()+0>?=>22265470346/./(…-.22989891(()&β€˜((1%%###+)()3Β±+.0211’(($$)+)&%&+832–./799;;:1228=256::BB97:BBEAAD?<@@AEBB:::<334=>?<9:;=<45520’().65576:8=?..D76@:68-'++0%&(-.01-(
’+β€˜459…/786β€™β€˜(0,><–557*()%#%%+,-,&%$%%$+.5:@=::@:7699:80(&63-0-**,8(&&-±…,+))044(&&&$&'%&##$)&&%&+1/(/0886////24001(β€™β€˜%%"#
@0a960ff5-f4a9-4850-9890-f8c52b8e70da runid=2ecfd430e213202cb6f7538082e8ad79ba5f9760 read=18 ch=409 start_time=2024-01-31T16:44:24.549452+00:00 flow_cell_id=FAX64935 protocol_group_id=16s_essai1 sample_id=no_sample barcode=barcode96 barcode_alias=barcode96 parent_read_id=0a960ff5-f4a9-4850-9890-f8c52b8e70da basecall_model_version_id=dna_r10.4.1_e8.2_400bps_fast@v4.2.0
GTTGTGTAACCTACTTGGTTCGTTGCGTATTGCTGGTGCTGCTGAACGGTCATCAAGTCCACCATTTAACCTTTCTGTTGGTGCTGATATTGCAGAGTTTGATTCTGGCTAGGATGAACGCTGGCGGCGTGCCTAATACAGTAAGTCGAGCAAACGACGAGAAGCTTGCTCTCTGATGTAGCGGCGGACGGGTGAGTAACACGTGGATAACCTTGCCTATAAGCTGGGATAACTTGGGAAACCGGAGCTAATACCGGATAATATTTTGAACCGCATCATTAGAAAAGTAAAGACGGTCTTGCTGTTCACTTATTCAGTCGATGCGCGCTGCGTTAGGCTTAGTTGGTTCGGCCATCCATCGCTTACTAAGGCAACGGAGTGCGACGCCGACCTGAGAGGGTGATCGGCCACACTGGAACTGGAGCACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGAATCTTCCGCAATCGGCAGAAAGCCTGACGGACAACGCCGCGTGAGTGATGGTCTTCGATCGTGATTTTGTTCGAGAGAACAAGAACATAGTAACTGAACGTCCCTGACGGTATCTAACCAGAAAGCCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTATTGGGCGTAAAGCGACGCAGGCGGTTTTCTTAAGTCTGATGTGAAAGCCCCGGCTGAACCGGGGAGGGTACATGGAAACTGGAAGACTCGGTGCAGAAGAATTCCCTACTGCTGCCTCCGTAGGAGTCTGGACCGTGTCAGTTCCGTGTGGCCGTCAGCCTCAGGTCGGCTGTGGTCGGTTCCTTGAGTAAGCCGTTACCTTTACCGACTGGCTGATGCAGCGCGGTGCCGTCTATAAGACATTTTGAACCATCGTCAAGTGTATACCATTGTTGCTCCGGTTTTCCGGAAGTTTGCCGTCTTATGGTAGGTTGTCCACGGTGTCTCACCGTACCGCCGCTGAGCATGCCGAGAAGCAACTTTCTGATCCGGTTCGCTGGACTTGCAGCTCTTGACAGCGCCGCCAGCGTTGGTGTAGAGCCAAGTGAACTTGCGATGTCAGCAGGCCAACGGGAGGGGGCAATGGTCGACTGTATAACCGTACGGCCGCGCCCGGCGGTGGTGG
+
&,.β€˜&’+$##&'&)
,(0…(%’/))67;<?;;:BB@@@>>>95;;112+''%(((/,548:;<;?8767786//8,,,5;963011+-)%)-456<1/055:++779/35<;<CB==>A@?@CB112/0(&&%%+1)((;<22@=2/0&%&).+--1,))&1,8/./62'33137=:B;78F@;:;@:<>***>><<<.'''')+//7((&*)#%')780*)'+-)+;?AC<;767C?AB??5443496568<--=101018.))*(%&&4B325:@=98))<;;=91+-,)%(()9''&$%''&))-)%$)*++.4)*+((*)--'(*+,)('%'(.,*&$%&(+*)2=,+.,,-2/+((*%&&&(+&#$%(+13/0B:6**+9;:?>>4,.2+3’)9101.β€˜+.37687<AA:70+&&)$,9:9535++<82β€™β€˜β€™.')/0.–0124>4120&&Β±678211A=9=<101’'-4572&2*((&$#$&&%((%$&--/+&)(+&%&$#%&β€˜%%$#’&β€˜β€™%%',2<??ACA?9868:@A@@IB???>:+&/3?0/-.&%%+9;;=<>114><==>;657211:9>>=89:5456+,)-((++(&&%&)75,.255)(((()0,++9>F53*&+,((%%β€˜1*&(3::76=E:88D0*,-1)<>=2…-$$%)9?FA>>503(’&#$$)+333:;;850&';?>8667CFDFBC12/2’'(3454949:=:3212*%$&+('&22((++%&&15)β€˜β€™-,+/2A%%###β€˜&’&&#$')$$$((.00&+%%&),β€˜&’&%%)$##&$$$),(().0/-β€˜&%&$%)β€˜(%&&’%$$$1,)/±’&"$&%$%)(&'&,$$%$###$$%$&+.+./5+()’%-,&&&$&($$%,.)β€˜β€™/2159:)(()$$%$β€˜&#%&&&),%(&&%%$()%&’(,&&%((&+/-.,%%%β€˜β€™&'($#$#$+&/((),)%&%β€˜%%$%$$%)#$’%%β€˜&’)/-0.-.((+&$%($$$%β€˜+1%$β€™β€˜β€™)/-.-%%"&,/.)&%(%%&).β€˜&’+(β€˜%&%&’&#$')+%$$$%%&&-*,1**&%#$β€˜-’%,016-,-'#$&$$$$

should I used another program to convert the file into fasta?

My general plan for the analysis is the following, feel free to comment on it.

fastq.info (to get fasta files)
make.group (to create count file)
merge.files (to get 1 single fasta file)
unique.seqs (to get unique sequences and start the analysis)
…
and so on (when I guet this working I will toy with different parameters and see how our mock community and already MiSeq V4 sequences samples behave).

Kind regards,

Just a quick update: I ran fastqc on a fastq file and the analysis is working fine. So it does not appear to be the file.

Kind regards,

Another udate: changed format, still did not work.
My problem was with my file: did a format that was column 1: β€œgroup” and then column 2: β€œfastq”. Replaced by doing a seperate iteration of fastq.info per fastq file and the fasta are forming correctly.

Finger crossed for the rest of the pipeline. I will post in another thread if i encounter a problem with other function.

Kind regards,

fastq.info is expecting a fastq file as input. What does nanopore_test1.file look like?

Hello! Thank you for answering back.

file looks like that (tab delimited)

barcode73 barcode73.fastq
barcode74 barcode74.fastq
…
barcode96 barcode96.fastq

I went around it by using a different iteration of fastq.info for each file. It did not took too long to write.

Still toying with my first dataset (out of the nanopore raw fastq). I am putting together the pipeline while I am working on basecalling on my analysis computer to increase quality of the fastq. From what I saw and what I read, you cannot really do the basecalling on the nannopore sequencer (we have a MK1C), you have to use the basecaller on your computer to run a β€œsuper accurate” version and also toy with parameters for bacrode triming options which are not offered when you basecall on the sequencer. So I am doing 2 things in parallel.

Positive control looks ok on Epi2Me (Nanopore analysis) but I like having control on my sequencing analysis so I am building something in that sense.

My gut feeling is telling me that species taxonomic inference won’t be great but that full 16S will resolves correctly a lot of genuses that were missclassified or unclassified and probably won’t change a thing to alpha or beta diversity analysis overall results.

I will posted my best options when I have something that works on my hands.

Thanks again for reading me.

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.