Hello everyone! Got our first nanopore sequencing run done. I am exploring my possibilities for analysis and being a long time user of Mothur, I am trying to use Mothur for it.
I took the fastQ files that pased the Nanopore basecaling and that were generate from their program MinKNOW.
I concatenated all files per barcode to get a single fastq file per barcode then, I went to mothur.
My plan is to used fastq.info as a start t generate the fasta files. Unfortunately, mothur do not recognized the file as fastq.
"
mothur > fastq.info(file=nanopore_test1.file)
[WARNING]: Blank fasta name, ignoring read.
[WARNING]: missing sequence for , ignoring.[WARNING]: expected a name with + as a leading character, ignoring.[WARNING]: missing quality for , ignoring.[WARNING]: Blank fasta name, ignoring read.
"
This is an exemple of what the file look like.(Copy paste)
@38676c8c-7575-4957-a161-5ce1fa18506a runid=2ecfd430e213202cb6f7538082e8ad79ba5f9760 read=16 ch=390 start_time=2024-01-31T16:44:23.549452+00:00 flow_cell_id=FAX64935 protocol_group_id=16s_essai1 sample_id=no_sample barcode=barcode96 barcode_alias=barcode96 parent_read_id=38676c8c-7575-4957-a161-5ce1fa18506a basecall_model_version_id=dna_r10.4.1_e8.2_400bps_fast@v4.2.0
TGTTATGTCCTATTTACTTCGTTCAGTTACGTATTGCTGGTGCTGCTGAACGGTCATAAGAGTCCACCATTTAACCTTTCTGTTGGTGCTGAATATTGCAGAGTTTGATTATGGCTCAGGATGAACGCTGGCGGCGTGCCTTAATCATGCCAAGTCGAGCGAACGGACAGAAGCTTGCTCTTCTGATGTTAGCGGCGGACGGGTGAGTAACACGTGGATAACCTACCTATAAGACTGGGATAACTCGGGAACCGGAGCTAATACCGGATAGTATTTTGAACCGCATGGTTCAAAATATTATCGGTATTGGGTTCCGAAGTTATGCCGGTCTTATAGGTAGGTTATCACGTCTACTGGCCCGTTCCGCCGCTAACGTCCGGAGGGCTCCTCGTCCATTTCGCTCGACTTGCAGTGTATTAGGCACACCGCAGCGCTGATCCTGAGCCATGAAACTCTGCCGATATCAGCACCGACGGAAAGGTTAAATGATCAACTCTATAACGTTCGCGGCACCACTGGATA
+
$$β(&((+%%%$&$((,.)>9+)310068311887568++'(39<<<<=<<<1,)22>4==@CDM=E???FCB;7;4427;A@@432200.+1003.731589DBA89=C<989EMBABA>>?<<>A@C777>;96&6912@<=52=:<:<;?;33>>876:&(-9/,+/+34/30.85:<>::7>@>==B?@8687;00094599922254A,/34034C872(-))%%7771101&/2=;<67:=@545β.234599;7&&%-,./.,9677822-/00/))/)/1-,14.β$#&(-%%&&,)%&%&%((%%(β+()$$,/β%)(ββ&&%)($$)##%.0-+)-(()0,/$$$&(#$&ββββ$$#$&$%$%,β&'33223)(),3)β$&$β&####%)),+%&&,β%,++β%#&',)ββ)0&&%%.+.ββ##&ββ()()($$%%)())))0+0,%$β((%&$$&)&&#β$$%&$$β$&&(())&##%β)%
@c0f18f2d-ca5d-48e6-9091-3abb99f64e94 runid=2ecfd430e213202cb6f7538082e8ad79ba5f9760 read=26 ch=443 start_time=2024-01-31T16:44:24.549452+00:00 flow_cell_id=FAX64935 protocol_group_id=16s_essai1 sample_id=no_sample barcode=barcode96 barcode_alias=barcode96 parent_read_id=c0f18f2d-ca5d-48e6-9091-3abb99f64e94 basecall_model_version_id=dna_r10.4.1_e8.2_400bps_fast@v4.2.0
ATGTAACCTACTGGTTCAGTTGCGTATTGCTGGTGCTGCTGAACGGTGATCAAGTCCACCATTAACCTACTTGCCTGTGGCTACTATCTTCTACGGCTACTTGTTACAACTTCACCCCAATCATTTACCACCTTCGACGGCTAGCTCCTAAAAGGTTACTCCACCGGCTTGGGTGTTACAAACTCTCGTGGTGACGGGCGGTGTGTACAACCGGGAACATATCACCAGTAGCATGCTGATCTACGATTACTAGCGATTCCAGCTTCAGTGTCCGTCGAGTAGACTACAATCCCGATTTAAACAACTTTATGGAATTGCTTGACCTCGCGGTTTCGCTGCCCTTTGTATTGTACACACCGCCCGTCACACCACGAGAGTTTGTAACACCCGAAGCCGTGGAGTAACCTTGCCAGAGCTAGCCGTAGAAGGTGGGACGAATCGTTGGGGTGAAGTCGTAACAAGGTAGCCGTAGAAGATCGAGCGACAGGCAGTTAGGTTGATCGGTGGACTCTGACCGTTCAGCAGCACCAGCAATACGTGGCT
+
%β&)&)(()(β($&β))(-β+('(-.779:79>B?<<A9827;41/1&&&+((())735622334;8@92289KCHEB879:,-477:98<;?..;2,&($#%+β%%-4β,:;+(β##$(%%(334=/β/669B9889A??..<46;7&ββ¦&<>B<211<6&&<=776;;9>=98333230//201)(04%%%4965$$%$&,+,//-+βββ(β))+(&(()+0>?=>22265470346/./(β¦-.22989891(()&β((1%%###+)()3Β±+.0211β(($$)+)&%&+832β./799;;:1228=256::BB97:BBEAAD?<@@AEBB:::<334=>?<9:;=<45520β().65576:8=?..D76@:68-'++0%&(-.01-(β+β459β¦/786ββ(0,><β557*()%#%%+,-,&%$%%$+.5:@=::@:7699:80(&63-0-**,8(&&-Β±β¦,+))044(&&&$&'%&##$)&&%&+1/(/0886////24001(ββ%%"#
@0a960ff5-f4a9-4850-9890-f8c52b8e70da runid=2ecfd430e213202cb6f7538082e8ad79ba5f9760 read=18 ch=409 start_time=2024-01-31T16:44:24.549452+00:00 flow_cell_id=FAX64935 protocol_group_id=16s_essai1 sample_id=no_sample barcode=barcode96 barcode_alias=barcode96 parent_read_id=0a960ff5-f4a9-4850-9890-f8c52b8e70da basecall_model_version_id=dna_r10.4.1_e8.2_400bps_fast@v4.2.0
GTTGTGTAACCTACTTGGTTCGTTGCGTATTGCTGGTGCTGCTGAACGGTCATCAAGTCCACCATTTAACCTTTCTGTTGGTGCTGATATTGCAGAGTTTGATTCTGGCTAGGATGAACGCTGGCGGCGTGCCTAATACAGTAAGTCGAGCAAACGACGAGAAGCTTGCTCTCTGATGTAGCGGCGGACGGGTGAGTAACACGTGGATAACCTTGCCTATAAGCTGGGATAACTTGGGAAACCGGAGCTAATACCGGATAATATTTTGAACCGCATCATTAGAAAAGTAAAGACGGTCTTGCTGTTCACTTATTCAGTCGATGCGCGCTGCGTTAGGCTTAGTTGGTTCGGCCATCCATCGCTTACTAAGGCAACGGAGTGCGACGCCGACCTGAGAGGGTGATCGGCCACACTGGAACTGGAGCACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGAATCTTCCGCAATCGGCAGAAAGCCTGACGGACAACGCCGCGTGAGTGATGGTCTTCGATCGTGATTTTGTTCGAGAGAACAAGAACATAGTAACTGAACGTCCCTGACGGTATCTAACCAGAAAGCCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTATTGGGCGTAAAGCGACGCAGGCGGTTTTCTTAAGTCTGATGTGAAAGCCCCGGCTGAACCGGGGAGGGTACATGGAAACTGGAAGACTCGGTGCAGAAGAATTCCCTACTGCTGCCTCCGTAGGAGTCTGGACCGTGTCAGTTCCGTGTGGCCGTCAGCCTCAGGTCGGCTGTGGTCGGTTCCTTGAGTAAGCCGTTACCTTTACCGACTGGCTGATGCAGCGCGGTGCCGTCTATAAGACATTTTGAACCATCGTCAAGTGTATACCATTGTTGCTCCGGTTTTCCGGAAGTTTGCCGTCTTATGGTAGGTTGTCCACGGTGTCTCACCGTACCGCCGCTGAGCATGCCGAGAAGCAACTTTCTGATCCGGTTCGCTGGACTTGCAGCTCTTGACAGCGCCGCCAGCGTTGGTGTAGAGCCAAGTGAACTTGCGATGTCAGCAGGCCAACGGGAGGGGGCAATGGTCGACTGTATAACCGTACGGCCGCGCCCGGCGGTGGTGG
+
&,.β&β+$##&'&),(0β¦(%β/))67;<?;;:BB@@@>>>95;;112+''%(((/,548:;<;?8767786//8,,,5;963011+-)%)-456<1/055:++779/35<;<CB==>A@?@CB112/0(&&%%+1)((;<22@=2/0&%&).+--1,))&1,8/./62'33137=:B;78F@;:;@:<>***>><<<.'''')+//7((&*)#%')780*)'+-)+;?AC<;767C?AB??5443496568<--=101018.))*(%&&4B325:@=98))<;;=91+-,)%(()9''&$%''&))-)%$)*++.4)*+((*)--'(*+,)('%'(.,*&$%&(+*)2=,+.,,-2/+((*%&&&(+&#$%(+13/0B:6**+9;:?>>4,.2+3β)9101.β+.37687<AA:70+&&)$,9:9535++<82βββ.')/0.β0124>4120&&Β±678211A=9=<101β'-4572&2*((&$#$&&%((%$&--/+&)(+&%&$#%&β%%$#β&ββ%%',2<??ACA?9868:@A@@IB???>:+&/3?0/-.&%%+9;;=<>114><==>;657211:9>>=89:5456+,)-((++(&&%&)75,.255)(((()0,++9>F53*&+,((%%β1*&(3::76=E:88D0*,-1)<>=2β¦-$$%)9?FA>>503(β&#$$)+333:;;850&';?>8667CFDFBC12/2β'(3454949:=:3212*%$&+('&22((++%&&15)ββ-,+/2A%%###β&β&&#$')$$$((.00&+%%&),β&β&%%)$##&$$$),(().0/-β&%&$%)β(%&&β%$$$1,)/Β±β&"$&%$%)(&'&,$$%$###$$%$&+.+./5+()β%-,&&&$&($$%,.)ββ/2159:)(()$$%$β&#%&&&),%(&&%%$()%&β(,&&%((&+/-.,%%%ββ&'($#$#$+&/((),)%&%β%%$%$$%)#$β%%β&β)/-0.-.((+&$%($$$%β+1%$βββ)/-.-%%"&,/.)&%(%%&).β&β+(β%&%&β&#$')+%$$$%%&&-*,1**&%#$β-β%,016-,-'#$&$$$$
should I used another program to convert the file into fasta?
My general plan for the analysis is the following, feel free to comment on it.
fastq.info (to get fasta files)
make.group (to create count file)
merge.files (to get 1 single fasta file)
unique.seqs (to get unique sequences and start the analysis)
β¦
and so on (when I guet this working I will toy with different parameters and see how our mock community and already MiSeq V4 sequences samples behave).
Kind regards,