my sequences coming out of shhh.flows are mostly longer than (1-5 bases) the input ones except a fraction that got filtered out. I was wondering why this happened and if this is an artifact.
Thanks a lot,
Sorry but I’m not sure what you’re asking… Also, nothing gets “filtered” out - you see a reduction in the number of sequences because the method effectively dereplicates the flowgrams and sequences. The redundant sequence names are in the shhh.names file.
Sorry for not being clear. When I look the shhh.fasta the sequences have in general 1 - 5 nucleotides more.
fusion35. - 2839seq - max lenght - 284/ min lenght - 221
fusion35.shhh - 2395seq - max lenght 286/ min lenght - 223
I did not understand why…
Thanks a lot for your help,
Not sure but it could be because the flows (with the flow order A) came in clusters of four bases at a time. The default trimming that’s in the fasta file produced by sffinfo is based on the quality score (not sure how that’s done) and the fasta file produced by shhh.flows is based on complete cycles of four bases at a time.
thanks a lot …i am running again the Pyronoise using the files generated after the trim.flows (i trimmed before using the RDP pipeline) just to see if i get the same result. However i got this message…
Processing fusion35_good.scrap.flow (file 1 of 1) <<<<<
[ERROR]: St9bad_alloc has occurred in the ShhherCommand class function getFlowDa ta. Please contact Pat Schloss at , and be sure to include the mothur.logFile with your inquiry.
So you probably don’t want to be processing fusion35_good.scrap.flow. Those are the bad sequences. You probably want to use the files option as described here:
Unless they’ve updated it in the last year, the RDP pipeline does very little to improve the quality of the output data. So you should expect some differences.