I am running Shhh.flows command for new dataset. But can not open LookUp_Titanium.pat. I have been analyzing several dataset by using this command before, it worked just fine. The LookUp_Titanium.pat file is in the same folder with Mothur. The LookUp_Titanium.pat file looks fine after opened in Textedit …I am running this on Mac
Mothur > shhh.flows(file=hui/hui.flow.files)
Unable to open LookUp_Titanium.pat. Trying mothur’s executable location /Users/mpat-group/hui/LookUp_Titanium.pat
Hei, Pat,
Really have no idea what is wrong with the folder. But anyway, it worked again. Thanks.
Still one more question about the following steps: after shhh.seqs, the number of the reads dropped down drastically. Also after trim.seqs and unique.seqs. Is this because the reads are from fungi or just bad sequence quality? For bacteria, it is not so drastical dropping down after Shhh.seqs…
Before trim.flows: 431595 total sequences
After shhh.flows: 18550 total sequences / 1192 unique sequences
After unique.seqs: 18550 total sequences / 679 unique sequences
The after shhh.flows/unique.seqs numbers look consistent. shhh.flows will output the unique denoised sequences within each group. unique.seqs uniques across groups. So your total number of sequences after shhh.flows and unique.seqs are the same. shhh.flows does not remove sequences so you are going from 431595 to 18550 sequences in the trim.flows step. Either your primer /barcode sequences are messed up or your flowgrams are short. If you look at the scrap.flow file you will see codes indicating why your sequences are getting discarded. Look there for your answer…
You are right that the flowgrams are too short. I should notice that my sequences are much shorter (about 200-250bp for ITS2). Do you think it is reasonable to set up 150-450 for minflows & maxflows?
What do you mean :minflows needs to be the same as maxflows? Doesn’t mean it I can not set up the range of flowgrams, such as between 150 (min) to 450 (max)?
What are your suggestions on the flowgrams in this case as too less sequences left for 450 ?
Take a look at Figure 1 of http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0027310. See the column that says “shhh.flows (360-720)”? That shows the error rate using minflows=360, maxflows=720. You’ll notice that the columns to the right of that give a much better error reduction when minflows and maxflows have the same value. So while you can set them to be different values, its really a bad idea.
I’ve been thinking about the minflows-macflows for some time, and I just dont get why they have to be the same size to work.
(yes, yes, I get thats what the data is showing), but, WHY does this happen? What is the probability of getting most of your sequences to be a minimum size of 450 and a maximum size of 450.
This is of course presuming that the minflow/maxflow size refers to the length in bp of the sequence…
Does this step trim longer sequences to be that size and exclude smaller sequences?
(and if so, cant you just take care of this after you align?)
Here’s what I think is the reason… If the reads vary in length between 360 and 720 then there are some of the 360 reads that are the same as the 720 reads, you just don’t have those data. So when the distances are calculated between sequences they may be treated differently because of their difference in length (same reason we make all our 16S seuqence the same length before clustering). So they don’t get denoised together and the denoising doesn’t work as well because the errors accumulate in the longer reads.