hi all,
some questions once more … :oops:
I’m working through some data distributed in 8 .sff files. These contain 52 samples, forward and reverse sequences mixed randomly.
I need to extract 9 samples (which our lab is working on) and, as you suggested in an earlier thread, analyse them separately (F and R).
When setting the minflows and maxflows to 0 and 800 respectively, I assume I get the full raw data. There are about 55600 F sequences out of >700k (F+R), which is a reasonable number.
Now, using the default flow sizes of 450, I retain only 27091 sequences, too bad, but, I seems plausible.
Consequently subjecting them to shhh.flows dramatically reduces this number (about 9000 for the default settings, 16000 for the “raw data”). Ok, I assume this could probably be due to some really bad quality data. (What is a normal reduction rate when denoising? The SOP appeared to have a far less reduction in sequences)
The problem is, however, that the pyro data was already analysed by another lab using Pyro/AmpliconNoise, some custom scripts and BWA. Because I was a little suspicious about the quality and since we need to be able to process this kind of data ourselves, I wanted to re-analyse it with mothur. However, the original analysis retained 55200 F sequences, after subjecting to AmpliconNoise. How is this discrepancy possible using the same algorithm?
This eventually leaves me with about 700-1000 sequences/sample using mothur, and 1500-3000 with the original analysis after further removing low quality sequences (too short, …).
Secondly, when creating OTUs, my number of sequences increases again up to 20k+. Now, during the pre-analysis, I renamed some files a couple of times. I read somewhere that mothur somehow "remembers" links to previous files? Is this always the case or is this reset everytime you end mothur? In other words, can you just rename files and then just start up mothur again to continue with these new names?
Thanks