Hi everyone, I am pretty new in this sequencing/mothur world so I am sorry if my question sounds to basic. What I am trying to understand here is the difference between make.shared files using label=unique and label=0.03. I thought I was getting more OTUs using 0.03 than UNIQUE reads but seems is the opposite, right?
Can I just pick the label that gives me more OTUs? Does it really matter if I use UNIQUE or 0.01 or 0.03? I am quite confused with those numbers!!!
Thanks in advance
It all matters label=0.03 is the same as a 97% similarity cutoff. label=unique means that to be in an OTU everything has to be identical, which I doubt is what you want. You’ll have more OTUs at label=unique than you will at label=0.03.
When I trim my reads using keepfirst=300 I always get some weird results and never get a good blast regarding my fungal community. Then, I tried to trim my reads using 250 and 200 instead and the results were much better. My best result was using 200 actually and my diversity was much higher using that. So, why am I not getting results with 300 but with 250 or 200 :? ? The mean length of my reads are 360bp. Am I missing something? Is that normal?
Here is the summary from my data:
mothur > summary.seqs(fasta=SEED.fasta, processors=2)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 247 247 0 3 1
2.5%-tile: 1 317 317 0 4 313646
25%-tile: 1 359 359 0 4 3136454
Median: 1 359 359 0 4 6272908
75%-tile: 1 359 359 0 4 9409361
97.5%-tile: 1 430 430 2 13 12232169
Maximum: 1 501 500 67 250 12545814
Mean: 1 360.094 360.093 0.283635 4.5508
of Seqs: 12545814
Here is what I used for trimming:
trim.seqs(fasta=SEED.fasta, oligos=SEED.oligos, maxambig=0, maxhomop=8,flip=T, bdiffs=0, pdiffs=2, minlength=250, maxlength=400, keepfirst=250,processors=2)
The keepfirst parameter in trim.seqs removes all bases after the number you set keepfirst to. My best guess would be the fragment of the sequences between 200 and the end of the sequence contain more errors. These errors are preventing you from getting a good match with blast and good results in your downstream analysis. Removing that section of the sequence improved your results. Do you have quality data for these sequences? If you do, you could run something like Pat recommends in the SOP:
trim.seqs(fasta=SEED.fasta, oligos=SEED.oligos, qfile=SEED.qual, maxambig=0, maxhomop=8, flip=T, bdiffs=0, pdiffs=2, qwindowaverage=35, qwindowsize=50, processors=2)