I wanted to confirm that for preclustering it is necessary to do the following beforehand:
1)unique
2)align
3)filter the sequences (ie. stopping at the “mothur > filter.seqs(fasta=sogin.unique.align, vertical=T)” command according to the Sogin instructions.
If I wanted to cluster the preclustered sequences afterwards to determine Chao richness, this means that I would return to the Sogin Data Analysis page to perform a distance matrix (mothur > dist.seqs(fasta=sogin.unique.filter.fasta, cutoff=0.10)) based on the *precluster.fasta file. Then I would do the clustering of sequences according to the read.dist and read.otu commands.
Hope this makes sense. Any input is appreciated. I am a beginner as it may very well appear.
1)do you suggest that I start from scratch using the Schloss SOP page (starting from the shhh.flows command to denoise as this is my goal) and then align, screen, filter, unique and precluster?
2)after preclustering, how do I determine species richness, diversity as was done on the Sogin data analysis page (which commands as this is not listed on the Schloss SOP page).
It is going smoothly, however, I am stumped at the moment.
I do not have the *.oligos file. I do have the *.sff file. Therefore, I cannot proceed at the mothur > trim.flows(flow=SW.flow, oligos=SW.oligos, pdiffs=2, bdiffs=1, processors=2) step.
[ERROR]: SW.flow.files is blank, aborting.
values for either flow or file must be provided for the shhh.flows command.
Unable to open LookUp_Titanium.pat. Trying mothur’s executable location /Applications/mothur/LookUp_Titanium.pat
Unable to open /Applications/mothur/LookUp_Titanium.pat.
Using 2 processors.
[ERROR]: did not complete shhh.flows.
A)what is the exact command for screen sequences as I could not find this
B)how do I continue after preclustering to determine Chao1 richness, Simpson, Shannon indices, etc… (ie. at what point)
If you’re having trouble with your oligos file, could you copy/paste the first few lines here? We’ve had problems in the past, because it is extremely sensitive to having the wrong characters in the wrong place (spaces instead of tabs, or vice versa) which aren’t always obvious to the eye.
For your questions:
A) The command to screen sequences is screen.seqs.
B) Assuming you’ve completed the Schloss SOP, which includes chimera checking and removing suspected contaminants, the commands you would use to calculate your estimators are
dist.seqs
cluster
make.shared
collect.single
Which is the alpha diversity section of the SOP, as Pat said. You’ll need to check each command to see what parameters are required.
Also, I could not get through the Schloss SOP. I am stuck at the screen.seqs step where I lose all my sequences. I started off with 133000 sequences and when I screen I get the following:
I get an error message all the time at the align step:
“Some of you sequences generated alignments that eliminated too many bases, a list is provided in SW.trim.unique.flip.accnos. If you set the flip parameter to true mothur will try aligning the reverse compliment as well.”
When I try the reverse compliment, it gets worse, I get something like 8 sequences left at the screen.seqs step.
Keep in mind I am using the Ion Torrent which gives 100bp reads instead of 400bp like 454.
If it’s because of bad quality sequences, I am considering skipping a step to save something left for analysis because this is all I have.
FWIW, these are IonTorrent data that are bad quality (that may be redundant) and are getting culled because many sequences have 4 blank flows in a row early in the sequence.