pre.cluster

I wanted to confirm that for preclustering it is necessary to do the following beforehand:

1)unique
2)align
3)filter the sequences (ie. stopping at the “mothur > filter.seqs(fasta=sogin.unique.align, vertical=T)” command according to the Sogin instructions.

If I wanted to cluster the preclustered sequences afterwards to determine Chao richness, this means that I would return to the Sogin Data Analysis page to perform a distance matrix (mothur > dist.seqs(fasta=sogin.unique.filter.fasta, cutoff=0.10)) based on the *precluster.fasta file. Then I would do the clustering of sequences according to the read.dist and read.otu commands.

Hope this makes sense. Any input is appreciated. I am a beginner as it may very well appear.

Thank you!

You’ll want to do…

align.seqs
screen.seqs
filter.seqs
unique.seqs
precluster

I woudl strongly encourage you to not follow the Sogin analysis and instead follow the Schloss SOP wiki page.

Hi,

Thank you for the response.

1)do you suggest that I start from scratch using the Schloss SOP page (starting from the shhh.flows command to denoise as this is my goal) and then align, screen, filter, unique and precluster?
2)after preclustering, how do I determine species richness, diversity as was done on the Sogin data analysis page (which commands as this is not listed on the Schloss SOP page).

Thank you.

Yes. See the sections on alpha diversity.

Thank you for the help.

It is going smoothly, however, I am stumped at the moment.

I do not have the *.oligos file. I do have the *.sff file. Therefore, I cannot proceed at the mothur > trim.flows(flow=SW.flow, oligos=SW.oligos, pdiffs=2, bdiffs=1, processors=2) step.

Where can I find this *.oligos file.

Thank you for your time!

You need to get it from your sequence provider. This is how mothur knows what sequence goes into each group.

I was trying to run the shhh and for some reason am encountering the following message:

mothur > trim.flows(flow=SW.flow, oligos=SW.oligos.txt, pdiffs=2, bdiffs=1, processors=2)


Using 2 processors. 10000 10000 20000 20000 30000 30000 40000 40000 50000 50000 60000 60000 66665 66688

Appending files from process 7149

Output File Names:
SW.trim.flow
SW.scrap.flow
SW.flow.files


mothur > shhh.flows(file=SW.flow.files, processors=2)

[ERROR]: SW.flow.files is blank, aborting.
values for either flow or file must be provided for the shhh.flows command.
Unable to open LookUp_Titanium.pat. Trying mothur’s executable location /Applications/mothur/LookUp_Titanium.pat
Unable to open /Applications/mothur/LookUp_Titanium.pat.

Using 2 processors.
[ERROR]: did not complete shhh.flows.


Do you know what I am doing wrong?

Thank you.

Yeah, this is probably because all of yoru sequences are going to the scrap.flow file. Likely because there’s a problem with your oligos file.

Hi,

If you have some time to answer, this would be great.

Since the Schloss analysis is impossible for me to do because of sequencing error, if I continue with the following:

align.seqs
screen.seqs
filter.seqs
unique.seqs
precluster

A)what is the exact command for screen sequences as I could not find this
B)how do I continue after preclustering to determine Chao1 richness, Simpson, Shannon indices, etc… (ie. at what point)

Thanks for all your help. Very appreciated.

Thanks.

If you’re having trouble with your oligos file, could you copy/paste the first few lines here? We’ve had problems in the past, because it is extremely sensitive to having the wrong characters in the wrong place (spaces instead of tabs, or vice versa) which aren’t always obvious to the eye.

For your questions:

A) The command to screen sequences is screen.seqs.

B) Assuming you’ve completed the Schloss SOP, which includes chimera checking and removing suspected contaminants, the commands you would use to calculate your estimators are
dist.seqs
cluster
make.shared
collect.single

Which is the alpha diversity section of the SOP, as Pat said. You’ll need to check each command to see what parameters are required.

Thank you for your respone, DWAITE.

Here are some of my barcodes from my file. Keep in mind I had no idea what to name the barcodes and if that affects anything.

forward GATTAGAWACCCBDGTAGTCC toto
barcode ACGAG MID1
barcode ACGCT MID2
barcode AGACG MID3
barcode AGCAC MID4
barcode ATCAG MID5

Also, I could not get through the Schloss SOP. I am stuck at the screen.seqs step where I lose all my sequences. I started off with 133000 sequences and when I screen I get the following:

Using 2 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 40960 43116 3 0 1 1
2.5%-tile: 42533 43116 4 0 2 358
25%-tile: 42624 43116 12 0 2 3573
Median: 43031 43116 21 0 3 7146
75%-tile: 43061 43116 47 0 4 10719
97.5%-tile: 43112 43116 93 0 4 13934
Maximum: 43112 43117 169 0 6 14291
Mean: 42906.9 43116 29.2364 0 2.93534

of unique seqs: 12006

total # of seqs: 14291

Output File Name:
SW.trim.unique.good.summary


I get an error message all the time at the align step:

“Some of you sequences generated alignments that eliminated too many bases, a list is provided in SW.trim.unique.flip.accnos. If you set the flip parameter to true mothur will try aligning the reverse compliment as well.”

When I try the reverse compliment, it gets worse, I get something like 8 sequences left at the screen.seqs step.

Keep in mind I am using the Ion Torrent which gives 100bp reads instead of 400bp like 454.

If it’s because of bad quality sequences, I am considering skipping a step to save something left for analysis because this is all I have.


Thanks a lot!

FWIW, these are IonTorrent data that are bad quality (that may be redundant) and are getting culled because many sequences have 4 blank flows in a row early in the sequence.