Cluster split only gives unique and 0.01 distance

Hi,
Please help I don’ understand what is happening…
I have ran cluster.split command for 16s rRNA sequences (amplicon: Miseq 2x250 bp region V4-V5 ) and after a couple of days (I am running it in my personal laptop !) I get back the .list file only for the unique and distance 0.01 …
We have ran the same using 20 processors and it crashes at the en when it has to do the cutoffs"
Clustering /home1/scratch/mlegac/Dyneco.bac.precluster.pick.pick.fasta.4.dist
Cutoff was 0.15 changed cutoff to 0.07
Cutoff was 0.15 changed cutoff to 0.08
Cutoff was 0.15 changed cutoff to 0.09
Cutoff was 0.15 changed cutoff to 0.09

This is the command we have used:

mothur > cluster.split(fasta=Dyneco.bac.precluster.pick.pick.fasta, count=Dyneco.bac.precluster.pick.pick.count_table, taxonomy=Dyneco.bac.precluster.pick.pick.wang.pick.taxonomy, splitmethod=classify, taxlevel=5, cutoff=0.15)

I have done the same with other data and had no issues


The versions of mothur used are : mothur v.1.38.1 (super computer) and mothur v.1.37.0 (my computer)

Thank you!

Hi there,

I would strongly encourage you to get the most recent version of mothur - 1.39.5. That has the opticlust algoirthm and you only need to use a cutoff of 0.03 and there won’t be any of the business about changing the cutoff within the clustering commands. The SOP has been updated to show how we are now clustering the data.

Pat

Thanks Pat,
I will download it now.
However I might have identified the problem: my sequences still had the primers!..
So I have ran the the make.contigs again adding this time an oligos file as follows
mothur > make.contigs(file=fileList.paired.file, oligos=Pn_bac.oligos)

the fileList.paired.file is as follows:
KLS1X1TF1 KLS1X1TF1_CTACTA_L001_R1.fastq KLS1X1TF1_CTACTA_L001_R2.fastq
KLS1X1TF2 KLS1X1TF2_GGCTTG_L001_R1.fastq KLS1X1TF2_GGCTTG_L001_R2.fastq
KLS1X1TF3 KLS1X1TF3_GCCGCG_L001_R1.fastq KLS1X1TF3_GCCGCG_L001_R2.fastq
KLS1X2TF1 KLS1X2TF1_TCATGT_L001_R1.fastq KLS1X2TF1_TCATGT_L001_R2.fastq
KLS1X2TF2 KLS1X2TF2_GAACAC_L001_R1.fastq KLS1X2TF2_GAACAC_L001_R2.fastq
KLS1X2TF3 KLS1X2TF3_CGCACA_L001_R1.fastq KLS1X2TF3_CGCACA_L001_R2.fastq
KLS1X9TF1 KLS1X9TF1_AATGAA_L001_R1.fastq KLS1X9TF1_AATGAA_L001_R2.fastq
KLS1X9TF2 KLS1X9TF2_AACTTA_L001_R1.fastq KLS1X9TF2_AACTTA_L001_R2.fastq
KLS1X9TF3 KLS1X9TF3_GGAGGT_L001_R1.fastq KLS1X9TF3_GGAGGT_L001_R2.fastq
KLS2X1TF1 KLS2X1TF1_TGCGCT_L001_R1.fastq KLS2X1TF1_TGCGCT_L001_R2.fastq
ETC…

An my oligos file:

primer GTGYCAGCMGCCGCGGTA CCCCGYCAATTCMTTTRAGT

This time I have about 300000 less sequences and I have and extension to the name with fbddiffs, rbdiffs etc which I didn’t have when I did not use the oligos option the first time.

M00629_39_000000000-APEBN_1_2107_9597_1632 fbdiffs=0(match), rbdiffs=0(match) fpdiffs=0(match), rpdiffs=0(match)

So I am wondering if it is a problem to have this name extension and why I am getting less sequences… if we can improve this somehow?

Thanks

You’re probably losing seqeunces because they don’t match to your primers. You could do pdiffs=2 or 3 and give it a shot. I’d still upgrade and use opticlust.

Pat