Pre.cluster 18S

Hello,

I’m working with 18S sequences (R1 only good quality) from Illumina sequencing. I used the fastq.info command to extract the fasta sequences and then did all the commands in the Mothur tutorial without make.file and make.contigs

Arrived at the pre.cluster level, I have deleted sequences and the output file is empty.

I increased the RAM to 150 giga and also tried the nucleotide difference level to 2, 3 then 4 between sequences. I increased then decreased the process level (10 - 64).

I’ve also run the chimera.vsearch command but it’s been running for about 10 hours.

Could you please help me?

Thanks

mothur > summary.seqs(fasta=current, count=current)
Using 18S_R1.good.unique.good.filter.count_table as input file for the count parameter.
Using 18S_R1.good.unique.good.filter.unique.fasta as input file for the fasta parameter.

Using 64 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	737	196	0	3	1
2.5%-tile:	1	828	264	0	4	66656
25%-tile:	1	828	274	0	4	666558
Median: 	1	828	275	0	5	1333115
75%-tile:	1	828	279	0	5	1999672
97.5%-tile:	1	828	280	0	6	2599574
Maximum:	3	828	282	0	8	2666229
Mean:	1	827	275	0	4
# of unique seqs:	1153321
total # of seqs:	2666229

It took 75 secs to summarize 2666229 sequences.

Output File Names:
18S_R1.good.unique.good.filter.unique.summary

mothur > pre.cluster(fasta=18S_R1.good.unique.good.filter.unique.fasta, count=18S_R1.good.unique.good.filter.count_table, diffs= 3)

Using 10 processors.
When using running without group information mothur can only use 1 processor, continuing.
0	1026717	126604
1000	712377	440944


Total number of sequences before precluster was 1153321.
pre.cluster removed 484995 sequences.

/******************************************/
[WARNING]: 18S_R1.good.unique.good.filter.unique.fasta does not contain any sequence from the .accnos file.
Selected 0 sequences from 18S_R1.good.unique.good.filter.unique.fasta.

Output File Names:
18S_R1.good.unique.good.filter.unique.precluster.fasta

/******************************************/
Done.
It took 1455 secs to cluster 1153321 sequences.

Using 10 processors.

Output File Names: 
18S_R1.good.unique.good.filter.unique.precluster.fasta
18S_R1.good.unique.good.filter.unique.precluster.count_table
18S_R1.good.unique.good.filter.unique.precluster.map

vsearch v2.15.2_linux_x86_64, 1007.8GB RAM, 64 cores
https://github.com/torognes/vsearch



Fatal error: Unable to read from file (18S_R1.good.unique.good.filter.unique.precluster.temp)

mothur > chimera.vsearch(fasta=current, count=current, dereplicate=t)
Using 18S_R1.good.unique.good.filter.unique.precluster.count_table as input file for the count parameter.
Using 18S_R1.good.unique.good.filter.unique.precluster.fasta as input file for the fasta parameter.
[ERROR]: 18S_R1.good.unique.good.filter.unique.precluster.fasta is blank, aborting.
Using 18S_R1.good.unique.good.filter.unique.precluster.fasta as input file for the fasta parameter.

Using 10 processors.
Unable to open vsearch. Trying mothur's executable directory vsearch.
Unable to open vsearch.
vsearch file does not exist. Checking path... 
Found vsearch in your path, using /beegfs/data/hgbaguidi/miniconda3/envs/mothur148/bin//vsearch
Using vsearch version v2.15.2.
Checking sequences from 18S_R1.good.unique.good.filter.unique.precluster.fasta ...
When using template=self, mothur can only use 1 processor, continuing.
[ERROR]: 18S_R1.good.unique.good.filter.unique.precluster.fasta is blank. Please correct.

It took 2 secs to check your sequences. 0 chimeras were found.

Hi there - do all of your sequences belong to a single sample or are they from multiple samples? mothur appears to be treating them as if they belong to a single sample. I would need to see the previous steps to know how you created your count file and whehter you were using the barcode information to assign sequences to samples. With only a single read, you wouldn’t want to use make.contigs, but use trim.seqs instead.

Pat

Hello

Here are the commands made in Mothur

Also, I would like to know how to make Mothur understand that each fasta file is a group with several sequences. I did make.group, I have a merge.count_table file and for the fasta files I did cat Lib*fasta > 18S_R1.fasta

Example of fasta files:

Lib_5_04_24_S5_R1_18S.fasta

Lib_10_04_24_S10_R1_18S.fasta

Thanks

#fastq.info
#cat Lib*fasta > 18S_R1.fasta
#summary.seqs(fasta=18S_R1.fasta)

#screen.seqs(fasta=18S_R1.fasta, summary=18S_R1.summary, maxambig=0, maxhomop=8)
#summary.seqs(fasta=current)

#unique.seqs(fasta=18S_R1.good.fasta)

#summary.seqs(fasta=current, count= current)

#align.seqs(fasta=current, reference= silva.nr_v138_2.align)
#summary.seqs(fasta=current, count= current)

#screen.seqs(fasta=18S_R1.good.unique.align, count=18S_R1.good.count_table, start=2040, end=6100, minlength=250)
#summary.seqs(fasta=current, count= current)

#filter.seqs(fasta=current, vertical=T, trump= . )
#unique.seqs(fasta=current, count=current)

#summary.seqs(fasta=18S_R1.good.unique.good.filter.unique.fasta, count=18S_R1.good.unique.good.filter.count_table)

#summary.seqs(fasta=current, count=current)

#pre.cluster(fasta=current, count=current, diffs=2)