Hello, I hope the community can help me with this:
I am working with mothur v 1.48.0 with some data I need to first demultiplex and then run the complete mothur pipeline.
The demultiplexing itself went well, but the pre.cluster command is taking forever. I checked the threats and realized that the pre.cluster command is only using one, instead the complete capacity of the PC.
I wonder if, by any chance, it is not analyzing the samples in group mode, and therefore not only the pre.cluster step is going to take a long time, but the samples won’t separate as the should. When I did not need to demultiplex (in other experiments), I always prepared the stabily.files as part of the script to help separate the groups of sequences and analyze them together, but I cannot do it now, because now I only have one reverse and one forward fastq files.
This is the script I’m using:
# Prepare Mothur batch file
MOTHUR_BATCH_FILE="$MOTHUR_OUTPUT_DIR/mothur_commands.batch"
{
echo "make.contigs(ffastq=LU_L_2_forward_paired.fq, rfastq=LU_L_2_reverse_paired.fq, oligos=$BASE_DIR/barcode_map/LU_L_2_barcode_map.tsv, processors=15, checkorient=t, pdiffs=3, bdiffs=2, tdiffs=4);"
echo "summary.seqs(fasta=current, processors=$CORES);"
echo "screen.seqs(fasta=current, count=current, maxambig=0, minlength=$MIN_LENGTH, maxlength=$MAX_LENGTH, maxhomop=8, processors=15);"
echo "unique.seqs(fasta=current, count=current);"
echo "pre.cluster(fasta=current, count=current, diffs=2, processors=15);"
echo "chimera.vsearch(fasta=current, count=current, dereplicate=t, processors=15);"
echo "classify.seqs(fasta=current, count=current, reference=$EZ_DATABASE, taxonomy=$EZ_TAXONOMY, cutoff=60, processors=15);"
echo "remove.lineage(fasta=current, count=current, taxonomy=current, taxon='unknown-Protista');"
echo "summary.tax(taxonomy=current, count=current, processors=15);"
echo "make.shared(count=current, label=ASV);"
echo "classify.otu(list=current, count=current, taxonomy=current, label=ASV);"
} > "$MOTHUR_BATCH_FILE"
# Execute Mothur with batch file and direct logs to the Mothur log directory
$MOTHUR_EXECUTABLE "$MOTHUR_BATCH_FILE" | tee $MOTHUR_LOG_DIR/mothur.logfile
And these are the logs:
Linux version
Using ReadLine,Boost,GSL
mothur v.1.48.0
Last updated: 5/20/22
by
Patrick D. Schloss
Department of Microbiology & Immunology
University of Michigan
http://www.mothur.org
When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.
Distributed under the GNU General Public License
Type 'help()' for information on the commands that are available
For questions and analysis support, please visit our forum at https://forum.mothur.org
Type 'quit()' to exit program
[NOTE]: Setting random seed to 19760620.
Batch Mode
mothur > make.contigs(ffastq=/media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_forward_paired.fq, rfastq=/media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_reverse_paired.fq, oligos=/media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/barcode_map/LU_L_2_barcode_map.tsv, processors=15, checkorient=t, pdiffs=3, bdiffs=2, tdiffs=4);
Using 15 processors.
Making contigs...
Done.
Group count:
LU_L_2_1.V5_V7 93935
LU_L_2_10.V5_V7 125363
LU_L_2_11.V5_V7 39409
LU_L_2_12.V5_V7 54061
LU_L_2_13.V5_V7 6523
LU_L_2_14.V5_V7 31704
LU_L_2_15.V5_V7 127507
LU_L_2_16.V5_V7 57839
LU_L_2_17.V5_V7 61705
LU_L_2_18.V5_V7 170966
LU_L_2_19.V5_V7 50277
LU_L_2_2.V5_V7 169554
LU_L_2_20.V5_V7 173857
LU_L_2_21.V5_V7 61362
LU_L_2_22.V5_V7 70759
LU_L_2_23.V5_V7 148012
LU_L_2_24.V5_V7 91537
LU_L_2_25.V5_V7 53512
LU_L_2_26.V5_V7 123139
LU_L_2_27.V5_V7 198013
LU_L_2_28.V5_V7 197719
LU_L_2_29.V5_V7 156640
LU_L_2_3.V5_V7 137612
LU_L_2_30.V5_V7 128871
LU_L_2_31.V5_V7 293816
LU_L_2_32.V5_V7 166654
LU_L_2_33.V5_V7 195727
LU_L_2_34.V5_V7 170883
LU_L_2_35.V5_V7 93765
LU_L_2_36.V5_V7 153331
LU_L_2_37.V5_V7 200890
LU_L_2_38.V5_V7 84394
LU_L_2_39.V5_V7 174023
LU_L_2_4.V5_V7 87426
LU_L_2_40.V5_V7 203069
LU_L_2_41.V5_V7 126804
LU_L_2_42.V5_V7 147330
LU_L_2_43.V5_V7 131324
LU_L_2_44.V5_V7 66301
LU_L_2_45.V5_V7 118774
LU_L_2_46.V5_V7 204762
LU_L_2_47.V5_V7 107987
LU_L_2_48.V5_V7 212448
LU_L_2_49.V5_V7 204668
LU_L_2_5.V5_V7 155856
LU_L_2_50.V5_V7 262382
LU_L_2_51.V5_V7 178735
LU_L_2_52.V5_V7 135582
LU_L_2_53.V5_V7 152664
LU_L_2_54.V5_V7 273555
LU_L_2_55.V5_V7 297188
LU_L_2_56.V5_V7 228081
LU_L_2_57.V5_V7 300399
LU_L_2_58.V5_V7 220290
LU_L_2_6.V5_V7 196192
LU_L_2_7.V5_V7 22751
LU_L_2_8.V5_V7 35444
LU_L_2_9.V5_V7 83346
Total of all groups is 8216717
It took 753 secs to process 12910226 sequences.
Output File Names:
/media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_forward_paired.trim.contigs.fasta
/media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_forward_paired.scrap.contigs.fasta
/media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_forward_paired.contigs_report
/media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_forward_paired.contigs.count_table
mothur > summary.seqs(fasta=current, processors=15);
Using /media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_forward_paired.trim.contigs.fasta as input file for the fasta parameter.
Using 15 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 38 38 0 2 1
2.5%-tile: 1 68 68 0 4 205418
25%-tile: 1 372 372 0 5 2054180
Median: 1 372 372 0 5 4108359
75%-tile: 1 378 378 0 5 6162538
97.5%-tile: 1 380 380 9 5 8011300
Maximum: 1 410 410 67 205 8216717
Mean: 1 332 332 0 4
# of Seqs: 8216717
It took 7 secs to summarize 8216717 sequences.
Output File Names:
/media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_forward_paired.trim.contigs.summary
mothur > screen.seqs(fasta=current, count=current, maxambig=0, minlength=330, maxlength=440, maxhomop=8, processors=15);
Using /media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_forward_paired.contigs.count_table as input file for the count parameter.
Using /media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_forward_paired.trim.contigs.fasta as input file for the fasta parameter.
Using 15 processors.
It took 8 secs to screen 8216717 sequences, removed 1456520.
/******************************************/
Running command: remove.seqs(accnos=/media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_forward_paired.trim.contigs.bad.accnos.temp, count=/media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_forward_paired.contigs.count_table)
Removed 1456520 sequences from /media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_forward_paired.contigs.count_table.
Output File Names:
/media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_forward_paired.contigs.pick.count_table
/******************************************/
Output File Names:
/media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_forward_paired.trim.contigs.good.fasta
/media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_forward_paired.trim.contigs.bad.accnos
/media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_forward_paired.contigs.good.count_table
It took 37 secs to screen 8216717 sequences.
mothur > unique.seqs(fasta=current, count=current);
Using /media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_forward_paired.contigs.good.count_table as input file for the count parameter.
Using /media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_forward_paired.trim.contigs.good.fasta as input file for the fasta parameter.
6760197 2603500
Output File Names:
/media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_forward_paired.trim.contigs.good.unique.fasta
/media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_forward_paired.trim.contigs.good.count_table
mothur > pre.cluster(fasta=current, count=current, diffs=2, processors=15);
Using /media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_forward_paired.trim.contigs.good.count_table as input file for the count parameter.
Using /media/shot89_1000/Ubuntu_data/mtp_librerias_maite_05_24/output_LU_L_2/trimmomatic_output/LU_L_2_forward_paired.trim.contigs.good.unique.fasta as input file for the fasta parameter.
Using 15 processors.
/******************************************/
Splitting by sample:
Using 15 processors.
Selecting sequences for groups LU_L_2_12.V5_V7-LU_L_2_13.V5_V7-LU_L_2_14.V5_V7
Selecting sequences for groups LU_L_2_6.V5_V7-LU_L_2_7.V5_V7-LU_L_2_8.V5_V7-LU_L_2_9.V5_V7
Selecting sequences for groups LU_L_2_1.V5_V7-LU_L_2_10.V5_V7-LU_L_2_11.V5_V7
Selecting sequences for groups LU_L_2_22.V5_V7-LU_L_2_23.V5_V7-LU_L_2_24.V5_V7-LU_L_2_25.V5_V7
Selecting sequences for groups LU_L_2_15.V5_V7-LU_L_2_16.V5_V7-LU_L_2_17.V5_V7-LU_L_2_18.V5_V7
Selecting sequences for groups LU_L_2_44.V5_V7-LU_L_2_45.V5_V7-LU_L_2_46.V5_V7-LU_L_2_47.V5_V7
Selecting sequences for groups LU_L_2_33.V5_V7-LU_L_2_34.V5_V7-LU_L_2_35.V5_V7-LU_L_2_36.V5_V7
Selecting sequences for groups LU_L_2_19.V5_V7-LU_L_2_2.V5_V7-LU_L_2_20.V5_V7-LU_L_2_21.V5_V7
Selecting sequences for groups LU_L_2_26.V5_V7-LU_L_2_27.V5_V7-LU_L_2_28.V5_V7-LU_L_2_29.V5_V7
Selecting sequences for groups LU_L_2_40.V5_V7-LU_L_2_41.V5_V7-LU_L_2_42.V5_V7-LU_L_2_43.V5_V7
Selecting sequences for groups LU_L_2_37.V5_V7-LU_L_2_38.V5_V7-LU_L_2_39.V5_V7-LU_L_2_4.V5_V7
Selecting sequences for groups LU_L_2_48.V5_V7-LU_L_2_49.V5_V7-LU_L_2_5.V5_V7-LU_L_2_50.V5_V7
Selecting sequences for groups LU_L_2_3.V5_V7-LU_L_2_30.V5_V7-LU_L_2_31.V5_V7-LU_L_2_32.V5_V7
Selecting sequences for groups LU_L_2_51.V5_V7-LU_L_2_52.V5_V7-LU_L_2_53.V5_V7-LU_L_2_54.V5_V7
Selecting sequences for groups LU_L_2_55.V5_V7-LU_L_2_56.V5_V7-LU_L_2_57.V5_V7-LU_L_2_58.V5_V7
Selected 17101 sequences from LU_L_2_12.V5_V7.
Selected 1121 sequences from LU_L_2_13.V5_V7.
Selected 4052 sequences from LU_L_2_14.V5_V7.
Selected 18956 sequences from LU_L_2_6.V5_V7.
Selected 7862 sequences from LU_L_2_7.V5_V7.
Selected 7815 sequences from LU_L_2_8.V5_V7.
Selected 222 sequences from LU_L_2_9.V5_V7.
Selected 40178 sequences from LU_L_2_1.V5_V7.
Selected 22554 sequences from LU_L_2_10.V5_V7.
Selected 2113 sequences from LU_L_2_11.V5_V7.
Selected 32169 sequences from LU_L_2_22.V5_V7.
Selected 63134 sequences from LU_L_2_23.V5_V7.
Selected 40749 sequences from LU_L_2_24.V5_V7.
Selected 22962 sequences from LU_L_2_25.V5_V7.
Selected 53046 sequences from LU_L_2_15.V5_V7.
Selected 17389 sequences from LU_L_2_16.V5_V7.
Selected 21379 sequences from LU_L_2_17.V5_V7.
Selected 69062 sequences from LU_L_2_18.V5_V7.
Selected 34513 sequences from LU_L_2_44.V5_V7.
Selected 41316 sequences from LU_L_2_45.V5_V7.
Selected 67755 sequences from LU_L_2_46.V5_V7.
Selected 51110 sequences from LU_L_2_47.V5_V7.
Selected 16048 sequences from LU_L_2_19.V5_V7.
Selected 57193 sequences from LU_L_2_2.V5_V7.
Selected 84622 sequences from LU_L_2_20.V5_V7.
Selected 34681 sequences from LU_L_2_21.V5_V7.
Selected 91767 sequences from LU_L_2_33.V5_V7.
Selected 71676 sequences from LU_L_2_34.V5_V7.
Selected 38830 sequences from LU_L_2_35.V5_V7.
Selected 69779 sequences from LU_L_2_36.V5_V7.
Selected 85556 sequences from LU_L_2_40.V5_V7.
Selected 42430 sequences from LU_L_2_41.V5_V7.
Selected 60831 sequences from LU_L_2_42.V5_V7.
Selected 65871 sequences from LU_L_2_43.V5_V7.
Selected 39917 sequences from LU_L_2_26.V5_V7.
Selected 64720 sequences from LU_L_2_27.V5_V7.
Selected 71739 sequences from LU_L_2_28.V5_V7.
Selected 70263 sequences from LU_L_2_29.V5_V7.
Selected 89706 sequences from LU_L_2_37.V5_V7.
Selected 37617 sequences from LU_L_2_38.V5_V7.
Selected 82795 sequences from LU_L_2_39.V5_V7.
Selected 44227 sequences from LU_L_2_4.V5_V7.
Selected 66376 sequences from LU_L_2_3.V5_V7.
Selected 45250 sequences from LU_L_2_30.V5_V7.
Selected 117114 sequences from LU_L_2_31.V5_V7.
Selected 65855 sequences from LU_L_2_32.V5_V7.
Selected 81832 sequences from LU_L_2_48.V5_V7.
Selected 67028 sequences from LU_L_2_49.V5_V7.
Selected 30904 sequences from LU_L_2_5.V5_V7.
Selected 95079 sequences from LU_L_2_50.V5_V7.
Selected 71140 sequences from LU_L_2_51.V5_V7.
Selected 66475 sequences from LU_L_2_52.V5_V7.
Selected 67566 sequences from LU_L_2_53.V5_V7.
Selected 114121 sequences from LU_L_2_54.V5_V7.
Selected 148688 sequences from LU_L_2_55.V5_V7.
Selected 99130 sequences from LU_L_2_56.V5_V7.
Selected 159087 sequences from LU_L_2_57.V5_V7.
Selected 80353 sequences from LU_L_2_58.V5_V7.
It took 35 seconds to split the dataset by sample.
/******************************************/
Processing group LU_L_2_12.V5_V7:
Processing group LU_L_2_15.V5_V7:
Processing group LU_L_2_19.V5_V7:
Processing group LU_L_2_22.V5_V7:
Processing group LU_L_2_26.V5_V7:
Processing group LU_L_2_3.V5_V7:
Processing group LU_L_2_33.V5_V7:
Processing group LU_L_2_37.V5_V7:
Processing group LU_L_2_40.V5_V7:
Processing group LU_L_2_44.V5_V7:
Processing group LU_L_2_48.V5_V7:
Processing group LU_L_2_51.V5_V7:
Processing group LU_L_2_55.V5_V7:
Processing group LU_L_2_6.V5_V7:
Processing group LU_L_2_1.V5_V7:
LU_L_2_19.V5_V7 16048 5374 10674
Total number of sequences before pre.cluster was 16048.
pre.cluster removed 10674 sequences.
It took 4352 secs to cluster 16048 sequences.
Processing group LU_L_2_2.V5_V7:
LU_L_2_12.V5_V7 17101 8127 8974
Total number of sequences before pre.cluster was 17101.
pre.cluster removed 8974 sequences.
It took 10346 secs to cluster 17101 sequences.
…(the pre.cluster is still running, so many more pre.cluster outputs like the one before)