Hello,
I am processing a small dataset (16 samples) using the MiSeq protocol. The sequencing facility left primers attached to the seqs, so I am running trim.seqs after make.contigs. I lose group membership after the trim.seqs step.
I provided the important segments of the logfile below:
Linux version
Using ReadLine,Boost,HDF5,GSL
mothur v.1.48.5
Last updated: 1/13/26
Script Mode
mothur > make.contigs(file=soil.files)
Total of all groups is 22158881
It took 4558 secs to process 22158881 sequences.
Output File Names:
soil.trim.contigs.fasta
soil.scrap.contigs.fasta
soil.contigs_report
soil.contigs.count_table
mothur > summary.seqs(fasta=current, count=current)
Using soil.contigs.count_table as input file for the count parameter.
Using soil.trim.contigs.fasta as input file for the fasta parameter.
Using 28 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 35 35 0 3 1
2.5%-tile: 1 291 291 0 4 553973
25%-tile: 1 292 292 0 5 5539721
Median: 1 292 292 0 5 11079441
75%-tile: 1 292 292 0 5 16619161
97.5%-tile: 1 301 301 1 6 21604909
Maximum: 1 602 602 141 301 22158881
Mean: 1 292 292 0 4
# of unique seqs: 22158881
total # of seqs: 22158881
It took 531 secs to summarize 22158881 sequences.
Output File Names:
soil.trim.contigs.summary
mothur > trim.seqs(fasta=current, count=current, oligos=oligos.oligos)
Using soil.contigs.count_table as input file for the count parameter.
Using soil.trim.contigs.fasta as input file for the fasta parameter.
Using 28 processors.
Creating count files...
Output File Names:
soil.trim.contigs.trim.fasta
soil.trim.contigs.scrap.fasta
soil.contigs.trim.count_table
soil.contigs.scrap.count_table
soil.contigs.trim.count_table
soil.contigs.scrap.count_table
mothur > summary.seqs(fasta=current, count=current)
Using soil.contigs.trim.count_table as input file for the count parameter.
Using soil.trim.contigs.trim.fasta as input file for the fasta parameter.
Using 28 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 5 5 0 3 1
2.5%-tile: 1 252 252 0 4 505247
25%-tile: 1 253 253 0 4 5052464
Median: 1 253 253 0 5 10104927
75%-tile: 1 253 253 0 5 15157390
97.5%-tile: 1 253 253 1 6 19704606
Maximum: 1 550 550 113 237 20209852
Mean: 1 252 252 0 4
# of unique seqs: 20209852
total # of seqs: 20209852
It took 501 secs to summarize 20209852 sequences.
The count table generated after make.contigs shows group membership:
head soil.contigs.count_table
#Compressed Format: groupIndex,abundance. For example 1,6 would mean the read has an abundance of 6 for group N01_B_1.
#1,N01_B_1 2,N01_B_2 3,N03_B_2 4,NO3_B_1 5,W06_B_1 6,W06_B_2 7,W21_B_1 8,W21_B_2
Representative_Sequence total N01_B_1 N01_B_2 N03_B_2 NO3_B_1 W06_B_1 W06_B_2 W21_B_1 W21_B_2
VH01318_173_AAFNLT7M5_1_1101_10013_11980 1 8,1
VH01318_173_AAFNLT7M5_1_1101_10013_12586 1 5,1
VH01318_173_AAFNLT7M5_1_1101_10013_12965 1 6,1
VH01318_173_AAFNLT7M5_1_1101_10013_13268 1 6,1
VH01318_173_AAFNLT7M5_1_1101_10013_13495 1 5,1
VH01318_173_AAFNLT7M5_1_1101_10013_13609 1 8,1
VH01318_173_AAFNLT7M5_1_1101_10013_14593 1 5,1
The updarted count table generated after trim.seqs has no group membership:
head soil.contigs.trim.count_table
Representative_Sequence total
VH01318_173_AAFNLT7M5_1_1101_10013_11980 1
VH01318_173_AAFNLT7M5_1_1101_10013_12586 1
VH01318_173_AAFNLT7M5_1_1101_10013_12965 1
VH01318_173_AAFNLT7M5_1_1101_10013_13268 1
VH01318_173_AAFNLT7M5_1_1101_10013_13495 1
VH01318_173_AAFNLT7M5_1_1101_10013_13609 1
VH01318_173_AAFNLT7M5_1_1101_10013_14858 1
VH01318_173_AAFNLT7M5_1_1101_10013_14896 1
VH01318_173_AAFNLT7M5_1_1101_10013_14972 1
I am not sure if this is important, but I noticed that some output files created by trim.seqs are reported twice in the logfile (repeated from logfile snippet above):
mothur > trim.seqs(fasta=current, count=current, oligos=oligos.oligos)
Using soil.contigs.count_table as input file for the count parameter.
Using soil.trim.contigs.fasta as input file for the fasta parameter.
Using 28 processors.
Creating count files...
Output File Names:
soil.trim.contigs.trim.fasta
soil.trim.contigs.scrap.fasta
soil.contigs.trim.count_table
soil.contigs.scrap.count_table
soil.contigs.trim.count_table
soil.contigs.scrap.count_table
I am reluctant to report this as a bug because I am often guilty of overlooking details. However, I cannot track down what I have done incorrectly. I also noticed that the trim.seqs command has been updated in mothur v.1.48. I am not sure if that is relevant.
If anyone has advice, it would be greatly appreciated.
Thank you,
Jim