Trim.seqs removes group membership

Hello,

I am processing a small dataset (16 samples) using the MiSeq protocol. The sequencing facility left primers attached to the seqs, so I am running trim.seqs after make.contigs. I lose group membership after the trim.seqs step.

I provided the important segments of the logfile below:

Linux version
Using ReadLine,Boost,HDF5,GSL
mothur v.1.48.5
Last updated: 1/13/26
Script Mode


mothur > make.contigs(file=soil.files)
Total of all groups is 22158881

It took 4558 secs to process 22158881 sequences.

Output File Names: 
soil.trim.contigs.fasta
soil.scrap.contigs.fasta
soil.contigs_report
soil.contigs.count_table


mothur > summary.seqs(fasta=current, count=current)
Using soil.contigs.count_table as input file for the count parameter.
Using soil.trim.contigs.fasta as input file for the fasta parameter.

Using 28 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	35	35	0	3	1
2.5%-tile:	1	291	291	0	4	553973
25%-tile:	1	292	292	0	5	5539721
Median: 	1	292	292	0	5	11079441
75%-tile:	1	292	292	0	5	16619161
97.5%-tile:	1	301	301	1	6	21604909
Maximum:	1	602	602	141	301	22158881
Mean:	1	292	292	0	4
# of unique seqs:	22158881
total # of seqs:	22158881

It took 531 secs to summarize 22158881 sequences.

Output File Names:
soil.trim.contigs.summary


mothur > trim.seqs(fasta=current, count=current, oligos=oligos.oligos)
Using soil.contigs.count_table as input file for the count parameter.
Using soil.trim.contigs.fasta as input file for the fasta parameter.

Using 28 processors.

Creating count files...


Output File Names: 
soil.trim.contigs.trim.fasta
soil.trim.contigs.scrap.fasta
soil.contigs.trim.count_table
soil.contigs.scrap.count_table
soil.contigs.trim.count_table
soil.contigs.scrap.count_table


mothur > summary.seqs(fasta=current, count=current)
Using soil.contigs.trim.count_table as input file for the count parameter.
Using soil.trim.contigs.trim.fasta as input file for the fasta parameter.

Using 28 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	5	5	0	3	1
2.5%-tile:	1	252	252	0	4	505247
25%-tile:	1	253	253	0	4	5052464
Median: 	1	253	253	0	5	10104927
75%-tile:	1	253	253	0	5	15157390
97.5%-tile:	1	253	253	1	6	19704606
Maximum:	1	550	550	113	237	20209852
Mean:	1	252	252	0	4
# of unique seqs:	20209852
total # of seqs:	20209852

It took 501 secs to summarize 20209852 sequences.


The count table generated after make.contigs shows group membership:

head soil.contigs.count_table

#Compressed Format: groupIndex,abundance. For example 1,6 would mean the read has an abundance of 6 for group N01_B_1.
#1,N01_B_1	2,N01_B_2	3,N03_B_2	4,NO3_B_1	5,W06_B_1	6,W06_B_2	7,W21_B_1	8,W21_B_2	
Representative_Sequence	total	N01_B_1	N01_B_2	N03_B_2	NO3_B_1	W06_B_1	W06_B_2	W21_B_1	W21_B_2
VH01318_173_AAFNLT7M5_1_1101_10013_11980	1	8,1
VH01318_173_AAFNLT7M5_1_1101_10013_12586	1	5,1
VH01318_173_AAFNLT7M5_1_1101_10013_12965	1	6,1
VH01318_173_AAFNLT7M5_1_1101_10013_13268	1	6,1
VH01318_173_AAFNLT7M5_1_1101_10013_13495	1	5,1
VH01318_173_AAFNLT7M5_1_1101_10013_13609	1	8,1
VH01318_173_AAFNLT7M5_1_1101_10013_14593	1	5,1

The updarted count table generated after trim.seqs has no group membership:

head soil.contigs.trim.count_table

Representative_Sequence	total
VH01318_173_AAFNLT7M5_1_1101_10013_11980	1
VH01318_173_AAFNLT7M5_1_1101_10013_12586	1
VH01318_173_AAFNLT7M5_1_1101_10013_12965	1
VH01318_173_AAFNLT7M5_1_1101_10013_13268	1
VH01318_173_AAFNLT7M5_1_1101_10013_13495	1
VH01318_173_AAFNLT7M5_1_1101_10013_13609	1
VH01318_173_AAFNLT7M5_1_1101_10013_14858	1
VH01318_173_AAFNLT7M5_1_1101_10013_14896	1
VH01318_173_AAFNLT7M5_1_1101_10013_14972	1

I am not sure if this is important, but I noticed that some output files created by trim.seqs are reported twice in the logfile (repeated from logfile snippet above):

mothur > trim.seqs(fasta=current, count=current, oligos=oligos.oligos)
Using soil.contigs.count_table as input file for the count parameter.
Using soil.trim.contigs.fasta as input file for the fasta parameter.

Using 28 processors.

Creating count files...


Output File Names: 
soil.trim.contigs.trim.fasta
soil.trim.contigs.scrap.fasta
soil.contigs.trim.count_table
soil.contigs.scrap.count_table
soil.contigs.trim.count_table
soil.contigs.scrap.count_table

I am reluctant to report this as a bug because I am often guilty of overlooking details. However, I cannot track down what I have done incorrectly. I also noticed that the trim.seqs command has been updated in mothur v.1.48. I am not sure if that is relevant.

If anyone has advice, it would be greatly appreciated.

Thank you,

Jim

Instead of trim.seqs, you might try pcr.seqs .

The pcr.seqs command will scan the sequences and remove the primers. It will also return a .bad.accnos file with any sequence names that the primers were not found in. You can use that accnos file to remove the “bad” reads from your count file.

mothur > pcr.seqs(fasta=current, oligos=oligos.oligos, keepdots=f, …. and pdiffs or rdiffs options…)

Removing “bad” reads from the count_table

mothur > remove.seqs(count=current, accnos= current)

You can also provide an oligos file to make.contigs to remove the primers in that function

I tried pcr.seqs. That solved the problem. Thank you!

Jim

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.