Pcr.seqs removing many sequences

Hello Pat and team

I am using PCR seqs to adjust my alignment, since this is COI and is more noisy (and some groups have longer or shorted - but not removing any sequence. I used it as:

pcr.seqs(fasta=current, count=current, start=27, end=447)

but then it removes A LOT of sequences, with no reason (they are no empty - min length is 250, and it is removing 40% of the reads). Many do not start at 27, and many do not reach to 447… But it is OK, it is COI - some taxa have gaps there.

mothur > pcr.seqs(fasta=current, count=current, start=27, end=447)
Using COIcontigs.good.good.good.count_table as input file for the count parameter.
Using COItrim.contigs.good.good.good.align as input file for the fasta parameter.

Using 20 processors.
/******************************************/
Running command: remove.seqs(accnos=COItrim.contigs.good.good.good.bad.accnos, count=COIcontigs.good.good.good.count_table)
Removed 7230896 sequences from COIcontigs.good.good.good.count_table.

Is then pcr.seqs working as screen.seqs for start and end? Is this something new from any update - I think I used PCR seqs in the past and was not removing sequences unless they were empty, just being just a PCR.

I am using Mothur in a unix server, managed by slurm

Linux version

Using ReadLine,Boost,HDF5,GSL
mothur v.1.48.0
Last updated: 5/20/22

Leo


Pat, I am answering in the same post since the post is locked due to inactivity:

Summary before and after

mothur > summary.seqs(fasta=current, count=current)
Using COIcontigs.good.good.good.count_table as input file for the count parameter.
Using COItrim.contigs.good.good.good.align as input file for the fasta parameter.

Using 20 processors.

                            Start      End        NBases Ambigs Polymer               NumSeqs

Minimum: 1 345 250 0 3 1
2.5%-tile: 26 439 305 0 4 492435
25%-tile: 27 444 310 0 6 4924349
Median: 27 447 313 0 6 9848698
75%-tile: 27 447 313 0 7 14773046
97.5%-tile: 30 447 315 0 9 19204960
Maximum: 138 473 339 0 10 19697394
Mean: 27 445 311 0 6

of unique seqs: 19697394

total # of seqs: 19697394

It took 319 secs to summarize 19697394 sequences.

Output File Names:
COItrim.contigs.good.good.good.summary

mothur > pcr.seqs(fasta=current, count=current, start=27, end=447)
Using COIcontigs.good.good.good.count_table as input file for the count parameter.
Using COItrim.contigs.good.good.good.align as input file for the fasta parameter.

Using 20 processors.
/******************************************/
Running command: remove.seqs(accnos=COItrim.contigs.good.good.good.bad.accnos, count=COIcontigs.good.good.good.count_table)
Removed 7230896 sequences from COIcontigs.good.good.good.count_table.

Output File Names:
COIcontigs.good.good.good.pick.count_table

/******************************************/
It took 197 secs to screen 19697394 sequences.

Output File Names:
COItrim.contigs.good.good.good.pcr.align
COItrim.contigs.good.good.good.bad.accnos
COItrim.contigs.good.good.good.scrap.pcr.align
COIcontigs.good.good.good.pcr.count_table

mothur > summary.seqs(fasta=current, count=current)
Using COIcontigs.good.good.good.pcr.count_table as input file for the count parameter.
Using COItrim.contigs.good.good.good.pcr.align as input file for the fasta parameter.

Using 20 processors.

                            Start      End        NBases Ambigs Polymer               NumSeqs

Minimum: 27 446 250 0 3 1
2.5%-tile: 27 447 310 0 4 311663
25%-tile: 27 447 313 0 6 3116625
Median: 27 447 313 0 6 6233250
75%-tile: 27 447 313 0 7 9349874
97.5%-tile: 27 447 316 0 8 12154836
Maximum: 29 447 339 0 10 12466498
Mean: 27 446 312 0 6

of unique seqs: 12466498

total # of seqs: 12466498

It took 206 secs to summarize 12466498 sequences.

Output File Names:
COItrim.contigs.good.good.good.pcr.summary

I am not sure why it is creating a badaccnos file in pcr.seqs…

Thank you!!

Can you post the output of running summary.seqs on the fasta file?

Pat

1 Like