Hello Pat and team
I am using PCR seqs to adjust my alignment, since this is COI and is more noisy (and some groups have longer or shorted - but not removing any sequence. I used it as:
pcr.seqs(fasta=current, count=current, start=27, end=447)
but then it removes A LOT of sequences, with no reason (they are no empty - min length is 250, and it is removing 40% of the reads). Many do not start at 27, and many do not reach to 447… But it is OK, it is COI - some taxa have gaps there.
mothur > pcr.seqs(fasta=current, count=current, start=27, end=447)
Using COIcontigs.good.good.good.count_table as input file for the count parameter.
Using COItrim.contigs.good.good.good.align as input file for the fasta parameter.
Using 20 processors.
/******************************************/
Running command: remove.seqs(accnos=COItrim.contigs.good.good.good.bad.accnos, count=COIcontigs.good.good.good.count_table)
Removed 7230896 sequences from COIcontigs.good.good.good.count_table.
Is then pcr.seqs working as screen.seqs for start and end? Is this something new from any update - I think I used PCR seqs in the past and was not removing sequences unless they were empty, just being just a PCR.
I am using Mothur in a unix server, managed by slurm
Linux version
Using ReadLine,Boost,HDF5,GSL
mothur v.1.48.0
Last updated: 5/20/22
Leo
Pat, I am answering in the same post since the post is locked due to inactivity:
Summary before and after
mothur > summary.seqs(fasta=current, count=current)
Using COIcontigs.good.good.good.count_table as input file for the count parameter.
Using COItrim.contigs.good.good.good.align as input file for the fasta parameter.
Using 20 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 345 250 0 3 1
2.5%-tile: 26 439 305 0 4 492435
25%-tile: 27 444 310 0 6 4924349
Median: 27 447 313 0 6 9848698
75%-tile: 27 447 313 0 7 14773046
97.5%-tile: 30 447 315 0 9 19204960
Maximum: 138 473 339 0 10 19697394
Mean: 27 445 311 0 6
of unique seqs: 19697394
total # of seqs: 19697394
It took 319 secs to summarize 19697394 sequences.
Output File Names:
COItrim.contigs.good.good.good.summary
mothur > pcr.seqs(fasta=current, count=current, start=27, end=447)
Using COIcontigs.good.good.good.count_table as input file for the count parameter.
Using COItrim.contigs.good.good.good.align as input file for the fasta parameter.
Using 20 processors.
/******************************************/
Running command: remove.seqs(accnos=COItrim.contigs.good.good.good.bad.accnos, count=COIcontigs.good.good.good.count_table)
Removed 7230896 sequences from COIcontigs.good.good.good.count_table.
Output File Names:
COIcontigs.good.good.good.pick.count_table
/******************************************/
It took 197 secs to screen 19697394 sequences.
Output File Names:
COItrim.contigs.good.good.good.pcr.align
COItrim.contigs.good.good.good.bad.accnos
COItrim.contigs.good.good.good.scrap.pcr.align
COIcontigs.good.good.good.pcr.count_table
mothur > summary.seqs(fasta=current, count=current)
Using COIcontigs.good.good.good.pcr.count_table as input file for the count parameter.
Using COItrim.contigs.good.good.good.pcr.align as input file for the fasta parameter.
Using 20 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 27 446 250 0 3 1
2.5%-tile: 27 447 310 0 4 311663
25%-tile: 27 447 313 0 6 3116625
Median: 27 447 313 0 6 6233250
75%-tile: 27 447 313 0 7 9349874
97.5%-tile: 27 447 316 0 8 12154836
Maximum: 29 447 339 0 10 12466498
Mean: 27 446 312 0 6
of unique seqs: 12466498
total # of seqs: 12466498
It took 206 secs to summarize 12466498 sequences.
Output File Names:
COItrim.contigs.good.good.good.pcr.summary
I am not sure why it is creating a badaccnos file in pcr.seqs…
Thank you!!