screen.seqs removes most sequences

Could you post your specific problem to a new thread? This one is a year old and it’s not clear what your specific issue might be.

Pat

Dear Pat,

My research is quite the same as this topic, 16S V4 primer 515F 806R, so I would not create a new topic and just follow this with the hope of your help.

I am following your great guideline I have seen, it is super details

Bellow is my summary:
summary.seqs(fasta=all.fasta, processors=16)

            Start   End     NBases  Ambigs  Polymer NumSeqs

Minimum: 1 35 35 0 2 1
2.5%-tile: 1 291 291 0 4 331195
25%-tile: 1 292 292 0 4 3311945
Median: 1 292 292 0 5 6623889
75%-tile: 1 332 332 0 7 9935833
97.5%-tile: 1 373 373 16 14 12916583
Maximum: 1 502 502 95 251 13247777
Mean: 1 314 314 1 6

of Seqs: 13247777

It took 132 secs to summarize 13247777 sequences.

screen.seqs(fasta=all.fasta, group=all.groups, maxambig=0, minlength=230, maxlength=275)

It took 179 secs to screen 13247777 sequences, removed 13228652.

I am trying to do further and there are warning and errors

align.seqs(fasta=all.good.unique.fasta, reference=silva.138_v4.fasta)
Reading in the silva.138_v4.fasta template sequences… DONE.
It took 83 to read 146796 sequences.
summary.seqs(fasta=all.good.unique.align, count=all.good.count_table)

Using 16 processors.

            Start   End     NBases  Ambigs  Polymer NumSeqs

Minimum: 0 0 0 0 1 1
2.5%-tile: 1 1231 3 0 1 479
25%-tile: 3 10651 3 0 2 4782
Median: 13391 13425 3 0 3 9563
75%-tile: 13422 13425 12 0 3 14344
97.5%-tile: 13422 13425 35 0 4 18647
Maximum: 13425 13425 275 0 15 19125
Mean: 9859 10428 9 0 2

of unique seqs: 15458

total # of seqs: 19125

It took 1 secs to summarize 19125 sequences.

screen.seqs(fasta=all.good.unique.align, count=all.good.count_table, summary=all.good.unique.summary, start=9859, end=10428, maxhomop=8)

Using 16 processors.

It took 1 secs to screen 15458 sequences, removed 15365.

/******************************************/
Running command: remove.seqs(accnos=all.good.unique.bad.accnos.temp, count=all.good.count_table)
Removed 19005 sequences from your count file.

unique.seqs(fasta=all.good.unique.good.filter.fasta, count=all.good.good.count_table)
Removing group: VHT100 because all sequences have been removed.

[ERROR]: M00868_223_000000000-J6LB9_1_2109_7994_21648 is not in your count table. Please correct

It is so weird all of the sequence removed!
I really need your help to do further.

Kind regards,
Truong

Can you post this to its own thread? This thread hadn’t been touched in 6 years.

1 Like