screen.seqs: removal of high percent of sequences

Hi all,

I’m suspicious about the following step in MiSeq SOP:

screen.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table, summary=stability.trim.contigs.good.unique.summary, start=2, end=7448, maxhomop=8)

After this step, the number of total sequences was reduced from 1.376.092 to 454.912. It seems to me a great deal of sequences removed. Could it be due to some mistake during the filtering process or to possible bad quality from the dataset?

Many thanks,

Juanjo

You can look at the bad.accnos file that is generated and see what tags are appended to the end of the sequence names that were rejected.

Hi,

What is a “normal” percentage of sequences remaining after trimming by screen.seqs (based on start and end)?

In addition, do you recommend the use of the “optimize” option? Maybe I am mistaken, but in my thought, I think we force the number of sequences to be retained… In our side, we use the 97.5 tile for the start and the 2.5 tile for the stop from the summary in screen.seqs.

Looking forward to hearign from you.

Seb

It’s hard to say what to expect. If you’re using MiSeq with paired reads, you shouldn’t lose a lot of sequences. You should also be able to be pretty specific about your start / end positions. Can you post the output of running summary.seqs on the input to screen.seqs?

Pat