Screen.seqs getting rid of most sequences

Hello!
I wondered if you would be willing to help me, my screen.seqs is removing 1528936 of 1529102 sequences, whats happening here? See below:

mothur > summary.seqs(fasta=/home/n/nb326/miniconda3/envs/bpsenv/03_preprocess/align/bps.trim.contigs.good.unique.align, count=/home/n/nb326/miniconda3/envs/bpsenv/03_preprocess/bps.trim.contigs.good.count_table)

Using 28 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1044	1044	1	0	1	1
2.5%-tile:	13862	23444	252	0	3	302545
25%-tile:	13862	23444	253	0	4	3025444
Median: 	13862	23444	253	0	4	6050888
75%-tile:	13862	23444	253	0	5	9076332
97.5%-tile:	13862	23444	253	0	6	11799231
Maximum:	43113	43116	254	0	8	12101775
Mean:	**13861**	**23442**	252	0	4
No. of unique seqs:	1529102
total # of seqs:	12101775

It took 251 secs to summarize 12101775 sequences.

Output File Names:
/home/n/nb326/miniconda3/envs/bpsenv/03_preprocess/summary/bps.trim.contigs.good.unique.summary

mothur > screen.seqs(fasta=/home/n/nb326/miniconda3/envs/bpsenv/03_preprocess/align/bps.trim.contigs.good.unique.align, count=/home/n/nb326/miniconda3/envs/bpsenv/03_preprocess/bps.trim.contigs.good.count_table, summary=home/n/nb326/miniconda3/envs/bpsenv/03_preprocess/summary/bps.trim.contigs.good.unique.summary, start=**13861**, end=**23442**)

Using 28 processors.

It took 265 secs to screen 1529102 sequences, removed 1528936.

/******************************************/
Running command: remove.seqs(accnos=/home/n/nb326/miniconda3/envs/bpsenv/03_preprocess/new/new/bps.trim.contigs.good.unique.bad.accnos.temp, count=/home/n/nb326/miniconda3/envs/bpsenv/03_preprocess/bps.trim.contigs.good.count_table)

Removing group: B.10.raw because all sequences have been removed.

Removing group: B.2.raw because all sequences have been removed.

Removing group: B.3.raw because all sequences have been removed.

Removing group: B.5.raw because all sequences have been removed.

Removing group: B.6.raw because all sequences have been removed.

Removing group: B.7.raw because all sequences have been removed.

Removing group: B.8.raw because all sequences have been removed.

Removing group: C.13.1.raw because all sequences have been removed.

Removing group: C.13.2.raw because all sequences have been removed.

Removing group: C.2.2.raw because all sequences have been removed.

Removing group: C.2.3.raw because all sequences have been removed.

Removing group: C.3.2.raw because all sequences have been removed.

Removing group: C.4.2.raw because all sequences have been removed.

Removing group: C.4.3.raw because all sequences have been removed.

Removing group: C.5.3.raw because all sequences have been removed.

Removing group: C.6.1.raw because all sequences have been removed.

Removing group: C.6.2.raw because all sequences have been removed.

Removing group: C.6.3.raw because all sequences have been removed.

Removing group: C.7.1.raw because all sequences have been removed.

Removing group: C.7.3.raw because all sequences have been removed.

Removing group: C.8.3.raw because all sequences have been removed.

Removing group: C.9.3.raw because all sequences have been removed.

Removing group: F.10.1.raw because all sequences have been removed.

Removing group: F.13.2.raw because all sequences have been removed.

Removing group: F.13.3.raw because all sequences have been removed.

Removing group: F.2.3.raw because all sequences have been removed.

Removing group: F.3.1.raw because all sequences have been removed.

Removing group: F.3.2.raw because all sequences have been removed.

Removing group: F.3.3.raw because all sequences have been removed.

Removing group: F.5.1.raw because all sequences have been removed.

Removing group: F.6.3.raw because all sequences have been removed.

Removing group: F.7.3.raw because all sequences have been removed.

Removed 12101186 sequences from your count file.

Output File Names:
/home/n/nb326/miniconda3/envs/bpsenv/03_preprocess/new/new/bps.trim.contigs.good.pick.count_table

/******************************************/

Output File Names:
/home/n/nb326/miniconda3/envs/bpsenv/03_preprocess/new/new/bps.trim.contigs.good.unique.good.summary
/home/n/nb326/miniconda3/envs/bpsenv/03_preprocess/new/new/bps.trim.contigs.good.unique.good.align
/home/n/nb326/miniconda3/envs/bpsenv/03_preprocess/new/new/bps.trim.contigs.good.unique.bad.accnos
/home/n/nb326/miniconda3/envs/bpsenv/03_preprocess/new/new/bps.trim.contigs.good.good.count_table


It took 546 secs to screen 1529102 sequences.

If you look at your log, most of your (good) sequences start at 13862. The mean goes down due to the very small, and likely error, 1bp reads, that are in 1044. So you remove all your good sequences, that start after 13681, and all the 1 bp noise ones, since they do not reach the end?

Why are you using the mean? Look where the fragment you are interested in starts and ends, and cut there. Do not use the mean.

1 Like

Understood, thank you for your help!

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.