Filter.seqs what kind of numbers should be being removed?

Hello, I am analysis V4 MiSeq data. Im not sure what is an appropriate number of columns do be removed in this step. The number removed seems very high, is this reasonable? Thanks in advance!

mothur > summary.seqs(fasta=current, count=current)
Using /home/n/nb326/miniconda3/envs/tbatch/03_preprocess/tbps.trim.contigs.good.good.count_table as input file for the count parameter.
Using /home/n/nb326/miniconda3/envs/tbatch/03_preprocess/tbps.trim.contigs.good.unique.good.align as input file for the fasta parameter.

Using 28 processors.

  Start	End	NBases	Ambigs	Polymer	NumSeqs

Minimum: 10357 23444 252 0 3 1
2.5%-tile: 13862 23444 252 0 3 302276
25%-tile: 13862 23444 253 0 4 3022754
Median: 13862 23444 253 0 4 6045507
75%-tile: 13862 23444 253 0 5 9068260
97.5%-tile: 13862 23444 253 0 6 11788737
Maximum: 13862 25319 254 0 8 12091012
Mean: 13861 23444 252 0 4
Number of unique seqs: 1525842
total # of seqs: 12091012

It took 242 secs to summarize 12091012 sequences.

Output File Names:
/home/n/nb326/miniconda3/envs/tbatch/03_preprocess/tbps.trim.contigs.good.unique.good.summary

mothur > filter.seqs(fasta=current, vertical=T, trump=.)
Using /home/n/nb326/miniconda3/envs/tbatch/03_preprocess/tbps.trim.contigs.good.unique.good.align as input file for the fasta parameter.

Using 28 processors.
Creating Filter…
It took 278 secs to create filter for 1525842 sequences.

Running Filter…
It took 206 secs to filter 1525842 sequences.

Length of filtered alignment: 641
Number of columns removed: 49359
Length of the original alignment: 50000
Number of sequences used to construct filter: 1525842

That looks pretty good. :blush:

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.