So many sequences removed with maxambig=0

Hi- I have performed alignment without any prior screening. After alignment (withalign.seqs) I am getting such sequence distribution:

Using 16 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	11550	231	0	3	1
2.5%-tile:	1968	11550	252	0	3	261901
25%-tile:	1968	11552	253	0	4	2619004
Median: 	1968	11552	254	0	4	5238007
75%-tile:	1968	11552	254	5	4	7857010
97.5%-tile:	1968	11552	254	9	6	10214113
Maximum:	1968	13424	284	25	126	10476013
Mean:	1967	11553	253	2	3
# of unique seqs:	3276106
total # of seqs:	10476013

Now I am screening sequences with following command with different maxambig values:

mothur > screen.seqs(fasta=stability.trim.contigs.unique.align, count=stability.trim.contigs.count_table, start=1968, end=11550, maxambig=0)

maxambig=0 removes 45% sequences
maxambig=1 removes 41% sequences
maxambig=2 removes 36% sequences
maxambig=3 removes 34% sequences

Can you please suggest me what should be my ideal maxambig value in this case?

Thanks,
DC7

I would never suggest using any maxambig other than 0. I suspect that your cluster density was too high on the sequencing run, which would cause lower quality sequence data and more ambiguous base calls.

Pat

2 Likes

Maybe you can trim your sequences before to perform the make.contig command…it works a bit for me.

FWIW - when ever we trim before assembly we get worse quality contigs than if we don’t trim. This is as measured using mock community data.
Pat

1 Like

Interesting quote! Do you have any refference to suport your comment? I also was so curious, and I found this paper talking about trim. What do you think?

Peter

My observation is based on my own unpublished analysis of mock community data within the MiSeq pipeline.

pat

1 Like