Interpreting seq.error output

Hi everyone,

I’m currently analyzing my institution’s first V4 MiSeq run. We included a mock community in a single plate (95 samples + 1 mock). I’ve followed the current MiSeq SOP to the tee, and I’m getting interesting seq.error output, which I think may indicate some contamination in our samples. The overall error rate is high (0.5%):

Multiply error rate by 100 to obtain the percent sequencing errors.
Overall error rate: 0.00528679
Errors Sequences
0 50869
1 9
2 29
3 224
4 41
5 17
6 5
7 4
8 1
9 0
10 0
11 0
12 0
13 0
14 2
15 0
16 0
17 4265
18 1
19 1
20 4
21 0
22 0
23 0
24 0
25 0
26 0
27 0
28 0
29 0
30 0
31 2
32 0
33 0
34 5
35 0
36 0
37 0
38 0
39 0
40 0
41 1
42 0
43 0
44 0
45 0
46 0
47 0
48 5
It took 6 secs to check 678 sequences

The very large group of sequences (4265) with 17 errors, with a relative scarcity of sequences with errors of similar size, makes me think this is contamination with a species that has a ~17bp mismatch with a reference in the mock community.

Does this interpretation seem reasonable? I am currently clustering and classifying, so I will have the OTUs in the mock sample momentarily.

My primary purpose for this analysis is just to “benchmark” our V4 protocol–none of this data will be used for research purposes. If this error profile seems consistent with low-level contamination, we will likely move forward with the new V4 protocol as-is (and try to identify potential sources of contamination, obviously!).

Thanks for any help,
Greg

Hi Greg,

Yeah, it looks like there is some contamination in there or that you missed a sequence that went into your mock. I’d check out what that sequence is and go from there.

Pat