Groups missing after cluster command

I am analysing the ITS data with pairwise.seq followed by cluster.seq. I noticed that some of the groups were missing after cluster.seq. The no. of sample is 179 in the group file and became 154 in the list file after cluster.
The commands I used are:
make.contigs - pcr.seq - screen.seq - chimera.seq -trim.seq - precluster.seq - pairwise.seq - cluster.seq

The summary.seq after precluster.seq (179 samples):

mothur > summary.seqs(fasta=stability.trim.contigs.pcr.good.unique.pick.trim.precluster.fasta, count=stability.trim.contigs.pcr.good.unique.pick.trim.precluster.count_table)

Using 40 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	230	230	0	3	1
2.5%-tile:	1	230	230	0	4	315717
25%-tile:	1	230	230	0	4	3157163
Median: 	1	230	230	0	5	6314325
75%-tile:	1	230	230	0	6	9471487
97.5%-tile:	1	230	230	0	7	12312932
Maximum:	1	230	230	0	8	12628648
Mean:	1	230	230	0	4
# of unique seqs:	44522
total # of seqs:	12628648

It took 2 secs to summarize 12628648 sequences.


mothur > pairwise.seqs(fasta=stability.trim.contigs.pcr.good.unique.pick.trim.precluster.fasta,cutoff=0.10,processors=10)

Using 10 processors.

Sequence	Time	Num_Dists_Below_Cutoff
0	0	0
100	3	68
200	11	244
300	26	524
400	48	980
500	76	1709
600	107	2508
........
.......
14078	57393	1504650

It took 57397 secs to find distances for 44522 sequences. 12457754 distances below cutoff 0.1.


Output File Names: 
stability.trim.contigs.pcr.good.unique.pick.trim.precluster.dist

mothur > cluster(column=stability.trim.contigs.pcr.good.unique.pick.trim.precluster.dist,count=stability.trim.contigs.pcr.good.unique.pick.trim.precluster.count_table)

Using 40 processors.

You did not set a cutoff, using 0.03.

Clustering stability.trim.contigs.pcr.good.unique.pick.trim.precluster.dist


iter	time	label	num_otus	cutoff	tp	tn	fp	fn	sensitivity	specificity	ppv	npv	fdr	accuracy	mcc	f1score

0.03
0	0	0.03	44522	0.03	0	9.91078e+08	0	3913	0	1	0	0.999996	1	0.999996	0	0	
1	0	0.03	44288	0.03	596	9.91078e+08	63	3317	0.152313	1	0.904401	0.999997	0.904401	0.999997	0.371149	0.260717	
2	0	0.03	44291	0.03	624	9.91078e+08	66	3289	0.159468	1	0.904348	0.999997	0.904348	0.999997	0.379756	0.271128	
3	0	0.03	44293	0.03	624	9.91078e+08	64	3289	0.159468	1	0.906977	0.999997	0.906977	0.999997	0.380307	0.271245	
4	0	0.03	44293	0.03	624	9.91078e+08	64	3289	0.159468	1	0.906977	0.999997	0.906977	0.999997	0.380307	0.271245	


It took 1 seconds to cluster

Output File Names: 
stability.trim.contigs.pcr.good.unique.pick.trim.precluster.opti_mcc.list
stability.trim.contigs.pcr.good.unique.pick.trim.precluster.opti_mcc.steps
stability.trim.contigs.pcr.good.unique.pick.trim.precluster.opti_mcc.sensspec

make.shared(list=stability.trim.contigs.pcr.good.unique.pick.trim.precluster.opti_mcc.list,count=stability.trim.contigs.pcr.good.unique.pick.trim.precluster.count_table ,label=0.03)

Output File Names:

stability.trim.contigs.pcr.good.unique.pick.trim.precluster.opti_mcc.shared

mothur > count.groups(shared=current)

Using stability.trim.contigs.pcr.good.unique.pick.trim.precluster.opti_mcc.shared as input file for the shared parameter.
mothur > count.groups(shared=current)

Using stability.trim.contigs.pcr.good.unique.pick.trim.precluster.opti_mcc.shared as input file for the shared parameter.

1 contains 70988.

10 contains 76729.

100 contains 74455.

101 contains 77634.

102 contains 71252.

103 contains 70535.

104 contains 51048.

105 contains 73197.

106 contains 61454.

107 contains 44071.

108 contains 57442.
.............................
.............................
82 contains 81138.

83 contains 80754.

84 contains 81059.

85 contains 79033.

Size of smallest group: 3869.

Total seqs: 11479878.

Output File Names:

stability.trim.contigs.pcr.good.unique.pick.trim.precluster.opti_mcc.count.summary

I suspect that by the time you went through all of the steps there were samples where all of the reads were removed. These steps would have been thinks like screen.seqs and chimera.uchime. If you run count.groups on the group file created by make.contigs, I suspect you’ll find samples with not many reads in them.

Pat

Hei, Pat,

Thanks for the suggestion. Actually, I have been running the count.groups after each step to avoid errors. I just realised that after the chimera.vsearch, it seems there is problem with stability.trim.contigs.pcr.good.denovo.vsearch.pick.count_table. If I use fasta file generated by remove.seq with vsearch.accnos and count_table generated by chimera.vsearch for downstream trim.seq. There is error in the output that fasta and count_table do not match. The no. of reads in each samples differed from the two count_tables. Then I generated new count_table at the same time by remove.seq with vsearch.accnos for fasta file. It works fine. But there will be problem in the cluster command!! Any ideas how to explain and what the problem would be? I attached the command used and the output differences (179 samples in total).

mothur > chimera.vsearch(fasta=stability.trim.contigs.pcr.good.unique.fasta, count=stability.trim.contigs.pcr.good.count_table, dereplicate=t, processors=4)

Using 4 processors.
Unable to open vsearch. Trying mothur’s executable location vsearch.
Unable to open vsearch. Trying mothur’s tools location vsearch.
Unable to open vsearch.
vsearch file does not exist. Checking path…
Found vsearch in your path, using /appl/soft/bio/mothur/mothur-1.44.3//vsearch
Using vsearch version v2.13.3.
Checking sequences from stability.trim.contigs.pcr.good.unique.fasta …

/******************************************/
Running command: split.groups(groups=1-10-100-101-102-103-104-105-106-107-108-109-11-110-111-112-113-114-115-116-117-118-119-12-120-121-122-123-124-125-126-127-128-129-13-130-131-132-133-134-135-136-137-138-139-14-140-141-142-143-144-145-146-147-148-149-15-150-151-152-153-154-155-156-157-158-16-160-161-162-163-164-165-166-167-168-169-17-170-171-172-173-174-175-176-177-178-179-18-180-19-2-20-21-22-23-24-25-26-27-28-29-3-30-31-32-33-34-35-36-37-38-39-4-40-41-42-43-44-45-46-47-48-49-5-50-51-52-53-54-55-56-57-58-59-6-60-61-62-63-64-65-66-67-68-69-7-70-71-72-73-74-75-76-77-78-79-8-80-81-82-83-84-85-86-87-88-89-9-90-91-92-93-94-95-96-97-98-99,

Output File Names:
stability.trim.contigs.pcr.good.denovo.vsearch.pick.count_table
stability.trim.contigs.pcr.good.unique.denovo.vsearch.chimeras
stability.trim.contigs.pcr.good.unique.denovo.vsearch.accnos

mothur > count.groups(count=stability.trim.contigs.pcr.good.denovo.vsearch.pick.count_table)

1 contains 76358.

10 contains 81401.

100 contains 80033.

101 contains 80456.

102 contains 75144.

103 contains 80636.

104 contains 80988.

105 contains 77963.

106 contains 80226.

107 contains 80260.

108 contains 75069.

109 contains 78252.

11 contains 80800.

110 contains 80373.

111 contains 81548.

112 contains 78760.

113 contains 81318.

114 contains 76988.

115 contains 80866.

116 contains 78231.

117 contains 81069.

118 contains 81129.

119 contains 79373.

12 contains 81314.

120 contains 81085.

121 contains 81601.

122 contains 81456.

123 contains 77940.

124 contains 81260.

125 contains 80925.

126 contains 81194.

127 contains 81175.

128 contains 81117.

129 contains 81244.

13 contains 77099.

130 contains 47121.

131 contains 46390.

132 contains 78697.

133 contains 65285.

134 contains 81187.

135 contains 79309.

136 contains 80979.

137 contains 79127.

138 contains 79187.

139 contains 81023.

14 contains 78924.

140 contains 80065.

141 contains 80594.

142 contains 80067.

143 contains 77352.

144 contains 81148.

145 contains 80748.

146 contains 77385.

147 contains 81655.

148 contains 79563.

149 contains 80905.

15 contains 81229.

150 contains 81449.

151 contains 81157.

152 contains 79263.

153 contains 80239.

154 contains 80623.

155 contains 80066.

156 contains 80884.

157 contains 80123.

158 contains 78785.

16 contains 80451.

160 contains 80942.

161 contains 80593.

162 contains 80391.

163 contains 80331.

164 contains 80454.

165 contains 80682.

166 contains 81302.

167 contains 80749.

168 contains 81114.

169 contains 79839.

17 contains 81951.

170 contains 80887.

171 contains 81009.

172 contains 81227.

173 contains 80895.

174 contains 81185.

175 contains 80376.

176 contains 74950.

177 contains 78507.

178 contains 75393.

179 contains 80610.

18 contains 81615.

180 contains 77561.

19 contains 81542.

2 contains 81213.

20 contains 79090.

21 contains 81659.

22 contains 70592.

23 contains 81508.

24 contains 81385.

25 contains 8219.

26 contains 81444.

27 contains 80112.

28 contains 80856.

29 contains 80233.

3 contains 81462.

30 contains 81360.

31 contains 81610.

32 contains 80612.

33 contains 82010.

34 contains 78283.

35 contains 81193.

36 contains 81248.

37 contains 81857.

38 contains 81427.

39 contains 77649.

4 contains 81366.

40 contains 75115.

41 contains 81467.

42 contains 78521.

43 contains 81446.

44 contains 81040.

45 contains 78270.

46 contains 81853.

47 contains 81454.

48 contains 81337.

49 contains 79653.

5 contains 81269.

50 contains 80927.

51 contains 81560.

52 contains 81900.

53 contains 81266.

54 contains 81801.

55 contains 81006.

56 contains 81453.

57 contains 81766.

58 contains 81228.

59 contains 81596.

6 contains 81526.

60 contains 69855.

61 contains 81578.

62 contains 81028.

63 contains 81681.

64 contains 75640.

65 contains 81732.

66 contains 58175.

67 contains 81538.

68 contains 81783.

69 contains 81257.

7 contains 80226.

70 contains 81138.

71 contains 81978.

72 contains 79521.

73 contains 82083.

74 contains 80329.

75 contains 80876.

76 contains 81599.

77 contains 80734.

78 contains 81202.

79 contains 81897.

8 contains 81619.

80 contains 81882.

81 contains 80832.

82 contains 81347.

83 contains 81083.

84 contains 81430.

85 contains 80525.

86 contains 80535.

87 contains 81409.

88 contains 81904.

89 contains 81263.

9 contains 80714.

90 contains 81529.

91 contains 81021.

92 contains 81227.

93 contains 81253.

94 contains 80748.

95 contains 77592.

96 contains 80662.

97 contains 80846.

98 contains 79048.

99 contains 80040.

Size of smallest group: 8219.

Total seqs: 14197937.

mothur > remove.seqs(fasta=stability.trim.contigs.pcr.good.unique.fasta,accnos=stability.trim.contigs.pcr.good.unique.denovo.vsearch.accnos,count=stability.trim.contigs.pcr.good.count_table)
Removed 24635 sequences from your fasta file.
Removed 49231 sequences from your count file.

Output File Names:
stability.trim.contigs.pcr.good.unique.pick.fasta
stability.trim.contigs.pcr.good.pick.count_table

mothur > count.groups(count=stability.trim.contigs.pcr.good.pick.count_table)

1 contains 76883.

10 contains 81435.

100 contains 80137.

101 contains 80497.

102 contains 75318.

103 contains 80708.

104 contains 81044.

105 contains 78012.

106 contains 80380.

107 contains 80294.

108 contains 75312.

109 contains 78466.

11 contains 80834.

110 contains 80659.

111 contains 81597.

112 contains 78839.

113 contains 81385.

114 contains 77063.

115 contains 81019.

116 contains 78349.

117 contains 81109.

118 contains 81213.

119 contains 79513.

12 contains 81331.

120 contains 81186.

121 contains 81821.

122 contains 81490.

123 contains 78928.

124 contains 81341.

125 contains 80962.

126 contains 81268.

127 contains 81249.

128 contains 81196.

129 contains 81266.

13 contains 77167.

130 contains 47143.

131 contains 46430.

132 contains 78828.

133 contains 65353.

134 contains 81293.

135 contains 79655.

136 contains 81031.

137 contains 79553.

138 contains 79253.

139 contains 81086.

14 contains 78979.

140 contains 80158.

141 contains 80657.

142 contains 80124.

143 contains 77768.

144 contains 81211.

145 contains 81130.

146 contains 77434.

147 contains 81697.

148 contains 79612.

149 contains 80964.

15 contains 81283.

150 contains 81565.

151 contains 81216.

152 contains 79367.

153 contains 80488.

154 contains 80705.

155 contains 80125.

156 contains 81026.

157 contains 80215.

158 contains 79063.

16 contains 80457.

160 contains 81128.

161 contains 80698.

162 contains 80537.

163 contains 80412.

164 contains 80665.

165 contains 80721.

166 contains 81406.

167 contains 80872.

168 contains 81173.

169 contains 79880.

17 contains 81999.

170 contains 80935.

171 contains 81118.

172 contains 81264.

173 contains 80982.

174 contains 81266.

175 contains 80519.

176 contains 75046.

177 contains 81033.

178 contains 78251.

179 contains 81499.

18 contains 81636.

180 contains 77714.

19 contains 81562.

2 contains 81223.

20 contains 79147.

21 contains 81724.

22 contains 70970.

23 contains 81675.

24 contains 81436.

25 contains 8248.

26 contains 81507.

27 contains 80817.

28 contains 80936.

29 contains 80327.

3 contains 81473.

30 contains 81391.

31 contains 81629.

32 contains 81130.

33 contains 82015.

34 contains 78493.

35 contains 81291.

36 contains 81305.

37 contains 81862.

38 contains 81466.

39 contains 77833.

4 contains 81398.

40 contains 75221.

41 contains 81504.

42 contains 80355.

43 contains 81507.

44 contains 81290.

45 contains 79729.

46 contains 81866.

47 contains 81498.

48 contains 81344.

49 contains 79702.

5 contains 81300.

50 contains 80942.

51 contains 81580.

52 contains 81953.

53 contains 81364.

54 contains 81836.

55 contains 81028.

56 contains 81670.

57 contains 81772.

58 contains 81239.

59 contains 81617.

6 contains 81534.

60 contains 69859.

61 contains 81580.

62 contains 81589.

63 contains 81695.

64 contains 75956.

65 contains 81751.

66 contains 58385.

67 contains 81577.

68 contains 81805.

69 contains 81317.

7 contains 80243.

70 contains 81160.

71 contains 82020.

72 contains 80736.

73 contains 82132.

74 contains 80424.

75 contains 80989.

76 contains 81665.

77 contains 80744.

78 contains 81255.

79 contains 81961.

8 contains 81629.

80 contains 81897.

81 contains 81131.

82 contains 81357.

83 contains 81104.

84 contains 81476.

85 contains 80595.

86 contains 80613.

87 contains 81429.

88 contains 81965.

89 contains 81280.

9 contains 80753.

90 contains 81529.

91 contains 81045.

92 contains 81256.

93 contains 81271.

94 contains 80888.

95 contains 77827.

96 contains 80719.

97 contains 80953.

98 contains 79191.

99 contains 80065.

Size of smallest group: 8248.

Total seqs: 14226419.

Output File Names:

stability.trim.contigs.pcr.good.pick.count.summary

mothur > summary.seqs(fasta=stability.trim.contigs.pcr.good.unique.pick.fasta,count=stability.trim.contigs.pcr.good.denovo.vsearch.pick.count_table)

Using 40 processors.

Start End NBases Ambigs Polymer NumSeqs

Minimum: 1 18 18 0 2 1

2.5%-tile: 1 198 198 0 4 354949

25%-tile: 1 237 237 0 4 3549485

Median: 1 248 248 0 5 7098969

75%-tile: 1 273 273 0 6 10648453

97.5%-tile: 1 314 314 0 7 13842989

Maximum: 1 360 360 0 8 14197937

Mean: 1 255 255 0 5

of unique seqs: 279961

total # of seqs: 14197937

It took 88 secs to summarize 14197937 sequences.

mothur > summary.seqs(fasta=current,count=current)
Using stability.trim.contigs.pcr.good.pick.count_table as input file for the count parameter.
Using stability.trim.contigs.pcr.good.unique.pick.fasta as input file for the fasta parameter.

Using 40 processors.

	Start	End	NBases	Ambigs	Polymer	NumSeqs

Minimum: 1 18 18 0 2 1
2.5%-tile: 1 198 198 0 4 355661
25%-tile: 1 237 237 0 4 3556605
Median: 1 248 248 0 5 7113210
75%-tile: 1 272 272 0 6 10669815
97.5%-tile: 1 314 314 0 7 13870759
Maximum: 1 360 360 0 8 14226419
Mean: 1 255 255 0 5

of unique seqs: 279961

total # of seqs: 14226419

It took 19 secs to summarize 14226419 sequences.

Many thanks,

Hui

Can you try running remove.seqs without the count=stability.trim.contigs.pcr.good.count_table? The count table is updated in the chimera checking step - it’s stability.trim.contigs.pcr.good.denovo.vsearch.pick.count_table

Pat