mothur

Mothur not generating sample wise classification


#1

Mothur user

Kindly help me in solving this issue.
I am using following command to classify my sequences.
Windows version

Running 64Bit Version

mothur v.1.38.0

Last updated: 7/20/2016

mothur >
classify.seqs(fasta=merge.trim.good.unique.good.good.filter.unique.precluster.pick.pick.fasta, count=merge.trim.good.unique.good.good.filter.unique.precluster.pick.pick.count_table, template=trainset14.fasta, taxonomy=trainset14.tax,cutoff=80, processors=2, iters=1000)

Using 2 processors.
Reading template taxonomy… DONE.
Reading template probabilities… DONE.
It took 21 seconds get probabilities.
Classifying sequences from merge.trim.good.unique.good.good.filter.unique.precluster.pick.pick.fasta …
Reading template taxonomy… DONE.
Reading template probabilities… DONE.
It took 21 seconds get probabilities.
[WARNING]: M00384_276_000000000-AN4FU_1_1111_14009_6307 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.

[WARNING]: mothur reversed some your sequences for a better classification. If you would like to take a closer look, please check merge.trim.good.unique.good.good.filter.unique.precluster.pick.pick.trainset14.wang.flip.accnos for the list of the sequences.

It took 267917 secs to classify 100228 sequences.

It took 5 secs to create the summary file for 100228 sequences.

Output File Names:
merge.trim.good.unique.good.good.filter.unique.precluster.pick.pick.trainset14.wang.taxonomy
merge.trim.good.unique.good.good.filter.unique.precluster.pick.pick.trainset14.wang.tax.summary
merge.trim.good.unique.good.good.filter.unique.precluster.pick.pick.trainset14.wang.flip.accnos

However, when I am looking at the output summary file, it has classification in this format

taxlevel rankID taxon daughterlevels total
0 0 Root 3 769213
1 0.1 Archaea 2 2
2 0.1.1 Archaea_unclassified 1 1
3 0.1.1.1 Archaea_unclassified 1 1
4 0.1.1.1.1 Archaea_unclassified 1 1
5 0.1.1.1.1.1 Archaea_unclassified 1 1
6 0.1.1.1.1.1.1 unclassified 0 1
2 0.1.2 Euryarchaeota 1 1
3 0.1.2.1 Thermoplasmata 1 1
4 0.1.2.1.1 Methanomassiliicoccales 1 1

And so On…

While earlier, I used to get the output in summary file for "each sample " as below.

taxlevel rankID taxon daughterlevels total 799_1 799_10 799_100 799_101 799_102 799_103 799_104 799_105 799_106 799_107 799_108 799_109 799_11 799_110 799_112 799_113 799_114 799_115 799_116 799_117 799_118 799_119 799_12 799_120 799_121 799_122 799_123 799_124 799_125 799_126 799_127 799_128 799_129 799_13 799_130 799_131 799_132 799_133 799_134 799_135 799_136 799_137 799_138 799_139 799_14 799_140 799_141 799_142 799_143 799_144 799_145 799_146 799_147 799_148 799_149 799_15 799_150 799_151 799_152 799_153 799_154 799_155 799_156 799_157 799_158 799_159 799_16 799_160 799_161 799_163 799_164 799_165 799_166 799_167 799_168 799_169 799_17 799_170 799_171 799_18 799_19 799_2 799_20 799_21 799_22 799_23 799_24 799_241 799_25 799_26 799_27 799_28 799_29 799_3 799_30 799_31 799_32 799_33 799_34 799_35 799_36 799_37 799_38 799_39 799_4 799_40 799_41 799_42 799_43 799_44 799_45 799_46 799_47 799_48 799_49 799_5 799_50 799_51 799_52 799_53 799_54 799_55 799_56 799_57 799_58 799_59 799_6 799_60 799_61 799_62 799_63 799_64 799_65 799_66 799_67 799_68 799_69 799_7 799_70 799_71 799_72 799_73 799_74 799_75 799_76 799_77 799_78 799_79 799_8 799_80 799_81 799_82 799_83 799_84 799_85 799_86 799_87 799_88 799_89 799_9 799_90 799_91 799_92 799_93 799_94 799_95 799_96 799_97 799_98 799_99
0 0 Root 2 200517 3272 4190 1772 1803 1002 540 609 1010 1065 1063 1106 591 3909 473 2310 2830 2660 2938 2607 2448 1706 1420 5281 2061 2918 4093 3030 2975 1916 2587 1709 2064 3849 4067 3402 2545 1909 1041 1441 1103 1451 3241 2665 2032 3108 1951 1449 2706 823 1886 3959 3206 2114 1175 1051 1388 1255 1420 1761 3604 2358 1819 2762 1739 1844 686 2475 1067 858 752 720 395 744 557 461 1030 3330 728 1048 3541 1986 4031 3870 3901 3983 1551 2020 1298 4226 4050 2719 4331 5861 1863 3595 2369 3504 3390 3709 3327 2063 2632 2802 2259 4189 1217 4712 11293 5212 2560 3715 2506 1544 1890 4492 5116 4157 2527 6195 5140 4028 1576 2279 2648 2120 4059 2483 3693 3905 2653 1840 2266 4906 3359 4913 3878 5578 1972 1735 1610 2277 3385 4253 3840 3524 3804 1929 1955 1560 1922 4271 4024 2882 5386 4431 2032 1815 2328 3103 4816 3581 2094 5377 4062 2437 2309 1603 3347 2851 2403
1 0.1 Archaea 2 46 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 1 1 0 0 0 1 1 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 2 1 0 0 3 1 4 0 0 0 0 0 0 1 0 0 1 1 2 0 3 1 1 0 0 0 0 3 2 1 0 0 1 0 0 1 0 1 0 3 2 0 0 0 0 0 0 0 1 0 0 0 0 4 0 2 0 1 1 0 0 0 0 5 1 2 0 1 1 0 0 0 1 0 2 0 0 0 0 0 0 0 1 2 2 1 0 0 0 0 0
2 0.1.1 “Crenarchaeota” 1 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0

Kindly, provide me the reason for this. I need sample wise classification for my dataset.
Thank you
Gurdeep


#2

When you look at the contents of merge.trim.good.unique.good.good.filter.unique.precluster.pick.pick.count_table, do you see the group columns? I suspect there aren’t any groups and that you lost them somewhere upstream of this command.

A couple of other things…

  1. You are using a version that is 2.5 years old. A number of bugs have been fixed in the intervening time.
  2. It would probably be best to follow the phylotype section in the MiSeq SOP (i.e. phylotype, make.shared, classify.otu) to get what you want. That data would likely be easier to work with for downstream analyses.

#3

Thank you Dr. Schloss you have picked the right problem about loosing group information somewhere upstream. Let me re-run the pipeline. thank you.