Classify.seqs force classifier categories

terryhbell · January 22, 2013, 2:57pm

I love Mothur, and there are fewer and fewer functions for which I need to use outside sources.

I have one request, but don’t know how simple it would be to implement. In classify.seqs, I generally like that the number of rows in the summary is reduced to only the organisms that are actually identified in your dataset. On the rare occasion, however, I have wanted to combine multiple classification summaries, but this was not trivial since they could not be matched up 1 to 1 (e.g. you have a Caulobacter in one dataset, but not in another, so that row is omitted).

Would it be possible to include an option to print all rows in the summary? Or are the rows derived from the output taxonomy file rather than the input .tax file?

Thanks for all your hard work!

pschloss · January 22, 2013, 6:13pm

We can work on this for you… Also, you should note that regardless of whether Caulobacter is present in your dataset, the numbering (eg. 1.2.3.1) will be the same. That may allow you to merge data files in R or something else.

Pat

terryhbell · January 22, 2013, 7:43pm

Thanks for the quick reply, and great suggestion for the workaround! I will use that for now.

westcott · April 30, 2013, 4:10pm

You might try this: http://www.mothur.org/wiki/Merge.taxsummary

fibar · May 16, 2013, 8:05pm

Hi, helpful information! Thanks.
I add a question/suggestion related to this…

Isn’t subsampling important prior classification? I mean, it is not the same to classify a sample with 2000 sequences and another with 8000 sequences, specially if you are using those numbers to compare samples later. Is there a way to do this within Mothur suggested pipeline? At which step would it be advisable?

Cheers,
Fred

pschloss · May 16, 2013, 9:23pm

Sub-sampling is necessary because we aren’t sure of the frequencies when the number of sequences differ between samples. So I wouldn’t be concerned about running classify.otu before subsampling, because it shouldn’t affect the % of different taxa in each OTU. In classify.otu, you actually want more information so you can get a better idea of the consensus classification. Practically speaking, I would be surprised if there were OTUs that changed their consensus classification with and without subsampling first.

fibar · May 17, 2013, 2:49pm

Thanks Pat for your reply. I realize my question was not clear. What I meant is:
Sample 1: 6000 sequences
Sample 2: 2000 sequences
If you run classify.seqs, I think the odds to detect rare taxa is larger in Sample 1 than in Sample 2. If you sub.sample both to 2000 sequences, you kind of balance those odds. Is that right?
I found this command is possible: sub.sample(fasta=X.fasta, group=X.groups, size=2000, persample=T, name=X.names)
Please let me know what you think. Thanks!
Best,
Fred

pschloss · May 20, 2013, 1:11pm

The classification of a sequence doesn’t depend on its abundance. So, I wouldn’t subsample the classifications - I would subsample the phylotype or OTU tables.

Topic		Replies	Views
Classify.otu Commands in mothur	3	1736	September 17, 2015
subsampling fasta and taxonomy file Commands in mothur	2	2440	December 2, 2014
Classify.seqs Commands in mothur	4	2267	February 6, 2015
help needed: subsample not carried through to classif.otu Commands in mothur	3	3316	June 19, 2012
sub.sample and taxonomy file problems mothur bugs	2	3623	January 13, 2012

Classify.seqs force classifier categories

Related topics