Getting a matrix of OTU counts across samples

I am trying to achieve the following:

I would like to create an N by M table, where N is a given OTU and M is a sample. The values in the table are the sequence abundance of OTU N in sample M. Furthermore, I also want to obtain a representative sequence for OTU N.

I have been following the MiSEQ SOP and I have generated the consensus taxonomy and the summary taxonomy. I also have generated a consensus taxonomy on a group bases (so one OTU classification file per group; persample = T). Then, I ran get.oturep to find the respentative sequence for each OTU.

It’s putting it together those files is where I have some doubts. Here’s why:

  1. When I classify OTUs on a per group basis: do the OTU numbers line up across files? I.e OTU 001, is the same across all my files and if an OTU doesn’t show up in some sample, it would have a count of 0? I seems like the OTU names are consistent, but I am using a mock community and don’t know if I just got lucky.

  2. get.oturep outputs an N by2 matrix. Where in column 1 I have the representative sequences and column 2 I think contains sequences that the one in column 1 represents. SO, if row 1 represent OTU 1, and OTU 1 has 50 sequences in it, then when I parse (by comma) column 2 I should get 50… expect I don’t. So I assume I have done something wrong.

  3. Are there easier ways to do this?? Seems like it would be common and… I am making this harder then it should be.

Thanks kindly!

It sounds like you are looking for something like what the create.database command makes, but with less info. http://www.mothur.org/wiki/Create.database.

  1. When I classify OTUs on a per group basis: do the OTU numbers line up across files? I.e OTU 001, is the same across all my files and if an OTU doesn’t show up in some sample, it would have a count of 0? I seems like the OTU names are consistent, but I am using a mock community and don’t know if I just got lucky.

Mothur is smart enough to know the OTU labels that are used or not used for groups. When persample=t, if a group does not have any sequences in that OTU, then mothur does not find a consensus taxonomy for it.

  1. get.oturep outputs an N by2 matrix. Where in column 1 I have the representative sequences and column 2 I think contains sequences that the one in column 1 represents. SO, if row 1 represent OTU 1, and OTU 1 has 50 sequences in it, then when I parse (by comma) column 2 I should get 50… expect I don’t. So I assume I have done something wrong.

That seems odd. Could you post the command you ran and version you are using?

I have tried to run create database, but it seems to seg fault at the get.oturep

mothur > get.oturep(list=MockCommunity_16S.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, label=0.03, fasta=MockCommunity_16S.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, phylip=MockCommunity_16S.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.phylip.dist,count=MockCommunity_16S.trim.contigs.good.unique.good.filter.unique.precluster.pick.count_table)

********************#****#****#****#****#****#****#****#****#****#****#
Reading matrix:     ||||||||||||||||||||Segmentation fault

currently using
mothur v.1.32.1
Last updated: 10/16/2013

UPDATE

Using a column distance instead of phylip distance, fixed this issue with the segmentation fault.

When I try to create DB is now provides the error that my repnames and my fasta file don’t line up. I checked the rep_countable and rep_fasta, and they indeed do no have the same OTU sequences present… any suggestions?

Update 2
Okay, I added the count parameter to remove.seqs following chimera removal and now it works perfectly and this is exactly the output I want! Thank you! This is excellent!