Increase in sample-specific OTU abundance after remove.seqs?

Hi,

I selected a number of unique OTUs with get.otulabel command using list option and subsequently their corresponding sequences with list.seqs.

Then I removed these sequences from 97% OTU list file and group file. When I reconstructed my shared file from these two input files to generate a 97% OTU table, I did observe a consistent decrease in total OTU abundance across most of OTU labels, which is presumably correct, BUT a differential increase and/or decrease in sample-specific OTU abundance within each individual OTU label, which is somehow puzzling to me.

Might this be a bug?

Thank you.

Daniel

Sorry, I’m not following what you’re doing (or why…). Could you post the commands you’re running along with an example of what you’re getting?

Thanks,
Pat

Hi again Pat,

My apology for late response.

We’re interested in investigating the prevalence of OTU sequences that can be confidently classified to certain bacterial strains.

Here are the commands I used with a new dataset:

To generate non-selected/original shared file…
make.shared(list=final.an.list,group=final.groups,label=0.03)

To generate re-selected shared file…
get.seqs(accnos=classified_sequences.accnos,list=final.an.list,name=final.names,group=final.groups,label=0.03)
make.shared(list=final.an.0.03.pick.list,group=final.pick.groups)

Because I was only interested in a certain group of OTU labels that are found in both files of my shared table (i.e., original vs selected), I made an accnos file for filtering,
get.otulabels(accnos=common.otulabels,shared=final.an.original.shared)#renamed file name
get.otulabels(accnos=common.otulabels,shared=final.an.selected.shared)#renamed file name

Output:
Section of my original shared table (final.an.original.0.03.pick.shared)…
Otu0001 Otu0002 Otu0003 Otu0004 Otu0005 Otu0006 Otu0007 Otu0008 Otu0009 Otu0010 Otu0011 Otu0012
Sample A 148 0 14 0 0 0 0 0 0 0 0 0
Sample B 631 343 310 0 33 885 23 0 0 5 0 0
Sample C 1231 339 0 0 375 0 113 0 0 0 0 0
Sample D 980 371 99 0 16 56 5 0 0 2 0 0
Sample E 215 406 0 0 0 42 21 1 0 0 0 27
Sample F 73 0 0 14083 0 0 0 0 0 0 653 0

Section of my selected shared table (final.an.selected.0.03.pick.shared)…
Otu0001 Otu0002 Otu0003 Otu0004 Otu0005 Otu0006 Otu0007 Otu0008 Otu0009 Otu0010 Otu0011 Otu0012
Sample A 13 0 0 0 0 0 0 0 0 0 0 0
Sample B 12 322 1 33 885 0 0 5 0 0 41 0
Sample C 45 339 0 375 0 0 0 0 0 0 0 0
Sample D 11 352 1 16 56 0 0 2 0 0 4 0
Sample E 53 406 0 0 42 1 0 0 0 27 708 0
Sample F 0 0 0 0 0 0 0 0 653 0 23 0

In OTU0004, mothur over-selects sequences but I thought sequences that are originally absent in OTU0004 can never be present in a selected shared table? When I look closely at OTU0004 in the selected table, it seems to me that these abundance values used to belong to those of OTU0005 in the original table (or to the column immediate next to the right). I believe the selected numbers of sequences that I see in my table are correct. But, could mothur have overwritten the zero abundances, which then shifts downstream values up to the left side of the table on a column-basis when I reconstructed my shared file?

Or could I have done things incorrectly? :?

Thank you.

Daniel

I suspect that we are generating the OTU labels in make.shared and so the labeling isn’t consistent between your datasets.