Hi again Pat,
My apology for late response.
We’re interested in investigating the prevalence of OTU sequences that can be confidently classified to certain bacterial strains.
Here are the commands I used with a new dataset:
To generate non-selected/original shared file…
make.shared(list=final.an.list,group=final.groups,label=0.03)
To generate re-selected shared file…
get.seqs(accnos=classified_sequences.accnos,list=final.an.list,name=final.names,group=final.groups,label=0.03)
make.shared(list=final.an.0.03.pick.list,group=final.pick.groups)
Because I was only interested in a certain group of OTU labels that are found in both files of my shared table (i.e., original vs selected), I made an accnos file for filtering,
get.otulabels(accnos=common.otulabels,shared=final.an.original.shared)#renamed file name
get.otulabels(accnos=common.otulabels,shared=final.an.selected.shared)#renamed file name
Output:
Section of my original shared table (final.an.original.0.03.pick.shared)…
Otu0001 Otu0002 Otu0003 Otu0004 Otu0005 Otu0006 Otu0007 Otu0008 Otu0009 Otu0010 Otu0011 Otu0012
Sample A 148 0 14 0 0 0 0 0 0 0 0 0
Sample B 631 343 310 0 33 885 23 0 0 5 0 0
Sample C 1231 339 0 0 375 0 113 0 0 0 0 0
Sample D 980 371 99 0 16 56 5 0 0 2 0 0
Sample E 215 406 0 0 0 42 21 1 0 0 0 27
Sample F 73 0 0 14083 0 0 0 0 0 0 653 0
Section of my selected shared table (final.an.selected.0.03.pick.shared)…
Otu0001 Otu0002 Otu0003 Otu0004 Otu0005 Otu0006 Otu0007 Otu0008 Otu0009 Otu0010 Otu0011 Otu0012
Sample A 13 0 0 0 0 0 0 0 0 0 0 0
Sample B 12 322 1 33 885 0 0 5 0 0 41 0
Sample C 45 339 0 375 0 0 0 0 0 0 0 0
Sample D 11 352 1 16 56 0 0 2 0 0 4 0
Sample E 53 406 0 0 42 1 0 0 0 27 708 0
Sample F 0 0 0 0 0 0 0 0 653 0 23 0
In OTU0004, mothur over-selects sequences but I thought sequences that are originally absent in OTU0004 can never be present in a selected shared table? When I look closely at OTU0004 in the selected table, it seems to me that these abundance values used to belong to those of OTU0005 in the original table (or to the column immediate next to the right). I believe the selected numbers of sequences that I see in my table are correct. But, could mothur have overwritten the zero abundances, which then shifts downstream values up to the left side of the table on a column-basis when I reconstructed my shared file?
Or could I have done things incorrectly? :?
Thank you.
Daniel