I’m nearly finishing the analysis of my 16S data. My initial struggle with Mothur was absolutely worth it. Thanks for all the help I got here and the Schloss SOP!
Today I wanted to look at the abundant sequences per sample. I have the sequences of 6 samples processed together in one file, and then used split.abund with the group parameter and groups=all. So far so good, I got the abundant sequences for each sample in a separate file.
BUT, now I want to know which of these abundant otus are shared by all 6 samples. I believe I can’t just perform get.sharedseqs, because the abundant sequences are all in different list and group files.
Therefore, I merged these files first. It still didn’t work, since merging the list files just gave the 6 lists after each other, with the number of otus at the beginning.
45 seqname 1 seqname 2
36 seqname 1b seqname 2b
32 seqname 1c seqname 2c
When reading this list file, Mothur seems to be limited to the first list of 45 names. So, when processing make.shared, an error would appear saying the the number of sequences doesn’t match between list and group file. OK, I get that, seems logical.
Then I decided to delete the phrases of otu numbers at the beginning of each of the six subsequent lists and adjusted the total amount at the top.
113 seqname 1 seqname 2 seqname 1b seqname 2b seqname 1c seqname 2c
But in the end, this was useless, since now every name in that list is considered as a separate otu (so 113 otus in the example). So when performing make.shared and then get.shared.seqs, I will always get the output that there are no shared otus. I think. Maybe it really are 113 different otus, but I have my doubts about it, since from doing get.sharedseqs and split.abund on all my data as a whole, more than half of the resulting otus are both abundant and shared. It is of course possible that they are shared in lower numbers (like 2 per sample) and together they are abundant (12, which is > 10 which was my threshold), but separately not. But as a lot of them are very abundant (>6000 sequences) I would really expect at least some of them to be shared in higher numbers. Although it would be very intriguing if all these >6000 sequence-otus are only present in some or one of my samples, I’m sceptic about that. The chance that I did something wrong in Mothur seems higher to me. I’m not confident about the merging step and the adjustments I made in the merged list file by hand.
I was now thinking of performing get.sharedseqs on all my data and comparing the overall shared otus to the 6 lists of abundant otus. But then again, I would have to perform a get.sharedseqs command on two files to allow this comparison, and I’m stuck at merging files again.
So, any hints on how to get this done? There must be a way to compare otus between separate files, no? I get the feeling I’m (again) overlooking something very basic…