Hi,
I’m unsure if I have the correct location for this question (forgive me if I’ve posted wrongly)
I have alignments (ergo a set of OTUs with the counts for each OTU already).
I need to do rarefaction curves and was wondering if I need to go back and import all my fasta files into Mothur and have it do the alignments or if I can break in half way with a file.
The easily exportable format for me is: (a random number example to show format and not real data)
OTU_Name Sample1TotalReads Sample2TotalReads Sample3Reads
1 30 2 10001
2 100 34 4023
etc
but as long as I know how to format my data I can turn it into what ever format I need.
I’m also not too sure how I would write the rarefaction.single command line in the case of where I’m not using a fasta file.
I’m working with Illumina Hi-Seq data, and need to generate some quick rarefaction plots.
If you have this in one of your tutorials already, a pointer to where would be appreciated.
Thanks.
You would need to generate a shared file. And you could then enter that into rarefaction.single with the shared option. You can see what the shared files look like through the various tutorials. Briefly, the first column is some label that’s the same for all of your samples, the second column is the name of the same, the third column is the number of OTUs and then the subsequent columns are that many long and each value is the number of reads in that OTU and sample.
Thanks for the quick reply Pat. I’m only now back at it and trying to get it to work, but the program seems to be stalling, or maybe I’m not that patient enough.
the commands I’ve tried are
rarefaction.single(shared=nameoffile.shared,calc=ace,freq=100,groupmode=F)
rarefaction.single(shared=nameoffile.shared,calc=ace,freq=1,groupmode=F)
The sharedfile looks like
0.1B11928021023…etc for all 1928 OTUs…it’s an illumina data run…
0.1B21928…OTU counts…
etc
Mothur spits out a set of rabund files and then I get
Processing group B1
0.1
and then it just sits there for a really really long time… a few hours.
Am I not being patient enough? I seem to recall getting things working once on a different computer and it ran faster but it was a 454 data set so the numbers weren’t nearly as large.
I can probably produce a list with a smaller number of OTUs for the data if that’s the problem.
My options for computer platforms here are Windows or Ubuntu (Linux). I’m currently trying it on the windows system first - so if that’s the problem I can switch to the Ubuntu platform if needed.
(My IT department are windows fiends and have a small freak out when ever I start the Linux platform up on their system… so I don’t get internet access with it, and can’t look for help files as easily; which means if I can sort out the issue and get it working on the windows system I’ll try that first.)
Thanks for any help you can be with this.
The number of OTUs isn’t the issue, rather it’s the number of sequences. How many sequences do you have? Can you try calc=sobs and pick a freq that’s 10% of the number of sequences you have?
[VV11, VV12 etc. - there are 22 samples resampled down to 182 sqs/sample]
My question is indeed simple: What is this third column without a header, and why does it get smaller towards the end?
I guessed that if lci and hci are the low/high conf-ints (as I have read), then this should be the “mean/central” estimate. But if so, then why is it smaller than the hci? If I plot this value, all samples strongly appear to have reached saturation - which is good, though a bit surprising for a mere 182 sqs. The same, though less clear, holds true if i plot the other values. I hesitate to draw conclusions from this before I get a better understanding of this.
I remained somewhat puzzled and found no conclusive information from the mothur-wiki on the rarefaction.single command.
The rabund file for VV10 looks like this in its entirety:
0.03 12 152 7 5 4 4 4 1 1 1 1 1 1
My main puzzlement concerns how a rarefaction value for a simple species count can decrease. I might understand it if it is some kind of uncertainty expression, but that is hard to know since there is no header or other indication as to what this string of numbers mean.
But to be sure, thank you for an otherwise great programme, and a great support service. Best Regards, Christoffer Bugge Harder
I don’t think it’s a big deal, but let me describe what’s happening. The current way we calcualte the lci and hci is by getting the standard error and converting that to the confidence level so that the confidence intervals are symmetrical. That’s what you’re seeing here. Clearly the confidence interval shouldn’t be symmetrical at the ends of the sampling curve. The next release (next weekish) will get the confidence interval empirically based on the iterations. The mean value won’t change. The reason I don’t think it’s a big deal is that I’m not really sure that anyone uses/needs the confidence interval from rarefaction.single.
Thanks for the fast reply. Then I understand a little better. And no, I don´t need the confidence interval from the species count, but I did need to be sure about which numbers represent what. Thus, in order to be certain that the hard of understanding has understood the numbers correctly: The three columns to the right besides 1) sample and 2) group represent
mean 4) low ci 5) high ci, right? I.e. to spell it out with the first rows from the above file:
lci/Hci appears as the header to columns 3-4 on my screen, while the fifth column has no header. Obviously, that hardly fits with the numbers, but then again, the 5th column iappeared quite mysterious ndeed , and then I got confused. Have I got it about right with the sequence of the columns? Best regards, CBH