pca timeout

Zak38 · December 13, 2017, 12:01am

I am trying the pca command on 4 different data sets. Each data set contains 14 groups. There are two data sets using v4 illumina tag sequences and two using the v6 illumina tag sequences. Each of the two (v4 and v6) data sets are 10,000 sequences per group and 230,000 sequences per group. When I run pca for the 10,000 sequences, both the v4 and v6 data sets work well. However, at the 230,000 sequences level, only the v4 set works. The v6 run has timed out twice now. Once after 5 days and once after 10 days while the v4 set only took a few hours. This is completely opposite to everything else I have run in mothur where the v4 runs are consistently longer than the v6 runs due to the sequences averaging about 250 bp vice the 60ish bp of the v6 tag. Is there an algorithm reason this is occurring (more difficulty with shorter reads?), or some other problem I haven’t found yet (typo in my batch file)?

Thanks,
Zak

pschloss · December 18, 2017, 3:00pm

I would strongly discourage the use of PCA. Instead calculate a distance matrix using dist.shared with something like Bray-Curtis or ThetaYC and run that distance matrix through PCoA. There is an example of this in the MiSeq SOP wiki page.

Pat

Zak38 · December 20, 2017, 6:52pm

Thanks Pat. I will try that. Could you be a little more specific as to why you discourage PCA?

Zak

pschloss · December 21, 2017, 2:11pm

PCA essentially uses R2 as a distance between samples. This weights double zeros the same as double ones. In other words, if an OTU is missing from the two samples being compared, it will inflate the similarity between samples. Other metrics that are widely used in ecology (e.g. Bray-Curits and ThetaYC ignore these double zeros). PCA is more appropriate for comparing communities based on their metadata.

Pat

Zak38 · December 21, 2017, 3:26pm

Thank you very much for that, Pat.

Zak

Topic		Replies	Views
pca running out of memory mothur bugs	7	7679	May 10, 2014
pca commands Commands in mothur	15	10985	May 3, 2013
Distance matrix issues - still running Commands in mothur	6	722	October 14, 2019
duration of analyses Commands in mothur	7	5250	May 30, 2014
Problem with OTU classification mothur bugs	5	5588	April 19, 2010

pca timeout

Related topics