Problem with split.abund and group file

svazquez · September 3, 2014, 2:03pm

Hi

I am working 454 data and have to remove the OTUs with only 1, 2 or 3 seqs (yes, I know is not the best choice, but I was asked to do like this, I’m sorry). But also I have to get the ordination (PCoA) based on unifrac distances. Of course this last and the other results (which will be based on the OTU approach) have to be based on the same dataset (same fasta file, type and number of sequences per sample).

I followed the 454 SOP up to the cluster step and got the final4.an.list, together with my final4.fasta, final4.groups and final4.names files.

Then,
split.abund(fasta=final4.fasta, list=final4.an.list, group=final4.groups, cutoff=3, label=0.03)
Output File Names:
final4.an.0.03.rare.list
final4.an.0.03.abund.list
final4.0.03.rare.groups
final4.0.03.abund.groups
final4.0.03.rare.fasta
final4.0.03.abund.fasta

summary.seqs(fasta=final4.0.03.abund.fasta, name=final4.names)

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 1571 493 0 3 1
2.5%-tile: 1 1573 500 0 4 4608
25%-tile: 1 1573 505 0 4 46073
Median: 1 1573 520 0 5 92145
75%-tile: 1 1573 527 0 5 138217
97.5%-tile: 1 1573 529 0 7 179681
Maximum: 3 1573 574 0 8 184288
Mean: 1.00001 1573 517.172 0 4.94459

of unique seqs: 38482

total # of seqs: 184288

make.shared(list=final4.an.0.03.abund.list, group=final4.0.03.abund.groups, label=0.03)
count.groups()

10A_11 contains 7563 5A_10 contains 6590
11A_10 contains 7117 5A_11 contains 6302
11B_10 contains 6154 6A_11 contains 7735
11C_10 contains 6696 6B_11 contains 7408
12A_10 contains 5950 7A_10 contains 5736
12B_10 contains 8900 7A_11 contains 8435
1A_11 contains 6508 7B_10 contains 7686
1B_11 contains 7614 8A_10 contains 6151
2A_11 contains 8065 8A_11 contains 6781
2B_11 contains 7924 8B_10 contains 5481
3B_11 contains 6144 9A_10 contains 6463
4A_11 contains 5278 9A_11 contains 6509
4B_11 contains 6270 S1B1S_10 contains 5767
Total seqs: 184288 S1B1S_11 contains 7061

sub.sample(shared=final4.an.0.03.abund.shared, size=5278)
classify.otu(list=final4.an.0.03.abund.list, name=final4.names, taxonomy=final4.taxonomy, label=0.03)

Up to here, all seem to have worked well!! I could get the rarefaction curves, diversity indexes, heatmap and so on from the subsampled.shared file.

Then, when I tried to do the PCoA based on unifrac distances:
dist.seqs(fasta=final4.0.03.abund.fasta, output=phylip, processors=10)
clearcut(phylip=final4.0.03.abund.phylip.dist)

I tried:
unifrac.unweighted(tree=final4.0.03.abund.phylip.tre, name=final4.names, group=final4.0.03.abund.groups, distance=lt, processors=10, random=F, subsample=T)

It didn’t work and a lot of seqs names were listed saying that those seqs were not in my groups file.
I tried with the first group file (before split.abund):
unifrac.unweighted(tree=final4.0.03.abund.phylip.tre, name=final4.names, group=final4.groups, distance=lt, processors=10, random=F, subsample=T)

Setting subsample size to 5830

Now it worked but the size of subsample was higher than that I had to set for subsampling the shared file. :roll:

Why is not working if the fasta file is the same and the groups file should match it, as both were outputs after split.abund??

Sorry, but I cannot see where I am doing sth wrong :oops:

Thanks!!

pschloss · September 10, 2014, 5:51pm

You might want to include the names file in the split.abund command.

Topic		Replies	Views
Removing split.abund after pre.clustering Commands in mothur	6	5808	October 8, 2014
split.abund (too many singletons and doubletons)!! Commands in mothur	5	4262	August 4, 2014
Using split.abund with bygroups=T? Commands in mothur	0	3198	July 19, 2011
split.abund - No data in abund.groups mothur bugs	2	4316	August 6, 2010
Fasta and count files don't match after split.abund Commands in mothur	2	515	October 13, 2020

Problem with split.abund and group file

of unique seqs: 38482

Related topics