help with create.database of sub.sample otus!

Dear Pat,

I need your help! I would like to have a database of the normalized otus by group with taxonomy. To get there I’ve tried two different ways:
A.

  1. First, I counted my groups:
    mothur > count.groups(group=3sitesall.final.groups)
    com519Fbar1 contains 8494.
    com519Fbar2 contains 7485.
    com519Fbar3 contains 5955.
    com519Fbar5 contains 8761.
    com519Fbar6 contains 8324.
    com519bar10 contains 3637.
    com519bar11 contains 9146.
    com519bar12 contains 740.
    com519bar15 contains 7417.
    com519bar16 contains 5870.
    com519bar19 contains 5109.
    com519bar20 contains 6834.
    com519bar4 contains 3211.
    com519bar9 contains 6479.

Total seqs: 87462.

Output File Names:
3sitesall.final.count.summary


2. I created the repfasta, repname and constaxonomy from the '-.list' file using all of my dataset:

mothur > get.oturep(list=3sitesall.final.woCyano.an.list, label=0.03, fasta=3sitesall.final.woCyano.fasta, column=3sitesall.final.woCyano.dist, name=3sitesall.final.woCyano.names)
********************###########
Reading matrix: |||||||||||||||||||||||||||||||||||||||||||||||||||


0.03 5860

Output File Names:
3sitesall.final.woCyano.an.0.03.rep.names
3sitesall.final.woCyano.an.0.03.rep.fasta


mothur > classify.otu(list=3sitesall.final.woCyano.an.list, name=3sitesall.final.woCyano.names, taxonomy=3sitesall.final.woCyano.taxonomy, label=0.03) reftaxonomy is not required, but if given will keep the rankIDs in the summary file static. 0.03 5860

Output File Names:
3sitesall.final.woCyano.an.0.03.cons.taxonomy
3sitesall.final.woCyano.an.0.03.cons.tax.summary



2. Then I created my subsampled -.shared file, using 2686 seqs.

mothur > sub.sample(shared=3sitesall.final.woCyano.an.shared, size=2686)
com519bar12 contains 701. Eliminating.
Sampling 2686 from each group.
0.03

Output File Names:
3sitesall.final.woCyano.an.0.03.subsample.shared

  1. Then I tried to create the database=
    mothur > create.database(shared=/media/zm1/TrekStor/OTU analysis&stats/3sitesall.final.woCyano.an.0.03.subsample.2686.shared, label=0.03, repfasta=/media/zm1/TrekStor/OTU analysis&stats/3sitesall.final.woCyano.an.0.03.rep.fasta, repname=/media/zm1/TrekStor/OTU analysis&stats/3sitesall.final.woCyano.an.0.03.rep.names, constaxonomy=/media/zm1/TrekStor/OTU analysis&stats/3sitesall.final.woCyano.an.0.03.cons.taxonomy, group=/media/zm1/TrekStor/OTU analysis&stats/3sitesall.final.woCyano.groups)

BUT, i received a lot of warnings:
[WARNING]: OTU Otu03241 contains 1 sequence, but the rep and taxonomy files indicated this OTU should have 2. Make sure you are using files for the same distance.
[WARNING]: OTU Otu03242 contains 1 sequence, but the rep and taxonomy files indicated this OTU should have 2. Make sure you are using files for the same distance.
[WARNING]: OTU Otu03244 contains 1 sequence, but the rep and taxonomy files indicated this OTU should have 2. Make sure you are using files for the same distance.

Output File Names:
/media/zm1/TrekStor/OTU analysis&stats/3sitesall.final.woCyano.an.0.03.subsample.2686.database

B.
I followed these instructions that I found in the forum:

  1. sub.sample(list=final.an.list, group=final.groups, persample=t) - selects same number of seqs from each group
  2. list.seqs(list=current) - lists the seqs in the subsample

-work around to avoid issues with sequence names, this will be resolved in next release
3. remove.seqs(list=final.an.list, accnos=current) - creates list file with seqs not in subsample
4. list.seqs(list=current) - lists seqs not in sample sample
3. remove.seqs(taxnonomy=yourTaxonomyFile, name=yourNameFile, dups=f, accnos=current) - removes all seqs in not the subsampled list file.

  1. summary.tax(taxonomy=current, name=current)

  2. Since I didn’t want all the groups I specified:

mothur > sub.sample(list=/media/zm1/TrekStor/OTU analysis&stats/3sitesall.final.woCyano.an.list, group=/media/zm1/TrekStor/OTU analysis&stats/3sitesall.final.woCyano.groups, groups=com519Fbar1-com519Fbar2-com519Fbar3-com519bar4-com519Fbar5-com519Fbar6-com519bar9-com519bar10-com519bar11-com519bar15-com519bar16-com519bar19-com519bar20)
Sampling 7973 from 79739.
unique
0.01
0.02
0.03
0.04
0.05

Output File Names:
/media/zm1/TrekStor/OTU analysis&stats/3sitesall.final.woCyano.subsample.groups
/media/zm1/TrekStor/OTU analysis&stats/3sitesall.final.woCyano.an.unique.subsample.list
/media/zm1/TrekStor/OTU analysis&stats/3sitesall.final.woCyano.an.0.01.subsample.list
/media/zm1/TrekStor/OTU analysis&stats/3sitesall.final.woCyano.an.0.02.subsample.list
/media/zm1/TrekStor/OTU analysis&stats/3sitesall.final.woCyano.an.0.03.subsample.list
/media/zm1/TrekStor/OTU analysis&stats/3sitesall.final.woCyano.an.0.04.subsample.list
/media/zm1/TrekStor/OTU analysis&stats/3sitesall.final.woCyano.an.0.05.subsample.list

list.seqs(list=/media/zm1/TrekStor/OTU analysis&stats/3sitesall.final.woCyano.an.0.03.subsample.list)

Output File Names:
/media/zm1/TrekStor/OTU analysis&stats/3sitesall.final.woCyano.an.0.03.subsample.accnos


I worked all the way down the instrusctions with the '3sitesall.final.woCyano.an.0.03.subsample.list' file. Then I removed the otus outside the subsample of the group, fasta and name file, and continue to: - get.oturep, but couldn't get ahead from there...I tried, get.groups and move forward but there were always problems with the number of otus between my fasta and name files :(

please suggestions!!!
many thanks!!!

Astrid