Just when I thought it was all over... how to group data after classify.seqs?

Hello! Thank you all for your previous help.

So Ive got quite far in my pipeline, but now I’m stuck as to how to assign my samples to treatment groups?

Here are the scripts I’ve ran so far:

BATCHI

mothur “#set.dir(input=/scratch/micro/nb326/temp_bpsenvs/01_rawdata, output=/home/n/nb326/miniconda3/envs/tbatch/03_preprocess); make.file(inputdir=/scratch/micro/nb326/temp_bpsenvs/01_rawdata, type=gz, prefix=tbps); make.contigs(file=current, oligos=/scratch/micro/nb326/temp_bpsenvs/01_rawdata/abps.oligos, pdiffs=1); summary.seqs(fasta=current)”

BATCHII

mothur “#set.dir(output=/home/n/nb326/miniconda3/envs/tbatch/03_preprocess); screen.seqs(fasta=/home/n/nb326/miniconda3/envs/tbatch/03_preprocess/tbps.trim.contigs.fasta, group=/home/n/nb326/miniconda3/envs/tbatch/03_preprocess/tbps.contigs.groups, summary=/home/n/nb326/miniconda3/envs/tbatch/03_preprocess/tbps.trim.contigs.summary,maxambig=0, minlength=252, maxlength=254, maxhomop=8); summary.seqs(fasta=current); unique.seqs(fasta=current); count.seqs(name=current,group=current);summary.seqs(fasta=current,count=current); align.seqs(fasta=current,reference=/home/n/nb326/miniconda3/envs/bpsenv/silva/silva.nr_v138/silva.nr_v138.align); summary.seqs(fasta=current,count=current)”

Batch III

mothur “#set.dir(input=/home/n/nb326/miniconda3/envs/tbatch/03_preprocess, output=/home/n/nb326/miniconda3/envs/tbatch/03_preprocess/trains); screen.seqs(fasta=/home/n/nb326/miniconda3/envs/tbatch/03_preprocess/tbps.trim.contigs.good.unique.align, count=/home/n/nb326/miniconda3/envs/tbatch/03_preprocess/tbps.trim.contigs.good.count_table, start=13862, end=23444, maxhomop=8); summary.seqs(fasta=current, count=current); filter.seqs(fasta=current, vertical=T, trump=.); unique.seqs(fasta=current, count=current); pre.cluster(fasta=current, count=current, diffs=2); chimera.uchime(fasta=current, count=current, dereplicate=t); remove.seqs(fasta=current, accnos=current); summary.seqs(fasta=current, count=current)”

Then I changed the names of the final output fasta and count file generated in this batch job to tbps.train.fasta and tbps.train.count.

BATCH IV

mothur “#set.dir(output=/home/n/nb326/miniconda3/envs/tbatch/03_preprocess/trains); classify.seqs(fasta=home/n/nb326/miniconda3/envs/tbatch/03_preprocess/trains/tbps.train.fasta,count=/home/n/nb326/miniconda3/envs/tbatch/03_preprocess/trains/tbps.train.count_table,reference=/home/n/nb326/miniconda3/envs/tbatch/trainset18_062020.rdp/trainset18_062020.rdp.fasta,taxonomy=/home/n/nb326/miniconda3/envs/tbatch/trainset18_062020.rdp/trainset18_062020.rdp.tax, cutoff=80); remove.lineage(fasta=current, count=current, taxonomy=current, taxon=Chloroplast-Mitochondria-Eukaryota); remove.groups(count=current, fasta=current, taxonomy=current, groups=SAM1.raw-SAM3.raw)”

Right now I have 96 groups. I have 96 samples I sent for sequencing. If I want to add these samples to treatment groups, to do unifraq analysis what do i do please?

Thanks in advance!

Hi,

You would need to create a design file that has the name of the sample in the first column and the treatment group the sample belongs to in the second column. If you look at the SOP, we do this with the early/late groups.

Pat