Hello!
I’m having some difficulties in my analysis and I decided to call for help here in the forum.
Mothur makes my job much easier and I’ve already solved many doubts through this forum without posting, but this time I can’t find other way.
I’m checking some HiSeq V6 sequences from freshwater samples and I’m having problems with weird subsampling results
This is how I’ve been doing it on a mothur 1.33v:
make.contigs(ffastq=cce1.fq, rfastq=cce2.fq)
trim.seqs(fasta=current, oligos=V6.oligos, pdiffs=2)
reverse.seqs(fasta=cce1.trim.contigs.scrap.fasta)
trim.seqs(fasta=current, oligos=V6.oligos, pdiffs=2)
merge.files(input=cce1.trim.contigs.scrap.rc.trim.fasta-cce1.trim.contigs.trim.fasta, output=cce.fasta)
----------I’m doing this with all my 9 samples, than i merge them:
_make.group(fasta=cce.fasta-cch.fasta-ccm.fasta-cse.fasta-csh.fasta-gce.fasta-gch.fasta-gse.fasta-gsh.fasta, groups=cce-cch-ccm-cse-csh-gce-gch-gse-gsh)
merge.files(input=cce.fasta-cch.fasta-ccm.fasta-cse.fasta-csh.fasta-gce.fasta-gch.fasta-gse.fasta-gsh.fasta, output=all.fasta)
summary.seqs(fasta=all.fasta)
screen.seqs(fasta=all.fasta, minlength=56, maxlength=64, maxhomop=8, maxambig=0, summary=all.summary)
unique.seqs(fasta=all.good.fasta)
align.seqs(fasta=current, reference=V6_vamps.align, ksize=9)
screen.seqs(fasta=current, name=current, start=31189, end=33183)
filter.seqs(fasta=current, vertical=T, trump=.)
unique.seqs(fasta=current, name=current)
list.seqs(name=current)
get.seqs(group=all.groups, accnos=current)
chimera.uchime(fasta=current, name=current, dereplicate=t, group=all.pick.groups)
remove.seqs(accnos=current, fasta=current, name=current)
classify.seqs(fasta=current, name=current, reference=gg_13_5_99.fasta, taxonomy=gg_13_5_99.gg.tax, cutoff=70)
remove.lineage(fasta=current, name=current, taxonomy=current, taxon=unknown-Chloroplast-mitochondria)
list.seqs(name=current)
get.seqs(accnos=current, group=all.groups)
cluster.split(fasta=current, name=current, taxonomy=current, splitmethod=classify, taxlevel=4, cutoff=0.15)
make.shared(list=current, group=current, label=0.03)
classify.otu(list=current, group=current, name=current, taxonomy=current, persample=t, label=0.03)
mothur > count.groups()
Using all.good.unique.good.filter.unique.pick.pick.an.shared as input file for the shared parameter.
cce contains 2325437.
cch contains 2247329.
ccm contains 2222988.
cse contains 682916.
csh contains 651511.
gce contains 2331413.
gch contains 2359810.
gse contains 654156.
gsh contains 520838.
Total seqs: 13996398._
As some samples have four times more sequences than others (came from different sequencing runs) I need to equalize the sizes to perform alpha and beta analysis:
_sub.sample(list=current, group=current, label=0.03, persample=t)
list.seqs(list=current)
make.shared(list=current, group=current)
mothur > count.groups(group=all.pick.subsample.groups)
cce contains 520838.
cch contains 520838.
ccm contains 520838.
cse contains 520838.
csh contains 520838.
gce contains 520838.
gch contains 520838.
gse contains 520838.
gsh contains 520838.
Total seqs: 4687542._
The problem arises when comparing the .shared files before and after sub.sampling.
Sub.sampling strongly changed the OTU’s accumulation curves of the samples that had more sequences.
After sub.sampling, these samples (which were before composed of hundreds of abundants OTUs) exhibit less than ten abundant OTUs and increased singletons frequency.
It’s normal or expected that a random process causes such a deep change in sample diversity?
Before sub.sampling the rank-abbundance(relative) curves were very similar between all samples.
If I’m not missing any commands mistake, what other strategy could I use for improving or getting more reliable results?
Hope someone helps me,
Thanks in advance!