Losing sequences at classify.seqs

Pretty broad conceptual questions here regarding how reads are being used. I am running 16s sequencing data through mother for 78 samples across conditions. I want to conserve abundance so I can eventually create relative abundance plots. Before running classify.seqs I have around 7 million total reads and 400,000 unique reads, but after classifying taxonomy it seems that subsequent analysis uses the unique reads. Am I losing abundance of species by using unique reads and not total reads? Thank you in advance for the help. I do not see the need in including commands or log data, so if you feel it is necessary to answer the question I can provide it.

Likely you are not including some command (name, count…) so then you lose the counts per unique. Classify uses the uniques. So, yes, I am afraid that, without the command lines you are using, we will not be able to help :slight_smile:

Thank you for the reply. I am currently trying to run a customized fasta and count file after running deunique.seqs. This method is having its own set of issues though. Will running classify.seqs without a count file preserve total counts or just not even include counts at all? Below is my first try using the files generated by following the SOP.

#ran summary.seqs here

of unique seqs: 395056

total # of seqs: 7012656
It took 22 secs to summarize 7012656 sequences.

classify.seqs(fasta=stability.trim.contigs.good.unique.good.precluster.pick.pick.align, count=stability.trim.contigs.good.unique.good.precluster.pick.denovo.vsearch.pick.count_table, reference=trainset9_032012.pds.fasta,, cutoff=80)

It took 835 secs to classify 395056 sequences.

The abundance data will come back in later when you make the shared file

Thank you for clarifying what is happening to the reads Pat. It’s great to be able to use a tool with such a helpful community.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.