Dear Mothur users,
I’ve been trying to run mothur using two files by following the MiSeq - SOP and everything went well. I could perform all the steps on the guide, since my files presents the specific data derived from Illumina MiSeq run.
My problems started when I was trying to perform a comparison between the files from Illumina with other biological files. I’m using two samples from a MiSeq run and I wanted to compare them with other biological data as metagenome samples (bovine rumen, cecum samples and gut samples which are related to mine).
So, I created a topic 'cause I was unable to perform the analysis and then, I discovered If I want to run Mothur with all the samples, I needed to create a single group file and fasta file (by using make.contigs, merge.files and make.group).
I performed the MiSeq - SOP guide (not all the steps due to the building of different biological libraries), and when I ran the classify.seqs command, I couldn’t classify a single sequence other than those inside my samples.
If I run my two samples only, following the MiSeq - SOP guide:
M00988_41_000000000-ACE44_1_1101_12783_9498 Bacteria(100);Firmicutes(95);Clostridia(94);Clostridiales(93);unclassified;unclassified;
M00988_41_000000000-ACE44_1_1101_12769_26602 Bacteria(100);Firmicutes(84);Clostridia(81);Clostridiales(81);unclassified;unclassified;
M00988_41_000000000-ACE44_1_1101_12175_15467 Bacteria(100);Firmicutes(100);Clostridia(100);Clostridiales(100);Ruminococcaceae(100);unclassified;
M00988_41_000000000-ACE44_1_1101_10469_7046 Bacteria(100);Firmicutes(100);Clostridia(99);Clostridiales(99);Ruminococcaceae(95);unclassified;
M00988_41_000000000-ACE44_1_1101_10021_12449 Bacteria(100);Firmicutes(98);Clostridia(98);Clostridiales(98);Ruminococcaceae(98);unclassified;
And If I run using more samples than mine (Metagenome samples obtained at the MG-Rast web-server):
M00988_41_000000000-ACE44_1_2114_23954_25084 Bacteria(100);Verrucomicrobia(100);Verrucomicrobiae(100);Verrucomicrobiales(100);Verrucomicrobiaceae(100);Akkermansia(100);
M00988_41_000000000-ACE44_1_2114_18925_25608 Bacteria(99);Firmicutes(97);Clostridia(71);Clostridiales(70);unclassified;unclassified;
M00988_41_000000000-ACE44_1_2114_16242_26215 Bacteria(100);Firmicutes(70);unclassified;unclassified;unclassified;unclassified;
FTJKNNL02FWWIT Bacteria(82);unclassified;unclassified;unclassified;unclassified;unclassified;
FTJKNNL02HH0RK Bacteria(87);unclassified;unclassified;unclassified;unclassified;unclassified;
FTJKNNL02JJ1VG Bacteria(85);unclassified;unclassified;unclassified;unclassified;unclassified;
FTJKNNL02HA9V3 Bacteria(87);unclassified;unclassified;unclassified;unclassified;unclassified;
FTJKNNL02H12DN Bacteria(85);unclassified;unclassified;unclassified;unclassified;unclassified;
FTJKNNL02HDIRT Bacteria(76);unclassified;unclassified;unclassified;unclassified;unclassified;
So, my first question is: what am I missing??
The other samples contains 16S information, I ran the classify.seqs and the remove.lineage commands, so If there is nothing that can be compared to RDP files, it should be removed, right (unknown…)?
I ran the command using the default instructions from MiSeq - SOP guide, also using an updated RDP taxonomic file trainset14.
I didn’t perform all the commands in the guide, but I was able to refine and remove duplicated entries from the files. And I performed those analysis using a total of 12 samples at once, then divided it into 3 files containing 4 samples each, and also get one file to ran Mothur. The results are always the same.
I clearly understand that running only classify.seqs wouldn’t give me a high ratio of specificity, but I don’t understand why I can classify nothing beside the domain.
At last, I also changed the default value even to 20%, but nothing happens. Can you guys help me?
Thanks for your attention,
Rafael.