I am analysing an environmental sample focusing on one particular species. Before DNA extraction, I identified this species by culturing method to confirm the presence of this species in the sample. I used Illumina platform for sequencing the ITS1 region then I used the mothur pipeline with the most recent UNITE database to identify the OTUs in the sample. More than 1000 OTUs was identified but not the species in interest. I repeated the experience again assuming that something went wrong either during the DNA extraction or sequencing. But again I got the same result. Then I inserted the ITS1 sequences of the species in interest into the fastq files to see if there is anything to do with the data processing. And to my surprise, the species remained undetected again. If I blast this sequence into the UNITE database, it comes up as a first hit. So it should be identified from the sample after the data processing.
Have you tried qPCR on your original sample to see how abundant it is? The other thing to check is how well the primers hit it, do in silico PCR on your target organism and see if there are many mismatches.
OH and since it’s ITS. How long is ITS2 for your target? Both 454 and Illumina sequencing is biased towards shorter sequences (454 was really biased, not sure how biased Illumina is)
Thanks for your reply. I agree that there could b e a problem with the DNA extraction amplification, sequencing, etc… but my concern is more with the data processing. As I mentioned I inserted the ITS sequences in abundance into a “created” fastq files before data processing but they were still not identified. So even if they are present in the sample they remain undetected for some reason.
Any idea what can results this?
look at the number of bases different between the inserted sequence, the representative sequence in the database, and the sequence that the inserted sequence is hitting. If they are all very similar, you may not have enough resolution to differentiate the species that you want from the species that is in the database.
I have been using artificial data. The clustering goes well which means all the sequences I am interested are clustered together into one OTU. However, I am having problem classifying them. I used the distance and blast method too but my sequence is wrongly identified. It is a closely related species which comes up. When I check the distance matrix, the K2P distance between the OTU and the identified sequence is 0.066 which should be enough for the separation. Moreover if I Blast the OUT into any database it comes up as the species I want. So theoretically the OTU should be identified correctly. Do you have any suggestion how to refine or adjust the classify.seq function to get the correct ID?