cluster vs cluster.split with many unclassifieds

Hi,

I am currently processing amplicon sequencing data (Illumina MiSeq) of piezophillic organisms cultivated with long chain alkanes under very high pressure (up to 200 bar). The original inoculum came from a deep sea core sample and hence it would seem likely to me that not all 16S sequences can be classified, even using the RDP trainset 10 (pds version).
I am inclined to not kick out these “unknowns”: although I also don’t want to actually include them in elaborate analysis, I would like to see if there is a relation with my design and more “unknown” 16S occurs at higher pressures (incubations were done at different, increasing, pressures).
So my question is how cluster.split handles “unknown” sequences and if it is better in this case to use the regular cluster command.
If I am making a major mistake here by not kicking out the unknowns when I had to please let me know. I am aware that this is not necessarily a proxy for unknown diversity and could also be sequencing erors, but if there is a systematic correlation with the experimental design this might be an indication that we are looking at some unknown stuff as we increase the pressure, no :?:

Thanks in advance for your input.

An unknown is a read that doesn’t classify at the level of kingdom. I wouldn’t trust them, but you can always kick them out later. These would form their own group in the splitting algorithm and be clustered separately from everything else. We generally see these when we get non-specific amplification products that somehow make it through the pipeline.