I am performing mothur runs on fastq coming from two separate datasets. I am using one dataset to train an ML model and another to cross validate the model. In order to do this, I need to make sure that the features the models are trained on (OTUs) are only the OTUs that both runs have in common. How should I go about this ? Should I do one run for everything and then separate the OTU table as I please after ? Or is there some method to merge two OTU tables, keeping only the OTUs that are in common.
To further complicate things, one dataset is 100bp single reads while the other is paired end 250bp.
Any ideas ?
Thanks