I have been working through the MiSeq SOP (great work btw) and I use the subsampling command for normalisation. However, after subsampling I obviously lose some OTUs from the resulting shared file. Is there a way to redo the taxonomy file so only the OTUs corresponding to the shared file are included? As it stands I fear I will have to do a lot of cross referencing with my original taxonomy file which will be less then ideal…
This shouldn’t be necessary since the OTU names don’t change. The column headings from the subsampled data will still correspond to those in the cons.taxonomy file.
As I mentioned, I would like to avoid cross referencing with the taxonomy file. For example I tend to insert the taxonomy IDs into a column in my shared file next to the OTUs, then I can easily cross reference in downstream analysis. If I cannot create the corresponding taxonomy file once I subsample, then I am unable to simply copy and paste the taxononmy file as obviously the alignment will be out and there will be many more IDs (from original taxonomy file) then remaining OTUs in the subsampled.shared file.
If it is not possible then cross referencing it is, but hopefully you can appriciate why I would like to avoid this…
I’m not sure what you mean by “If I cannot create the corresponding taxonomy file once I subsample”. The corresponding taxonomy file is not affected by subsampling. You can very easily do this with a some simple R commands. Alternatively, you can always use create.database [mothur.org/wiki/Create.database] to do what you propose.
Sorry if I am not clear. I know the OTU numbers still correspond after subsampling, but several OTUs will be missing after subsampling so the rows of OTUs before and after wont align. This is why I was wondering if a new taxonomy file could be created after subsampling, which only contains the OTUs present in the subsampled shared file. It seems this is not possible and as I am no dab hand at R I will continue to do this manually.
You can generated a subsampling taxonomic file (and others) by following these three steps. The “name of file”.subsample.0.03.cons.taxonomy in step #2 is the OTU taxonomy list. Caution: a .list file is used for subsampling.
mothur > sub.sample(count=“name of file”.count_table, list=“name of file”.list, taxonomy=“name of file”.taxonomy, persample=T, size=5000)
Sampling 5000 from each group.
Output File Names:
“name of file”.subsample.count_table
“name of file”.subsample.taxonomy
“name of file”.subsample.list
2)
mothur > classify.otu(taxonomy="_name of file_".subsample.taxonomy, list=PS SUB 3.0 samples.list, label=0.03, count="_name of file_".subsample.count_table, basis=otu)
Output File Names:
“name of file”.subsample.0.03.cons.taxonomy
“name of file”.subsample.0.03.cons.tax.summary
3)
mothur > make.shared(list="_name of file_".subsample.list, count="_name of file_".subsample.count_table, label=0.03)
Output File Names:
“name of file”.subsample.shared
I agree with you 100% (I added a caution label). Just for clarification, these steps are just a quick example (just to be use as reference in his analysis) and were only addressing his particular case.