I used remove.groups to remove a large number of groups from my dataset, and as a result some sequences are no longer found in my count table. Following the remove.groups command, my count table no longer has all of the sequences in my fasta file (or at least that I what I think is happening). When trying to run pre.cluster on my newly paired down dataset, I get the error below:
[ERROR]: M02149_396_000000000-AR0A9_1_1101_18200_1755 is not in your count table. Please correct.
Segmentation fault (core dumped)
Is there a way to update my fasta file so that it only contains sequences in my count table? Or is there another work around? Thanks!
output:
412_HU45M_SRF_MG_svx_na is not a valid group, and will be disregarded.
412_MI27M_SRF_MG_svx_na is not a valid group, and will be disregarded.
413_SU17M_B10_MG_f1_141 is not a valid group, and will be disregarded.
812_HUFE_DCL_16S_fw_2 is not a valid group, and will be disregarded.
812_ON33M_SRF_MG_f1_071 is not a valid group, and will be disregarded.
Removed 7762648 sequences from your count file.
In order to eliminate file mismatches you should include related files on any remove / get commands. In other words if you remove sequences from your count file, you want to also remove them from your fasta and taxonomy files. You can do this with the remove.groups command as follows: