remove.groups - downstream issues caused by removed sequences

I used remove.groups to remove a large number of groups from my dataset, and as a result some sequences are no longer found in my count table. Following the remove.groups command, my count table no longer has all of the sequences in my fasta file (or at least that I what I think is happening). When trying to run pre.cluster on my newly paired down dataset, I get the error below:

[ERROR]: M02149_396_000000000-AR0A9_1_1101_18200_1755 is not in your count table. Please correct.
Segmentation fault (core dumped)

Is there a way to update my fasta file so that it only contains sequences in my count table? Or is there another work around? Thanks!

Hi,

Did you include your count file in the remove.groups command? Can you post the command?

Pat

Hi Pat,

I did include the count file in the remove.groups command. The remove.groups command and following command are included below:

mothur > remove.groups(count=par_mgl.good.unique.good.count_table, accnos=gl.remove.accnos)

output:
412_HU45M_SRF_MG_svx_na is not a valid group, and will be disregarded.
412_MI27M_SRF_MG_svx_na is not a valid group, and will be disregarded.
413_SU17M_B10_MG_f1_141 is not a valid group, and will be disregarded.
812_HUFE_DCL_16S_fw_2 is not a valid group, and will be disregarded.
812_ON33M_SRF_MG_f1_071 is not a valid group, and will be disregarded.
Removed 7762648 sequences from your count file.

Output File names:
par_mgl.good.unique.good.pick.count_table

mothur > pre.cluster(fasta=par_mgl.good.unique.good.unique.align, count=par_mgl.good.unique.good.pick.count_table, diffs=2, processors=16)

Using 16 processors.
[ERROR]: M02149_396_000000000-AR0A9_1_1101_18200_1755 is not in your count table. Please correct.

Sara

In order to eliminate file mismatches you should include related files on any remove / get commands. In other words if you remove sequences from your count file, you want to also remove them from your fasta and taxonomy files. You can do this with the remove.groups command as follows:

mothur > remove.groups(count=par_mgl.good.unique.good.count_table, fasta=yourFastaFile, taxonomy=yourTaxonomyFile, accnos=gl.remove.accnos)

Note: From the output of remove.groups I can see that several groups were ignored. This could cause blank files if all your groups are ignored.