REMOVE SEQUENCES

Hi,

I have sequences from 30 samples and I have done all the steps following the Miseq SOP, until making OTU and classify OTU. Before doing the Beta analyses, I would like to remove one of the samples that I am not interested in and I am not sure if I can remove it directly from the OTU table or I would have to do it at a previous step, probably before making OTUs. I guess that makes more sense. If so, would anybody be able to advice me in which command to use and which files I would have to change?

Thank you,

If you want to remove an entire sample group from your data, there’s the remove.groups command. It can take a shared file (OTU table) as the input, so you won’t need to redo any of your previous analysis.

If you just want to remove a few sequences, there’s the remove.seqs command, although this has to be done earlier in the analysis.

HI dwaite,

Thanks for your quick response. I just have one more question. For the next step (classify.OTUs), I guess I will have to remove groups from all the other files:list, count table and pick taxonomy files, in order to continue my analyses, right?

Thanks

That is correct.

Hello,
Thanks for the prompt reply. I was successful to remove my group from the OTU table with the command remove.groups(shared=stability.an.shared, groups=SPAK24bR).
However, I am having trouble to remove my sequences from the
Undetermined_trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.cons.taxonomy, and the
Undetermined_trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.cons.tax.summary.

I have tried different options, but I am always miss something. I was trying with the group option but the last group file I have includes sequences related to mitochondria, chloroplasts and archea, and I haven’t figured out how to create a new group file.

Thanks!

Just to clarify, I want to get a new an.cons.taxonomy (without the OTUs unique for my sample):
OTU Size Taxonomy
Otu00001 23136 Bacteria(100);“Bacteroidetes”(100);“Sphingobacteria”(100);“Sphingobacteriales”(100);Cytophagaceae(100);Microscilla(100);
Otu00002 27288 Bacteria(100);“Proteobacteria”(100);Gammaproteobacteria(100);Alteromonadales(100);Pseudoalteromonadaceae(100);Pseudoalteromonas(100);
Otu00003 14266 Bacteria(100);“Bacteroidetes”(100);“Sphingobacteria”(100);“Sphingobacteriales”(100);Chitinophagaceae(100);unclassified(94);
Otu00004 27218 Bacteria(100);“Proteobacteria”(100);Alphaproteobacteria(100);Rhodobacterales(100);Rhodobacteraceae(100);Thalassobius(85);
Otu00005 15210 Bacteria(100);“Proteobacteria”(100);Gammaproteobacteria(100);“Vibrionales”(100);Vibrionaceae(100);Vibrio(93);
Otu00006 10646 Bacteria(100);“Proteobacteria”(100);Gammaproteobacteria(100);Alteromonadales(100);Alteromonadaceae(100);Alteromonas(82);
Otu00007 8914 Bacteria(100);“Proteobacteria”(100);Alphaproteobacteria(100);Rhodobacterales(100);Rhodobacteraceae(100);unclassified(100);

The taxonomy file that looks like below, I don’t have problems getting it.
M02149_86_000000000-A94DF_1_1101_9236_7338 Bacteria(100);“Proteobacteria”(100);Gammaproteobacteria(100);Alteromonadales(84);Alteromonadaceae(84);Haliea(84);
M02149_86_000000000-A94DF_1_1101_24689_15570 Bacteria(100);“Bacteroidetes”(100);“Sphingobacteria”(100);“Sphingobacteriales”(100);Chitinophagaceae(100);Balneola(90);
M02149_86_000000000-A94DF_1_1101_9557_10212 Bacteria(100);“Proteobacteria”(95);unclassified;unclassified;unclassified;unclassified;
M02149_86_000000000-A94DF_1_1101_15817_20805 Bacteria(100);“Proteobacteria”(100);Gammaproteobacteria(100);unclassified;unclassified;unclassified;
M02149_86_000000000-A94DF_1_1101_26118_14675 Bacteria(100);“Proteobacteria”(99);Alphaproteobacteria(99);unclassified;unclassified;unclassified;
M02149_86_000000000-A94DF_1_1101_17707_8277 Bacteria(100);“Proteobacteria”(100);Alphaproteobacteria(99);Rhodospirillales(81);Rhodospirillaceae(80);unclassified;

Thanks.

The list.otulabels and get.otulables commands should help. http://www.mothur.org/wiki/List.otulabels and http://www.mothur.org/wiki/Get.otulabels

mothur > list.otulabels(shared=*.pick.shared) - list otus in shared file after group is removed.
mothur > get.otulabels(constaxonomy=yourConsTaxonomyFile, accnos=current) - select otus from constaxonomy that are in your shared file.

Dear all,

I have the same issue, I want to remove certain groups of samples from my dataset, but have already generated the OTU table (.shared). However, I have previously (before make.shared) run “split.abund” to filter away OTUs with < 10 reads. If I now remove certain groups of sequences from my dataset (OTU table), won’t I then risk to suddenly have OTUs with < 10 reads again? The groups I am removing (and their sequences) have after all provided reads to my different OTUs… For example, an OTU with an abundance of 10 reads might be reduced to having an abundance of 9 reads after the removal of certain groups…

If this is true, I guess I need to start completely over, removing the groups from the initial group file…? :shock:

Best,

Even

Hi Even,
You do not need to start over. The OTUs that were removed by the first split.abund command would be removed again after the additional groups are removed and you run remove.rare on the new shared file. Here’s a simple example:

File = my.shared made from my.list and my.group
label Group numOtus Otu001 Otu002 Otu003 Otu004 Otu005 Otu006 Otu007 Otu008 Otu009 Otu010 Otu011 Otu012 Otu013 Otu014
0.14 A 14 16 0 21 1 0 2 0 0 0 0 1 1 0 0
0.14 B 14 12 2 2 0 13 6 0 2 2 2 1 0 0 1
0.14 C 14 0 25 4 25 0 4 4 0 0 0 0 0 1 0

mothur > split.abund(list=my.list, group=my.group, cutoff=10)
mothur > make.shared(list=current, group=current)

File = my.0.14.abund.shared
label Group numOtus Otu001 Otu002 Otu003 Otu004 Otu005 Otu006
0.14 A 6 16 0 21 1 0 2
0.14 B 6 12 2 2 0 13 6
0.14 C 6 0 25 4 25 0 4

mothur > remove.groups(shared=my.0.14.abund.shared, groups=C)

File = my.0.14.abund.pick.shared
label Group numOtus Otu001 Otu002 Otu003 Otu004 Otu005 Otu006
0.14 A 6 16 0 21 1 0 2
0.14 B 6 12 2 2 0 13 6

mothur > remove.rare(shared=my.0.14.abund.pick.shared, nseqs=9) - removes OTUs <= nseqs

File = my.0.14.abund.pick.shared
label Group numOtus Otu001 Otu003 Otu005
0.14 A 3 16 21 0
0.14 B 3 12 2 13

VS:

File = my.shared made from my.list and my.group
label Group numOtus Otu001 Otu002 Otu003 Otu004 Otu005 Otu006 Otu007 Otu008 Otu009 Otu010 Otu011 Otu012 Otu013 Otu014
0.14 A 14 16 0 21 1 0 2 0 0 0 0 1 1 0 0
0.14 B 14 12 2 2 0 13 6 0 2 2 2 1 0 0 1
0.14 C 14 0 25 4 25 0 4 4 0 0 0 0 0 1 0

mothur > remove.groups(shared=my.shared, groups=C)

File = my.pick.shared
label Group numOtus Otu001 Otu002 Otu003 Otu004 Otu005 Otu006 Otu008 Otu009 Otu010 Otu011 Otu012 Otu014
0.14 A 12 16 0 21 1 0 2 0 0 0 1 1 0
0.14 B 12 12 2 2 0 13 6 2 2 2 1 0 1

mothur > remove.rare(shared=my.pick.shared, nseqs=9) - removes OTUs <= nseqs

File = my.pick.0.14.pick.shared
label Group numOtus Otu001 Otu003 Otu005
0.14 A 3 16 21 0
0.14 B 3 12 2 13

Wow, thank you for the thorough answer! :smiley: You made my day! :wink:

Best regards,

Even