Hi!
I have used mothur v.1.22.2
I split my file like this:
mothur > split.groups(fasta=GDEG1CX02.shhh.trim.unique.pick.chop.fasta, group=GDEG1CX02.shhh.pick.groups, name=GDEG1CX02.shhh.trim.pick.names)
This file contains
% grep -c “>” GDEG1CX02.shhh.trim.unique.pick.chop.fasta
850
sequences.
Now, when I count the number of sequences in the resulting files I get
% grep -c “>” GDEG1CX02.shhh.trim.unique.pick.chop.R*.fasta |awk -F":" ‘{SUM += $2} END {print SUM}’
912
sequences.
What happens here?
Karin
That seems odd. Could you send your files to mothur.bugs@gmail.com?
Hi Karin,
The total across all groups can be higher, because of the name file. Lets look at an example:
From the names file:
seq1 seq1,seq2,seq3
From the fasta file:
seq1
ATGCATGA…
From the group file:
seq1 Group1
seq2 Group2
seq3 Group1
When mothur splits by group, it will create a new names and fasta file for each group.
*.Group1.fasta
seq1
ATGCATGA…
*.Group1.names
seq1 seq1,seq3
*.Group2.fasta
seq2
ATGCATGA…
*.Group2.names
seq2 seq2
The one unique sequence represents sequences from multiple groups, so each group gets a copy. Does that make sense?
Kindly,
Sarah