Merging Alignment/Taxonomy Reference files

Hi all,

I was wondering if anyone had thoughts about the following. For aligning sequences, is there any reason why I should not combine the <silva.archaea.fasta>, <silva.bacteria.fasta>, and the <silva.eukarya.fasta> reference alignments into one master reference alignment? Related, could I also use a combined master taxonomy reference file (silva.archaea.silva.tax + silva.bacteria.silva.tax + silva.eukarya.silva.tax) in conjunction with a master reference alignment for classifying sequences?

Thanks.

Jarrod

Go for it. The reason I kept them separate is because very rarely do people use primers that cross all three domains or want to perform an analysis that crosses all three domains.

Hi,

Can you tell me how this merge.files was performed? I attempted this and received the error during classify.seqs that there were sequences in my template file not found in my taxonomy file.

I attempted to join the following: silva.bacteria.silva.fasta-silva.archaea.silva.fasta-silva.eukarya.silva.fasta, and the same for the .tax files.

Thanks,
~Jo

Here is the code is mothur to appendFiles:

/**************************************************************************************************/
int MothurOut::appendFiles(string temp, string filename) {
 try{
  ofstream output;
  ifstream input;
 
  //open output file in append mode
  openOutputFileAppend(filename, output);
  int ableToOpen = openInputFile(temp, input, "no error");
  //int ableToOpen = openInputFile(temp, input);
  
  int numLines = 0;
  if (ableToOpen == 0) { //you opened it
            
            char buffer[4096];        
            while (!input.eof()) {
                input.read(buffer, 4096);
                output.write(buffer, input.gcount());
                //count number of lines
                for (int i = 0; i < input.gcount(); i++) {  if (buffer[i] == '\n') {numLines++;} }
            }
   input.close();
  }
  
  output.close();
  
  return numLines;
 }
 catch(exception& e) {
  errorOut(e, "MothurOut", "appendFiles");
  exit(1);
 } 
}

It’s a simple buffered read copy. The issue you are having is likely due to duplicate sequence names in the templates. Have you tried running the command in debug mode? Are there other error messages? Something like “sequence names must be unique” in the log file?