We are looking at all Bacteria, Archaea and Eukarya sequences in 11 samples (see below for primers/regions used) and it was cheaper and faster for our sequencing facility to do all three domains at the same time per sample. Hence we have a mixture of sequences from a mixture of amplicons. We are following the MiSeq SOP and using silva.align database. We couldn’t do the pcr.seqs step since we have different sequences all together. And now, after the alignment -align.seqs-, we are running into the same problem with screen.seqs. Our questions is the following (though we will use any advise and read any -polite- thoughts on anything about this!): we are wondering if maybe we should use the align.seqs step to separate our Bacteria from our Archaea and from our Eukarya; and do all the analyses from this step on with a domain at a time. Is there any way to extract the Bacteria-only sequences from silva.align? Our primers are shown below. We will be using the V4-V5 region in the future, but these are the ones we had at the time/knew about when we started this project. Thank you so much for your insights, a.
What is your issue with the align step? I think I’d have all the seqs together through alignment, then use screen.seqs after alignment to separate them. You’ll need to know the coordinants for each primer set, then use that as the start and end for each set. So you’ll run screen.seqs for the bac coords, save the fasta and count as bac, then run screen.seqs again on the original align for arch, …
Thank you so much. We will try that. Here is below our summary.seqs right after align.seqs. I am imagining that all of the sequences beginning at 35141 and ending at 43116 are Bacteria sequences; a bit puzzled about 43107 and ending at 43116, but maybe just taking the Bacteria ones out -almost certainly the most numerous- will help sort the other ones out. Thank you again very much. We are excited to get to the actual analysis!