Custom Database Help


I posted 17 days ago to get help with creating a custom database, and I didn’t get a response, so I thought I would break it down. Could someone give me an example of what the align file and the taxonomy files are supposed to look like after running the following code from the README (Before the R code).(README for the SILVA v138 reference files)

#generate alignment file
mv silva.full_v138.good.pcr.pick.fasta silva.nr_v138.align

#generate taxonomy file
grep ‘>’ silva.nr_v138.align | cut -f1,3 | cut -f2 -d’>’ > silva.nr_v138.full

There are no examples for any of these files, so I don’t know where I am going wrong, but I don’t think either of these files look correct after following the steps provided. Thanks.

For context, I am trying to create a LSU Silva database for 23S alignment.


After the mv step, you will have a new file called silva.nr_v138.align that has the same contents as what was in silva.full_v138.good.pcr.pick.fasta.

After the grep step, you will have a file that contains two columns - one with the sequence name and a second with the taxonomy string for that sequence.

If you’re getting an error message, could you post it (or part of it)?



I am not getting error messages, but I’ve never successfully gone through the MiSeq SOP with the files made, and so I suspect something is wrong.

My .align file looks like this (it’s a whole role of dots for all, and I don’t know if that’s right):

AB003380.FibSuc60 100 Bacteria;Fibrobacterota;Fibrobacteria;Fibrobacterales;Fibrobacteraceae;Fibrobacter;

My .tax file ends up looking like this after the R script. This file definitely seems like it’s being made incorrectly somewhere, because I don’t think it’s supposed to have 'NA’s.
AB003380.FibSuc60 NA
CP017688.FlaCra15 NA
LS483298.Str11364 NA

Thank you!

Hi there,

I wonder if your export file from silva has a problem. I don’t think that “100” should be there. Also, you know that we provide the output of this pipeline at Silva reference files - right?