Hi. I am doing comparative genomics work of 24 organisms. I have about 90,000 orfs, and I would like to assign OPFs using mgcluster. I would like to ask how to generate a blast table using my dataset? My sequences are currenty in one text file, each ORF in fasta format. Thanks for your help.
The first step would be to do an all v all blast of your ORFs. If they’re amino acids you can just do a blastp otherwise I would recommend tblastx. You should set the max target sequences parameter to something really large like 10000. After this finishes you can use mgcluster to create the OPFs with this blast file. This will give you a list file which you can then make into a shared file. The OTUs in this file will be the OPFs. You could then go on to pick a representative sequence for each of the OPFs and annotate them that way or do some other kind of analysis.
An example workflow with a nucleotide fasta file
Make the blast database:
makeblastdb -in myfasta.fa -dbtype nucl -out mydatabase
Do the blast:
tblastx -query myfasta.fa -db mydatabase -out allvall.out -evalue 1e-5 -outfmt 6 -max_target_seqs 10000
Cluster in mothur:
If you have a group file you can make a shared file with the list file from above
I hope this helps,
Hi Kathryn, thanks for your help.
I would also like to ask the group about the names file? Is that also made using the BLAST program? I now have the BLAST database file. Is the names file really necessary?
The names file is made within mothur http://www.mothur.org/wiki/Name_file What are you trying to do with the names file?