Customize my reference alignment for ITS1-ITS2 region

I am trying to customize my reference alignment for ITS1-ITS2 region. I was advised to use pairwise.seqs and I have used pairwise.seqs(fasta=yeast_ITS.fasta, reference=ITS2seq_revcom.fasta, align=needleman).

However, I saw this:

[WARNING]: reference is not a valid parameter, ignoring.
The valid parameters are: column, oldfasta, fitcalc, fasta, align, match, mismatch, gapopen, gapextend, processors, output, calc, countends, compress, cutoff, kmercutoff, ksize, seed, inputdir, and outputdir.

Using 20 processors.

Sequence Time Num_Dists_Below_Cutoff
and mothur shuts off.

My ITS2seq_revcom.fasta contains the reverse complement of ITS2seq primer: GCATCGATGAACGCAGCCTGTCTCTTATACACATCTCCGAGCCCACGAGAC

and yeast_ITS.fasta contains

NR_111007.1 Saccharomyces cerevisiae CBS 1171 ITS region; from TYPE material
AAAGAAATTTAATAATTTTGAAAATGGATTTTTTTGTTTTGGCAAGAGCATGAGAGCTTTTACTGGGCAA
GAAGACAAGAGATGGAGAGTCCAGCCGGGCCTGCGCTTAAGTGCGCGGTCTTGCTAGGCTTGTAAGTTTCTTTCTTGCTATTCCAAACGGTGAGAGATTTCTGTGCTTTTGTTATAGGACAATTAAAACCGTTTCAATACAACACACTGTGGAGTTTTCATATCTTTGCAACTTTTTCTTTGGGCATTCGAGCAATCGGGGCCCAGAGGTAACAAACACAAACAATTTTATCTATTCATTAAATTTTTGTCAAAAACAAGAATTTTCGTAACTGGAAATTTTAAAAATATTAAAAACTTTCAACAACGGATCTCTTGGTTCTCGCATCGATGAAGAACGCAGCGAAATGCGATACGTAATGTGAATTGCAGAATTCCGTGAATCATCGAATCTTTGAACGCACATTGCCCCCTTGGTATTCCAGGGGGCATGCCTGTTTGAGCGTCATTTCCTTCTCAAACATTCTGTTTGGTAGTGAGTGATACTCTTTGGAGTTAACTTGAAATTGCTGGCCTTTTCATTGGATGTTTTTTTTCCAAAGAGAGGTTTCTCTGCGTGCTTGAGGTATAATGCAAGTACGGTCGTTTTAGGTTTTACCAACTGCGGCTAATCTTTTTTATACTGAGCGTATTGGAACGTTATCGATAAGAAGAGAGCGTCTAGGCAACAATGTTCTTAAAGT

I need to modify
align.seqs(fasta=ecoli_v3.fasta, reference=silva.seed_v123.align),

summary.seqs(fasta=ecoli_v3.align) and

pcr.seqs(fasta=silva.seed_v123.align, start=6388, end=13861, keepdots=FALSE) commands to get count and taxonomy tables from fungal species for my samples.

Can you please help me modifying my commands? I need to understand where I am doing wrong and how can I achieve my goal.

I also tried running

align.seqs(fasta=yeast_ITS.fasta, reference=UNITEv10_sh_99.fasta) and found:

mothur > align.seqs(fasta=yeast_ITS.fasta, reference=UNITEv10_sh_99.fasta)

Using 20 processors.

Reading in the UNITEv10_sh_99.fasta template sequences… [ERROR]: template is not aligned, aborting.
DONE.
It took 0 to read 0 sequences.

I am using the current version of mothur. Please let me know. Can anyone help me please.

Hi,

As I mentioned elsewhere, you cannot have a reference alignment for ITS sequences. They are not homologous like 16S or 18S sequences are. You would use pairwise.seqs to align the sequences you are analyzing. Say you have 1000 ITS sequences in a file called yeast_ITS.fasta. You would need to run the following…

pairwise.seqs(fasta=yeast_ITS.fasta)

That’s it. There’s no need for align.seqs or pcr.seqs. You can take the file that is generated directly to clustering.

Pat

Hi, I just ran pairwise.seqs(fasta=yeast_ITS.fasta) as you instructed. However, I am seeing this:

mothur > pairwise.seqs(fasta=yeast_ITS.fasta)

Using 20 processors.

Sequence Time Num_Dists_Below_Cutoff

and then mothur shuts off.

The yeast_ITS.fasta file contains:

NR_111007.1 Saccharomyces cerevisiae CBS 1171 ITS region; from TYPE material
AAAGAAATTTAATAATTTTGAAAATGGATTTTTTTGTTTTGGCAAGAGCATGAGAGCTTTTACTGGGCAAGAAGACAAGAGATGGAGAGTCCAGCCGGGCCTGCGCTTAAGTGCGCGGTCTTGCTAGGCTTGTAAGTTTCTTTCTTGCTATTCCAAACGGTGAGAGATTTCTGTGCTTTTGTTATAGGACAATTAAAACCGTTTCAATACAACACACTGTGGAGTTTTCATATCTTTGCAACTTTTTCTTTGGGCATTCGAGCAATCGGGGCCCAGAGGTAACAAACACAAACAATTTTATCTATTCATTAAATTTTTGTCAAAAACAAGAATTTTCGTAACTGGAAATTTTAAAAATATTAAAAACTTTCAACAACGGATCTCTTGGTTCTCGCATCGATGAAGAACGCAGCGAAATGCGATACGTAATGTGAATTGCAGAATTCCGTGAATCATCGAATCTTTGAACGCACATTGCCCCCTTGGTATTCCAGGGGGCATGCCTGTTTGAGCGTCATTTCCTTCTCAAACATTCTGTTTGGTAGTGAGTGATACTCTTTGGAGTTAACTTGAAATTGCTGGCCTTTTCATTGGATGTTTTTTTTCCAAAGAGAGGTTTCTCTGCGTGCTTGAGGTATAATGCAAGTACGGTCGTTTTAGGTTTTACCAACTGCGGCTAATCTTTTTTATACTGAGCGTATTGGAACGTTATCGATAAGAAGAGAGCGTCTAGGCAACAATGTTCTTAAAGT

I am not getting why this is happening. The yeast_ITS.fasta file is in the same directory where I have saved mothur. Please let me know.

The file needs to have more than one sequence in it. It should have all of the sequences you are interested in analyzing.

My samples are named trimmed_G2024-59-1-1_S1_L001_R1_trimmed_fungal_R1.fastq and trimmed_G2024-59-1-1_S1_L001_R2_trimmed_fungal_R2.fastq until trimmed_G2024-59-1-96_S96_L001_R1_trimmed_fungal_R1.fastq and trimmed_G2024-59-1-96_S96_L001_R2_trimmed_fungal_R2.fastq because I have 96 samples for 1 plate. They are all FASTQ files. I opened trimmed_G2024-59-1-1_S1_L001_R1_trimmed_fungal_R1.fastq through Notepad and it looks like this:

@M06365:33:000000000-LKMRC:1:1101:16167:2646 1:N:0:1
AAGTCATAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTAATAGTGCCCTCTGATGCAAATCATTGGGTTAGATCTGCTCTCTTCGCAAGAAGAGGGTTTCCATACACACCGTGAACTGTGGCTTCGGCCATCACAAACTGTTAGTAATGAATGTAATATCATAACAAAAACAAAACTTTTAACAACGGATCTCTTGGCTCTCGTATCGATGAAGAACGCAGC
+
GGGGGFGGGGGGFDFGFGGGGGGGGGGFGGGGGGGEGFGGGGGGDFGFGGGGGGGDGGGGGGGGGGFGGGGGGGCCFFFFGFGGGGGGGFGGGGFGGGGGGGGGGGGGGFGGFGGGGGBFFDFEFFGGGGGGGGGGGGGGGGGGGDGFGFFFGGDFFGGGEFGGF;FEFGGGGGGGGGGGGFFFFFGGGGGGGGGGEEFGGCFCFGGGGGCEFCEGFFGGGGGFEEGGG
@M06365:33:000000000-LKMRC:1:1101:16182:2656 1:N:0:1
AAGTCATAACAAGGTTTCCGTAGGTGAACCTACGGAAGGATCATTAATA
+
FAFFEAF<F8E,66CEFGG;F@68;,;<FFG,C,8@B@CFFGGD@EFAF
@M06365:33:000000000-LKMRC:1:1101:20949:3585 1:N:0:1
AAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTCATAATCAAGTGTTTTTATGGCACTTTCAAAAATCCATATCCACCTTGTGTG
+

and so on.

I believe, all my other Fastq files will be like this almost. Do you recommend copying and pasting these for all 96 samples F and R reads on one page and then running pairwise.seq on it?

Or should I run all my samples as I ran for bacterial samples from the beginning until unique.seqs(fasta=stability.trim.contigs.good.fasta, count=stability.contigs.good.count_table), and then use stability.trim.contigs.good.fasta in pairwise.seqs(fasta=yeast_ITS.fasta), then run the subsequent command filter.seqs(fasta=stability.trim.contigs.good.unique.good.align, vertical=T, trump=.) and replace stability.trim.contigs.good.unique.good.align with the output of pairwise.seq command? Then move on with unique.seqs, pre.cluster, chimera.vsearch and classify.seqs

You’ll probably need to use make.contigs to join the two reads and then use that as input to the downstream steps.

Pat

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.