Customize my reference alignment for ITS1-ITS2 region

debdutta.nath · December 10, 2024, 11:07pm

I am trying to customize my reference alignment for ITS1-ITS2 region. I was advised to use pairwise.seqs and I have used pairwise.seqs(fasta=yeast_ITS.fasta, reference=ITS2seq_revcom.fasta, align=needleman).

However, I saw this:

[WARNING]: reference is not a valid parameter, ignoring.
The valid parameters are: column, oldfasta, fitcalc, fasta, align, match, mismatch, gapopen, gapextend, processors, output, calc, countends, compress, cutoff, kmercutoff, ksize, seed, inputdir, and outputdir.

Using 20 processors.

Sequence Time Num_Dists_Below_Cutoff
and mothur shuts off.

My ITS2seq_revcom.fasta contains the reverse complement of ITS2seq primer: GCATCGATGAACGCAGCCTGTCTCTTATACACATCTCCGAGCCCACGAGAC

and yeast_ITS.fasta contains

NR_111007.1 Saccharomyces cerevisiae CBS 1171 ITS region; from TYPE material
AAAGAAATTTAATAATTTTGAAAATGGATTTTTTTGTTTTGGCAAGAGCATGAGAGCTTTTACTGGGCAA
GAAGACAAGAGATGGAGAGTCCAGCCGGGCCTGCGCTTAAGTGCGCGGTCTTGCTAGGCTTGTAAGTTTCTTTCTTGCTATTCCAAACGGTGAGAGATTTCTGTGCTTTTGTTATAGGACAATTAAAACCGTTTCAATACAACACACTGTGGAGTTTTCATATCTTTGCAACTTTTTCTTTGGGCATTCGAGCAATCGGGGCCCAGAGGTAACAAACACAAACAATTTTATCTATTCATTAAATTTTTGTCAAAAACAAGAATTTTCGTAACTGGAAATTTTAAAAATATTAAAAACTTTCAACAACGGATCTCTTGGTTCTCGCATCGATGAAGAACGCAGCGAAATGCGATACGTAATGTGAATTGCAGAATTCCGTGAATCATCGAATCTTTGAACGCACATTGCCCCCTTGGTATTCCAGGGGGCATGCCTGTTTGAGCGTCATTTCCTTCTCAAACATTCTGTTTGGTAGTGAGTGATACTCTTTGGAGTTAACTTGAAATTGCTGGCCTTTTCATTGGATGTTTTTTTTCCAAAGAGAGGTTTCTCTGCGTGCTTGAGGTATAATGCAAGTACGGTCGTTTTAGGTTTTACCAACTGCGGCTAATCTTTTTTATACTGAGCGTATTGGAACGTTATCGATAAGAAGAGAGCGTCTAGGCAACAATGTTCTTAAAGT

I need to modify
align.seqs(fasta=ecoli_v3.fasta, reference=silva.seed_v123.align),

summary.seqs(fasta=ecoli_v3.align) and

pcr.seqs(fasta=silva.seed_v123.align, start=6388, end=13861, keepdots=FALSE) commands to get count and taxonomy tables from fungal species for my samples.

Can you please help me modifying my commands? I need to understand where I am doing wrong and how can I achieve my goal.

I also tried running

align.seqs(fasta=yeast_ITS.fasta, reference=UNITEv10_sh_99.fasta) and found:

mothur > align.seqs(fasta=yeast_ITS.fasta, reference=UNITEv10_sh_99.fasta)

Using 20 processors.

Reading in the UNITEv10_sh_99.fasta template sequences… [ERROR]: template is not aligned, aborting.
DONE.
It took 0 to read 0 sequences.

I am using the current version of mothur. Please let me know. Can anyone help me please.

pschloss · December 11, 2024, 2:16pm

Hi,

As I mentioned elsewhere, you cannot have a reference alignment for ITS sequences. They are not homologous like 16S or 18S sequences are. You would use pairwise.seqs to align the sequences you are analyzing. Say you have 1000 ITS sequences in a file called yeast_ITS.fasta. You would need to run the following…

pairwise.seqs(fasta=yeast_ITS.fasta)

That’s it. There’s no need for align.seqs or pcr.seqs. You can take the file that is generated directly to clustering.

Pat

debdutta.nath · December 11, 2024, 3:56pm

Hi, I just ran pairwise.seqs(fasta=yeast_ITS.fasta) as you instructed. However, I am seeing this:

mothur > pairwise.seqs(fasta=yeast_ITS.fasta)

Using 20 processors.

Sequence Time Num_Dists_Below_Cutoff

and then mothur shuts off.

The yeast_ITS.fasta file contains:

NR_111007.1 Saccharomyces cerevisiae CBS 1171 ITS region; from TYPE material
AAAGAAATTTAATAATTTTGAAAATGGATTTTTTTGTTTTGGCAAGAGCATGAGAGCTTTTACTGGGCAAGAAGACAAGAGATGGAGAGTCCAGCCGGGCCTGCGCTTAAGTGCGCGGTCTTGCTAGGCTTGTAAGTTTCTTTCTTGCTATTCCAAACGGTGAGAGATTTCTGTGCTTTTGTTATAGGACAATTAAAACCGTTTCAATACAACACACTGTGGAGTTTTCATATCTTTGCAACTTTTTCTTTGGGCATTCGAGCAATCGGGGCCCAGAGGTAACAAACACAAACAATTTTATCTATTCATTAAATTTTTGTCAAAAACAAGAATTTTCGTAACTGGAAATTTTAAAAATATTAAAAACTTTCAACAACGGATCTCTTGGTTCTCGCATCGATGAAGAACGCAGCGAAATGCGATACGTAATGTGAATTGCAGAATTCCGTGAATCATCGAATCTTTGAACGCACATTGCCCCCTTGGTATTCCAGGGGGCATGCCTGTTTGAGCGTCATTTCCTTCTCAAACATTCTGTTTGGTAGTGAGTGATACTCTTTGGAGTTAACTTGAAATTGCTGGCCTTTTCATTGGATGTTTTTTTTCCAAAGAGAGGTTTCTCTGCGTGCTTGAGGTATAATGCAAGTACGGTCGTTTTAGGTTTTACCAACTGCGGCTAATCTTTTTTATACTGAGCGTATTGGAACGTTATCGATAAGAAGAGAGCGTCTAGGCAACAATGTTCTTAAAGT

I am not getting why this is happening. The yeast_ITS.fasta file is in the same directory where I have saved mothur. Please let me know.

pschloss · December 11, 2024, 6:32pm

The file needs to have more than one sequence in it. It should have all of the sequences you are interested in analyzing.

debdutta.nath · December 11, 2024, 7:10pm

My samples are named trimmed_G2024-59-1-1_S1_L001_R1_trimmed_fungal_R1.fastq and trimmed_G2024-59-1-1_S1_L001_R2_trimmed_fungal_R2.fastq until trimmed_G2024-59-1-96_S96_L001_R1_trimmed_fungal_R1.fastq and trimmed_G2024-59-1-96_S96_L001_R2_trimmed_fungal_R2.fastq because I have 96 samples for 1 plate. They are all FASTQ files. I opened trimmed_G2024-59-1-1_S1_L001_R1_trimmed_fungal_R1.fastq through Notepad and it looks like this:

@M06365:33:000000000-LKMRC:1:1101:16167:2646 1:N:0:1
AAGTCATAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTAATAGTGCCCTCTGATGCAAATCATTGGGTTAGATCTGCTCTCTTCGCAAGAAGAGGGTTTCCATACACACCGTGAACTGTGGCTTCGGCCATCACAAACTGTTAGTAATGAATGTAATATCATAACAAAAACAAAACTTTTAACAACGGATCTCTTGGCTCTCGTATCGATGAAGAACGCAGC
+
GGGGGFGGGGGGFDFGFGGGGGGGGGGFGGGGGGGEGFGGGGGGDFGFGGGGGGGDGGGGGGGGGGFGGGGGGGCCFFFFGFGGGGGGGFGGGGFGGGGGGGGGGGGGGFGGFGGGGGBFFDFEFFGGGGGGGGGGGGGGGGGGGDGFGFFFGGDFFGGGEFGGF;FEFGGGGGGGGGGGGFFFFFGGGGGGGGGGEEFGGCFCFGGGGGCEFCEGFFGGGGGFEEGGG
@M06365:33:000000000-LKMRC:1:1101:16182:2656 1:N:0:1
AAGTCATAACAAGGTTTCCGTAGGTGAACCTACGGAAGGATCATTAATA
+
FAFFEAF<F8E,66CEFGG;F@68;,;<FFG,C,8@B@CFFGGD@EFAF
@M06365:33:000000000-LKMRC:1:1101:20949:3585 1:N:0:1
AAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTCATAATCAAGTGTTTTTATGGCACTTTCAAAAATCCATATCCACCTTGTGTG
+

and so on.

I believe, all my other Fastq files will be like this almost. Do you recommend copying and pasting these for all 96 samples F and R reads on one page and then running pairwise.seq on it?

Or should I run all my samples as I ran for bacterial samples from the beginning until unique.seqs(fasta=stability.trim.contigs.good.fasta, count=stability.contigs.good.count_table), and then use stability.trim.contigs.good.fasta in pairwise.seqs(fasta=yeast_ITS.fasta), then run the subsequent command filter.seqs(fasta=stability.trim.contigs.good.unique.good.align, vertical=T, trump=.) and replace stability.trim.contigs.good.unique.good.align with the output of pairwise.seq command? Then move on with unique.seqs, pre.cluster, chimera.vsearch and classify.seqs

pschloss · December 12, 2024, 12:36am

You’ll probably need to use make.contigs to join the two reads and then use that as input to the downstream steps.

Pat

system · December 22, 2024, 12:37am

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Align.seqs command Commands in mothur	6	148	December 19, 2024
Aligned ITS reference files Commands in mothur	1	695	February 13, 2017
Reg. UNITE for 18S ITS Analysis Commands in mothur	1	876	May 11, 2017
reference alignment of ITS1 database Theory behind mothur	1	1806	May 28, 2016
Query regarding 16S rRNA ITS sequencing	6	396	November 29, 2022

Customize my reference alignment for ITS1-ITS2 region

Related topics