PCR.seqs in MiSeq SOP

Nicolas · December 29, 2020, 11:42pm

Good day

I am going over my first 16S dataset (I normally work with fungi) and am going over the Miseq SOP. I have a question about the alignment and pcr.seqs step. For the pcr.seqs command, I used the same start and end values as in the SOP, because I used the same reference file. I did however ad my primers used as an oligos file. Here is the code:

mothur > pcr.seqs(fasta=silva.bacteria.fasta, oligos=16S.oligos, keepdots=F, processors=8)

#I downloaded the silva file from mothur. It's just a general reference file for bacteria. The oligos file I made
with just the forward and reverse primers like this:

forward CCTACGGGNGGCWGCAG
reverse GACTACHVGGGTATCTAATCC



It took 36 secs to screen 14956 sequences.

Output File Names: 
silva.bacteria.pcr.fasta
silva.bacteria.bad.accnos
silva.bacteria.scrap.pcr.fasta

mothur > rename.file(input=silva.bacteria.pcr.fasta, new=silva.v4.fasta)

summary.seqs(fasta=silva.v4.fasta)

mothur > summary.seqs(fasta=silva.v4.fasta)

Using 8 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	17013	382	0	3	1
2.5%-tile:	2	17014	402	0	4	339
25%-tile:	2	17014	405	0	4	3390
Median: 	2	17014	425	0	5	6779
75%-tile:	2	17014	427	0	5	10168
97.5%-tile:	2	17014	428	1	6	13219
Maximum:	2	17063	469	5	9	13557
Mean:	1	17014	417	0	4
# of Seqs:	13557

mothur > align.seqs(fasta=stability.paired.trim.contigs.good.unique.fasta, reference=silva.v4.fasta)


It took 4159 secs to align 3144255 sequences.

[WARNING]: 712 of your sequences generated alignments that eliminated too many bases, a list is provided in stability.paired.trim.contigs.good.unique.flip.accnos.
[NOTE]: 305 of your sequences were reversed to produce a better alignment.

It took 4160 seconds to align 3144255 sequences.

Output File Names: 
stability.paired.trim.contigs.good.unique.align
stability.paired.trim.contigs.good.unique.align.report
stability.paired.trim.contigs.good.unique.flip.accnos

mothur > summary.seqs(fasta=stability.paired.trim.contigs.good.unique.align, count=stability.paired.trim.contigs.good.count_table)

mothur > summary.seqs(fasta=stability.paired.trim.contigs.good.unique.align, count=stability.paired.trim.contigs.good.count_able)

Using 4 processors.

Start End NBases Ambigs Polymer NumSeqs

Minimum: 0 0 0 0 1 1

2.5%-tile: 2 17014 402 0 4 138038

25%-tile: 2 17014 403 0 4 1380373

Median: 2 17014 411 0 5 2760746

75%-tile: 2 17014 427 0 5 4141118

97.5%-tile: 2 17014 428 0 6 5383453

Maximum: 17063 17063 454 0 118 5521490

Mean: 11 17012 414 0 4

# of unique seqs: 3144255

total # of seqs: 5521490

It took 4969 secs to summarize 5521490 sequences.

Output File Names:

stability.paired.trim.contigs.good.unique.summary

Firstly, that warning message after align.seqs seems off and secondly, when looking at the summary.seqs for the output of align.seqs, the start and end of the sequences do not correspond with the set parameters in pcr.seqs. Could this be because I added the oligos file? Should I take out the oligos file in pcr.seqs and then remove the primers separately after I have done alignment?

Sorry if this is a basic question, but I haven’t done an alignment yet, so I would therefore kindly appreciate feedback regarding this matter.

Best
Nicolas

pschloss · January 4, 2021, 6:54pm

Hi Nicolas,

The warning message from align.seqs indicates that you have reads that were in the opposite direction from what was expected and/or you have data of poor quality. This would likely fit with the long (>400 nt) contigs compared to the V4 region of the 16S rRNA gene. I’m not sure what you mean by “the sequences do not correspond with the set parameters in pcr.seqs”. You didn’t give it a start or end position, only the primer sequences.

Pat

Nicolas · January 4, 2021, 7:13pm

Dear Pat
Thanks for getting back to me on this. So what I did in the meantime is: I trimmed off the primers before I ran align.seqs, with the .oligos option and then I just followed the rest of the SOP as you’ve outlined. Is that fine?
So with the pcr.seqs command, I first tried it without the oligos file and just with the start and end options, and when I ran summary.seqs after align.seqs, it didn’t match up with the same start and end functions as the command. And again, when I did the primer trimming first prior to pcr.seqs, they didn’t match up with the set parameter. Here, I’ll include my code:

(this is after the screen.seqs step)

mothur > trim.seqs(fasta=stability.paired.trim.contigs.good.fasta, oligos=16S.oligos, pdiffs=0, bdiffs=0, checkorient=T)

Output File Names: 
stability.paired.trim.contigs.good.trim.fasta
stability.paired.trim.contigs.good.scrap.fasta

mothur > summary.seqs(fasta=stability.paired.trim.contigs.good.trim.fasta)

Using 4 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	263	263	0	3	1
2.5%-tile:	1	402	402	0	4	126003
25%-tile:	1	403	403	0	4	1260026
Median: 	1	411	411	0	5	2520052
75%-tile:	1	427	427	0	5	3780078
97.5%-tile:	1	428	428	0	6	4914101
Maximum:	1	462	462	0	81	5040103
Mean:	1	414	414	0	4
# of Seqs:	5040103

It took 714 secs to summarize 5040103 sequences.

Output File Names:
stability.paired.trim.contigs.good.trim.summary

Looking at it on geneious, not everything merged properly, but the trim.seqs worked for sure.

mothur > unique.seqs(fasta=stability.paired.trim.contigs.good.trim.fasta)

Output File Names: 
stability.paired.trim.contigs.good.trim.names
stability.paired.trim.contigs.good.trim.unique.fasta

mothur > summary.seqs(fasta=stability.paired.trim.contigs.good.trim.unique.fasta, name=stability.paired.trim.contigs.good.trim.names)

Using 4 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	263	263	0	3	1
2.5%-tile:	1	402	402	0	4	126003
25%-tile:	1	403	403	0	4	1260026
Median: 	1	411	411	0	5	2520052
75%-tile:	1	427	427	0	5	3780078
97.5%-tile:	1	428	428	0	6	4914101
Maximum:	1	462	462	0	81	5040103
Mean:	1	414	414	0	4
# of unique seqs:	1848492
total # of seqs:	5040103

It took 136 secs to summarize 5040103 sequences.

Output File Names:
stability.paired.trim.contigs.good.trim.unique.summary

count.seqs(name=stability.paired.trim.contigs.good.trim.names, group=stability.paired.contigs.groups)

mothur > count.seqs(name=stability.paired.trim.contigs.good.trim.names, group=stability.paired.contigs.groups)

It took 131 secs to create a table for 5040103 sequences.

Total number of sequences: 5040103

Output File Names: 
stability.paired.trim.contigs.good.trim.count_table

mothur > pcr.seqs(fasta=silva.bacteria.fasta, start=11894, end=25319, keepdots=F, processors=8)

It took 20 secs to screen 14956 sequences.

Output File Names: 
silva.bacteria.pcr.fasta

rename.file(input=silva.bacteria.pcr.fasta, new=silva.v4.fasta)

mothur > summary.seqs(fasta=silva.v4.fasta)

Using 8 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	13424	270	0	3	1
2.5%-tile:	1	13425	292	0	4	374
25%-tile:	1	13425	293	0	4	3740
Median: 	1	13425	293	0	4	7479
75%-tile:	1	13425	293	0	5	11218
97.5%-tile:	1	13425	294	1	6	14583
Maximum:	3	13425	351	5	9	14956
Mean:	1	13424	292	0	4
# of Seqs:	14956

It took 8 secs to summarize 14956 sequences.

Output File Names:
silva.v4.summary

mothur > align.seqs(fasta=stability.paired.trim.contigs.good.trim.unique.fasta, reference=silva.v4.fasta)

[WARNING]: 32 of your sequences generated alignments that eliminated too many bases, a list is provided in stability.paired.trim.contigs.good.trim.unique.flip.accnos.
[NOTE]: 0 of your sequences were reversed to produce a better alignment.

It took 1697 seconds to align 1848492 sequences.

Output File Names: 
stability.paired.trim.contigs.good.trim.unique.align
stability.paired.trim.contigs.good.trim.unique.align.report
stability.paired.trim.contigs.good.trim.unique.flip.accnos

summary.seqs(fasta=stability.paired.trim.contigs.good.trim.unique.align, count=stability.paired.trim.contigs.good.trim.count_table)
 
 
mothur > summary.seqs(fasta=stability.paired.trim.contigs.good.trim.unique.align, count=stability.paired.trim.contigs.good.trim.count_table)

Using 8 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	4393	111	0	3	1
2.5%-tile:	1	11546	269	0	4	126003
25%-tile:	1	11546	270	0	4	1260026
Median: 	1	11546	270	0	4	2520052
75%-tile:	1	11546	270	0	5	3780078
97.5%-tile:	1	11546	271	0	6	4914101
Maximum:	1241	13425	297	0	17	5040103
Mean:	1	11545	270	0	4
# of unique seqs:	1848492
total # of seqs:	5040103

It took 2123 secs to summarize 5040103 sequences.

Output File Names:
stability.paired.trim.contigs.good.trim.unique.summary

screen.seqs(fasta=stability.paired.trim.contigs.good.trim.unique.align, count=stability.paired.trim.contigs.good.trim.count_table, summary=stability.paired.trim.contigs.good.trim.unique.summary, start=1, end=11546, maxhomop=8)

I carried on with that all the way to the cluster step, but now at the cluster step, I am having errors. My count table and my .dist file does not seem to match up. But anyways, I would please like to hear your opinion on my approach to the PCR.seqs step before I get to that again.

Thanks in advance!
Best
Nicolas

pschloss · January 7, 2021, 1:39pm

Hi Nicolas,

It looks like you are using the V4 coordinates, but I don’t think you have V4 data. I suspect you really have V4-V5 data judging by their original lengths of >400 nt. I’m not totally clear what you mean by …

it didn’t match up with the same start and end functions as the command. And again, when I did the primer trimming first prior to pcr.seqs, they didn’t match up with the set parameter.

But I suspect they didn’t correspond with the coordinates on the wiki because the wiki is using the V4 region rather than the V4-V5, which is what you need.

Can you provide the steps between running screen.seqs and where you are running into problems?

Thanks,
Pat

Nicolas · January 7, 2021, 4:38pm

Hey Pat
Thanks for getting back to me again. What I meant with that statement you copied is that after the alignment step, the start and end sequences don’t start where I set the parameter for them to start. Your speculation on the V4-V5 part makes sense. So does that mean I need to set different start and end values based on V4-V5, rather than the start and end values corresponding with V4, like in the Wiki?

I am copying my workflow from after screen.seqs here:

mothur > filter.seqs(fasta=stability.paired.trim.contigs.good.trim.unique.good.align, vertical=T, trump=.)

It took 1044 secs to filter 1845905 sequences.

Length of filtered alignment: 695
Number of columns removed: 12730
Length of the original alignment: 13425
Number of sequences used to construct filter: 1845905

Output File Names:
stability.filter
stability.paired.trim.contigs.good.trim.unique.good.filter.fasta

unique.seqs(fasta=stability.paired.trim.contigs.good.trim.unique.good.filter.fasta, count=stability.paired.trim.contigs.good.trim.count_table)

Output File Names:
stability.paired.trim.contigs.good.trim.unique.good.filter.count_table
stability.paired.trim.contigs.good.trim.unique.good.filter.unique.fasta

pre.cluster(fasta=stability.paired.trim.contigs.good.trim.unique.good.filter.unique.fasta, count=stability.paired.trim.contigs.good.trim.unique.good.filter.count_table, diffs=2)

stability.paired.trim.contigs.good.trim.unique.good.filter.unique.precluster.fasta
stability.paired.trim.contigs.good.trim.unique.good.filter.unique.precluster.count_table

chimera.vsearch(fasta=stability.paired.trim.contigs.good.trim.unique.good.filter.unique.precluster.fasta, count=stability.paired.trim.contigs.good.trim.unique.good.filter.unique.precluster.count_table, dereplicate=t)

Reading file stability.paired.trim.contigs.good.trim.unique.good.filter.unique.precluster.temp1.temp 100%
1953750 nt in 7235 seqs, min 248, max 280, avg 270
Masking 27%
It took 23 secs to check 7618 sequences from group 75G3.
Masking 100%
Sorting by abundance 100%
Counting k-mers 100%
Detecting chimeras 100%
Found 2285 (31.6%) chimeras, 4858 (67.1%) non-chimeras,
and 92 (1.3%) borderline sequences in 7235 unique sequences.
Taking abundance information into account, this corresponds to
2735 (4.4%) chimeras, 59890 (95.4%) non-chimeras,
and 132 (0.2%) borderline sequences in 62757 total sequences.

It took 23 secs to check 7235 sequences from group 53E5.
It took 546 secs to check 556205 sequences.

Output File Names:
stability.paired.trim.contigs.good.trim.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table
stability.paired.trim.contigs.good.trim.unique.good.filter.unique.precluster.denovo.vsearch.chimeras
stability.paired.trim.contigs.good.trim.unique.good.filter.unique.precluster.denovo.vsearch.accnos

remove.seqs(fasta=stability.paired.trim.contigs.good.trim.unique.good.filter.unique.precluster.fasta, accnos=stability.paired.trim.contigs.good.trim.unique.good.filter.unique.precluster.denovo.vsearch.accnos)

Removed 130869 sequences from your fasta file.

Output File Names:
stability.paired.trim.contigs.good.trim.unique.good.filter.unique.precluster.pick.fasta

Try this now:

mothur > cluster(column=stability.paired.trim.contigs.good.trim.unique.good.filter.unique.precluster.pick.dist, count=stability.paired.trim.contigs.good.trim.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table)

That didn’t work, try this

mothur > cluster.split(fasta=stability.paired.trim.contigs.good.trim.unique.good.filter.unique.precluster.pick.fasta, count=stability.paired.trim.contigs.good.trim.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, taxonomy=stability.paired.trim.contigs.good.trim.unique.good.filter.unique.precluster.pds.wang.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.03)

Both commands cluster and cluster.split gave me errors stating that the list values and the fasta values did not correspond to the values in the count table. It gave me one value that was not in the count.table.
More specifically, this is what happens when I try to run the cluster command:
"mothur > cluster(column=stability.paired.trim.contigs.good.trim.unique.good.filter.unique.precluster.pick.dist, count=stability.paired.trim.contigs.good.trim.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table)

Using 4 processors.

You did not set a cutoff, using 0.03.

Clustering stability.paired.trim.contigs.good.trim.unique.good.filter.unique.precluster.pick.dist

iter time label num_otus cutoff tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score

0.03
AAError: Sequence ‘M00313_220_000000000-CRDLT_1_1118_4699_6126’ was not found in the name or count file, please correct"

And when I try to run the split.cluster command:
" mothur > cluster.split(fasta=stability.paired.trim.contigs.good.trim.unique.good.filter.unique.precluster.pick.fasta, count=stability.paired.trim.contigs.good.trim.unique.good.filter.uniqe.precluster.denovo.vsearch.pick.count_table, taxonomy=stability.paired.trim.contigs.good.trim.unique.good.filter.unique.precluster.pds.wang.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.03)

Using 4 processors.

Splitting the file…

ERROR: M00313_220_000000000-CRDLT_1_1102_11126_19733 is missing from your fastafile. This could happen if your taxonomy file is not unique and your fastafile is, or it could indicate an error.

ERROR: M00313_220_000000000-CRDLT_1_1102_11290_9108 is missing from your fastafile. This could happen if your taxonomy file is not unique and your fastafile is, or it could indicate an error.

ERROR: M00313_220_000000000-CRDLT_1_1102_13546_14638 is missing from your fastafile. This could happen if your taxonomy file is not unique and your fastafile is, or it could indicate an error.

ERROR: M00313_220_000000000-CRDLT_1_1102_14327_13893 is missing from your fastafile. This could happen if your taxonomy file is not unique and your fastafile is, or it could indicate an error.

ERROR: M00313_220_000000000-CRDLT_1_1102_16551_3122 is missing from your fastafile. This could happen if your taxonomy file is not unique and your fastafile is, or it could indicate an erro"

This error message just repeats for many lines.

Thanks in advance.
Best
Nicolas

pschloss · January 11, 2021, 8:50pm

Hi Nicolas,

You can find instructions on customizing the reference to your region of interest by going to Customize your reference alignment for your favorite region and following the instructions there.

Can you re-run classify.seqs and dist.seqs on the output of remove.seqs? I wonder if something weird happened and perhaps you accidentally skipped it when you were running these commands before running cluster and cluster.split.

Pat

Nicolas · January 15, 2021, 5:21pm

Dear Pat
Again, thanks for all the tips! The link you sent me on customizing my reference alignment for your favorite region is useful, but I’m not 100% if I followed it correctly. Everything seemed to work fine, until I reached the pre.cluster step with my new alignments. At first I tried running pre.cluster with the count.table, but I kept on getting errors. Then I tried it with the group.file, and I got similar errors again.

Sorry if this takes up too much of your time, I’m going to try and set up a meeting with a bioinformatician to troubleshoot this with me. But again, if you have the time, I would kindly appreciate your input.

This is what I did following the link that you sent me.

mothur > align.seqs(fasta=stability.paired.trim.contigs.good.unique.fasta, reference=silva.nr_v138.align)




[WARNING]: 893 of your sequences generated alignments that eliminated too many bases, a list is provided in stability.paired.trim.contigs.good.unique.flip.accnos.
[NOTE]: 485 of your sequences were reversed to produce a better alignment.

It took 23506 seconds to align 3144255 sequences.

Output File Names: 
stability.paired.trim.contigs.good.unique.align
stability.paired.trim.contigs.good.unique.align.report
stability.paired.trim.contigs.good.unique.flip.accnos

mothur > summary.seqs(fasta=stability.paired.trim.contigs.good.unique.align)

Using 4 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	0	0	0	0	1	1
2.5%-tile:	6388	25316	440	0	4	78607
25%-tile:	6388	25316	441	0	5	786064
Median: 	6388	25316	455	0	5	1572128
75%-tile:	6388	25316	465	0	6	2358192
97.5%-tile:	6388	25316	466	0	7	3065649
Maximum:	43116	43116	500	0	136	3144255
Mean:	6406	25326	452	0	5
# of Seqs:	3144255

It took 16496 secs to summarize 3144255 sequences.

Output File Names:
stability.paired.trim.contigs.good.unique.summary

mothur > pcr.seqs(fasta=silva.nr_v138.align, start=6388, end=25316, keepdots=F, processors=8)

It took 235 secs to screen 146796 sequences.

Output File Names: 
silva.nr_v138.pcr.align

mothur > rename.file(input=silva.nr_v138.pcr.align, new=silva.v4.fasta)

mothur > summary.seqs(fasta=silva.v4.fasta)


Using 8 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	15382	256	0	3	1
2.5%-tile:	1	18928	436	0	4	3670
25%-tile:	1	18928	441	0	4	36700
Median: 	1	18928	461	0	5	73399
75%-tile:	1	18928	464	0	6	110098
97.5%-tile:	1	18928	606	1	7	143127
Maximum:	3865	18928	1666	5	16	146796
Mean:	1	18927	468	0	5
# of Seqs:	146796

It took 159 secs to summarize 146796 sequences.

Output File Names:
silva.v4.summary


mothur > align.seqs(fasta=stability.paired.trim.contigs.good.unique.fasta, reference=silva.v4.fasta)


It took 16423 secs to align 3144255 sequences.

[WARNING]: 633 of your sequences generated alignments that eliminated too many bases, a list is provided in stability.paired.trim.contigs.good.unique.flip.accnos.
[NOTE]: 385 of your sequences were reversed to produce a better alignment.

It took 16430 seconds to align 3144255 sequences.

Output File Names: 
stability.paired.trim.contigs.good.unique.align
stability.paired.trim.contigs.good.unique.align.report
stability.paired.trim.contigs.good.unique.flip.accnos

summary.seqs(fasta=stability.paired.trim.contigs.good.unique.align)

mothur > summary.seqs(fasta=stability.paired.trim.contigs.good.unique.align)

Using 4 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	0	0	0	0	1	1
2.5%-tile:	1	18928	439	0	4	78607
25%-tile:	1	18928	440	0	5	786064
Median: 	1	18928	453	0	5	1572128
75%-tile:	1	18928	464	0	6	2358192
97.5%-tile:	1	18928	465	0	7	3065649
Maximum:	18928	18928	498	0	135	3144255
Mean:	17	18925	451	0	5
# of Seqs:	3144255

It took 6146 secs to summarize 3144255 sequences.

Output File Names:
stability.paired.trim.contigs.good.unique.summary

In the MIseqSOP they use the count.table as well. I’m just a little concerned that the start and end sequences in the summary.table does not reflect
what I set it was in the pcr.seqs command

mothur > summary.seqs(fasta=stability.paired.trim.contigs.good.unique.align, count=stability.paired.trim.contigs.good.count_table)


Using 4 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	0	0	0	0	1	1
2.5%-tile:	1	18928	439	0	4	138038
25%-tile:	1	18928	440	0	5	1380373
Median: 	1	18928	448	0	5	2760746
75%-tile:	1	18928	464	0	6	4141118
97.5%-tile:	1	18928	465	0	6	5383453
Maximum:	18928	18928	498	0	135	5521490
Mean:	10	18926	451	0	5
# of unique seqs:	3144255
total # of seqs:	5521490

It took 6984 secs to summarize 5521490 sequences.

Output File Names:
stability.paired.trim.contigs.good.unique.summary


mothur > screen.seqs(fasta=stability.paired.trim.contigs.good.unique.align, count=stability.paired.trim.contigs.good.count_table, maxhomop=8)

It took 6472 secs to screen 3144255 sequences, removed 5688.

/******************************************/
Running command: remove.seqs(accnos=stability.paired.trim.contigs.good.unique.bad.accnos.temp, count=stability.paired.trim.contigs.good.count_table)
Removed 5791 sequences from your count file.

Output File Names:
stability.paired.trim.contigs.good.pick.count_table

/******************************************/

Output File Names:
stability.paired.trim.contigs.good.unique.good.align
stability.paired.trim.contigs.good.unique.bad.accnos
stability.paired.trim.contigs.good.good.count_table

mothur > filter.seqs(fasta=stability.paired.trim.contigs.good.unique.good.align, vertical=T, trump=.)

Length of filtered alignment: 0
Number of columns removed: 18928
Length of the original alignment: 18928
Number of sequences used to construct filter: 3138567

Output File Names: 
stability.filter
stability.paired.trim.contigs.good.unique.good.filter.fasta

mothur > unique.seqs(fasta=stability.paired.trim.contigs.good.unique.good.filter.fasta, count=stability.paired.trim.contigs.good.good.count_table)

Output File Names: 
stability.paired.trim.contigs.good.unique.good.filter.count_table
stability.paired.trim.contigs.good.unique.good.filter.unique.fasta

mothur > pre.cluster(fasta=stability.paired.trim.contigs.good.unique.good.filter.unique.fasta, group=stability.paired.contigs.groups, diffs=2)

And then here is where I keep on getting error messages.

pschloss · January 19, 2021, 7:07pm

You won’t get the same start/end positions after running align.seqs because you removed everything before positions 6388 and after position 25316 of the original reference file. That is why you start at position 1 and end at 18928. What you’ve done is a bit convoluted since you aligned your data to the reference, found the start/end positions, screened the reference and then realigned your data. That kind of defeats the purpose of customizing the region. You could have taken the first output of align.seqs and used screen.seqs and then filter.seqs like you are now. The page I sent you to, had you do an alignment of the E coli 16S rRNA gene sequence trimmed with your primers and then use those coordinates in pcr.seqs to generate the customized region.

The pre.cluster syntax should be…

pre.cluster(fasta=stability.paired.trim.contigs.good.unique.good.filter.unique.fasta, count=stability.paired.trim.contigs.good.unique.good.filter.count_table, diffs=2)

Pat

Nicolas · January 23, 2021, 9:04pm

Dear Pat.
Many thanks for getting back to me on this. I’ll revisit my workflow and implement these changes.

Best
Nicolas

Babak · May 16, 2021, 3:02pm

Hello Nicolas,
Did you find a solution for issue that you mentioned,
because I have same question, did you proceed with analysis?
mothur > summary.seqs(fasta=current, count=current)
Using stability.trim.contigs.good.good.count_table as input file for the count parameter.
Using stability.trim.contigs.good.unique.good.align as input file for the fasta parameter.

Using 7 processors.

            Start   End     NBases  Ambigs  Polymer NumSeqs

Minimum: 1 21 10 0 1 1
2.5%-tile: 1 30 11 0 2 76265
25%-tile: 1 18928 441 0 4 762647
Median: 1 18928 462 0 6 1525294
75%-tile: 1 18928 464 0 6 2287941
97.5%-tile: 18895 18931 467 0 8 2974323
Maximum: 18910 18931 558 0 8 3050587
Mean: 1543 17642 395 0 5

of unique seqs: 1333654

total # of seqs: 3050587

It took 384 secs to summarize 3050587 sequences.

Nicolas · June 25, 2021, 3:41pm

Hey Baback.
Sorry for only getting back to your comment now. I haven’t look at this pipeline in a couple of months and no, I didn’t get to a solution. I am however revisiting the pipeline now and will let you know if I can come to a solution.

Best
Nicolas

Topic		Replies	Views
Miseq SOP pcr.seqs Commands in mothur	13	17340	February 25, 2016
pcr.seqs with oligos file Commands in mothur	3	1546	March 22, 2016
Silva Bacteria Aligning Position Theory behind mothur	4	6326	July 27, 2014
Pcr.seqs with Silva 138 start/end Commands in mothur	2	381	April 12, 2020
PCR.Seqs Commands in mothur	3	2462	March 30, 2015

PCR.seqs in MiSeq SOP

of unique seqs: 1333654

Related topics