Hello. I am having some issues with aligning a portion of my sequences. They kept getting filtered out from the majority of the sequences, so I tried to analyze them separately to see what might be causing the issue. The number of bases are way too small, and that was after mothur flipped the sequences to create better alignment. Any help would be greatly appreciated. Thank you.
Before screen.seqs this was the summary:
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 35 35 0 2 1
2.5%-tile: 1 35 35 0 4 215362
25%-tile: 1 440 440 18 5 2153611
Median: 1 445 445 20 6 4307222
75%-tile: 1 465 465 22 6 6460833
97.5%-tile: 1 538 538 53 35 8399082
Maximum: 1 602 602 86 301 8614443
Mean: 1 410 410 20 8
# of unique seqs: 8614443
total # of seqs: 8614443
It took 126 secs to summarize 8614443 sequences.
I ran a screen seqs using the same parameters that I used for the rest of the samples. Should I not be doing this? Ideally I would like to make these samples work on the same run as the rest of the samples.
mothur > screen.seqs(fasta = /Users/joehansen/Documents/USA/MicrobiomeProject/Bioinformatics/HansenWithPrelimNewMethodV3V4/PreLim/prelim.trim.contigs.fasta , count = /Users/joehansen/Documents/USA/MicrobiomeProect/Bioinformatics/HansenWithPrelimNewMethodV3V4/PreLim/prelim.contigs.count_table , maxambig = 0, minlength = 200, maxlength = 466, maxhomop = 8)
The summary.seqs after the screen.seqs was:
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 200 200 0 3 1
2.5%-tile: 1 215 215 0 4 9489
25%-tile: 1 287 287 0 5 94882
Median: 1 288 288 0 5 189763
75%-tile: 1 290 290 0 6 284644
97.5%-tile: 1 313 313 0 7 370036
Maximum: 1 465 465 0 8 379524
Mean: 1 285 285 0 5
# of unique seqs: 379524
total # of seqs: 379524
It took 4 secs to summarize 379524 sequences.
mothur > align.seqs(fasta = /Users/joehansen/Documents/USA/MicrobiomeProject/Bioinformatics/HansenWithPrelimNewMethodV3V4/PreLim/prelim.trim.contigs.good.fasta , reference = /Users/joehansen/Documents/USA/MicroiomeProject/Bioinformatics/silva.bacteria/silva.bacteria.fasta )
It took 483 secs to align 379524 sequences.
[WARNING]: 372581 of your sequences generated alignments that eliminated too many bases, a list is provided in /Users/joehansen/Documents/USA/MicrobiomeProject/Bioinformatics/HansenWithPrelimNewMethodV3V4/PreLim/prelim.trim.contigs.good.flip.accnos.
[NOTE]: 207099 of your sequences were reversed to produce a better alignment.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1044 1060 3 0 1 9489
25%-tile: 43029 43116 9 0 3 94882
Median: 43059 43116 13 0 3 189763
75%-tile: 43097 43116 19 0 4 284644
97.5%-tile: 43112 43116 38 0 6 370036
Maximum: 43116 43116 457 0 8 379524
Mean: 39361 39658 18 0 3
# of unique seqs: 379524
total # of seqs: 379524
It took 50 secs to summarize 379524 sequences.