Dear Pat,
Thank you sincerely for your recommendation. Yes, after I adjusted the raw data from the company, which hadn’t been trimmed, the length of the shorter sequences increased and reached 246.
mothur > summary.seqs(fasta=stability.trim.contigs.fasta)
Using 56 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 246 246 0 2 1
2.5%-tile: 1 270 270 0 4 203779
25%-tile: 1 444 444 0 4 2037783
Median: 1 468 468 0 5 4075565
75%-tile: 1 470 470 0 6 6113347
97.5%-tile: 1 479 479 20 13 7947351
Maximum: 1 499 499 84 244 8151129
Mean: 1 440 440 1 5
# of Seqs: 8151129
For the next step, I opted for the ‘optimize’ option and encountered no errors:
mothur > screen.seqs(fasta=stability.trim.contigs.good.pick.unique.align, count=stability.trim.contigs.good.pick.count_table, optimize=start-end, criteria=90)
Consequently, I proceeded with the analysis:
mothur > summary.seqs(fasta=stability.trim.contigs.good.pick.unique.good.align, count=stability.trim.contigs.good.pick.good.count_table)
Using 56 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 2 18996 333 0 3 1
2.5%-tile: 8 18996 444 0 4 145850
25%-tile: 8 18996 453 0 4 1458494
Median: 12 18996 469 0 5 2916987
75%-tile: 12 19116 477 0 6 4375480
97.5%-tile: 12 19118 478 0 6 5688124
Maximum: 3642 19118 499 0 8 5833973
Mean: 12 19043 464 0 4
# of unique seqs: 3075028
total # of seqs: 5833973
Below is the summary of sequences before dis.seq()
, which is currently running. Please confirm if this process was executed correctly:
mothur > summary.seqs(fasta=stability.trim.contigs.good.pick.unique.good.filter.unique.precluster.denovo.vsearch.pick.fasta, count=stability.trim.contigs.good.pick.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table)
Using 56 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 1749 285 0 3 1
2.5%-tile: 1 1770 330 0 4 127448
25%-tile: 1 1770 332 0 4 1274472
Median: 1 1770 342 0 4 2548944
75%-tile: 1 1770 343 0 5 3823416
97.5%-tile: 1 1770 343 0 6 4970440
Maximum: 179 1770 423 0 8 5097887
Mean: 1 1769 338 0 4
# of unique seqs: 1153966
total # of seqs: 5097887
Indeed, the samples sent to different companies were not from similar aliquots and were collected in different seasons. While I agree there may be some variables such as using different primers, I thought that by combining the data and analyzing them together, I could reduce these variables to gain insight into the effect of season.
Looking forward to your feedback,
Best