chop.seqs

Hello, I am having difficulting understanding the cho.seqs command in MOTHUR. Here is my summary of my fasta file

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 447 93 0 2 1
2.5%-tile: 68 447 96 0 3 1624
25%-tile: 81 448 104 0 4 16236
Median: 84 448 113 0 4 32472
75%-tile: 121 448 117 0 4 48707
97.5%-tile: 134 449 125 0 6 63319
Maximum: 231 474 159 0 11 64942
Mean: 96.2093 447.977 111.134 0 4.13765

of Seqs: 64942

I want to use this command: chop.seqs(fasta=SLE_alldatasetnopoints.unique.good.filter.fasta, numbases=95, keep=back), but i dont know if its correct. Could someone help me understand how to “chop”? the concept is a little out of my league. Thank you

i performed the chop.seqs command and I am on the step of pre.cluster, however there seems to be a discrepancy with my names file and gave me this following message:
[ERROR]: XSYUM:01748:02605 is in your name file and not in your fasta file, please correct.
[ERROR]: XSYUM:02501:02389 is in your name file and not in your fasta file, please correct.
[ERROR]: XSYUM:01774:00268 is in your name file and not in your fasta file, please correct.
[ERROR]: XSYUM:01770:01691 is in your name file and not in your fasta file, please correct.
[ERROR]: XSYUM:01880:01119 is in your name file and not in your fasta file, please correct.

but i checked my names file and my fasta file and they both contain the same number of sequences, so I dont know what is wrong here. Does anyone know how to solve this problem?

When you run: chop.seqs(fasta=SLE_alldatasetnopoints.unique.good.filter.fasta, numbases=95, keep=back) mothur will keep the last 95 bases in your sequence. If the sequence is less than 95 bases, it will be removed from your dataset. Looking at your summary.seqs output, you would have one sequence with 93 bases. The chop.seqs command should have created an accnos file with the sequence name in it. To resolve the file mismatch error you are getting from the pre.cluster command run remove.seqs(name=yourNameFile, accnos=accnosFileFromChopSeqs), then pre.cluster.

If you want to keep the “short” sequences in chop.seqs, you can set the short parameter to true. Also, chop.seqs can handle aligned or unaligned sequences. By default mothur “chops” based on the number of bases in the sequence, you can force mothur to count gaps as bases if you set the countgaps parameter to true.

Pre.cluster expects your sequences to be aligned. Even if your sequences were aligned before chop.seqs, they will most likely not be the same length after chops.seqs unless you set countgaps=true.

Kindly,
Sarah

Hey, thank you I will try that