XXX is not in your name or count file, please correct

Hi there,
Not an expert, new to the program…

I have run several examples following the MiSOP command sequence and things have going smoothly but when I try to run my real samples things start to go wrong after the align.seqs and screen.seqs… I have tried 3 times starting from scratch but no luck.

I attach the part of the logfile.

mothur >
align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=silva.v4.fasta)

Using 16 processors.

Reading in the silva.v4.fasta template sequences… DONE.
It took 3 to read 14956 sequences.

Aligning sequences from stability.trim.contigs.good.unique.fasta …
It took 1475 secs to align 4115140 sequences.

[WARNING]: 5079 of your sequences generated alignments that eliminated too many bases, a list is provided in stability.trim.contigs.good.unique.flip.accnos.
[NOTE]: 3759 of your sequences were reversed to produce a better alignment.

It took 1485 seconds to align 4115140 sequences.

Output File Names:
stability.trim.contigs.good.unique.align
stability.trim.contigs.good.unique.align_report
stability.trim.contigs.good.unique.flip.accnos

summary.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table)

Using 16 processors.

            Start   End     NBases  Ambigs  Polymer NumSeqs

Minimum: 0 0 0 0 1 1
2.5%-tile: 1 18929 439 0 4 259538
25%-tile: 1 18929 440 0 5 2595376
Median: 1 18929 443 0 5 5190751
75%-tile: 1 18929 460 0 6 7786126
97.5%-tile: 2 18929 466 0 6 10121963
Maximum: 18929 18929 494 0 8 10381500
Mean: 6 18903 449 0 5

of unique seqs: 4115140

total # of seqs: 10381500

It took 502 secs to summarize 10381500 sequences.

Output File Names:
stability.trim.contigs.good.unique.summary

mothur > screen.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table, start=2, end=18929)

Using 8 processors.
1000
1000
1000

257196

It took 176 secs to screen 4115140 sequences, removed 124339.

/******************************************/
Running command: remove.seqs(accnos=stability.trim.contigs.good.unique.bad.accnos.temp, count=stability.trim.contigs.good.count_table)
Removed 14 sequences from stability.trim.contigs.good.count_table.

Output File Names:
stability.trim.contigs.good.pick.count_table

/******************************************/

Output File Names:
stability.trim.contigs.good.unique.good.align
stability.trim.contigs.good.unique.bad.accnos
stability.trim.contigs.good.good.count_table

It took 675 secs to screen 4115140 sequences.

mothur > summary.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.good.count_table)

Using 16 processors.
[ERROR]: ‘M07149_18_000000000-KGDT6_1_2101_17402_11050’ is not in your name or count file, please correct.
[ERROR]: ‘M07149_18_000000000-KGDT6_1_2112_10437_9521’ is not in your name or count file, please correct.

Any help greatly appreciated.

OK. After exploring various other issues, I figured out mine. Not enough disk space. Deleted files, increased size of partition and so far so good.

Also based on the suggestion of a colleague I analysed up to align smaller batches of samples which i merged afterwards.

Hope this is useful to others. Cheers.

Glad you figured it out - the reason you are running out of space is, in part, because you sequenced a region where your reads do not fully overlap with each other. When reads do not fully overlap the error rate is about 10x that of when they do: