align seqs discrepancy between input fasta and align file

Hi Pat and Sarah,

Apologies if this has been covered already but I couldn’t find a solution in my searching.

Anyway, I just ran an archaeal alignment in mothur (1.31.2 on OSX 10.8) using the silva.archaea.fasta template. I had about 350,000 sequences in my fasta file, after aligning, the align file only has around 87,000 and the flip.accnos lists only 30,000. What happened to the remainder (most of the sequences?) I am compiling an older version of mothur (1.29.2) to see if it is a recent bug though I’m doubtful.

Any ideas? My fasta file is pretty big so emailing it to mothur.bugs is not possible.

Cheers,
Tris

Did you run the command with multiple processors? It looks like perhaps one of the processes died and did not align the sequences assigned to it.

I’ve tried it with multiple processors and with one alone, same result. I suspected that was what had happened until I ran just one processor and got the same outcome. I can see when I do multiple processors that the temp files are being made of around 2.2gb and it’s during the recombining that it failed and just made one file with around 4.2gb.

OK, so I managed to get the aligning to work correctly this time. The difference is that previously I was running the analysis on an external (USB3) hard disk. There was plenty of space and all the preceding steps worked but after running it on the main hdd it worked just fine with 8 processors.

Is there any reason it would not work? The hard disk had about 50gb of free space and the completed alignment is ~17gb, so even with the temp files and the completed alignment in place there would still be space. Not sure why it didn’t just fail with an error message?

Well, at least I know how to proceed but I would really like to be able to have the analysis run on an external drive, is this a limitation of the drives or something about the way mothur is coded?
thanks,
Tris

There isn’t anything in mothur’s code that limits the drive location. Have you had any issues with other commands pulling data from the external drive?

Haven’t had any issues with the drive or reading/writing to it, everything else in the analysis up to the align step worked fine (sffinfo, trim.seqs etc). Not sure what the problem could have been. How much maximum space would mothur need to generate an alignment file of that size at the point in time where it is recombining the temp files? Say it was 20gb for the complete alignment?

Thanks,
Tris

Hi Pat and Sarah,

I think I may have worked out why it wasn’t working properly. I was just running dist.seqs with 8 processors and watched the generation of temp files and recombining. In this case, the program created 8 files around 5gb each, and when the files were recombining additional space (around 2-3gb) was being used as each temp file would join the main file. Once the temp file was recombined that 2-3gb was freed up and then began being used up again as the next file was incorporated. I guess since the file I had was an alignment of around 20gb, with only 50gb free space on the external drive, the ‘buffer’ space was exceeding the available space? Is that possible?

Hope that makes sense.
Tris

Yes, that makes sense. When mothur combines the temp files it reads from the temp file and writes to the combined file. Once the copy is complete the temp file is deleted.