sequencing alignment problem?

Hi,

I’ve been sent some data to analyze which is a bit different from what I normally see and I think is a candidate for the Mothur pipeline, however it is not bacterial so I need to use my own alignment reference etc. I took the training course years ago so have a general knowledge of the commends and theory of Mothur.

I have been going through the MiSeq SOP when I hit several problems.

Following the alignment to my reference (align.seq) the output was not particularly nice

Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 13 30 1 0 1 750003
25%-tile: 13 30 3 0 1 7500029
Median: 13 571 18 0 2 15000057
75%-tile: 584 584 18 0 2 22500085
97.5%-tile: 585 587 411 0 8 29250110
Maximum: 589 589 456 0 128 30000112
Mean: 251.818 311.952 46.292 0 2.03361

of unique seqs: 2580658

total # of seqs: 30000112

so when I used screen.seqs to trim to my region of interest I lost all the groups because of no data.

However looking at the previous steps fasta file in an editor it actually looks ok. It is the alignment step that is problematic.

Can anyone suggest how to proceed?
I am not sure if the problem is that the assembly has been poor (because of minimal overlap) or the alignment to a reference is the problem. Perhaps someone can tell me how to correctly interpret this table.

I decided to try simply trimming the primers using trim.seqs on the previous fasta file that had just unique haplotypes but this also looses almost all the data. The primers are there, the problem is that the sequences are not all in one orientation. Some are running from forward to reverse, others are reverse to forward. So trim.seqs can’t remove them, it’s only looking in one place and treats them almost all as missing.

This mix of directions is how the sequences came to me - I am told that is normal for a MiSeq… I wonder if that is the source of the alignment problem above. Is there an alternative for removing primers and a way to use the groups file to restore identity?

What I really want to get to is just a fasta file of my sequences with the primers removed and the group identity restored.

Thanks for any suggestions

Hi,

Have you tried flip=T in align.seqs?

Pat

To remove primers I tried PCR.seqs and provided an oligos file

I used mulitple oligo formats

e.g. primer ATTAGAWACCCBDGTAGTCC CCCGTCAATTCMTTTRAGT

As suggested on the oligo file page
or

PrimerA ATTAGAWACCCBDGTAGTCC
PrimerB CCCGTCAATTCMTTTRAGT

As was suggested in another forum post.

I tried a few other variants.

In all cases if successfully finds the first primer and removes it but sequences with the second primer are deleted because they don’t have the first primer.

So I want to find A or B and remove all that have neither. It seems to look for A.

I can do it in other programmes like cutadapt before entering mothur’s methods but i was hoping it could all be done under one framework.

I think the alignment is just fundamentally too poor to use align.seqs. Can any analysis be done without OTUs? If not I will have to move to a more simple BLAST analysis.

Thanks

I decided to try simply trimming the primers using trim.seqs on the previous fasta file that had just unique haplotypes but this also looses almost all the data. The primers are there, the problem is that the sequences are not all in one orientation. Some are running from forward to reverse, others are reverse to forward. So trim.seqs can’t remove them, it’s only looking in one place and treats them almost all as missing.

Did you try running trim.seqs with the checkorient parameter set to true?

http://www.mothur.org/wiki/Trim.seqs#checkorient

So I want to find A or B and remove all that have neither.

If the primers are not paired, then you can do that as follows:

mothur > trim.seqs(fasta=yourFastaFile, oligos=OnlyPrimerA) - trim all primer A
mothur > rename.file(fasta=current, output=trimmedA.fasta) - rename output file so it’s not overwritten
mothur > trim.seqs(fasta=yourFastaFile, oligos=OnlyPrimerB) - trim all primer B
mothur > rename.file(fasta=current, output=trimmedB.fasta)
mothur > merge.files(fasta=trimmedA.fasta-trimmedB.fasta, output=allTrimmed.fasta) - combine sequences that successfully had primers removed
mothur > list.seqs(fasta=allTrimmed.fasta)
mothur > get.seqs(fasta=current) - remove duplicate sequences added by running twice and merging

Thanks for the reply. That does seem somewhat complicated (and on large data sets will take huge amount of time I suspect). I suspect the question is why it doesn’t work as the wiki suggest it should. The command’s wiki page gives lots of information on using two primers… I just can’t get them to do anything other than find primer A and remove everything else.

Many people will have dozens of degenerate primers to deal with… making the problem much bigger.

Beth