Problems combining data from different runs

Hi,
I’ve been trying to analyze roche 454 data by 454 SOP. After run “trim.seqs”,I got 12 parts in 1st run by used 12 MID adapter.
I try to use “merge.files” to combine the fasta/names/groups files from 1st Run(include 16S Reverse and 16S Forword parts), and the same files from 2nd Run.
Follow 454 SOP,step by step. When I run “screen.seqs”,problem occured. The sequences in names file mismatch to the sequences in groups file! :o

Next I try to use “merge.files” to combine the fasta/names/groups files from 1st Run an 2nd RUN by difference adaper.
I still follow 454 SOP. I use “merge.files” to combine the fasta/names/groups files from 12 parts and try to analysis Phylotype. Then the sequences in list mismatch to the sequences in groups file! :?

How can I do? Help me please!~ I’m really confuse.
Thanks a lot.

Thor

Hi Thor,

This typically happens because during the merge steps the correct files aren’t used or they aren’t all used. Have you considered using sff.multiple instead?

If you can post your complete commands we can try and diagnose what’s going on.


Pat

Thank you kindly for your quick reply.
After Reducing sequencing error, I got some files. (1stRun.MID1~12.shhh.trim.fasta/1stRun.MID1~12.shhh.trim.names/1stRun.MID1~12.shhh.trim.groups/2ndRun.MID1~12.shhh.trim.fasta…etc)(MID means mid adapter, 2 runs have used the same adapter)

And then I’m processing improved sequences by different method: one was merged the files to be a merged file(ex: merged.fasta, merged.names, merged.groups); another one was merged the same MID adapter’s data to be processing improved sequence(ex:MID1~12.shhh.trim.fasta/MID1~12.shhh.trim.names/MID1~12.shhh.trim.groups)

In all merged analysis, there has a problem by doing screen.seqs or pre.cluster: [ERROR]: Your name file contains XXXX valid sequences, and your groupfile contains XXXX, please correct.
In the same MID adapter’s merged data analysis, processing improved sequences is OK. After processing improved sequences, I merged them and try to make Phylotype analysis or OTUs analysis. the names and groups will be still mismatch.

I’ve tried to Make.group form fasta file, but that didn’t work.

I don’t know what was happened, I just followed the SOP.

Please help me, thanks.

Thor

Hi Thor,
I’d like to help. Could you post the exact commands you have run so far so I can try to see were the issue might be coming from?
Thanks,
Sarah

After trim.seqs, I have got files like below:

Run1.MID01-16SF.shhh.trim.fasta~Run1.MID12-16SF.shhh.trim.fasta
Run1.MID01-16SR.shhh.trim.fasta~Run1.MID12-16SR.shhh.trim.fasta
Run1.MID02-16SF.shhh.trim.fasta~Run2.MID12-16SF.shhh.trim.fasta
Run1.MID02-16SR.shhh.trim.fasta~Run2.MID12-16SR.shhh.trim.fasta

Run1.MID01-16SF.shhh.trim.names~Run1.MID12-16SF.shhh.trim.names
Run1.MID01-16SR.shhh.trim.names~Run1.MID12-16SR.shhh.trim.names
Run1.MID02-16SF.shhh.trim.names~Run2.MID12-16SF.shhh.trim.names
Run1.MID02-16SR.shhh.trim.names~Run2.MID12-16SR.shhh.trim.names

Run1.MID01-16SF.shhh.trim.groups~Run1.MID12-16SF.shhh.trim.groups
Run1.MID01-16SR.shhh.trim.groups~Run1.MID12-16SR.shhh.trim.groups
Run1.MID02-16SF.shhh.trim.groups~Run2.MID12-16SF.shhh.trim.groups
Run1.MID02-16SR.shhh.trim.groups~Run2.MID12-16SR.shhh.trim.groups

And then,I run “reverse.seqs” to made RunX.MIDXX.16SR be Forward. Next I run “merge.files” to combine.
merge.fatsa
merge.names
merge.groups

The next steps of SOP I used just a merged .fasta file, its .names file and the .group file.
It seems like this post:
http://mothur.ltcmp.net/t/error-in-pre-cluster-command/982/2

PS. this two runs can’t merge sff file, because 2 runs’ trim different.
PSII. I have tried this post: http://mothur.ltcmp.net/t/name-file-and-group-file-sequence-discrepancy/1322/3
After filter.seqs, run this below
list.seqs(fname=Antarctic1.trim.unique.good.filter.names)
get.seqs(accnos=current, group=mergegroupsgood)
And it didn’t worked.

Thanks for help.
Thor

R1.1.fasta means “Run1.Station1.fasta”,R2.1.fasta means “Run2.Station1.fasta”. Two RUNs use the same sample.

merge.files(input=R1.1.fasta-R2.1.fasta-R1.2.fasta-R2.2.fasta-R1.3.fasta-R2.3.fasta-R1.4.fasta-R2.4.fasta-R1.5.fasta-R2.5.fasta-R1.6.fasta-R2.6.fasta-R1.7.fasta-R2.7.fasta-R1.8.fasta-R2.8.fasta-R1.9.fasta-R2.9.fasta-R1.10.fasta-R2.10.fasta-R1.11.fasta-R2.11.fasta-R1.12.fasta-R2.12.fasta, output=merge.fasta)

merge.files(input=R1.1.names-R2.1.names-R1.2.names-R2.2.names-R1.3.names-R2.3.names-R1.4.names-R2.4.names-R1.5.names-R2.5.names-R1.6.names-R2.6.names-R1.7.names-R2.7.names-R1.8.names-R2.8.names-R1.9.names-R2.9.names-R1.10.names-R2.10.names-R1.11.names-R2.11.names-R1.12.names-R2.12.names, output=merge.names)

merge.files(input=R1.1.groups-R2.1.groups-R1.2.groups-R2.2.groups-R1.3.groups-R2.3.groups-R1.4.groups-R2.4.groups-R1.5.groups-R2.5.groups-R1.6.groups-R2.6.groups-R1.7.groups-R2.7.groups-R1.8.groups-R2.8.groups-R1.9.groups-R2.9.groups-R1.10.groups-R2.10.groups-R1.11.groups-R2.11.groups-R1.12.groups-R2.12.groups, output=merge.groups)

deunique.seqs(fasta=R1.1.fasta, name=R1.1.names)(do R1.1 to R2.12, total 24 *.deunique.fasta)

make.group(fasta=R1.1.deunique.fasta-R2.1.deunique.fasta-R1.2.deunique.fasta-R2.2.deunique.fasta-R1.3.deunique.fasta-R2.3.deunique.fasta-R1.4.deunique.fasta-R2.4.deunique.fasta-R1.5.deunique.fasta-R2.5.deunique.fasta-R1.6.deunique.fasta-R2.6.deunique.fasta-R1.7.deunique.fasta-R2.7.deunique.fasta-R1.8.deunique.fasta-R2.8.deunique.fasta-R1.9.deunique.fasta-R2.9.deunique.fasta-R1.10.deunique.fasta-R2.10.deunique.fasta-R1.11.deunique.fasta-R2.11.deunique.fasta-R1.12.deunique.fasta-R2.12.deunique.fasta, groups=MID01-MID01-MID02-MID02-MID03-MID03-MID04-MID04-MID05-MID05-MID06-MID06-MID07-MID07-MID08-MID08-MID09-MID09-MID10-MID10-MID11-MID11-MID12-MID12) (rename to merge2.groups)

unique.seqs(fasta=merge.fasta, name=merge.names)
align.seqs(fasta=merge.unique.fasta, reference=silva.bacteria.fasta, processors=2)
screen.seqs(fasta=merge.unique.align, name=merge.unique.names, group=merge.groups, start=28465, optimize=end, criteria=95, processors=2)
filter.seqs(fasta=merge.unique.good.align, vertical=T, trump=., processors=2)
unique.seqs(fasta=merge.unique.good.filter.fasta, name=merge.unique.good.names)
pre.cluster(fasta=merge.unique.good.filter.unique.fasta, name=merge.unique.good.filter.names, group=merge.good.groups, diffs=2)
[ERROR] Your name file contains XXXX valid sequences and your groupfile contains XXXXX, please correct

unique.seqs(fasta=merge.fasta, name=merge.names)
align.seqs(fasta=merge.unique.fasta, reference=silva.bacteria.fasta, processors=2)
screen.seqs(fasta=merge.unique.align, name=merge.unique.names, group=merge.groups, start=28465, optimize=end, criteria=95, processors=2)
filter.seqs(fasta=merge.unique.good.align, vertical=T, trump=., processors=2)
unique.seqs(fasta=merge.unique.good.filter.fasta, name=merge2.unique.good.names)
pre.cluster(fasta=merge.unique.good.filter.unique.fasta, name=merge.unique.good.filter.names, group=merge2.good.groups, diffs=2)
[ERROR] Your name file contains XXXX valid sequences and your groupfile contains XXXXX, please correct

I’m getting confused - initially you said it was failing at screen.seqs and now it’s failing at pre.cluster, which is it? Can you make one posting with all of your commands and output and where the error is occurring? Some of the things you’ve put up don’t make sense, eg…

Run1.MID01-16SF.shhh.trim.groups~Run1.MID12-16SF.shhh.trim.groups
Run1.MID01-16SR.shhh.trim.groups~Run1.MID12-16SR.shhh.trim.groups
Run1.MID02-16SF.shhh.trim.groups~Run2.MID12-16SF.shhh.trim.groups
Run1.MID02-16SR.shhh.trim.groups~Run2.MID12-16SR.shhh.trim.groups

I’m not sure how you get a file that ends in trim.groups.

Pat