Problems combining data from different runs

thorvs · April 27, 2014, 6:33am

Hi,
I’ve been trying to analyze roche 454 data by 454 SOP. After run “trim.seqs”,I got 12 parts in 1st run by used 12 MID adapter.
I try to use “merge.files” to combine the fasta/names/groups files from 1st Run(include 16S Reverse and 16S Forword parts), and the same files from 2nd Run.
Follow 454 SOP,step by step. When I run “screen.seqs”,problem occured. The sequences in names file mismatch to the sequences in groups file! :o

Next I try to use “merge.files” to combine the fasta/names/groups files from 1st Run an 2nd RUN by difference adaper.
I still follow 454 SOP. I use “merge.files” to combine the fasta/names/groups files from 12 parts and try to analysis Phylotype. Then the sequences in list mismatch to the sequences in groups file! :?

How can I do? Help me please!~ I’m really confuse.
Thanks a lot.

Thor

pschloss · April 29, 2014, 1:02pm

Hi Thor,

This typically happens because during the merge steps the correct files aren’t used or they aren’t all used. Have you considered using sff.multiple instead?

If you can post your complete commands we can try and diagnose what’s going on.

Pat

thorvs · April 30, 2014, 6:51am

Thank you kindly for your quick reply.
After Reducing sequencing error, I got some files. (1stRun.MID1~12.shhh.trim.fasta/1stRun.MID1~12.shhh.trim.names/1stRun.MID1~12.shhh.trim.groups/2ndRun.MID1~12.shhh.trim.fasta…etc)(MID means mid adapter, 2 runs have used the same adapter)

And then I’m processing improved sequences by different method: one was merged the files to be a merged file(ex: merged.fasta, merged.names, merged.groups); another one was merged the same MID adapter’s data to be processing improved sequence(ex:MID1~12.shhh.trim.fasta/MID1~12.shhh.trim.names/MID1~12.shhh.trim.groups)

In all merged analysis, there has a problem by doing screen.seqs or pre.cluster: [ERROR]: Your name file contains XXXX valid sequences, and your groupfile contains XXXX, please correct.
In the same MID adapter’s merged data analysis, processing improved sequences is OK. After processing improved sequences, I merged them and try to make Phylotype analysis or OTUs analysis. the names and groups will be still mismatch.

I’ve tried to Make.group form fasta file, but that didn’t work.

I don’t know what was happened, I just followed the SOP.

Please help me, thanks.

Thor

westcott · May 5, 2014, 4:42pm

Hi Thor,
I’d like to help. Could you post the exact commands you have run so far so I can try to see were the issue might be coming from?
Thanks,
Sarah

thorvs · May 8, 2014, 8:16pm

After trim.seqs, I have got files like below:

Run1.MID01-16SF.shhh.trim.fasta~Run1.MID12-16SF.shhh.trim.fasta
Run1.MID01-16SR.shhh.trim.fasta~Run1.MID12-16SR.shhh.trim.fasta
Run1.MID02-16SF.shhh.trim.fasta~Run2.MID12-16SF.shhh.trim.fasta
Run1.MID02-16SR.shhh.trim.fasta~Run2.MID12-16SR.shhh.trim.fasta

Run1.MID01-16SF.shhh.trim.names~Run1.MID12-16SF.shhh.trim.names
Run1.MID01-16SR.shhh.trim.names~Run1.MID12-16SR.shhh.trim.names
Run1.MID02-16SF.shhh.trim.names~Run2.MID12-16SF.shhh.trim.names
Run1.MID02-16SR.shhh.trim.names~Run2.MID12-16SR.shhh.trim.names

Run1.MID01-16SF.shhh.trim.groups~Run1.MID12-16SF.shhh.trim.groups
Run1.MID01-16SR.shhh.trim.groups~Run1.MID12-16SR.shhh.trim.groups
Run1.MID02-16SF.shhh.trim.groups~Run2.MID12-16SF.shhh.trim.groups
Run1.MID02-16SR.shhh.trim.groups~Run2.MID12-16SR.shhh.trim.groups

And then,I run “reverse.seqs” to made RunX.MIDXX.16SR be Forward. Next I run “merge.files” to combine.
merge.fatsa
merge.names
merge.groups

The next steps of SOP I used just a merged .fasta file, its .names file and the .group file.
It seems like this post:
http://mothur.ltcmp.net/t/error-in-pre-cluster-command/982/2

PS. this two runs can’t merge sff file, because 2 runs’ trim different.
PSII. I have tried this post: http://mothur.ltcmp.net/t/name-file-and-group-file-sequence-discrepancy/1322/3
After filter.seqs, run this below
list.seqs(fname=Antarctic1.trim.unique.good.filter.names)
get.seqs(accnos=current, group=mergegroupsgood)
And it didn’t worked.

Thanks for help.
Thor

thorvs · May 15, 2014, 3:06pm

R1.1.fasta means “Run1.Station1.fasta”,R2.1.fasta means “Run2.Station1.fasta”. Two RUNs use the same sample.

merge.files(input=R1.1.fasta-R2.1.fasta-R1.2.fasta-R2.2.fasta-R1.3.fasta-R2.3.fasta-R1.4.fasta-R2.4.fasta-R1.5.fasta-R2.5.fasta-R1.6.fasta-R2.6.fasta-R1.7.fasta-R2.7.fasta-R1.8.fasta-R2.8.fasta-R1.9.fasta-R2.9.fasta-R1.10.fasta-R2.10.fasta-R1.11.fasta-R2.11.fasta-R1.12.fasta-R2.12.fasta, output=merge.fasta)

merge.files(input=R1.1.names-R2.1.names-R1.2.names-R2.2.names-R1.3.names-R2.3.names-R1.4.names-R2.4.names-R1.5.names-R2.5.names-R1.6.names-R2.6.names-R1.7.names-R2.7.names-R1.8.names-R2.8.names-R1.9.names-R2.9.names-R1.10.names-R2.10.names-R1.11.names-R2.11.names-R1.12.names-R2.12.names, output=merge.names)

merge.files(input=R1.1.groups-R2.1.groups-R1.2.groups-R2.2.groups-R1.3.groups-R2.3.groups-R1.4.groups-R2.4.groups-R1.5.groups-R2.5.groups-R1.6.groups-R2.6.groups-R1.7.groups-R2.7.groups-R1.8.groups-R2.8.groups-R1.9.groups-R2.9.groups-R1.10.groups-R2.10.groups-R1.11.groups-R2.11.groups-R1.12.groups-R2.12.groups, output=merge.groups)

deunique.seqs(fasta=R1.1.fasta, name=R1.1.names)(do R1.1 to R2.12, total 24 *.deunique.fasta)

make.group(fasta=R1.1.deunique.fasta-R2.1.deunique.fasta-R1.2.deunique.fasta-R2.2.deunique.fasta-R1.3.deunique.fasta-R2.3.deunique.fasta-R1.4.deunique.fasta-R2.4.deunique.fasta-R1.5.deunique.fasta-R2.5.deunique.fasta-R1.6.deunique.fasta-R2.6.deunique.fasta-R1.7.deunique.fasta-R2.7.deunique.fasta-R1.8.deunique.fasta-R2.8.deunique.fasta-R1.9.deunique.fasta-R2.9.deunique.fasta-R1.10.deunique.fasta-R2.10.deunique.fasta-R1.11.deunique.fasta-R2.11.deunique.fasta-R1.12.deunique.fasta-R2.12.deunique.fasta, groups=MID01-MID01-MID02-MID02-MID03-MID03-MID04-MID04-MID05-MID05-MID06-MID06-MID07-MID07-MID08-MID08-MID09-MID09-MID10-MID10-MID11-MID11-MID12-MID12) (rename to merge2.groups)

unique.seqs(fasta=merge.fasta, name=merge.names)
align.seqs(fasta=merge.unique.fasta, reference=silva.bacteria.fasta, processors=2)
screen.seqs(fasta=merge.unique.align, name=merge.unique.names, group=merge.groups, start=28465, optimize=end, criteria=95, processors=2)
filter.seqs(fasta=merge.unique.good.align, vertical=T, trump=., processors=2)
unique.seqs(fasta=merge.unique.good.filter.fasta, name=merge.unique.good.names)
pre.cluster(fasta=merge.unique.good.filter.unique.fasta, name=merge.unique.good.filter.names, group=merge.good.groups, diffs=2)
[ERROR] Your name file contains XXXX valid sequences and your groupfile contains XXXXX, please correct

unique.seqs(fasta=merge.fasta, name=merge.names)
align.seqs(fasta=merge.unique.fasta, reference=silva.bacteria.fasta, processors=2)
screen.seqs(fasta=merge.unique.align, name=merge.unique.names, group=merge.groups, start=28465, optimize=end, criteria=95, processors=2)
filter.seqs(fasta=merge.unique.good.align, vertical=T, trump=., processors=2)
unique.seqs(fasta=merge.unique.good.filter.fasta, name=merge2.unique.good.names)
pre.cluster(fasta=merge.unique.good.filter.unique.fasta, name=merge.unique.good.filter.names, group=merge2.good.groups, diffs=2)
[ERROR] Your name file contains XXXX valid sequences and your groupfile contains XXXXX, please correct

pschloss · May 16, 2014, 8:23pm

I’m getting confused - initially you said it was failing at screen.seqs and now it’s failing at pre.cluster, which is it? Can you make one posting with all of your commands and output and where the error is occurring? Some of the things you’ve put up don’t make sense, eg…

Run1.MID01-16SF.shhh.trim.groups~Run1.MID12-16SF.shhh.trim.groups
Run1.MID01-16SR.shhh.trim.groups~Run1.MID12-16SR.shhh.trim.groups
Run1.MID02-16SF.shhh.trim.groups~Run2.MID12-16SF.shhh.trim.groups
Run1.MID02-16SR.shhh.trim.groups~Run2.MID12-16SR.shhh.trim.groups

I’m not sure how you get a file that ends in trim.groups.

Pat

Topic		Replies	Views
problems combining data from different runs Commands in mothur	5	4283	September 26, 2016
Optimal point for merging several 454 runs after sff? Commands in mothur	1	2572	March 28, 2014
Merging Files Theory behind mothur	9	8161	January 20, 2014
Analysis of multiple datasets with same MID identifiers Commands in mothur	3	1452	March 17, 2016
combining sff files from different runs Commands in mothur	2	2363	September 29, 2014

Problems combining data from different runs

Related topics