I am happy to help. The filtering step is likely what’s causing the issue with the lengths not being the same. Here’s what I recommend:
mothur > merge.fasta(input=dataset1.fasta-dataset2.fasta-…-datasetn.fasta, output=merged1.fasta) - merge a subset of your dataset that will process in a reasonable amount of time
mothur > merge.count(count=dataset1.count_table-dataset2.count_table-…-datasetn.count_table, output=merged1.count_table) - merge a subset of your dataset that will process in a reasonable amount of time
mothur > unique.seqs(fasta=merge1.fasta, count=merge1.count_table) - merge identical reads and update count file
mothur > align.seqs(fasta=merge1.fasta, reference=yourReferenceFile) - align reads
mothur > screen.seqs(fasta=current, count=current, … other parameters… )
mothur > filter.seqs(fasta=current, vertical=t, trump=.) - filter reads
mothur > unique.seqs(fasta=current, count=current) - merge identical reads created after filtering
mothur > pre.cluster(fasta=current, count=current, diffs=2) - combine reads with diffs<=2
mothur > chimera.vsearch(fasta=current, count=current, dereplicate=t) - remove chimeras
mothur > classify.seqs(fasta=current, count=current, …other parameters …)
mothur > remove.lineage(fasta=current, count=current, taxonomy=current, …other parameters…) - remove contaminants
mothur > cluster.split(fasta=current, count=current, taxonomy=current, runsensspec=t) - create list file and column distance file
Now to fit another set of datasets to merge1:
mothur > merge.fasta(input=datasetn+1.fasta-datasetn+2.fasta-…-datasetn+m.fasta, output=merged2.fasta) - merge a subset of your dataset that will process in a reasonable amount of time
mothur > merge.count(count=datasetn+1.count_table-datasetn+2.count_table-…-datasetn+m.count_table, output=merged2.count_table) - merge a subset of your dataset that will process in a reasonable amount of time
mothur > unique.seqs(fasta=merge2.fasta, count=merge2.count_table) - merge identical reads and update count file
mothur > align.seqs(fasta=merge2.fasta, reference=yourReferenceFile) - align reads
mothur > screen.seqs(fasta=current, count=current, … other parameters… )
mothur > filter.seqs(fasta=current, hard=merg1.filter) - filter reads using filter from first dataset to ensure same length and column used in alignment
mothur > unique.seqs(fasta=current, count=current) - merge identical reads created after filtering
mothur > pre.cluster(fasta=current, count=current, diffs=2) - combine reads with diffs<=2
mothur > chimera.vsearch(fasta=current, count=current, dereplicate=t) - remove chimeras
mothur > classify.seqs(fasta=current, count=current, …other parameters …)
mothur > remove.lineage(fasta=current, count=current, taxonomy=current, …other parameters…) - remove contaminants
mothur > dist.seqs(fasta=current, cutoff=0.03) - create distance matrix for merge2.fasta
mothur > cluster.fit(fasta=current, count=current, column=current, reflist=listFileFromMerge1, refcount=countFileFromMerge1, refcolumn=columnMatrixfromMerge1) - fits sequences from merge2 into otus in merge1, any reads unable to be fitted will be clustered into new OTUs.
mothur > merge.file(input=columnMatrixfromMerge1-columnMatrixfromMerge2, output=merge12.column) - use merge12.column as refcolumn in next cluster.fit
mothur > merge.count(input=countFileFromMerge1-countfileFromMerge2) - combine count files to create new refcount for use in next cluster.fit
mothur > rename.file(list=current, new=merge12.list) - rename new reflist file for use in next cluster.fit
Repeat for all remaining sets of datasets, always filtering using the merge1.filter.