Filtered alignment, 0

Hello,

So I posted about this previously, but I did not respond in time and the thread closed.

I am currently running the current version of mothur on 16s v4 sequence data. When a exectued filter.seqs I got the following:

Length of filtered alignment: 0
Number of columns removed: 13425
Length of the original alignment: 13425
Number of sequences used to construct filter: 3058818

Output File Names: 
/work/LAS/eswanner-lab/Micah/epa19_16s/epa19.filter
/work/LAS/eswanner-lab/Micah/epa19_16s/epa19.trim.contigs.good.unique.filter.fasta

In the previous discussion, Pat asked to see the summary.seqs command, so I have provided that here:

summary.seqs(fasta=epa19.trim.contigs.good.unique.align) 
[WARNING]: This command can take a namefile and you did not provide one. The current namefile is /work/LAS/eswanner-lab/Micah/epa19_16s/epa19.trim.contigs.good.names which seems to match /work/LAS/eswanner-lab/Micah/epa19_16s/epa19.trim.contigs.good.unique.align.

                Start   End     NBases  Ambigs  Polymer NumSeqs
Minimum:        0       0       0       0       1       1
2.5%-tile:      1968    11550   252     0       3       76471
25%-tile:       1968    11550   253     0       4       764705
Median:         1968    11550   253     0       4       1529410
75%-tile:       1968    11550   253     0       5       2294114
97.5%-tile:     1968    11550   254     0       6       2982348
Maximum:        13425   13425   275     0       114     3058818
Mean:   2013    11539   251     0       4
# of Seqs:      3058818

It took 159 secs to summarize 3058818 sequences.

Output File Names:
/work/LAS/eswanner-lab/Micah/epa19_16s/epa19.trim.contigs.good.unique.summary

I just reread this and realized I ran summary.seqs without the count table parameter, I will rerun this and post the output.

Okay, here are the summary.seqs and filter.seqs outputs that I got:

summary.seqs(fasta=epa19.trim.contigs.good.unique.align, count=epa19.trim.contigs.good.count_table)

                Start   End     NBases  Ambigs  Polymer NumSeqs
Minimum:        0       0       0       0       1       1
2.5%-tile:      1968    11550   253     0       4       305688
25%-tile:       1968    11550   253     0       4       3056876
Median:         1968    11550   253     0       4       6113751
75%-tile:       1968    11550   253     0       5       9170626
97.5%-tile:     1968    11550   254     0       6       11921813
Maximum:        13425   13425   275     0       114     12227500
Mean:   1991    11545   252     0       4
# of unique seqs:       3058818
total # of seqs:        12227500

It took 213 secs to summarize 12227500 sequences.
filter.seqs(fasta=epa19.trim.contigs.good.unique.align, vertical=T, trump=.)

Output File Names:
epa19.trim.contigs.good.unique.summary

Length of filtered alignment: 0
Number of columns removed: 13425
Length of the original alignment: 13425
Number of sequences used to construct filter: 3058818

Output File Names: 
epa19.filter
epa19.trim.contigs.good.unique.filter.fasta

Edit: I read through another post regarding this, and based off of the suggestions, I added ‘minlength=50’ to the screen.seqs command. I reran this and the subsequent commands, but I am still getting the same output for summary.seqs and for filter.seqs, as shown above.

screen.seqs(fasta=epa19.trim.contigs.fasta, group=epa19.contigs.groups, maxambig=0, minlength=50, maxlength=275)


It took 52 secs to screen 13881595 sequences, removed 1654095.

/******************************************/
Running command: remove.seqs(accnos=epa19.trim.contigs.bad.accnos.temp, group=epa19.contigs.groups)
Removed 1654095 sequences from your group file.

Output File Names: 
epa19.contigs.pick.groups

/******************************************/

Output File Names:
epa19.trim.contigs.good.fasta
epa19.trim.contigs.bad.accnos
epa19.contigs.good.groups


It took 87 secs to screen 13881595 sequences.

The problem is that some of your sequences did not align to the correct region and when you do trump=T in filter.seqs you lose all of the columns. This is why in the SOP, we suggest using both screen.seqs with start and end positions and filter.seqs with vertical=T and trump=..

You should do the following…

mothur > screen.seqs(fasta=epa19.trim.contigs.good.unique.align, count=epa19.trim.contigs.good.count_table, start=1968, end=11550, maxhomop=8)
mothur > filter.seqs(fasta=current, vertical=T, trump=.)

Pat

Hi Pat,

Okay, I had (embarrassingly) overlooked the second screen.seqs command.

Thank you.

Hi Pat,

So I tried running this with the screen.seqs and filter.seqs, but I still got the following:

Length of filtered alignment: 0
Number of columns removed: 13425
Length of the original alignment: 13425
Number of sequences used to construct filter: 3058818

Output File Names:
epa19.filter
epa19.trim.contigs.good.unique.filter.fasta

Thanks, Micah.

Can you post the ouput from running…

summary.seqs(fasta=epa19.trim.contigs.good.unique.align, count=epa19.trim.contigs.good.count_table)
summary.seqs(fasta=epa19.trim.contigs.good.unique.good.align, count=epa19.trim.contigs.good.good.count_table)

Make sure you ran screen.seqs as I had in in the earlier post

Hi Pat,

here are the outputs you requested after running re-running the screen.seqs and subsequent commands. Thanks, Micah.

summary.seqs(fasta=epa19.trim.contigs.good.unique.align, count=epa19.trim.contigs.good.count_table)


		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	0	0	0	0	1	1
2.5%-tile:	1968	11550	253	0	4	304760
25%-tile:	1968	11550	253	0	4	3047594
Median: 	1968	11550	253	0	4	6095187
75%-tile:	1968	11550	253	0	5	9142780
97.5%-tile:	1968	11550	254	0	6	11885614
Maximum:	13425	13425	260	0	103	12190373
Mean:	1972	11548	252	0	4
# of unique seqs:	3038729
total # of seqs:	12190373

It took 855 secs to summarize 12190373 sequences.

Output File Names:
epa19.trim.contigs.good.unique.summary
summary.seqs(fasta=epa19.trim.contigs.good.unique.good.align, count=epa19.trim.contigs.good.good.count_table)

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	11550	245	0	3	1
2.5%-tile:	1968	11550	253	0	4	303473
25%-tile:	1968	11550	253	0	4	3034726
Median: 	1968	11550	253	0	4	6069452
75%-tile:	1968	11550	253	0	5	9104178
97.5%-tile:	1968	11550	254	0	6	11835431
Maximum:	1968	13393	260	0	8	12138903
Mean:	1967	11550	253	0	4
# of unique seqs:	3014261
total # of seqs:	12138903

It took 788 secs to summarize 12138903 sequences.

Output File Names:
epa19.trim.contigs.good.unique.good.summary

The problem was that you used filter.seqs on the file before you ran screen.seqs. You want to run this instead…

filter.seqs(fasta=epa19.trim.contigs.good.unique.good.align, vertical=T, trump=.)

Hi Pat!

Is there a way that I could send my log file? I think I am having trouble keeping things straight when I have been posting my information. From what I can tell, I have done things in the correct order (but maybe I am missing something or got things switched around).

Regards,

Micah

Sure - or you could post it here

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.