Screen.seqs removed all the sequences

After making contigs, here is the summary of my data.

And the summary of contigs report:

mothur > screen.seqs(fasta=stability.trim.contigs.fasta, count=stability.contigs.count_table, maxambig=0, maxlength=275, maxhomop=8)

It took 28 secs to screen 1896836 sequences, removed 1896836.

/******************************************/
Running command: remove.seqs(accnos=/users/hiraabid/desktop/mothur/Paddy_Fish_NGS_RawData/stability.trim.contigs.bad.accnos.temp, count=/users/hiraabid/desktop/mothur/Paddy_Fish_NGS_RawData/stability.contigs.count_table)
Removed 1896836 sequences from /users/hiraabid/desktop/mothur/Paddy_Fish_NGS_RawData/stability.contigs.count_table.
[WARNING]: /users/hiraabid/desktop/mothur/Paddy_Fish_NGS_RawData/stability.contigs.count_table contains only sequences from the .accnos file.

I need help in understanding the screen.seqs command. I want to know how to set the start, end, maxlength, minlength, maxamig, maxhomop parameters according to this data.
There are questions similar to mine already on the forum and I read them but I am unable to understand this concept. I hope I can get help in this regard.

Thanks
Hira

Hi - It looks like you used the 2x300 chemistry and a region where your reads do not fully overlap. If I were you, I’d use…

screen.seqs(fasta=stability.trim.contigs.fasta, count=stability.contigs.count_table, maxambig=0, maxhomop=8, maxlength=500)

The maxlength=500 really depends on the region and what the length range looks like for good sequences, typically obtained from a set of database sequences. The start and end positions only make sense once your sequences have been aligned.

You can get reference sequences for your region here:

You need to read this given that you don’t have fully overlapping sequences:

1 Like

Thank you for replying. I did the customization of reference alignment according to this “Customize your reference alignment for your favorite region” as my sequences were of 16srRNA V3-V4 region. I am able to understand all this clearly now :slight_smile:
I am curious about one thing
After making contigs, in scree.seqs, I used maxlength=485. I want to know that choosing length in this step produces what kind of variation in our final results i.e. the OTU table or taxonomy file. Does it also affect the number of unclassified sequences we get as a result of classify.seqs command? Or You can tell me what kinds of results of our analysis can be affected by it?

Regards
Hira

Picking maxlength prevents you from getting (in your case) contigs that are 600nt long because there is only trivial overlap between the reads. The value you pick should be selected keeping in mind how long the region is for high quality sequences selected from a database.

Pat

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.