The task of identifying intron and exon regions in genes is a very
complex task, and it is necessary to identify certain nucleotide
patterns in the gene sequence. This task can be done manually or
through software that most often uses genetic alignment techniques,
which is not a very effective way for this purpose. In this oppor-
tunity for collaboration between biology and computer science
using machine learning techniques, the objective was to predict
the intron and exon regions in filamentous fungi genes as well to
translate the identified regions intro proteic codons. In this paper,
the problem was modeled as a supervised learning problem, based
on training a set of genes obtained from GenBank that already
have the intron and exon regions identified. The machine learning
model used in this work was the Condicional Random Fields (CRF).
Through the values resulting from the metrics applied to the model,
it can be seen that it is possible to achieve a good precision in the
task of identifying the intron and exon regions as well the proteic
codons. Thus, although there is a need for a greater diversity of
database characteristics to support the effectiveness of identifying
the splicing sites, this paper gives evidence that it is possible to
predict these splicing sites with a good accuracy.
O Computer on the Beach é um evento técnico-científico que visa reunir profissionais, pesquisadores e acadêmicos da área de Computação, a fim de discutir as tendências de pesquisa e mercado da computação em suas mais diversas áreas.