• Resumo

    Aprendizagem de Máquina na identificação de regiões codantes em sequências de DNA de fungos filamentosos

    Data de publicação: 13/07/2022

    The task of identifying intron and exon regions in genes is a very
    complex task, and it is necessary to identify certain nucleotide
    patterns in the gene sequence. This task can be done manually or
    through software that most often uses genetic alignment techniques,

    which is not a very effective way for this purpose. In this oppor-
    tunity for collaboration between biology and computer science

    using machine learning techniques, the objective was to predict
    the intron and exon regions in filamentous fungi genes as well to
    translate the identified regions intro proteic codons. In this paper,
    the problem was modeled as a supervised learning problem, based
    on training a set of genes obtained from GenBank that already
    have the intron and exon regions identified. The machine learning
    model used in this work was the Condicional Random Fields (CRF).
    Through the values resulting from the metrics applied to the model,
    it can be seen that it is possible to achieve a good precision in the
    task of identifying the intron and exon regions as well the proteic
    codons. Thus, although there is a need for a greater diversity of
    database characteristics to support the effectiveness of identifying
    the splicing sites, this paper gives evidence that it is possible to
    predict these splicing sites with a good accuracy.

Anais do Computer on the Beach

O Computer on the Beach é um evento técnico-científico que visa reunir profissionais, pesquisadores e acadêmicos da área de Computação, a fim de discutir as tendências de pesquisa e mercado da computação em suas mais diversas áreas.

Access journal