• Resumo

    Avaliação In-Domain e Cross-Domain em Restauração de Pontuação utilizando Processamento de Linguagem Natural

    Data de publicação: 27/05/2025

    ABSTRACT
    Punctuation plays a fundamental role in conveying the correct meaning
    in written texts. As a result, punctuation errors can occur, significantly
    impairing the way a message is interpreted, whether in
    formal or informal contexts. In this sense, the use of machine learning,
    combined with recent techniques in natural language processing, has
    been widely used in the task of punctuation restoration, in languages
    such as English. However, despite the wide application of this task
    in other languages, its use in Portuguese is still quite limited. In this
    work, we propose to adapt a punctuation restoration model for its
    application in formal texts in the Portuguese language, in addition
    to evaluating the model’s behavior in informal texts. The Portuguese
    Legal Sentences v3 dataset was used to train the model, which was
    also used for the in-domain evaluation. Regarding the cross-domain
    evaluation, the IWSLT (International Workshop on Spoken Language
    Translation) database was used, consisting of transcripts of lectures
    known as TED Talks. The results indicate that the model with the
    largest amount of training data and that mapped all question marks
    to full stops performed satisfactorily in the formal context, suggesting
    that the methodology adopted was adequate for the proposed
    task. Furthermore, it was found that the scarcity of question marks
    negatively impacts the model’s performance and that, in the informal
    context, the results were unsatisfactory in the evaluation metrics, suggesting
    that formal and informal sentences have their own structures,
    which the model was unable to generalize adequately in the informal
    context.

Anais do Computer on the Beach

O Computer on the Beach é um evento técnico-científico que visa reunir profissionais, pesquisadores e acadêmicos da área de Computação, a fim de discutir as tendências de pesquisa e mercado da computação em suas mais diversas áreas.

Access journal