• Resumo

    Análise de Escalabilidade para Armazenamento e Processamento de Arquivos de Áudio Utilizando Transformers

    Data de publicação: 27/05/2025

    ABSTRACT
    In a continental-sized country like Brazil, collecting feedback on
    governmental services such as education, healthcare, and security
    is challenging and impractical to perform manually, except
    through sampling techniques. With advancements in machine learning,
    particularly models based on transformers, it is now possible
    to automate this process on a large scale, enabling, for instance,
    the dissemination of health campaign information or the collection
    of citizen opinions on recently used services. This paper focuses
    on speech-to-text transcription, a crucial step for enabling largescale
    voice-based responses.We explored scalability challenges and
    evaluated combinations of transcription models and audio formats
    (WAV, FLAC, and MP3), aiming to balance the computational cost
    and transcription quality. Our results showed that MP3 files sampled
    at 14 kHz provide transcription quality comparable to WAV
    files sampled at 16 kHz while requiring only 11% of the storage
    size. Furthermore, we demonstrated that smaller models, such as
    Wav2Vec2-XLSR-53 with 3.17 × 108 parameters, can achieve results
    similar to larger models, such as Seamless M4T, which has
    approximately an order of magnitude more parameters.

Anais do Computer on the Beach

O Computer on the Beach é um evento técnico-científico que visa reunir profissionais, pesquisadores e acadêmicos da área de Computação, a fim de discutir as tendências de pesquisa e mercado da computação em suas mais diversas áreas.

Access journal