• Resumo

    Lexicube: Geração e análise multidimensional de notícias com olap

    Data de publicação: 09/06/2026

    The exponential proliferation of digital news in Portuguese poses a critical challenge to extracting strategic knowledge from vast unstructured textual corpora. This paper introduces LexiCube, an innovative framework proposing a hybrid architecture that synergizes the structural robustness of multidimensional Data Warehouses with the deep semantic capabilities of Large Language Models (LLMs). Evolving from the Newsminer concept, LexiCube implements an ETL+ (Extract, Transform, Load, and Enrich) pipeline to process and normalize journalistic texts, culminating in a semantic augmentation stage via LLM for few-shot thematic classification, named entity recognition, and automatic summarization. The enriched data is persisted in a multidimensional corpus optimized for OLAP operations, organized in a star schema with dimensions of time, category, source, and term. Experimental validation on a Brazilian news corpus of 17,000 articles demonstrates the framework's high efficacy, achieving 98% accuracy in thematic classification based on the IPTC ontology and enabling complex analyses such as temporal trend extraction, knowledge graph construction, and entity correlation. LexiCube transcends traditional text mining systems by establishing a paradigm for transforming unstructured journalistic text into a multidimensional, explorable knowledge asset, representing a significant contribution to the fields of media intelligence and computational linguistics applied to the Portuguese language.

Anais do Computer on the Beach

O Computer on the Beach é um evento técnico-científico que visa reunir profissionais, pesquisadores e acadêmicos da área de Computação, a fim de discutir as tendências de pesquisa e mercado da computação em suas mais diversas áreas.

Access journal