ABSTRACT
In recent years, due to the significant volume of produced text
documents, challenges have arisen in the search and analysis of
content, necessitating the development of techniques for the extraction
of useful information. In the field of Law, where the majority of
information is in legal texts, information extraction has become crucial
for discovering knowledge in unstructured data. Named entity
recognition, driven by the advancement of deep learning models,
stands out as the main technique for this task. This project aimed
to explore the possibility of expanding the number of explanatory
variables beyond those available on the institutional website of
Tribunal de Contas da União, using natural language processing
techniques in the Special Accounting processes. The development
of the proposal included web scraping for data collection, preprocessing
of pieces, entity annotation, fine-tuning pre-trained models
in the legal domain and named entity recognition task, in addition
to extracting entities. From the selected texts, 388,201 records
(tokens and/or phrases) were extracted, with 286,781 and 101,420
records from the Instruction and Judgment pieces, respectively, confirming
the research hypothesis and demonstrating the feasibility
of expanding variables using natural language processing.
O Computer on the Beach é um evento técnico-científico que visa reunir profissionais, pesquisadores e acadêmicos da área de Computação, a fim de discutir as tendências de pesquisa e mercado da computação em suas mais diversas áreas.