• Resumo

    Conjunto de Dados para Envenenamento de Modelos Baseados em Self-Training em Fluxo de Dados

    Data de publicação: 27/05/2025

    Abstract
    In problems with a large volume of unlabeled data, semi-supervised
    learning techniques, such as self-training, are attractive because
    they make full use of the data and do not require extensive labeling
    of the data, since it is an expensive process. However, using pseudolabels
    to train a model indiscriminately can lead to undue changes in
    the model’s decision boundary, which can happen unintentionally
    or intentionally, such as in malware classification, where attackers
    want to classify malicious software as benign. In this paper, we
    propose a dataset for poisoning models based on self-training that
    simulates a data stream, intending to evaluate the robustness of
    these models against intentional or unintentional poisoning by
    unlabeled instances. Our experiments use models from the MOA-SS
    framework, and show that models that use incremental training
    and prediction confidence as a criterion for using the unlabeled
    instance in training are more susceptible to poisoning.

Anais do Computer on the Beach

O Computer on the Beach é um evento técnico-científico que visa reunir profissionais, pesquisadores e acadêmicos da área de Computação, a fim de discutir as tendências de pesquisa e mercado da computação em suas mais diversas áreas.

Access journal