Abstract
Macros are functions written in Visual Basic for automation within
MS Office documents. On the one hand, macros bring many facilities
to home users and organizations. On the other hand, they have
also stimulated the arise of macro viruses (malware that exploit
macros to infect users loading compromised documents, spreadsheets,
and presentations). Those viruses may delete data and steal
information, and have been causing losses of billions of dollars in
global attacks. Although more prevalent in the 1990s and 2000s,
macro viruses resurged in the last decade and continue to threat
current MS Windows/Office users. In this article, we present a natural
language processing-based pipeline to detect macros in MS
Office documents and classify them in malicious or benign. Using
byte2vec as document representation, we outperform the state-ofthe-
art in Macro detection, reaching over 99% of Precision-Recall
Area Under Curve (PRAUC) metric for four out of seven evaluated
classifiers (and over 98% PRAUC in the remaining three classifiers).
O Computer on the Beach é um evento técnico-científico que visa reunir profissionais, pesquisadores e acadêmicos da área de Computação, a fim de discutir as tendências de pesquisa e mercado da computação em suas mais diversas áreas.