Abstract
This study investigates the application of supervised learning algorithms
to predict the incidence of diabetes, leveraging easily
accessible data such as demographic information and health indicators
to identify the most effective approaches for early disease
detection. The primary goal is to compare the performance of different
classification models in this task.
The experimental results considered three classification methods,
namely: K-Nearest Neighbors (KNN), Logistic Regression, and
Support Vector Machine (SVM). Due to the imbalanced distribution
of the classes (presence or absence of diabetes), besides obtaining
decent accuracy values, the recall of all methods was highly impacted.
The continuation of this work will include: i) adding more
classification methods to the experiment, such as neural networks
and ensemble-based methods; ii) compare the obtained results with
the literature; and iii) consider the impact of data pre-processing
steps to mitigate the class imbalance.
O Computer on the Beach é um evento técnico-científico que visa reunir profissionais, pesquisadores e acadêmicos da área de Computação, a fim de discutir as tendências de pesquisa e mercado da computação em suas mais diversas áreas.