ABSTRACT
This paper addresses the challenge of typeface recognition, within
the broader scope of optical character recognition of historical
German-Brazilian periodicals. A dataset of words containing annotations
of font types and transcriptions for training neural networks
for typeface and text recognition is presented. By enabling wordlevel
typeface and text recognition, the authors plan to later develop
techniques for high-precision OCR of historical prints typeset in
heterogeneous font styles. The value of this dataset is proven by the
excellent results obtained by artificial neural networks trained on it.
The authors also recognize that even better results can be obtained
by exploring new ways of organizing the dataset prior to training,
and that the results can also be improved through modifications in
the architecture of the nets used.
O Computer on the Beach é um evento técnico-científico que visa reunir profissionais, pesquisadores e acadêmicos da área de Computação, a fim de discutir as tendências de pesquisa e mercado da computação em suas mais diversas áreas.