Zeig Dich: Dataset para Reconhecimento de Tipos de Fonte de Jornais Históricos Teuto-Brasileiros
This paper addresses the challenge of typeface recognition, within
the broader scope of optical character recognition of historical
German-Brazilian periodicals. A dataset of words containing annotations
of font types and transcriptions for training neural networks
for typeface and text recognition is presented. By enabling wordlevel
typeface and text recognition, the authors plan to later develop
techniques for high-precision OCR of historical prints typeset in
heterogeneous font styles. The value of this dataset is proven by the
excellent results obtained by artificial neural networks trained on it.
The authors also recognize that even better results can be obtained
by exploring new ways of organizing the dataset prior to training,
and that the results can also be improved through modifications in
the architecture of the nets used.