HisClima : Two Centuries of Climate Data

The objective of the project is to apply information extraction technologies parting from probabilistic indexes of a large collection of handwritten logs with climatological and geographical information.

The probabilistic indexes can be understood as a probabilistic representation of the textual contents of the images. This probabilistic representation will be used to classify the logs as per their contents and layout. It is important to note that we do not intend to classify the documents only considering the layout information, instead we will consider both sources of information in a holistic manner: layout and semantic content. Once this classification is performed we can approach with greater efficacy the extraction of semantic information.

We intend to colaborate with the Metereological Office (MET Office) in the United Kingdom by making use of the data processed in the OldWeather project, in which a million climatological records were annotated.