Papyrus introduces the concept of a Cross-Discipline Digital Library Engine. The project intends to be a dynamic digital library which will understand user queries in the context of a specific discipline, look for content in a domain alien to that discipline and return the results presented in a way useful and comprehensive to the user. To be able to achieve this, the source content has to be ‘understood’, which means analysed and modelled according to a domain ontology. The user query also has to be ‘understood’ and analysed following a model of this different discipline. Correspondences will then have to be found between the model of the source content and the realm of the user knowledge. Finally, the results have to be presented to the users in a useful and comprehensive manner according to their own ‘model of understanding’.
In particular, knowledge technologies will have to be utilised for bridging the semantic gap between cultural heritage collections and their historical attributes, as expressed in news archives. Ontologies have been advocated as a means of semantic interoperability support between distributed applications and services by providing formal conceptualizations for specific domains. Together with other tools that have been developed in the context of the Semantic Web such as the RDF and OWL semantic mark-up languages and other knowledge representation and inference rule techniques, ontologies are expected to play a major role in deriving the appropriate level of new knowledge from what already exists in news archives.
Significant work will also be needed in the area of information extraction, both from the textual transcriptions of news but also from accompanying multimedia material (image and video) that is usually attached to many news items. One particularity of news contents residing in the archives of news agencies and public broadcasters is that some annotations have already been attached manually. This existing metadata could be exploited and used as a guide by special ‘targeted’ multimedia analysis methods to achieve realistic classification results.