Past and existing work for digital recapturing and preservation of European cultural and scientific heritage has consumed significant effort and resources for the digitization, characterization, and classification of content. Huge archives of such digital content have been produced while current practices have now established the creation of new content directly in digital formats. Digital libraries have thus emerged providing electronic access for many communities of users to available information of their discipline. One can easily find a digital library of scientific publications for particular scientists, a digital library of cultural artefacts for the corresponding researchers, or a collection of digitized books for the readers of the particular literature. What has never been targeted, however, is a digital library that draws content from one domain and makes it available to a community of users who belong to a totally different discipline. For example, a researcher of the history of technology would very much benefit from being able to access a digital library of patents, if he could comprehend the structure and jargon of patent documents. In the same way a market analyst would want to have online access to product design manuals if she could understand the terminology of product specifications. Vast amounts of digital content are available and could be incredibly useful to a multiplicity of user communities if it could be presented in a comprehensive to them way.
Papyrus intends to deal with a specific pair of disciplines which can be illustrated as an apparent need and may prove to be an immediate exploitation opportunity even on its own. This proposed use case is the recovery of history from news digital content. The rational behind this selection is that vast amounts of digital news content exist in huge archives, which, although being of incredible value, are underused as they are not easily searchable and do not have a significant worth if seen as individual news items. News organizations have been recording history as it developed at each point in time and have been doing this since their foundation, which for many news publishers or agencies dates back to the previous centuries. Furthermore, the content found in these news archives addresses the widest sense of cultural and scientific heritage, covering disciplines like the history of politics, sciences, and entertainment, allowing for the potential of cross examination of events in all these domains. The content contained in news archives could very well lead to a dynamic composition of a complete historical library extending to the latest minute in time. However, there are many challenges involved in this use case. Even though all historical events have been recorded as news, the format, style, and philosophy of a ‘news item’ differ significantly from their counterparts of a `historical reference’ to the same event. Both disciplines have the ‘event’ as a central point of reference, but while news mostly examines faces of an event like: what happened; when; how and who were involved, history is mostly about why it happened; what were the consequences; and how it may have been avoided. Moreover, historical events are usually recorded in the news with multiple, different, and often conflicting interpretations, depending on the views of the reporting journalist, the overseeing media group, and possibly the local or national political interests of the time. Interpretation of news in the ‘language’ of history requires suitable modelling of both domains, semantic analysis of the news content and ‘historical’ user queries, proper modelling of the semantic correspondences between them and presentation of the results in the context of the ‘history’ domain.
In particular, Papyrus intends to target the following Scientific & Technological objectives in order to reach the goal of the cross-discipline digital library engine outlined in the previous section:
- Advance the State of the Art in semantic multimedia analysis, by introducing knowledge assisted methods which will take advantage of existing metadata and content structure models for the understanding of the source content
- Propose context sensitive query processing methods, for the understanding of the user demands
- Implement tools for automating the process of knowledge mapping, for corresponding concepts between the source content and the user queries
- Develop presentation techniques, for delivering the results in a manner comprehensive to the targeted users

|