Preservation was already possible with the functionality provided by Pilot I. There, resources could be selected and manually preserved using the PoF Middleware. Pilot II extends this with the ability of the PoF Middleware to enable Synergetic Preservation by relying upon the user’s Preservation Strategy and Preservation Value (PV) explained in the previous sections.
This section describes the updated preservation workflow steps of Pilot II along the ForgetIT PoF workflow “Preservation Preparation Workflow” as defined in deliverable D8.4 with a focus of the Pilot’s contribution to the steps. The full technical details will be reported in the final deliverable of the PoF Framework D8.6.
The workflow is depicted in the Figure below with its steps and functional entities involved in these steps (which are explained in the PoF Reference Model in the Functional Model; see D8.5). The additional numbers in the Figure are aligned with the following subsection numbers to explain the steps with involved functional entities.
10.2.0 ForgetIT PoF Middleware (at DFKI)
Before explaining the workflow, let us have a look at the PoF Middleware. For this, we use the PoF Middleware
installation at DFKI premises which was deployed in order to actually preserve real resources from the DFKI
PIMO. As the DFKI PIMO contains confidential material, we decided to have a local installation at
DFKI of the middleware as well as content analysis services. Therefore, this
installation also shows that the PoF Middleware, the components and services inside as well as the Digital
Preservation System can be deployed also at company premises.
DFKI wants to thank EURIX for their willingness and effort for helping us to deploy the middleware as well as
our partners USFD (University of Sheffield) and CERTH (Centre for Research and Technology Hellas) for providing
us with their services which we could install locally at the DFKI.
10.2.1 Content Value Assessment
Before the workflow starts, the functional entity Content Value Assessment (CVA) is responsible for the assessment of resources in the PoF Framework. As already pointed out in Section 2.4.1 this step provides the Preservation Value of a resource.
Considering the role of the functional entity CVA in the PoF Framework, the Semantic Desktop as Active System is an example of the situation where the Active System is capable of providing the PV for the preservation decision (as well as the MB for Managed Forgetting in the Active System) and thus, the functional entity CVA is part of the Active System.
This design decision was made because of the beneficial usage of both MB and PV in the Semantic Desktop infrastructure. The rich semantic model of the PIMO and the usage statistics of the Semantic Desktop allow for a comprehensive view on the resources wrt. MB and PV. Furthermore, the nature of the PIM application scenario implies a lot of access, usage, and changes to resources and the PIMO resulting in a lot of traffic as well as content assessment in the PIMO as a knowledge base. Therefore, both values are computed in the Semantic Desktop and stored directly in the PIMO to be easily accessed by its components and thus, making them an integral part of the PIMO.
Therefore, to enable the PoF Middleware to make decisions based on the PV in the Select step, the values are reported and updated in certain time intervals to the PoF Middleware by the SD/PoF Adapter (see ForgetIT architecture diagram).
The update contains the resource’ URI, its Preservation Value, and last modification date of the resource. Adding the last modification date allows the PoF Middleware to decide if the resource might need to be sent to the archive again if the resource changed since the last preservation.
10.2.2 Select
The Select step uses the functional entity Managed Forgetting & Appraisal to make conscious decisions about preservation of resources of the Active System. To accomplish this, the results of the Content Value Assessment are used for deciding about preservation actions.
The Forgettor component (see ForgetIT architecture diagram) selects the set of resources to be preserved based on the selected Preservation Value Categories set in the user’s Preservation Strategy. This information is part of the Preservation Broker Contract introduced in 10.1.9, set in the Preservation Service Contract, communicated to the PoF Middleware and managed there for each user.
10.2.3 Provide
The step Provide uses the functional entity De-Contextualization to extract a resource
from its Active System context in preparation of packaging it for archiving.
Since Pilot I, the PoF Middleware retrieves resources via the Collector using the CMIS
interface embedded in the SD/PoF Adapter. For the PIMO, this means that a thing and
its grounding occurrence (i.e., the semantic representation and the actual physical file) is
separated: the CMIS interface hands over the resource to be preserved as a cmis:Item
and the PIMO’s model information about its thing will be part of the context information
handed over in the forgetit:context
attribute of the cmis:Item
(for technical details
of the interface please refer to D8.4, Section CMIS Integration). This attribute is then
available for the modules in the PoF Middleware, especially the Contextualizer in the next
step.
Technically, the context information export is an excerpt from the PIMO semantic graph
describing the resource in the PIMO and its connection to other things such as topics for
a document or persons attending an event. The format used for the exported excerpt is
RDF/S using the PIMO Ontology RDF Schema and Turtle as exchange format.
The Terse RDF Triple Language is a compact textual syntax
for representing RDF.
For Pilot II, this interface was enhanced by handling collections of resources (see D8.4)
and using the additional context delivered by the SD/PoF Adapter.
Furthermore, now every concept in the PIMO can be preserved separately, i.e., the handling
was extended to all PIMO classes not only those representing (file) resources such as
pimo:Media
and pimo:LifeSituation
as in Pilot I (see below for an example of the
file).
Now, it is also possible to preserve a, e.g., pimo:Project
such as ForgetIT
in the DFKI PIMO, although it might not have a physical file attached.
10.2.4 Enrich
In the Enrich step the functional entity Contextualization shall provide additional information for the content to be preserved in order to allow archived items to be fully and correctly interpreted at some future date (see D8.3). All resources in the submitted collection are handed over to the Contextualizer which runs three different components:
First, the world knowledge contextualization, as described in deliverable D6.3, processes each textual resource in the submitted collection. This component creates a World Context by applying an entity recognition to the text of a resource (e.g., a document or e-mail) using DBPedia as source to disambiguate entities. Each entity found in the text is added as semantic annotation (i.e., as URI) to the World Context. This World Context is then stored as additional context information to the metadata of the respective resource. See image example for world context below.
Second, the visual concept detection, as described in deliverable D4.3, adds visual concepts detected in images as additional context. See image example for visual concepts for contextualization below.
Third, the personal knowledge contextualization takes the context information provided
in the previous step by the Semantic Desktop in the forgetit:context
attribute
as separate context. In terms of the PoF Reference Model, this context information generated
from personal knowledge represented in the PIMO is stored in the so-called Local
Context (see also D8.2). See image example for local context from the PIMO
below.
The context information delivered by the Semantic Desktop satisfies the context dimensions identified in deliverable D6.1:
- Time The excerpt contains the thing's creation time, last modification, and in addition any time information associated with certain things such as events (either a point in time or a time period).
- Location If locations are associated with a resource, e.g., usually for events or also for the
sub-class
pimo:LifeSituation
. - Topic Any kind of topics identified for the resource. For the PIMO, this includes manually annotated
topics (e.g., with the property
pimo:hasTopic
), suggested ones withpimo:hasSuggestedTopic
(e.g., suggested by the entity recognition using GATE in Firetag or SemanticFileExplorer, or using the ForgetIT image services), or inferred concepts withpimo:hasInferredTopic
(e.g., because the photo collection as a whole has this topic, or the super task was annotated with it). - Entity Space This is comprised by the remaining relations a resource has in the PIMO not already
covered above. These are various properties such as
rdfs:partOf
orpimo:isFundedBy
. - Document Space This consists of other documents or sub-classes such as web pages or e-mails related to the resource.
10.2.5 Package
In this step the functional entity Archiver creates from the resource(s) collected from the Semantic Desktop the content and metadata to create a Submission Information Package (SIP). This is then handed over to the Transfer step.
10.2.6 Transfer
The Transfer step then submits the SIP to the Digital Preservation System (DPS) (see ForgetIT architecture diagram) which stores it as an Archival Information Package (AIP). In the case of Pilot II on the ForgetIT testbed, the DPS is composed of DSpace and OpenStack Swift. For the deployment used at DFKI, currently, this is DSpace only as shown in the screenshots abiove.
10.2.7 Preservation finished
Once the preservation is finished, the PoF Middleware notifies the Active System of the
outcome.
Notifying the user of the outcome is twofold: first, the user gets a notification once a
collection has been preserved in the PIMO5 home screen as shown in the Figure below.
Second, several places in the Semantic Desktop show if a thing is preserved such as in the thing view the Figures below, and the decoration of images (as well as in the PIMOCloud with the green PIMOCloud icon, see D9.3).
10.2.8 Inspecting the results in the Archive
The following screenshots show examples from the DFKI PIMO which were preserved using the ForgetIT PoF Middleware as described above and stored in the DSpace installation at DFKI.
Now stuff is in the archive which came to the PIMO user Heiko over all the years and got a high preservation value. Another example is the early draft of the ForgetIT proposal (called Preserve-or-Forget at that time) where Claudia and Wolf asked us if we are interested to join this activity with our Semantic Desktop approach.