In the following, we want to point out the contribution of the Semantic Desktop approach to the goals of ForgetIT. This section is an excerpt from the ForgetIT deliverable D9.1 (Section 2). Please refer there for full coverage and references).
1.2.1 Current State of Personal Preservation and Main Obstacles
When considering preservation for personal, non-professional, or home usage, we have to contend with a vast increase in ways to create digital artefacts (computers, smartphones, tablets, digital cameras, ...) as well as an ever increasing amount of storage for this digital material. These days, users’ personal information space consists of a substantial number of information objects connected to the person’s life such as wedding videos, travel pictures, or graduation keepsakes. It requires serious dedication and cognitive effort to organize all this data and keep it accessible as time passes.
Moreover, these digital artefacts often represent past moments but are not associated with a physical memento. Therefore, they form a valuable resource for the user and future generations. If the material is lost or corrupted due to improper conservation, it will be useless. Most users still use backups as their main form of preservation.
In addition, many people keep everything. Five main reasons are pointed out (see [Marshall, 2011]):
- It is difficult to assess value in advance,
- Keeping everything aligns well with current practice,
- Deletion is itself a cognitively demanding exercise,
- People are rarely methodical about culling their files, so why even try, and
- A full chronological and contextual record is essential for using one’s archives as a memory prosthesis.
This recommendation leaves all steps to the user, i.e., what to save, how to organize, where to store (hard disk and online storage), and when to migrate. All this is a lot of effort for users, various decisions need to be made and it requires discipline in, e.g., maintaining and updating the archive. Creating a structure for preservation in particular is one of the major problems. Every person who adds material has to follow this structure every time further material is added, and people who want to search for files need to be aware of this structure. After a long period of time, someone else such as descendants need to be able to interpret the structure.
This cognitive up-front effort is one of the reasons why the cloud storage offered by Drop- Box, Microsoft SkyDrive, or Google Drive is not a preservation system in itself, but only a tool in a larger preservation strategy. Started as syncing, file sharing, and backup solutions, those services offer organisation methods such as file folders or (keyword-) tags, but do not comply with the OAIS standard. Other services, such as Amazon Cloud, comply with OAIS, but do not support users before ingesting data into the store. Either way, users are left to their own devices for large parts of the preservation process.
Considering the current state of personal preservation, the main obstacles we see so far are:
- Users are not aware of personal preservation of digital content. There is a huge gap between current practices, such as backup by copying material to a different hard disk, and a proper preservation strategy.
- When starting with personal preservation, the user faces high up-front costs in terms of time, effort, and resources, and there are very few tools to help users prepare material for preservation and interact with an archiving service.
- There is no personal preservation service for the majority of end users which supports the whole preservation process. Cloud storage alone is not preservation.
- The vast increase in digital content with relevance to a person’s life poses challenges to personal information management as well as preservation.
- Designing and organising an archive so that its structure can be understood in a century from now is cognitively challenging for users.
1.2.2 How does ForgetIT address this?
Within the ForgetIT framework, we will address these challenges as follows.
- Synergetic Preservation: Most personal preservation should be automatic. After deciding on a preservation policy and the amount and cost of storage, the ForgetIT system then determines what to preserve and how to represent it in a way that will be accessible to future generations. User intervention will be minimal, and therefore the burden on the user will be low.
- Managed Forgetting: As material accumulates over time, not everything can be preserved. As material becomes less relevant, it is gradually forgotten, using a process inspired by features of human forgetting.
- Contextualised Remembering: Context helps to find and access archived material; it is also crucial for interpreting the data that was archived.
1.2.2 Motivation for Personal Information Management using the Semantic Desktop Approach in ForgetIT
While some users are concerned about preservation, it is not part of most users’ regular practice. Preservation requires manual effort and the users need to think about it to actually do it, it poses a cognitive burden on the users.
Therefore, the approach envisioned in ForgetIT for Personal Preservation is to embed it in the user’s activities in the personal information space in order to collect material to be preserved, evidence for preservation values, and triggers for preservation while keeping user involvement minimal.
But how can this be achieved? By concentrating on the Personal Information Management (PIM) of users, we can cover various life events together with associated digital material, usage of the digital material, and evidence for preservation values. For example, we can detect whether a file is only relevant for a certain time frame (such as time tables) or has emotional relevance (such as a picture showing the user’s daughter). Furthermore, it is a chance to derive the user’s mental model on the contents of the material, and thus, get a means to describe the preserved material from a user’s point of view with less effort.
By providing an ecosystem for PIM we can show that collecting material and deriving evidence for preservation is possible. Motivated by the research done in the Semantic Desktop field, by using the Semantic Desktop paradigm in ForgetIT we can
- use the Personal Information Model (PIMO) to represent a user’s mental model over time. The PIMO is a result of the EU IP Nepomuk (GA 027705) and provides a basic ontology of concepts that a person uses for their desktop and PIM. The PIMO is modelled as a semantic graph of interconnected concepts and information objects. Extensions adapt the ontology to specific domains or tasks.
- provide an ecosystem of applications and plug-ins which access the PIMO for vocabulary and knowledge representation. The ecosystem outlined in [Maus et al., 2012] will be the Semantic Desktop implementation in ForgetIT.
- provide means to continuously update a user’s PIMO and adapt to new situations. We have shown that the PIMO can be used over time. The oldest PIMO still in use at the DFKI has been evolving steadily for more than 9 years.
- provide context for information objects such as files, webpages, or emails by using the PIMO. The PIMO provides the knowledge representation layer both for users and for semantic services
- provide a means to understand – together with observing user actions and access, creation and deletion of information objects – the context of the user and provide services such as context-aware task management
1.2.4 How will the Semantic Desktop approach contribute to ForgetIT?
By using a Semantic Desktop approach in ForgetIT, we can support Preservation, Forgetting, and Remembering as follows:
Preservation: The Semantic Desktop ecosystem (applications, plug-ins, mobile apps) allows us to connect the PIMO to the user’s information objects through annotating photos and web pages, organizing documents and emails, and managing tasks and reminders. Information objects are connected by reusing concepts such as contacts, which are part of the PIMO, for annotating pictures and writing emails. The resulting personal information space tightly links resources and concepts. Evidence for preservation values and context for preserving an information object can be derived from this information and formalised using the PIMO knowledge representation. Importantly, the continuously evolving PIMO not only covers information objects in current use but also objects which have already been stored in the archive for later use and are therefore no longer directly accessible for the user.
Forgetting: The data about information objects in the Semantic Desktop ecosystem that is held together by the PIMO provides evidence for preservation value, topical and long-term relevance. Observations in the PIMO are similar to files on the computer. For example, while topics of previous projects might still be relevant to the user, most of the associated resources, such as meetings, notes, and presentations, might no longer be of interest. The PIMO and its ecosystem therefore provide crucial input for the managed forgetting system.
Remembering: Just like the human brain, the PIMO is still capable to retrieve things which seem to be forgotten. Similar to humans, who can remember things or situations by starting with a cue and then follow associations, PIMO can provide paths through the semantic graph that start from a particular node. For example starting from a project (ForgetIT), we can follow a path to an associated event (the Kick-off Meeting in Hanover) to a photo (the group in front of the town hall) to a person (the professor from UEDIN). At each node along the path, the links from the node to other concepts provide the context required to remember. Thus, the PIMO contributes to contextualized remembering.
1.2.5 Semantic Desktop as Active System in the Preserve-or-Forget (PoF) Framework
The Personal Preservation Pilot is an implementation of the Preserve-or- Forget (PoF) Framework (see Figure below; the technical details are explained in deliverable D8.6) with the Semantic Desktop as Active System and is built in accordance with the PoF Reference Model as explained in deliverable D8.5.
The pilot was made possible by the close cooperation of WP9 with all ForgetIT work packages. This resulted in successfully deploying and running of the pilot connected to ForgetIT’s Preserve-or-Forget (PoF) Middleware components and the Digital Preservation System which resembles the architecture depicted above. Furthermore, several components from other ForgetIT work packages are used in the pilot. This allows for preservation of content on the computer with connection to the Semantic Desktop infrastructure as well as restoring from the archive.
In 10.4. Overcoming Obstacles of Personal Preservation, we present how the Semantic Desktop approach overcomes the obstacles for Personal Preservation.