| home | vision | documents | presentations | news | GGF | links | about |
|
David De Roure, Nick Gibbins, Danius Michaelides,
Jeremy Frey
University of Southampton |
Tom Rodden, Chris Greenhalgh
University of Nottingham |
The aim of the project is to investigate and innovate at the intersection of the Semantic Grid and the physical world, by focusing on the capture, distribution and use of semantic annotation in the context of pervasive devices. It addresses important computer science challenges that have arisen in e-Science projects by focusing on the future forms of scientific record that may emerge from the use of a pervasive e-Science infrastructure. The formation of this new form of scientific record raises research challenges in three distinct areas:
Challenge 1: Recording the record requires a disparate set of information to be captured and related to each other. We need to reduce significantly the cost of capturing this additional information and develop appropriate representations for codifying this information. This will require:
Challenge 2: Sharing the record requires a careful re-examination of the relationship between the underlying architecture and the semantic annotations used to form the record of activities. Sharing this record across a distributed community requires:
Challenge 3: Replaying the Record requires us to consider how best to represent the record to scientists and how this might best be understood by scientists. This will require.
The Semantic Media project has been funded under the second round of the EPSRC Computer Science Challenges to Emerge from e-Science programme. It commences in July 2005 and runs for 24 months with one Research Assistant at each site.
The emergence of initiatives such as the UK e-science programme and, internationally, the Global Grid Forum (GGF) has been driven from the initial suggestion of 'the Grid' as a distributed computing infrastructure for advanced science. We have already seen considerable progress on the construction of such an infrastructure, with software facilities such as the Globus Toolkit becoming freely available based on the premise that high bandwidth communication allows storage and computational resources to be shared by a range of scientists accessing these services from their labs. As an infrastructure for large scale collaborative science, the Grid also supports collaboration and virtual organisations. Initiatives such as OGSA [1] and work on the Semantic Grid [2,3] and OGSA-DAI have further outlined a clear position where powerful computational services and very large amounts of data are readily available to a distributed community of scholars. This movement is establishing new forms of shared scientific record and new ways of working. As a broader range of disciplines, such as social sciences, arts and humanities, begin to draw benefit from this infrastructure, the notion of e-Science has broadened to e-Research.
Several projects in the e-Science programme have explored the application of Semantic Web technologies to achieve the ambitions of e-Research. The Semantic Grid research agenda was first defined by De Roure, Jennings and Shadbolt in 2001 [4] and emphasises the role of explicit knowledge to enable machine-processing, such as service and data descriptions within the grid middleware and semantic annotation of content. Semantic Grid research has attracted increasing attention through the e-Science programme, particularly in the myGrid, Geodise and CombeChem pilots and the IRC projects CoAKTinG (Collaborative Advanced Knowledge Technologies in the Grid) and MIAKT (Grid enabled knowledge services: collaborative problem solving environments in medical informatics).
The trend towards an increasingly powerful grid with a growing set of semantic services has been complemented by extending early visions of the grid to embrace mobile devices and distributed sets of sensors. These low cost sensors allow real time scientific data and contextual information to be placed on the network for subsequent analysis by scientists. However, a real challenge exists in allowing these sensors to have a place on the Grid. They often lack the power required to support the web services styles required by many infrastructures and often provide continuous data which does not easily sit with the service-based paradigm inherent in many grid infrastructures. Bridging between the grid and these sensors raises fundamental issues of how low cost low power devices might be represented within grid and how they might most readily handle sources of real time continuous data.
We are now at a point of convergence where the work on pervasive computing meets the e-Science work on the Grid. The growing importance of sensor based systems is reflected in a growing importance of pervasive computing techniques where sensor based systems are used to capture the nature of the real world including the effects of human activities. Initial explorations in this area are reflected in projects such as the CombeChem e-Science pilot, and current plans for a UK workshop on Ubiquitous Computing for e-Research reflects growing interest within the programme.
CombeChem is about new forms of shared scientific record. It aims to enhance the correlation and prediction of chemical structures and properties by increasing the amount of knowledge about materials via synthesis and analysis of large compound libraries. Automation of measurement and analysis is required in order to do this efficiently and reliably. The project takes this further with its objective to achieve a complete end-to-end connection between the laboratory bench and the intellectual chemical knowledge that is published as a result of the investigation - this is described as publication at source. [5]. This starts in the smart laboratory and Grid-enabled instrumentation. By studying chemists within the laboratory, handheld technology has been introduced to facilitate the information capture at this earliest stage [6-8]. Additionally, pervasive computing devices are used to capture live metadata as it is created at the laboratory bench, relieving the chemist of the burden of metadata creation.
This data then feeds into the scientific data processing. All usage of the data through the chain of processing is effectively an annotation upon it, and the provenance is explicit. The creation of original data is accompanied by information about the experimental conditions in which it is created. There then follows a chain of processing such as aggregation of experimental data, selection of a particular data subset, statistical analysis, or modelling and simulation. The handling of this information may include explicit annotation of a diagram or editing of a digital image. All of this generates secondary data, accompanied by the information that describes the process that produced it. Through publication at source, all this data is made available for subsequent reuse in support of the scientific process, subject to appropriate access control. One role of Semantic Web technologies in CombeChem, such as the RDF triplestore [9], has been to establish this complete chain of interlinked digital information all the way from the experiment through to publication.
Hence CombeChem is very much about achieving a complete digital scientific record. Crucially, this record is enriched and interlinked by a variety of annotations be they data from sensors, records of use, or explicit interaction. The 2 annotations need to be machine processable, and useful for both their anticipated purpose and interoperable to facilitate subsequent unanticipated reuse - hence we refer to them as Semantic Annotations, and they are candidates for the application of Semantic Web technologies such as RDF to represent metadata and OWL to represent the shared vocabularies (ontologies) that are used. Current work in this field is almost entirely focused on attaching metadata to fairly persistent data objects (e.g. Web sites, archives of digital photographs), but the e-Science requirements are considerably richer and more challenging, especially when working with live information streams as in CombeChem. We refer to this broader class of annotation-enriched content as Semantic Media.
This notion of Semantic Media has also arisen very clearly in the CoAKTinG project [10] which has applied Semantic Web technologies in novel ways to advance the state of the art in collaborative mediated spaces for distributed e- Science. It comprises four tools: instant messaging and presence notification (BuddySpace), graphical meeting and group memory capture (Compendium), intelligent 'to-do' lists (Process Panels) and meeting capture and replay. These are integrated into existing collaborative environments (such as the Access Grid), and through use of a shared ontology to exchange structure, promote enhanced process tracking and navigation of resources before, during, and after a meeting occurs. In this context, collaboration as an activity can be seen as a resource in itself, which with the right tools can be used to enhance and aid future collaboration and work. The full record of any collaboration (e.g. a video recording of a meeting) is rich in detail, but to be useful we must extract resources which are rich in structure: each of the CoAKTinG tools can be thought of as extracting structure from the collaboration process. In its latter phase CoAKTinG, under the management of Danius Michaelides, has conducted trials with NASA and with CombeChem, and this has emphasised the requirements of interworking annotations across projects. CoAKTinG built on the earlier EPSRC project HyStream [11] which applied semantic annotation to temporal media; the record and reuse aspects also relate to work on temporal linking conducted in Nottingham [12].
CoAKTinG has provided proof of concept, with successful trials, and it has also highlighted research challenges. Specifically, capture and retrieval of live metadata stresses the existing semantic web infrastructure (e.g. RDF triplestores are not engineered for real-time update and for querying over space and time), the CoAKTinG tools work with a small number of ontologies and domain-specific ontologies cannot be plugged in, and the problems of distributed collaborative annotation (such as privacy and conflicts) have not been fully addressed. These are challenging Computer Science problems which require expertise in the conceptual and technological underpinnings of the Semantic Web technologies; solutions to these problems will facilitate a broad range of e-Research applications with new, dynamic and interlinked forms of scientific record.
In Grid-Based Medical Devices for Everyday Health [13], patients who have left hospital are monitored using wearable computing technology. Since the patient is mobile, position and motion information is gathered (using GPS and accelerometers) to provide the necessary contextual information in which to interpret the physiological signals. The patients also perform self-reporting of blood sugar, explicitly annotating the physiological data with information about their state. This new form of medical record is processed on the Grid and medics are alerted - by pervasive computing - when the patients experience episodes that need attention. The additional contextual information provided by the wearables, and the explicit annotations in self-reporting, are essential to interpretation of the physiological signals. This project has addressed the challenges in combining a Grid infrastructure with a pervasive infrastructure: the devices and sensors that we are dealing with typically have limited computational power and storage, and they only have intermittent network connectivity. All three projects have helped establish a research agenda for pervasive computing and the Grid and for semantic annotation of information streams. They emphasise the need for pervasive semantic annotation: events which occur alongside an information stream - be they captured automatically or resulting from user interaction - are effectively annotations upon that stream, which need to be recorded, distributed and later re-used.
The convergence of Grid and Pervasive computing suggested by these initiatives is also consistent with the vision of escience articulated within the UK programme.
The broadest vision of e-science is inclusive in nature and seeks to allow those in sophisticated research labs to work together with scientists in the field. For example, an environmental field officer monitoring plant growth in a remote jungle can be connected through wireless devices to scientist in the lab with access to sophisticated climate modelling software. Developing the technology to realise this vision represents an adventurous research agenda for IT.. [14]
It also appears in a recent EU strategic report [15], drawing on the ambient intelligence vision to discuss the proactive PDA.
This project seeks to explore the convergence between Grid and pervasive computing and to lay the basic research foundations needed to ensure that this broadest vision of an e-science infrastructure, utilising new forms of scientific record, might be realised. We seek to consider how a future e-research infrastructure should allow low cost real time sensors to be annotated with additional semantic information in a scaleable and reliable manner, represent human action and activities within the infrastructure and make these available across the grid, and reach beyond the desktop to provide more direct support for scientists working in their everyday activities through a range of mobile and pervasive computing devices.
In order to structure this research we wish to explore how new forms of more dynamic scientific record might emerge that combine information from existing e-Science services, data from real world sensors and representations of the scientists. activities. This project directly addresses a number of issues in the call. Fundamentally it is about new forms of shared scientific record, characterised here as a highly dynamic record making use of pervasive and semantic technologies. It addresses capture, provenance and replay through case studies with users, and aims to make support for shared contributions to annotation widely available. It moves from the current data focus to a semantic grid with facilities for the generation, support and traceability of knowledge, and it does this through development and integration of trusted ubiquitous systems. Furthermore the machine-processable semantic annotations facilitate autonomic operation, and annotations recorded during processing enable the subsequent rapid customised assembly of services.
Like other leading Semantic Grid projects we will be applying the appropriate, well-understood Semantic Web paradigm, but here we are embarking on an additional adventure by taking these ideas into the realm of live information streams and pervasive computing, which could potentially could have a big impact on e-Research in 5-10 years. As regards areas that require deeper understanding across the e-Science programme as a whole, we are addressing Semantics/Knowledge for Grid infrastructure, systems-level Computer Science research to make the e-Science applications meet the requirements of users, and through distributed annotation we touch upon scalability and complexity issues for e-Research.
| [1] | I. Foster, C. Kesselman, and S. Tuecke, "The Anatomy of the Grid: Enabling Scalable Virtual Organizations," International Journal of Supercomputer Applications, vol. 15, 2001. |
| [2] | D. De Roure, Y. Gil, and J. Hendler, "E-Science Special Issue," IEEE Intelligent Systems, vol. 19, 2004. |
| [3] | C. A. Goble, D. De Roure, N. R. Shadbolt, and A. A. A. Fernandes, "Enhancing Services and Applications with Knowledge and Semantics," in The Grid 2: Blueprint for a New Computing Infrastructure, I. Foster and C. Kesselman, Eds.: Morgan-Kaufmann, 2004, pp. 431-458. |
| [4] | D. De Roure, N. R. Jennings, and N. R. Shadbolt, "Research Agenda for the Semantic Grid: A Future e-Science Infrastructure," National e-Science Centre, Edinburgh, UK UKeS-2002-02, December 2001. |
| [5] | J. G. Frey, D. De Roure, and L. A. Carr, "Publication At Source: Scientific Communication from a Publication Web to a Data Grid," presented at Euroweb 2002 Conference, The Web and the GRID: from e-science to e-business, Oxford, UK, 2002. |
| [6] | G. Hughes, H. Mills, D. D. Roure, J. G. Frey, L. Moreau, m. schraefel, G. Smith, and E. Zaluska, "The Semantic Smart Laboratory: A system for supporting the chemical eScientist," Org. Biomol. Chem., 2004. |
| [7] | J. G. Frey, G. V. Hughes, H. R. Mills, m. c. schraefel, G. M. Smith, and D. De Roure, "Less is More: Lightweight Ontologies and User Interfaces for Smart Labs," presented at UK eScience All Hands Meeting, Nottingham, UK, 2004. |
| [8] | m. c. schraefel, G. Hughes, H. Mills, G. Smith, T. Payne, and J. Frey, "Breaking the Book: Translating the Chemistry Lab Book into a Pervasive Computing Lab Environment," presented at Conference on Human Factors in Computing Systems (CHI), Vienna, Austria, 2004. |
| [9] | S. Harris and N. Gibbins, "3store: Efficient Bulk RDF Storage," presented at 1st International Workshop on Practical and Scalable Semantic Systems (PSSS'03), Sanibel Island, Florida, 2003. |
| [10] | S. Buckingham Shum, D. De Roure, M. Eisenstadt, N. Shadbolt, and A. Tate, "CoAKTinG: Collaborative Advanced Knowledge Technologies in the Grid," presented at Second Workshop on Advanced Collaborative Environments, Edinburgh, 2002. |
| [11] | K. R. Page, D. Cruickshank, and D. D. Roure, "Its about time: link streams as continuous metadata," presented at Twelfth ACM conference on Hypertext and Hypermedia, Århus, Denmark, 2001. |
| [12] | C. Greenhalgh, J. Purbrick, S. Benford, M. Craven, A. Drozd, and I. Taylor, "Temporal links: recording and replaying virtual environments," presented at Eighth ACM international conference on Multimedia, Marina del Rey, California, United States, 2000. |
| [13] | D. Cruickshank and D. De Roure, "A Portal for Interacting with Context-Aware Ubiquitous Systems," presented at First International Workshop on Advanced Context Modelling, Reasoning And Management, Nottingham, UK, 2004. |
| [14] | M. Atkinson, J. Crowcroft, C. Goble, J. Gurd, T. Rodden, N. Shadbolt, M. Sloman, I. Sommerville, and T. Storey, "Computer Science Challenges to Emerge from e-Science," 2002. |
| [15] | EU Expert Group, "Next Generation Grid(s) 2005 - 2010," European Commission, Brussels June 2003. |
| [16] | G. Kress and T. van Leeuwen, Multimodal Discourse - The Modes and Media of Contemporary Communication. London: Hodder Arnold, 2001. |