Research Agenda for the Semantic Grid:

A Future e-Science Infrastructure

 

David De Roure, Nicholas Jennings and Nigel Shadbolt

 

Executive Summary

e-Science offers a promising vision of how computer and communication technology can support and enhance the scientific process. It does this by enabling scientists to generate, analyse, share and discuss their insights, experiments and results in a more effective manner. The underlying computer infrastructure that provides these facilities is commonly referred to as the Grid. At this time, there are a number of grid applications being developed and there is a whole raft of computer technologies that provide fragments of the necessary functionality. However there is currently a major gap between these endeavours and the vision of e-Science in which there is a high degree of easy-to-use and seamless automation and in which there are flexible collaborations and computations on a global scale. To bridge this practice–aspiration divide, this report presents a research agenda whose aim is to move from the current state of the art in e-Science infrastructure, to the future infrastructure that is needed to support the full richness of the e-Science vision. Here the future e-Science research infrastructure is termed the Semantic Grid (Semantic Grid to Grid is meant to connote a similar relationship to the one that exists between the Semantic Web and the Web).

In more detail, this document analyses the state of the art and the research challenges that are involved in developing the computing infrastructure needed for e-Science. In so doing, a conceptual architecture for the Semantic Grid is presented. This architecture adopts a service-oriented perspective in which distinct stakeholders in the scientific process provide services to one another in various forms of marketplace. The view presented in the report is holistic, considering the requirements of e-Science and the e-Scientist at the data/computation, information and knowledge layers. The data, computation and information aspects are discussed from a distributed systems viewpoint and in the particular context of the Web as an established large scale infrastructure. A clear characterisation of the knowledge grid is also presented. This   characterisation builds on the emerging metadata infrastructure with knowledge engineering techniques. These techniques are shown to be the key to working with heterogeneous information and also to working with experts and establishing communities of e-Scientists. The underlying fabric of the Grid, including the physical layer and associated technologies, is outside the scope of this document.

Having completed the analysis, the report then makes a number of recommendations that aim to ensure the full potential of e-Science is realised and that the maximum value is obtained from the endeavours associated with developing the Semantic Grid. These recommendations relate to the following aspects:

·         The research issues associated with the technical and conceptual infrastructure of the Semantic Grid;

·         The research issues associated with the content infrastructure of the Semantic Grid;

·         The bootstrapping activities that are necessary to ensure the UK’s grid and e-Science infrastructure is widely disseminated and exemplified;

·         The human resource issues that need to be considered in order to make a success of the UK e-Science and grid efforts;

·         The issues associated with the intrinsic process of undertaking e-Science;

·        The future strategic activities that need to be undertaken to maximise the value from the various Semantic Grid endeavours.

 

Contents

 

Status of this document. 2

Acknowledgements 2

Document revision history. 3

1. Introduction. 7

1.1 Motivation. 7

1.2 Motivating Scenario. 9

1.3 Report Structure. 12

2. A Service-Oriented View.. 14

2.1 Justification of a Service-Oriented View.. 14

2.2 Key Technical Challenges. 19

2.2.1 Service Owners and Consumers as Autonomous Agents. 19

2.2.2 Interacting Agents. 20

2.2.3 Marketplace Structures. 22

2.3 A Service-Oriented View of the Scenario. 23

3. The Data-Computation Layer 27

3.1 Grid Computing as a Distributed System.. 27

3.1.1 Distributed Object Systems. 29

3.1.2 The Web as an Infrastructure for Distributed Applications. 30

3.1.3 Peer-to-Peer Computing. 31

3.2 Data-Computational Layer Requirements. 32

3.3 Technologies for the Data-Computational Layer 34

3.3.1 Globus. 34

3.3.2 Legion. 35

3.3.3 WebFlow.. 36

3.3.4 Nimrod/G Resource Broker and GRACE. 36

3.3.5 Jini and RMI 37

3.3.6 The Common Component Architecture Forum.. 38

3.3.7 Batch/Resource Scheduling. 38

3.3.8 Storage Resource Broker 39

3.4 Grid portals on the Web. 39

3.4.1 The NPACI HotPage. 39

3.4.2 The SDSC Grid Port Toolkit 40

3.4.3 Grid Portal Development Kit 40

3.5 Grid Deployments. 41

3.5.1 Information Power Grid. 41

3.5.2 Particle Physics Grids. 42

3.5.3 EuroGrid and UNICORE. 42

3.6 Research Issues. 42

4. The Information Layer 45

4.1 Technologies for the Information Layer 45

4.1.2 Expressing Content and Metacontent 46

4.1.3 Semantic Web. 47

4.1.4 Towards an Adaptive Information Grid. 47

4.1.5 The Web as an e-Science Information Infrastructure. 49

4.1.6 Information Requirements of the Infrastructure. 50

4.1.7 Web Services. 51

4.2 Live Information Systems. 51

4.2.1 Collaboration. 51

4.2.2 Access Grid. 52

4.3 Information Layer Aspects of the Scenario. 53

4.4 Research Issues. 54

5. The Knowledge Layer 56

5.1 The Knowledge Lifecycle. 56

5.2 Ontologies and the Knowledge Layer 58

5.3 Technologies for the Knowledge Layer 61

5.4 Knowledge Layer Aspects of the Scenario. 65

5.5 Research Issues. 67

6. Research Agenda. 69

6.1 Technical and Conceptual Infrastructure. 69

6.2 Content Infrastructure. 70

6.3 Bootstrapping Activities. 71

6.4 Human Resource Issues. 71

6.5 e-Science Intrinsics. 72

6.6 Future Proposed Directions. 72

7 References. 74