Research Agenda for the Semantic Grid De Roure, Jennings and Shadbolt December 2001


6. Research Agenda

The following general recommendations arise from our research and analysis of the state of the art and of the longer term vision for the Semantic Grid as outlined in this report. These recommendations also embody our sense of the most important issues that need to be addressed to effectively provide a computational infrastructure for e-Science in general and the provision of grid based computational support in particular. The recommendations are clustered into six themes and their ordering is not significant.

 

Technical and Conceptual Infrastructure

These recommendations relate both to the languages and architectures at a technical and a conceptual level.

 

1.      Grid Toolkits - The technology and methodology does not exist to build the Semantic Grid today. However developments to this end are likely to be evolutionary and will need to include foundational elements found in widely used toolkits such as Globus. Within these toolkits there remain issues that still demand further research although it is important to be aware of what the US is doing in these areas and for the UK to play to its strengths. Issues that still need to be resolved are listed in section 3.6 and they include; naming, resource discovery, synchronisation, security, fault tolerance, dependability, integration of heterogeneous resources, scalability and performance.

 

2.      Smart Laboratories - We believe that for e-Science to be successful and for the Grid to be effectively exploited much more attention needs to focused on how laboratories need to be instrumented and augmented. For example, infrastructure that allows a range of equipment to advertise its presence, be linked together, annotate and markup content it is receiving or producing. This should also extend to the use of portable devices and should include support for next generation Access Grids.

 

3.      Service Oriented Architectures - Research the provision and implementation of e-Science and grid facilities in terms of service oriented architectures. Also research into service description languages as a way of describing and integrating the problem solving elements of an e-Science grid. Here we believe the emerging Web Services standards appear well suited to the e-Science infrastructure. Although these technologies have not yet fully emerged from the standards process, toolkits and test services exist and it is possible to build systems with these now.

 

4.      Agent Based Approaches - Research the use of agent based architectures and interaction languages to enable e-Science marketplaces to be developed, enacted and maintained. We believe that such approaches provide a level of abstraction and define capabilities essential to realising the full potential of the Semantic Grid.

 

5.      Network Philosophies – Research into the role of lightweight communication protocols for much of the threading of e-Science workflow. Investigate the likely evolution of various network and service distributions. For example, the extent to which within our computational networks there will be islands of high capacity grid clusters amounting to virtual private network grids. Investigate the merits of a range of fundamentally different configurations and architectures – for example, peer-to-peer, WebFlow and JINI.  Research the relative merits of synchronous versus asynchronous approaches in our e-Science and grid contexts.

 

6.      Trust and Provenance – Further research is needed to understand the processes, methods and techniques for establishing computational trust and determining the provenance and quality of content in e-Science and grid systems. This extends to the issue of digital rights management in making content available.

 

Content Infrastructure

These recommendations relate to the technologies and methods that are relevant to the way in which content is hosted and transacted on the Grid in e-Science contexts.

 

7.      Metadata and Annotation – Whilst the basic metadata infrastructure already exists in the shape of RDF, metadata issues have not been fully addressed in current grid deployments. It is relatively straightforward to deploy some of the technology in this area, and this should be promoted. RDF, for example, is already encoding metadata and annotations as shared vocabularies or ontologies. However, there is still a need for extensive work in the area of tools and methods to support the design and deployment of e-Science ontologies.  Annotation tools and methods need to be developed so that emerging metadata and ontologies can be applied to the large amount of content that will be present in the Grid and e-Science applications.

 

8.      Knowledge Technologies – In addition to the requirement for the research in metadata and annotation above, there is a need for a range of other knowledge technologies to be developed and customised for use in e-Science contexts. These are described in detail in section 5.5 and include knowledge capture tools and methods, dynamic content linking, annotation based search, annotated reuse repositories, natural language processing methods  (for content tagging, mark-up, generation and summarisation), data mining, machine learning and internet reasoning services. These technologies will need shared ontologies and service description languages if they are to be integrated into the e-Science workflow. These technologies will also need to be incorporated into the pervasive devices and smart laboratory contexts that will emerge in e-Science.

 

9.      Integrated Media – Research into incorporating a wide range of media into the e-Science infrastructure. This will include video, audio, and a wide range of imaging methods. Research is also needed into the association of metadata and annotation with these various media forms.

 

10.  Content Presentation – Research is required into methods and techniques that allow content to be visualised in ways consistent with the e-Science collaborative effort. This will also involve customising content in ways that reflect localised context and should allow for personalisation and adaptation.

 

Bootstrapping Activities

These recommendations relate to the processes and activities that are needed to get the UK’s Grid and e-Science infrastructure more widely disseminated and exemplified.

 

11.  Starter Kits - Currently the services provided by the grid infrastructure are somewhat rudimentary, their functionality is changing and the interfaces to these services are evolving. Grids will not be used to their full potential until developers have access to services with stable and standard interfaces, as well as richer functionality. Moreover the take up of the Grid will not happen until there are tools and utilities that facilitate the development of grid-based applications. Thus there is a clear need for more comprehensive starter kits to include many more tutorial examples of what can be achieved. To this end, we recommend continued research into development and deployment of portal technology and capabilities to provide access to grid resources in an intuitive and straightforward fashion

 

12.  Exemplar and Reference Sites – Related to 11 above, there are a whole raft of problems and issues associated with the take up and use of grid concepts, infrastructure, and applications. Grids will not be used widely and successfully until there is a fully functional grid infrastructure (this includes middleware and tools to help developers of grid-based applications) and the grid infrastructure will not mature and stabilise until it has been fully tested by a whole range of varying kinds of grid-applications.  However, the establishment, documentation and dissemination of exemplar sites and applications should be undertaken as a matter of urgency.

 

13.  Use Cases - Grid software and experience to date primarily relate to contexts where there are relatively few nodes but these nodes have large internal complexity. We need to analyse current best practice and develop use cases to establish the strengths and weaknesses of these approaches. This should include opportunities that can be seen for additional services and gap analysis to determine what is obviously missing. A careful understanding of the medium scale heterogeneous IPG at NASA would be a useful initial example.

 

Human Resource Issues

These recommendations relate specifically to potential bottlenecks in the human resources needed to make a success of the UK e-Science and grid effort.

 

14.  Community Building – There is a clear need for the UK e-Science and grid developers to establish strong links and watching briefs with the following technical communities – Semantic Web and Web Services. There is also a need to develop a balanced relationship between the application scientists and computer scientists. Here multi- and inter- disciplinarity are key requirements.

 

15.  System Support - Discussions and efforts related to the Grid and e-Science are usually centred around computer and application scientists. Comparatively little attention is paid to system administrators, who need to install, set up, and manage these new wide-area environments. It is important that tools and utilities to ease the burden of the maintainers of grid-based infrastructure and applications are produced in parallel with other grid software and that this constituency is well represented in efforts to community build that arise out of 14.

 

16.  Training – It is important that dissemination and training of the current and next generation of computer and application scientists, and system administrators is built into the evolving UK e-Science effort.

 

e-Science Intrinsics

These recommendations are concerned with obtaining a better understanding of the actual processes and practice of e-Science.

 

17.  e-Science Workflow and Collaboration - Much more needs to be done to understand the workflow of current and future e-Science collaborations. Users should be able to form, maintain and disband communities of practice with restricted membership criteria and rules of operation. Currently most studies focus on the e-Science infrastructure behind the socket on the wall. However this infrastructure will not be used unless it fits in with the working environment of the e-Scientists. This process has not been studied explicitly and there is a pressing need to gather and understand these requirements. There is a need to collect real requirements from users, to collect use cases and to engage in some evaluative and comparative work. There is also a need to understand the process of collaboration in e-Science in order to fully and accurately define requirements for next generation Access Grids.

 

18.  Pervasive e-Science - Currently most references and discussions about grids imply that their primary task is to enable global access to huge amounts of computational power. Generically, however, we believe grids should be thought of as the means of providing seamless and transparent access from and to a diverse set of networked resources. These resources can range from PDAs to supercomputers and from sensor’s and smart laboratories to satellite feeds.

 

Future Proposed Directions

These recommendations relate to strategic activities that need to be undertaken in order to maximise the leverage from the Semantic Grid endeavours.

 

19.  Core Computer Science Research – If the full potential of the Semantic Grid is to be realised then a research program addressing the core computer science research issues identified in this report needs to be established. Without such research, the grid applications that are developed will be unable to support the e-Scientist to the degree that is necessary for e-Science technologies to be widely deployed and exploited.

 

20.  e-Anything – Many of the issues, technologies and solutions developed in the context of e-Science can be exploited in other domains where groups of diverse stakeholders need to come together electronically and interact in flexible ways. Thus it is important that relationships are established and exploitation routes are explored with domains such as e-Business, e-Commerce, e-Education, and e-Entertainment.