


This report was commissioned for EPSRC/DTI Core e-Science Programme by Professor A. G. Hey in June 2001. A working version of this document was released for comment with limited circulation to the UK Research Councils e-Science Programme Grid Technical Advisory Group in July 2001. The current document supersedes this earlier version.
The authors are based in the Department of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, UK and may be contacted via David De Roure on dder@ecs.soton.ac.uk The authors accept no responsibility for loss or damage arising from the use of information contained in this report.
We are grateful to Mark Baker (co-chair of the IEEE Task Force in Cluster Computing) for his contributions to Section 3. Thanks also to Vijay Dialani for his input, and to the UK’s Grid Technical Advisory Group members and others who provided comments on the first draft of this report, and to Luc Moreau, Mark Greenwood, Nick Gibbins and Rob Allan for commenting on the second draft. We are grateful to Epistemics Ltd for management of the contract.
© The Authors 2001.
Research Agenda for the Semantic Grid:
A Future e-Science Infrastructure
David De Roure, Nicholas Jennings and Nigel Shadbolt
Executive Summary
e-Science offers a promising vision of how computer and communication technology can support and enhance the scientific process. It does this by enabling scientists to generate, analyse, share and discuss their insights, experiments and results in a more effective manner. The underlying computer infrastructure that provides these facilities is commonly referred to as the Grid. At this time, there are a number of grid applications being developed and there is a whole raft of computer technologies that provide fragments of the necessary functionality. However there is currently a major gap between these endeavours and the vision of e-Science in which there is a high degree of easy-to-use and seamless automation and in which there are flexible collaborations and computations on a global scale. To bridge this practice–aspiration divide, this report presents a research agenda whose aim is to move from the current state of the art in e-Science infrastructure, to the future infrastructure that is needed to support the full richness of the e-Science vision. Here the future e-Science research infrastructure is termed the Semantic Grid (Semantic Grid to Grid is meant to connote a similar relationship to the one that exists between the Semantic Web and the Web).
In more detail, this document analyses the state of the art and the research challenges that are involved in developing the computing infrastructure needed for e-Science. In so doing, a conceptual architecture for the Semantic Grid is presented. This architecture adopts a service-oriented perspective in which distinct stakeholders in the scientific process provide services to one another in various forms of marketplace. The view presented in the report is holistic, considering the requirements of e-Science and the e-Scientist at the data/computation, information and knowledge layers. The data, computation and information aspects are discussed from a distributed systems viewpoint and in the particular context of the Web as an established large scale infrastructure. A clear characterisation of the knowledge grid is also presented. This characterisation builds on the emerging metadata infrastructure with knowledge engineering techniques. These techniques are shown to be the key to working with heterogeneous information and also to working with experts and establishing communities of e-Scientists. The underlying fabric of the Grid, including the physical layer and associated technologies, is outside the scope of this document.
Having completed the analysis, the report then makes a number of recommendations that aim to ensure the full potential of e-Science is realised and that the maximum value is obtained from the endeavours associated with developing the Semantic Grid. These recommendations relate to the following aspects:
· The research issues associated with the technical and conceptual infrastructure of the Semantic Grid;
· The research issues associated with the content infrastructure of the Semantic Grid;
· The bootstrapping activities that are necessary to ensure the UK’s grid and e-Science infrastructure is widely disseminated and exemplified;
· The human resource issues that need to be considered in order to make a success of the UK e-Science and grid efforts;
· The issues associated with the intrinsic process of undertaking e-Science;
· The future strategic activities that need to be undertaken to maximise the value from the various Semantic Grid endeavours.
|
Draft 0.1 |
25-5-2001 |
Section 2 |
|
Draft 0.2 |
2-6-2001 |
Sections 1 and 2, remaining sections in outline |
|
Draft 0.3 |
7-6-2001 |
Sections 1, 2 and 5, remaining sections in outline |
|
Draft 0.4 |
18/28-6-2001 |
Section 5 extended |
|
Draft 0.5 |
29-6-2001 |
Integrate various sections |
|
Draft 0.6 |
2-7-2001 |
Included draft of section 4 |
|
Draft 0.7 |
7-7-2001 |
Included draft of section 3 |
|
Draft 0.8 |
9-7-2001 |
Reworked references, topped and tailed sec 3 & 4 |
|
Draft 0.9 |
10-7-2001 |
Refined sections 3, 4 and 5 |
|
|
11-7-2001 |
Distributed to TAG |
|
Draft 1.0 |
10-9-2001 |
Added section 2.3 and refined section 2 |
|
Draft 1.1 |
17-9-2001 |
Revised scenario and updated section 3. Changed 5.4 in line with 2.3 and added 5.5 – included TAG comments relevant to section 5 |
|
Draft 1.2 |
20-10-2001 |
Consolidated 1.1 and updated section 4 |
|
Draft 1.3 |
9-11-2001 |
Minor revisions throughout, restructuring in section 4 |
|
Draft 1.4 |
19-11-2001 |
Minor revisions throughout |
|
Draft 1.5 |
05-12-2001 |
Added Recommendations |
|
Draft 1.6 |
06-12-2001 |
Minor revisions throughout based on comments |
|
Draft 1.7 |
10-12-2001 |
Minor revisions throughout |
|
Draft 1.8 |
12-12-2001 |
Release to Architecture Task Force |
|
Draft 1.9 |
21-12-2001 |
Release to TAG |
Contents
2.1 Justification of a Service-Oriented View
2.2.1 Service Owners and Consumers as Autonomous Agents
2.3 A Service-Oriented View of the Scenario
3.1 Grid Computing as a Distributed System
3.1.1 Distributed Object Systems
3.1.2 The Web as an Infrastructure for Distributed Applications
3.2 Data-Computational Layer Requirements
3.3 Technologies for the Data-Computational Layer
3.3.4 Nimrod/G Resource Broker and GRACE
3.3.6 The Common Component Architecture Forum
3.3.7 Batch/Resource Scheduling
3.4.2 The SDSC Grid Port Toolkit
3.4.3 Grid Portal Development Kit
4.1 Technologies for the Information Layer
4.1.2 Expressing Content and Metacontent
4.1.4 Towards an Adaptive Information Grid
4.1.5 The Web as an e-Science Information Infrastructure
4.1.6 Information Requirements of the Infrastructure
4.3 Information Layer Aspects of the Scenario
5.2 Ontologies and the Knowledge Layer
5.3 Technologies for the Knowledge Layer
5.4 Knowledge Layer Aspects of the Scenario
Technical and Conceptual Infrastructure
Scientific research and development has always involved large numbers of people, with different types and levels of expertise, working in a variety of roles, both separately and together, making use of and extending the body of knowledge. In recent years, however, there have been a number of important changes in the nature and the process of research. In particular, there is an increased emphasis on collaboration between large teams, an increased use of advanced information processing techniques, and an increased need to share results and observations between participants who are widely dispersed. When taken together, these trends mean that researchers are increasingly relying on computer and communication technologies as an intrinsic part of their everyday research activity. At present, the key communication technologies are predominantly email and the Web. Together these have shown a glimpse of what is possible; however to more fully support the e-Scientist the next generation of technology will need to be much richer, more flexible and much easier to use. Against this background, this report focuses on the requirements, the design and implementation issues, and the research challenges associated with developing a computing infrastructure to support e-Science.
The computing infrastructure for e-Science is commonly referred to as the Grid [Foster98] and this is, therefore, the term we will use here. This terminology is chosen to connote the idea of a ‘power grid’: namely that e-Scientists can plug into the e-Science computing infrastructure like plugging into a power grid. An important point to note however is that the term ‘grid’ is sometimes used synonymously with a networked, high performance computing infrastructure. While this aspect is certainly an important and exciting enabling technology for future e-Science, it is only a part of a much larger picture that also includes information handling and support for knowledge within the e-scientific process. It is this broader view of the e-Science infrastructure that we adopt in this document and we refer to this as the Semantic Grid. Our view is that as the Grid is to the Web, so the Semantic Grid is to the Semantic Web. Thus the Semantic Grid is characterised by an open system, with a high degree of automation, that supports flexible collaboration and computation on a global scale.
The grid metaphor intuitively gives rise to the view of the e-Science infrastructure as a set of services that are provided by particular individuals or institutions for consumption by others. Given this, and coupled with the fact that many research and standards activities are embracing a similar view [WebServices01], we adopt a service-oriented view of the Grid throughout this document (see section 2 for a more detailed justification of this choice). This view is based upon the notion of various entities providing services to one another under various forms of contract (or service level agreement).
Given the above view of the scope of e-Science, which includes information and knowledge, it has become popular to conceptualise the computing infrastructure as consisting of three conceptual layers[1]:
This layer deals with the way that computational resources are allocated, scheduled and executed and the way in which data is shipped between the various processing resources. It is characterised as being able to deal with large volumes of data, providing fast networks and presenting diverse resources as a single metacomputer (i.e. a single virtual computer). In terms of technology, this layer shares much with the body of research that has been undertaken into distributed computing systems. In the context of this document, we will discuss a variety of frameworks that are, or could be, deployed in grid computing at this level. The data/computation layer builds on the physical ‘grid fabric’, i.e. the underlying network and computer infrastructure, which may also interconnect scientific equipment. Here data is understood as uninterpreted bits and bytes.
This layer deals with the way that information is represented, stored, accessed, shared and maintained. Given its key role in many scientific endeavours, the World Wide Web (WWW) is the obvious point of departure for this level. Thus in the context of this document, we will consider the extent to which the Web meets the e-Scientists’ information requirements. In particular, we will pay attention to the current developments in Web research and standards and identify gaps where the needs of e-Scientists are not being met. Here information is understood as data equipped with meaning.
This layer is concerned with the way that knowledge is acquired, used, retrieved, published and maintained to assist e-Scientists to achieve their particular goals and objectives. In the context of this document, we review the state of the art in knowledge technologies for the Grid, and identify the major research issues that still need to be addressed. Here knowledge is understood as information applied to achieve a goal, solve a problem or enact a decision.
![]() |
Figure 1.1: Three layered architecture viewed as services
We believe this structuring is compelling and we have, therefore, adopted it to structure this report. We also intend that this approach will provide a reader who is familiar with one layer with an introduction to the others, since the layers also tend to reflect distinct research communities and there is a significant need to bridge communities. However, there are a number of observations and remarks that need to be made. Firstly, all grids that have or will be built have some element of all three layers in them. The degree to which the various layers are important and utilised in a given application will be domain dependent – thus in some cases, the processing of huge volumes of data will be the dominant concern, while in others the knowledge services that are available will be the overriding issue. Secondly, this layering is a conceptual view on the system that is useful in the analysis and design phases of development. However, the strict layering may not be carried forward to the implementation for reasons of efficiency. Thirdly, the service-oriented view applies at all the layers. Thus there are services, producers, consumers and contracts at the computational layer, at the information layer and at the knowledge layer (figure 1.1).
Fourthly, a (power) grid is useless without appliances to plug in. Confining the infrastructure discussion to remote services runs the risk of neglecting the interface, i.e. the computers, devices and apparatus with which the e-Scientist interacts. While virtual supercomputers certainly offer potential for scientific breakthrough, trends in computer supported cooperative work (CSCW), such as embedded devices and collaborative virtual environments, also have tremendous potential in facilitating the scientific process. Whereas the grid computing literature has picked up on visualisation [Foster99] and virtual reality (VR) [Leigh99], comparatively little attention has been paid to pervasive computing and augmented reality – what might be called the ‘smart laboratory’. This document addresses this omission.
With this context established, the next sub-section introduces a grid application scenario that we will use to motivate and explain the various tools and techniques that are available at the different grid levels.
At this time, the precise set of requirements on the e-Science infrastructure are not clear since comparatively few applications have been envisaged in detail (let alone actually developed). While this is certainly going to change over the coming years, it means the best way of grounding the subsequent discussion on models, tools and techniques is in terms of a scenario. To this end, we will use the following scenario to motivate the discussion in this document.
This scenario is derived from talking with e-Scientists across several domains including physical sciences. It is not intended to be domain-specific (since this would be too narrow) and at the same time it cannot be completely generic (since this would not be detailed enough to serve as a basis for grounding our discussion). Thus it falls somewhere in between. Nor is the scenario science fiction – these practices exist today, but on a restricted scale and with a limited degree of automation. The scenario itself (figure 1.2) fits with the description of grid applications as “coordinated resource sharing and problem solving among dynamic collections of individuals” [Foster01].
Scenario
The sample arrives for analysis with an ID number. The technician logs it into the database and the information about the sample appears (it had been entered remotely when the sample was taken). The appropriate settings are confirmed and the sample is placed with the others going to the analyser (a piece of laboratory equipment). The analyser runs automatically and the output of the analysis is stored together with a record of the parameters and laboratory conditions at the time of analysis.
The analysis is automatically brought to the attention of the company scientist who routinely inspects analysis results such as these. The scientist reviews the results from their remote office and decides the sample needs further investigation. They request a booking to use the High Resolution Analyser and the system presents configurations for previous runs on similar samples; the scientist selects the appropriate parameters. Prior to the booking, the sample is taken to the analyser and the equipment recognizes the sample identification. The sample is placed in the equipment which configures appropriately, the door is locked and the experiment is monitored by the technician by live video then left to run overnight; the video is also recorded, along with live data from the equipment. The scientist is sent a URL to the results.
Later the scientist looks at the results and, intrigued, decides to replay the analyser run, navigating the video and associated data. They then press the “query” button and the system summarises previous related analyses reported internally and externally, and recommends other scientists who have published work in this area. The scientist finds that their results appear to be unique.
The scientist requests an agenda item at the next research videoconference and publishes the experimental data for access by their colleagues (only) in preparation for the meeting. The meeting decides to make the analysis available for the wider community to look at, so the scientist then logs the analysis and associated metadata into an international database and provides some covering information. Its provenance is recorded. The availability of the new data prompts other automatic processing and a number of databases are updated; some processing of this new data occurs.
Various scientists who had expressed interest in samples or analyses fitting this description are notified automatically. One of them decides to run a simulation to see if they can model the sample, using remote resources and visualizing the result locally. The simulation involves the use of a problem solving environment (PSE) within which to assemble a range of components to explore the issues and questions that arise for the scientist. The parameters and results of the simulations are made available via the public database. Another scientist adds annotation to the published data.

Figure 1.2: Workflow in the scenario
This scenario draws out a number of underlying assumptions and raises a number of requirements that we believe are broadly applicable to a range of e-Science applications:
We believe that these requirements will be ubiquitous in e-Science applications conducted or undertaken in a grid context. The rest of this document explores the issues that arise in trying to satisfy these requirements.
The remainder of this document is organised in the following manner:
Section 2. Service-Oriented Architectures
Justification and presentation of the service-oriented view at the various levels of the infrastructure. This section feeds into sections 4 and 5 in particular.
Section 3. The Data/Computation Layer
Discussion of the distributed computing frameworks that are appropriate for grid computing, informed by existing grid computing activities.
Section 4. The Information Layer
Discussion of the Web, distributed information management research and collaborative information systems that can be exploited in e-Science applications.
Section 5. The Knowledge Layer
Description of the tools and techniques related to knowledge capture, modelling and publication that are pertinent to e-Science applications.
Section 6. Research agenda
Summary of the steps that need to be undertaken in order to realise the vision of the Semantic Grid as outlined in this report.
This section expands upon the view of the e-Science infrastructure as a service-oriented architecture in which entities provide services to one another under various forms of contract. Thus, as shown in figure 1.1, the e-Scientist’s environment is composed of data/computation services, information services, and knowledge services. However, before we deal with the specifics of each of these different types of service, respectively in sections 3, 4 and 5, it is important to highlight those aspects that are common since this provides the conceptual basis and rationale for what follows. To this end, section 2.1 provides the justification for a service-oriented view of the different layers of the e-Science infrastructure. Section 2.2 then addresses the technical ramifications of this choice and outlines the key technical challenges that need to be overcome to make service-oriented grids a reality. The section concludes (section 2.3) with the e-Science scenario of section 1.2 expressed in a service-oriented architecture.
Given the set of desiderata and requirements from section 1.2, a key question in designing and building grid applications is what is the most appropriate conceptual model for the system? The purpose of such a model is to identify the key constituent components (abstractions) and specify how they are related to one another. Such a model is necessary to identify generic grid technologies and to ensure that there can be re-use between different grid applications. Without a conceptual underpinning, grid endeavours will simply be a series of handcrafted and ad hoc implementations that represent point solutions.
To this end, an increasingly common way of viewing many large systems (from governments, to businesses, to computer systems) is in terms of the services that they provide. Here a service can simply be viewed as an abstract characterization and encapsulation of some content or processing capabilities. For example, potential services in our exemplar scenario could be: the equipment automatically recognising the sample and configuring itself appropriately, the logging of data about a sample in the international database, the setting up of a video to monitor the experiment, the locating of appropriate computational resources to support a run of the High Resolution Analyser, the finding of all scientists who have published work on experiments similar to those uncovered by our e-Scientist, and the analyser raising an alert whenever a particular pattern of results occurs (see section 2.3 for more details). Thus, services can be related to the domain of the Grid, the infrastructure of the computing facility, or the users of the Grid – i.e., at the data/computation layer, at the information layer, or at the knowledge layer. In all of these cases, however, it is assumed that there may be multiple versions of broadly the same service present in the system.
Services do not exist in a vacuum, rather they exist in a particular institutional context. Thus all services have an owner (or set of owners). The owner is the body (individual or institution) that is responsible for offering the service for consumption by others. The owner sets the terms and conditions under which the service can be accessed. Thus, for example, the owner may decide to make the service universally available and free to all on a first-come, first-served basis. Alternatively, the owner may decide to limit access to particular classes of users, to charge a fee for access and to have priority-based access. All options between these two extremes are also possible. It is assumed that in a given system there will be multiple service owners (each representing a different stakeholder) and that a given service owner may offer multiple services. These services may correspond to genuinely different functionality or they may vary in the way that broadly the same functionality is delivered (e.g., there may be a quick and approximate version of the service and one that is more time consuming and accurate).
In offering a service for consumption by others, the owner is hoping that it will indeed attract consumers for the service. These consumers are the entities that decide to try and invoke the service. The purpose for which this invocation is required is not of concern here: it may be for their own private use, it may be to resell onto others, or it may be to combine with other services.
The relationship between service owner and service consumer is codified through a service contract. This contract specifies the terms and conditions under which the owner agrees to provide the service to the consumer. The precise structure of the contract will depend upon the nature of the service and the relationship between the owner and the provider. However examples of relevant attributes include the price for invoking the service, the information the consumer has to provide to the provider, the expected output from the service, an indication about when this output can be expected, and the penalty for failing to deliver according to the contract. Service contracts can either be established by an off-line or an on-line process depending on the prevailing context.
The service owners and service producers interact with one another in a particular environmental context. This environment may be common to all entities in the Grid (meaning that all entities offer their services in an entirely open marketplace). In other cases, however, the environment may be closed and entrance may be controlled (meaning that the entities form a private club).[2] In what follows, a particular environment will be called a marketplace and the entity that establishes and runs the marketplace will be termed the market owner. The rationale for allowing individual marketplaces to be defined is that they offer the opportunity to embed interactions in an environment that has its own set of rules (both for membership and ongoing operation) and they allow the entities to make stronger assumptions about the parties with which they interact (e.g., the entities may be more trustworthy or cooperative since they are part of the same club). Such marketplaces may be appropriate, for example, if the nature of the domain means that the services are particularly sensitive or valuable. In such cases, the closed nature of the marketplace will enable the entities to interact more freely because of the rules of membership.
To summarise, the key components of a service-oriented architecture are as follows (figure 2.1): service owners (rounded rectangles) that offer services (filled circles) to service consumers (filled triangles) under particular contracts (solid links between producers and consumers). Each owner-consumer interaction takes place in a given marketplace (denoted by ovals) whose rules are set by the market owner (filled cross). The market owner may be one of the entities in the marketplace (either a producer or a consumer) or it may be a neutral third party.
![]() |
Given the central role played by the notion of a service, it is natural to explain the operation of the system in terms of a service lifecycle (figure 2.2). The first step is for a service owner to define a service they wish to make available to others. The reasons for wanting to make a service available may be many and varied – ranging from altruism, through necessity, to commercial benefit. It is envisaged that in a given grid application all three motivations (and many others besides) are likely to be present, although perhaps to varying degrees that are dictated by the nature of the domain. Service creation should be seen as an ongoing activity. Thus new services may come into the environment at any time and existing ones may be removed (service decommissioning) at any time. This means the system is in a state of continual flux and never reaches a steady state. Creation is also an activity that can be automated to a greater or less extent. Thus, in some cases, all services may be put together in an entirely manual fashion. In other cases, however, there may be a significant automated component. For example, it may be decided that a number of services should be combined; either to offer a new service (if the services are complementary in nature) or to alter the ownership structure (if the services are similar). In such cases, it may be appropriate to automate the processes of finding appropriate service providers and of getting them to agree to new terms of operation. This dynamic service composition activity is akin to creating a new virtual organisation: a number of initially distinct entities can come together, under a set of operating conditions, to form a new entity that offers a new service. This grouping will then stay in place until it is no longer appropriate to remain in this form, whereupon it will disband.
The service creation process covers three broad types of activity. Firstly, specifying how the service is to be realized by the service owner using an appropriate service description language. These details are not available externally to the service consumer (i.e., they are encapsulated by the service owner). Secondly, specifying the meta-information associated with the service. This indicates the potential ways in which the service can be procured. This meta-information indicates who can access the service and what are the likely contract options for procuring it (see section 4 for more details). Thirdly, making the service available in the appropriate marketplace. This requires appropriate service advertising and registration facilities to be available in the marketplace (see section 4 for more details).
The service procurement phase is situated in a particular marketplace and involves a service owner and a service consumer establishing a contract for the enactment of the service according to a particular set of terms and conditions. There are a number of points to note about this process. Firstly, it may fail. That is, for whatever reason, a service owner may be unable or unwilling to provide the service to the consumer. Secondly, in most cases, the service owner and the service consumer will represent different and autonomous stakeholders. Thus the process by which contracts are established will be some form of negotiation– since the entities involved need to come to a mutually acceptable agreement on the matter. If the negotiation is successful (i.e., both parties come to an agreement) then the outcome of the procurement is a contract between the service owner and the service consumer. Thirdly, this negotiation may be carried out off-line by the respective service owners or it may be carried out at run-time. In the latter case, the negotiation may be automated to a greater or lesser extent – varying from the system merely automatically flagging the fact that a new service contract needs to be established to automating the entire negotiation process[3].
The final stage of the service lifecycle is service enactment. Thus, after having established a service contract, the service owner has to undertake the necessary actions in order to fulfil its obligations as specified in the contract. After these actions have been performed, the owner needs to fulfil its reporting obligations to the consumer with respect to the service. This may range from a simple inform indicating that the service has been completed, to reporting back complex content which represents the results of performing the service. The above assumes that the service owner is always able to honour the contracts that it establishes. However, in some cases the owner may not be able to stick to the terms specified in the contract. In such cases, it may have to renegotiate the terms and conditions of the contract; paying any penalties that are due. This enforcement activity is undertaken by the market owner and will be covered by the terms and conditions that the service providers and consumers sign up to when they enter into the marketplace.
Having described the key components of the service-oriented approach, we return to the key system-oriented desiderata noted in section 1.2. From the above discussion, it can be seen that a service-oriented architecture is well suited to grid applications:
The previous section outlined the service-oriented view of grid architectures. Building upon this, this section identifies the key technical challenges that need to be overcome to make such architectures a reality. To this end, table 2.1 represents the key functionality of the various components of the service-oriented architecture, each of which is then described in more detail in the remainder of this section.
|
Service Owner |
Service Consumer |
Marketplace |
|
Service creation |
Service discovery |
Owner and consumer registration |
|
Service advertisement |
|
Service registration |
|
Service contract creation |
Service contract creation |
Policy specification |
|
Service delivery |
Service result reception |
Policy monitoring and enforcement |
Table 2.1: Key functions of the service-oriented architecture components
A natural way to conceptualise the service owners and the service consumers are as autonomous agents. Although there is still some debate about exactly what constitutes agenthood, an increasing number of researchers find the following characterisation useful [Wooldridge97]:
an agent is an encapsulated computer system that is situated in some environment and that is capable of flexible, autonomous action in that environment in order to meet its design objectives
There are a number of points about this definition that require further explanation. Agents are [Jennings00]: (i) clearly identifiable problem solving entities with well-defined boundaries and interfaces; (ii) situated (embedded) in a particular environment—they receive inputs related to the state of their environment through sensors and they act on the environment through effectors; (iii) designed to fulfill a specific purpose—they have particular objectives (goals) to achieve; (iv) autonomous— they have control both over their internal state and over their own behaviour[4]; (v) capable of exhibiting flexible problem solving behaviour in pursuit of their design objectives—they need to be both reactive (able to respond in a timely fashion to changes that occur in their environment) and proactive (able to act in anticipation of future goals) .
Thus, each service owner will have one or more agents acting on its behalf. These agents will manage access to the services for which they are responsible and will ensure that the agreed contracts are fulfilled. This latter activity involves the scheduling of local activities according to the available resources and ensuring that the appropriate results from the service are delivered according to the contract in place. Agents will also act on behalf of the service consumers. Depending on the desired degree of automation, this may involve locating appropriate services, agreeing contracts for their provision, and receiving and presenting any received results.
Grid applications involve multiple stakeholders interacting with one another in order to procure and deliver services. Underpinning the agents’ interactions is the notion that they need to be able to inter-operate in a meaningful way. Such semantic interoperation is difficult to obtain in grids (and all other open systems) because the different agents will typically have their own individual information models. Moreover, the agents may have a different communication language for conveying their own individual terms. Thus, meaningful interaction requires mechanisms by which this basic interoperation can be effected (see sections 4 and 5 for more details).
Once semantic inter-operation has been achieved, the agents can engage in various forms of interaction. These interactions can vary from simple information interchanges, to requests for particular actions to be performed and on to cooperation, coordination and negotiation in order to arrange interdependent activities. In all of these cases, however, there are two points that qualitatively differentiate agent interactions from those that occur in other computational models. Firstly, agent-oriented interactions are conceptualised as taking place at the knowledge level [Newell82]. That is, they are conceived in terms of which goals should be followed, at what time, and by whom. Secondly, as agents are flexible problem solvers, operating in an environment over which they have only partial control and observability, interactions need to be handled in a similarly flexible manner. Thus, agents need the computational apparatus to make run-time decisions about the nature and scope of their interactions and to initiate (and respond to) interactions that were not foreseen at design time (cf. the hard-wired engineering of such interactions in extant approaches).
The subsequent discussion details what would be involved if all these interactions were to be automated and performed at run-time. This is clearly the most technically challenging scenario and there are a number of points that need to be made. Firstly, while such automation is technically feasible, in a limited form, using today’s technology, this is an area that requires more research to reach the desired degree of sophistication and maturity. Secondly, in some cases, the service owners and consumers may not wish to automate all of these activities since they may wish to retain a degree of human control over these decisions. Thirdly, some contracts and relationships may be set up at design time rather than being established at run-time. This can occur when there are well-known links and dependencies between particular services, owners and consumers.
The nature of the interactions between the agents can be broadly divided into two main camps. Firstly, those that are associated with making service contracts. This will typically be achieved through some form of automated negotiation since the agents are autonomous [Jennings01]. When designing these negotiations, three main issues need to be considered:
· The Negotiation Protocol: the set of rules that govern the interaction. This covers the permissible types of participants (e.g. the negotiators and any relevant third parties), the negotiation states (e.g. accepting bids, negotiation closed), the events that cause negotiation states to change (e.g. no more bidders, bid accepted) and the valid actions of the participants in particular states (e.g. which messages can be sent by whom, to whom, at what stage).
· The Negotiation Object: the range of issues over which agreement must be reached. At one extreme, the object may contain a single issue (such as price), while on the other hand it may cover hundreds of issues (related to price, quality, timings, penalties, terms and conditions, etc.). Orthogonal to the agreement structure, and determined by the negotiation protocol, is the issue of the types of operation that can be performed on agreements. In the simplest case, the structure and the contents of the agreement are fixed and participants can either accept or reject it (i.e. a take it or leave it offer). At the next level, participants have the flexibility to change the values of the issues in the negotiation object (i.e. they can make counter-proposals to ensure the agreement better fits their negotiation objectives). Finally, participants might be allowed to dynamically alter (by adding or removing issues) the structure of the negotiation object (e.g. a car salesman may offer one year’s free insurance in order to clinch the deal).
· The Agent’s Decision Making Models: the decision-making apparatus the participants employ to act in line with the negotiation protocol in order to achieve their objectives. The sophistication of the model, as well as the range of decisions that have to be made, are influenced by the protocol in place, by the nature of the negotiation object, and by the range of operations that can be performed on it. It can vary from the very simple, to the very complex.
In designing any automated negotiation system the first thing that needs to be established is the protocol to be used. In this context, this will be determined by the market owner. Here the main consideration is the nature of the negotiation. If it is a one-to-many negotiation (i.e., one buyer and many sellers or one seller and many buyers) then the protocol will typically be a form of auction. Although there are thousands of different permutations of auction, four main ones are typically used. These are: English, Dutch, Vickrey, and First-Price Sealed Bid. In an English auction, the auctioneer begins with the lowest acceptable price and bidders are free to raise their bids successively until there are no more offers to raise the bid. The winning bidder is the one with the highest bid. The Dutch auction is the converse of the English one; the auctioneer calls for an initial high price, which is then lowered progressively until there is an offer from a bidder to claim the item. In the first-priced sealed bid, each bidder submits their offer for the item independently without any knowledge of the other bids. The highest bidder gets the item and they pay a price equal to their bid amount. Finally, a Vickrey auction is similar to a first-price sealed bid auction, but the item is awarded to the highest bidder at a price equal to the second highest bid. More complex forms of auctions exist to deal with the cases in which there are multiple buyers and sellers that wish to trade (these are called double auctions) and with cases in which agents wish to purchase multiple interrelated goods at the same time (these are called combinatorial auctions). If it is a one-to-one negotiation (one buyer and one seller) then a form of heuristic model is needed (e.g. [Faratin99; Kraus01]). These models vary depending upon the nature of the negotiation protocol and, in general, are less well developed than those for auctions.
Having determined the protocol, the next step is to determine the nature of the contract that needs to be established. This will typically vary from application to application and again it is something that is set by the market owner. Given these two, the final step is to determine the agent’s reasoning model. This can vary from the very simple (bidding truthfully) to the very complex (involving reasoning about the likely number and nature of the other bidders).
The second main type of interaction is when a number of agents decide to come together to form a new virtual organisation. This involves determining the participants of the coalition and determining their various roles and responsibilities in this new organisational structure. Again this is typically an activity that will involve negotiation between the participants since they need to come to a mutually acceptable agreement about the division of labour and responsibilities. Here there are a number of techniques and algorithms that can be employed to address the coalition formation process [Sandholm00; Shehory98] although this area requires more research to deal with the envisaged scale of grid applications.
Marketplaces should be able to be established by any agent(s) in the system (including a service owner, a service consumer or a neutral third party). The entity which establishes the marketplace is here termed the market owner. The owner is responsible for setting up, advertising, controlling and disbanding the marketplace. In order to establish a marketplace, the owner needs a representation scheme for describing the various entities that are allowed to participate in the marketplace (terms of entry), a means of describing how the various allowable entities are allowed to interact with one another in the context of the marketplace, and what monitoring mechanisms (if any) are to be put in place to ensure the marketplace’s rules are adhered to.
Having reviewed the service-oriented approach we will now apply this analysis and framework to the scenario described in section 1.2.
The first marketplace is that connected with the scientist’s own lab. This marketplace has agents to represent the humans involved in the experiment, thus there is a scientist agent (SA) and a technician agent (TA). These are responsible for interacting with the scientist and the technician, respectively, and then for enacting their instructions in the Grid. These agents can be viewed as the computational proxies of the humans they represent – endowed with their personalised information about their owner’s preferences and objectives. These personal agents need to interact with other (artificial) agents in the marketplace in order to achieve their objectives. These other agents include an analyser agent (AA) (that is responsible for managing access to the analyser itself), the analyser database agent (ADA) (that is responsible for managing access to the database containing information about the analyser), and the high resolution analyser agent (HRAA) (that is responsible for managing access to the high resolution analyser). There is also an interest notification agent (INA) (that is responsible for recording which scientists in the lab are interested in which types of results and for notifying them when appropriate results are generated) and an experimental results agent (ERA) (that can discover similar analyses of data or when similar experimental configurations have been used in the past). The services provided by these agents are summarised in table 2.2.
|
Agent |
Services Offered |
Services Consumed By |
|
Scientist Agent (SA) |
resultAlert reportAlert |
Scientist Scientist |
|
Technician Agent (TA) |
MonitorAnalysis |
Technician |
|
Analyser Agent (AA) |
configureParameters runSample |
ADA ADA |
|
Analyser Database Agent (ADA) |
logSample setAnalysisConfiguration bookSlot recordAnalysis |
Technician Technician TA AA |
|
High Resolution |