Research Agenda for the Semantic Grid De Roure, Jennings and Shadbolt December 2001


3. The Data-Computation Layer

As soon as computers are interconnected and communicating we have a distributed system, and the issues in designing, building and deploying distributed computer systems have now been explored over many years. This section considers e-Science infrastructure from a distributed systems perspective. First it positions the Grid within the bigger picture of distributed computing, asking whether it is subsumed by current solutions. Then we look in more detail at the requirements and currently deployed technologies in order to identify issues for the next generation of the infrastructure. Since much of the grid computing development has addressed the data-computation layer, this section particularly draws upon the work of that community.

 

Our discussion of the data/computation layer makes certain assumptions of the underlying network fabric, which are not addressed in detail here:

 

 

 

 

Furthermore we do not study storage devices nor particular processing solutions. With these caveats, the remainder of this section is organised as follows. Section 3.1 deals with grid computing as a distributed system, section 3.2 with the requirements at the data-computational layer, section 3.3 with technologies that are being used at this layer, section 3.4 with grid portals, section 3.5 with grid deployments, and section 3.6 with the open research issues.

 

3.1 Grid Computing as a Distributed System

Fundamentally, distribution introduces notions of concurrency and locality. The traditional view of a computation involves the data, program and computational state (memory, processor registers) coinciding at one processing node. There are many techniques for achieving this in a distributed system and the following are illustrative examples:

 

·        Remote Procedure Calls. The code is installed in advance on the ‘server’. The client sends parameters, a computation is invoked, and a result is returned. The model is essentially synchronous.  This approach is exemplified by the Distributed Computing Environment (DCE) from the Open Software Foundation (now Open Group).

·        Services[1]. Again the code is server-side and the computation runs continuously (a UNIX daemon or Windows service), clients connect and there is a bi-directional interaction according to a high level communications protocol.

·        Distributed object systems. Within the object-oriented paradigm, remote method invocation replaces remote procedure calls, usually with support for object request brokering.

·        Database queries. The database is running on the server, a client sends a small program (i.e. the query) and this is executed on the server, data is exchanged.

·        Client-side scripting. The client downloads the script from the server and executes it locally in a constrained environment, such as a Web browser.

·        Message passing. This type of paradigm encapsulates distributed ‘parallel’ programming such as MPICH or PVM (note additionally that process invocation may occur through techniques such as a remote shell). It also includes message queuing systems.

·        Peer-to-peer. Those devices that traditionally act as clients (the office workstation, the home computer) can also offer services, including computational services; i.e. computation at the ‘edge’ of the network. In principle such systems can be highly decentralised, promoting scalability and avoiding single points of failure.

·        Web-based computing. This involves the use of the Web infrastructure as a fabric for distributed applications, and closely resembles Unix services and remote procedure calls.

 

Each of the above models is in widespread use and when taken together they illustrate the current practice in distributed systems deployment. It seems likely that grid computing will not be based on just one of these technologies: they are the legacy with which the infrastructure must interoperate, and although proponents of any one of the above might argue that theirs is a universal solution, no dominant solution is apparent in current practice. Of course they are closely inter-related and any system could be expressed in multiple ways – different instances of a conceptual model.

 

So the key question to ask in this context is will this change as grid computing evolves? In some respects the same distributed systems issues continue to apply. For example, although wide area networks now operate at LAN speeds, the scale of the wide area has grown too. Availability of very high speed networking might de-emphasise the need for locality, but the global geographical scale of the Grid re-emphasises it; e.g. local memory access must still be more efficient as the latency over the wider area is fundamentally constrained by the speed of light. Data locality continues to be key to application efficiency and performance. So although local computational resources may use a variety of architectures (e.g. tightly coupled multiprocessors), over the wide area the grid architecture is a classical distributed system.

 

What will certainly change is scale, and with it heterogeneity. These are both perennial challenges in distributed systems, and both are critical in the e-Science context. In fact, in many ways the “e-” prefix denotes precisely these things, as characterised by phrases such as ‘distributed global collaborations’, ‘very large data collections’, and ‘tera-scale computing resources’. With regards to scaling up the number of nodes, difficulties are experienced with the computational and intellectual complexity of the programs, with configuration of the systems, and with overcoming increased likelihood of failure. Similarly, heterogeneity is inherent – as any cluster manager knows, their only truly homogeneous cluster is their first one!  Furthermore, increasing scale also involves crossing an increasing number of organisational boundaries, which magnifies the need to address authentication and trust issues.

 

There is a strong sense of automation in distributed systems; for example, when humans cannot deal with the scale but delegate to processes on the distributed system to do so (e.g. through scripting), which leads to autonomy within the systems. This implies a need for coordination, which, in turn, needs to be specified programmatically at various levels – including process descriptions. Similarly the increased likelihood of failure implies a need for automatic recovery from failure: configuration and repair cannot remain manual tasks.

 

The inherent heterogeneity must also be handled automatically. Systems use varying standards and system APIs, resulting in the need to port services and applications to the plethora of computer systems used in a grid environment. Agreed interchange formats help in general, because n converters are needed to enable n components to interoperate via one standard, as opposed to converters for them to interoperate with each other.

 

To introduce the matching of e-Science requirements to distributed systems solutions, the next three subsections describe the three most prevalent distributed systems solutions with broad claims and significant deployment, considering whether each might meet the needs of the e-Science infrastructure. These systems are: distributed object systems, web-based computing and peer-to-peer solutions. Having positioned grid computing in the general context of distributed systems, we then turn to the specific requirements, technologies, and exemplars of the data-computational layer.

 

3.1.1 Distributed Object Systems

The Common Object Request Broker Architecture (CORBA) is an open distributed object-computing infrastructure being standardised by the Object Management Group [OMG]. CORBA automates many common network programming tasks such as object registration, location, and activation; request demultiplexing; framing and error-handling; parameter marshalling and demarshalling; and operation dispatching. Although CORBA provides a rich set of services, it does not contain the grid level allocation and scheduling services found in Globus (see section 3.3.1), however, it is possible to integrate CORBA with the Grid.

 

The OMG has been quick to demonstrate the role of CORBA in the grid infrastructure; e.g. through the ‘Software Services Grid Workshop’ held in 2001. Apart from providing a well-established set of technologies that can be applied to e-Science, CORBA is also a candidate for a higher level conceptual model. It is language-neutral and targeted to provide benefits on the enterprise scale, and is closely associated with the Unified Modelling Language (UML).

 

One of the concerns about CORBA is reflected by the evidence of intranet rather than internet deployment, indicating difficulty crossing organisational boundaries; e.g. operation through firewalls.  Furthermore, realtime and multimedia support was not part of the original design.

 

While CORBA provides a higher layer model and standards to deal with heterogeneity, Java provides a single implementation framework for realising distributed object systems. To a certain extent the Java Virtual Machine (JVM) with Java-based applications and services are overcoming the problems associated with heterogeneous systems, providing portable programs and a distributed object model through remote method invocation (RMI). Where legacy code needs to be integrated it can be ‘wrapped’ by Java code.

 

However, the use of Java in itself has its drawbacks, the main one being computational speed. This and other problems associated with Java (e.g. numerics and concurrency) are being addressed by the likes of the Java Grande Forum (a ‘Grande Application’ is ‘any application, scientific or industrial, that requires a large number of computing resources, such as those found on the Internet, to solve one or more problems’) [JavaGrande]. Java has also been chosen for UNICORE (see Section 3.5.3).  Thus what is lost in computational speed might be gained in terms of software development and maintenance times when taking a broader view of the engineering of grid applications.

 

3.1.2 The Web as an Infrastructure for Distributed Applications

Considerable imagination has been exercised in a variety of systems to overcome the Web’s architectural limitations in order to deliver applications and services. The motivation for this is often pragmatic, seeking ways to piggyback off such a pervasive infrastructure with so much software and infrastructure support. However the web infrastructure is not the ideal infrastructure in every case. There is a danger that e-Science infrastructure could fall into this trap, hence it is important to revisit the architecture.

 

The major architectural limitation is that information flow is always initiated by the client. If an application is ever to receive information spontaneously from an HTTP server then a role reversal is required, with the application accepting incoming HTTP requests. For this reason, more and more applications now also include HTTP server functionality (cf peer-to-peer, see below).  Furthermore, HTTP is designed around relatively short transactions rather than the longer-running computations. Finally we note that typical web transactions involve a far smaller number of machines that grid computations.

 

When the Web is used as an infrastructure for distributed applications, information is exchanged between programs, rather than being presented for a human reader. This has been facilitated by XML (and the XML family or recommendations from W3C) as the standard intermediate format for information exchanged in process-to-process communication. XML is not intended for human readership – that it is human-readable text is merely a practical convenience for developers.

 

Given its ubiquitous nature as an information exchange medium, there is further discussion of the web infrastructure in Section 4.

 

3.1.3 Peer-to-Peer Computing

One very plausible approach to address the concerns of scalability can be described as decentralisation, though this is not a simple solution. The traditional client-server model can be a performance bottleneck and a single point of failure, but is still prevalent because decentralisation brings its own challenges.

 

Peer-to-Peer (P2P) computing [Clark01a] (as implemented, for example, by Napster [Napster], Gnutella [Gnutella], Freenet [Clarke01b] and JXTA [JXTA]) and Internet computing (as implemented, for example, by the SETI@home [SETI], Parabon [Parabon], and Entropia systems [Entropia]) are examples of the more general computational structures that are taking advantage of globally distributed resources. In P2P computing, machines share data and resources, such as spare computing cycles and storage capacity, via the Internet or private networks. Machines can also communicate directly and manage computing tasks without using central servers. This permits P2P computing to scale more effectively than traditional client-server systems that must expand a server’s infrastructure to expand, and this ‘clients as servers’ decentralisation is attractive with respect to scalability and fault tolerance for the reasons discussed above.

 

However, there are some obstacles to P2P computing:

 

·        PCs and workstations used in complex P2P applications will require more computing power to carry the communications and security overhead that servers would otherwise handle.

·        Security is an issue as P2P applications give computers access to other machines’ resources (memory, hard drives, etc). Downloading files from other computers makes the systems vulnerable to viruses. For example, Gnutella users were exposed to the VBS_GNUTELWORM virus. Another issue is the ability to authenticate the identity of the machines with their peers.

·        P2P systems have to cope with heterogeneous resources, such as computers using a variety of operating systems and networking components. Technologies such as Java and XML will help overcome some of these interoperability problems.

·        One of the biggest challenges with P2P computing is enabling devices to find one another in a computing paradigm that lacks a central point of control. P2P computing needs the ability to find resources and services from a potentially huge number of globally based decentralised peers.

 

A number of peer-to-peer storage systems are being developed within the research community. In the grid context these raise interesting issues of security and anonymity:

 

·        The Federated, Available, and Reliable Storage for an Incompletely Trusted Environment (FARSITE) serverless distributed file system [Bolosky00].

·        OceanStore, a global persistent data store which aims to provide a ‘consistent, highly-available, and durable storage utility atop an infrastructure comprised of untrusted servers’ and scale to very large numbers of users. [Kubiatowicz00; Zhuang01]

·        The Self-Certifying File System (SFS), a network file system that aims to provide strong security over untrusted networks without significant performance costs [Fu00].

·        PAST is a ‘large-scale peer-to-peer storage utility’ that aims to provide high availability, scalability, and anonymity. Documents are immutable and the identities of content owners, readers and storage providers are protected. [Druschel01]

 

3.2 Data-Computational Layer Requirements

The emphasis of the early efforts in grid computing was driven by the need to link a number of US national supercomputing centres. The I-WAY project, a forerunner of Globus [Foster97a], successfully achieved this goal. Today the grid infrastructure is capable of binding together more than just a few specialised supercomputing centres. A number of key enablers have helped make the Grid more ubiquitous, including the take up of high bandwidth network technologies and adoption of standards, allowing the Grid to be viewed as a viable distributed infrastructure on a global scale that can potentially support diverse applications.

 

The previous section has shown that there are three main issues that characterise computational and data grids:

 

 

At this layer, the main concern is ensuring that all the globally distributed resources are accessible under the terms and conditions specified by their respective service owners. This layer can consist of all manner of networked resources ranging from computers and mass storage devices to databases and special scientific instruments. These need to be organised so that they provide resource-independent and application-independent services. Examples of these services are an information service that provides uniform access to information about the structure and state of grid resources or a security service with mechanisms for authentication and authorization that can establish a user’s identity and create various user credentials.

 

Against this background, the following are the main design features required at this level of the Grid:

 

·        Administrative Hierarchy – An administrative hierarchy is the way that each grid environment divides itself up to cope with a potentially global extent. The administrative hierarchy, for example, determines how administrative information flows through the Grid.

·        Communication Services – The communication needs of applications using a grid environment are diverse, ranging from reliable point-to-point to unreliable multicast. The communications infrastructure needs to support protocols that are used for bulk-data transport, streaming data, group communications, and those used by distributed objects. The network services used also provide the Grid with important QoS parameters such as latency, bandwidth, reliability, fault-tolerance, and jitter control.

·        Information Services – A grid is a dynamic environment where the location and type of services available are constantly changing. A major goal is to make all resources accessible to any process in the system, without regard to the relative location of the resource user. It is necessary to provide mechanisms to enable a rich environment in which information about the Grid is reliably and easily obtained by those services requesting the information. The grid information (registration and directory) services provide the mechanisms for registering and obtaining information about the structure, resources, services, and status and nature of the environment.

·        Naming Services – In a grid, like in any other distributed system, names are used to refer to a wide variety of objects such as computers, services, or data. The naming service provides a uniform name space across the complete distributed environment. Typical naming services are provided by the international X.500 naming scheme or DNS (the Internet's scheme).

·        Distributed File Systems and Caching – Distributed applications, more often than not, require access to files distributed among many servers. A distributed file system is therefore a key component in a distributed system. From an applications point of view it is important that a distributed file system can provide a uniform global namespace, support a range of file I/O protocols, require little or no program modification, and provide means that enable performance optimisations to be implemented (such as the usage of caches).

·        Security and Authorisation – Any distributed system involves all four aspects of security: confidentiality, integrity, authentication and accountability. Security within a grid environment is a complex issue requiring diverse resources autonomously administered to interact in a manner that does not impact the usability of the resources and that does not introduce security holes/lapses in individual systems or the environments as a whole. A security infrastructure is key to the success or failure of a grid environment.

·        System Status and Fault Tolerance – To provide a reliable and robust environment it is important that a means of monitoring resources and applications is provided. To accomplish this, tools that monitor resources and applications need to be deployed.

·        Resource Management and Scheduling – The management of processor time, memory, network, storage, and other components in a grid is clearly important. The overall aim is to efficiently and effectively schedule the applications that need to utilize the available resources in the distributed environment. From a user's point of view, resource management and scheduling should be transparent and their interaction with it should be confined to application submission. It is important in a grid that a resource management and scheduling service can interact with those that may be installed locally.

·        User and Administrative GUI – The interfaces to the services and resources available should be intuitive and easy to use as well as being heterogeneous in nature. Typically user and administrative access to grid applications and services are web based interfaces.

 

3.3 Technologies for the Data-Computational Layer

There are growing numbers of Grid-related projects, dealing with areas such as infrastructure, key services, collaborations, specific applications and domain portals. Here we identify some of the most significant to date.

 

3.3.1 Globus

Globus [Globus] [Foster97b] provides a software infrastructure that enables applications to handle distributed, heterogeneous computing resources as a single virtual machine. The Globus project is a U.S. multi-institutional research effort that seeks to enable the construction of computational grids. A computational grid, in this context, is a hardware and software infrastructure that provides dependable, consistent, and pervasive access to high-end computational capabilities, despite the geographical distribution of both resources and users. A central element of the Globus system is the Globus Toolkit, which defines the basic services and capabilities required to construct a computational grid. The toolkit consists of a set of components that implement basic services, such as security, resource location, resource management, and communications.

 

It is necessary for computational grids to support a wide variety of applications and programming paradigms. Consequently, rather than providing a uniform programming model, such as the object-oriented model, the Globus Toolkit provides a bag of services which developers of specific tools or applications can use to meet their own particular needs. This methodology is only possible when the services are distinct and have well-defined interfaces (APIs) that can be incorporated into applications or tools in an incremental fashion.

 

Globus is constructed as a layered architecture in which high-level global services are built upon essential low-level core local services. The Globus Toolkit is modular, and an application can exploit Globus features, such as resource management or information infrastructure, without using the Globus communication libraries. The Globus Toolkit currently consists of the following (the precise set depends on Globus version):

 

·        An HTTP-based ‘Globus Toolkit Resource Allocation Manager’ (GRAM) protocol is used for allocation of computational resources and for monitoring and control of computation on those resources.

·        An extended version of the File Transfer Protocol, GridFTP, is used for data access; extensions include use of connectivity layer security protocols, partial file access, and management of parallelism for high-speed transfers.

·        Authentication and related security services (GSI)

·        Distributed access to structure and state information that is based on the Lightweight Directory Access Protocol (LDAP). This service is used to define a standard resource information protocol and associated information model.

·        Monitoring of health and status of system components (HBM)

·        Remote access to data via sequential and parallel interfaces (GASS) including an interface to GridFTP.

·        Construction, caching, and location of executables (GEM)

·        Advanced Resource Reservation and Allocation (GARA)

 

3.3.2 Legion

Legion [Legion] [Grimshaw97] is an object-based ‘metasystem’, developed at the University of Virginia. Legion provides the software infrastructure so that a system of heterogeneous, geographically distributed, high performance machines can interact seamlessly. Legion attempts to provide users, at their workstations, with a single coherent virtual machine.

 

Legion takes a different approach to Globus to providing to a grid environment: it encapsulates all of its components as objects. The methodology used has all the normal advantages of an object-oriented approach, such as data abstraction, encapsulation, inheritance, and polymorphism. It can be argued that many aspects of this object-oriented approach potentially make it ideal for designing and implementing a complex environment such as a metacomputer. However, using an object-oriented methodology does not come without a raft of problems, many of which are tied-up with the need for Legion to interact with legacy applications and services.

 

Legion defines the API to a set of core objects that support the basic services needed by the metasystem. The Legion system has the following set of core object types:

 

·        Classes and Metaclasses – Classes can be considered managers and policy makers. Metaclasses are classes of classes.

·        Host objects – Host objects are abstractions of processing resources; they may represent a single processor or multiple hosts and processors.

·        Vault objects – Vault objects represents persistent storage, but only for the purpose of maintaining the state of object persistent representation.

·        Implementation Objects and Caches – Implementation objects hide the storage details of object implementations and can be thought of as equivalent to an executable in UNIX.

·        Binding Agents – A binding agent maps object IDs to physical addressees.

·        Context objects and Context spaces – Context objects map context names to Legion object IDs, allowing users to name objects with arbitrary-length string names.

 

3.3.3 WebFlow

WebFlow [Akarsu98] [Haupt99] is a computational extension of the web model that can act as a framework for wide-area distributed computing and metacomputing. The main goal of the WebFlow design was to build a seamless framework for publishing and reusing computational modules on the Web, so that end users, via a web browser, can engage in composing distributed applications using WebFlow modules as visual components and editors as visual authoring tools. WebFlow has a three-tier Java-based architecture that can be considered a visual dataflow system. The front-end uses applets for authoring, visualization, and control of the environment. WebFlow uses a servlet-based middleware layer to manage and interact with backend modules such as legacy codes for databases or high performance simulations. WebFlow is analogous to the Web; web pages can be compared to WebFlow modules and hyperlinks that connect web pages to inter-modular dataflow channels. WebFlow content developers build and publish modules by attaching them to web servers. Application integrators use visual tools to link outputs of the source modules with inputs of the destination modules, thereby forming distributed computational graphs (or compute-webs) and publishing them as composite WebFlow modules. A user activates these compute-webs by clicking suitable hyperlinks, or customizing the computation either in terms of available parameters or by employing some high-level commodity tools for visual graph authoring. The high performance backend tier is implemented using the Globus toolkit:

 

·        The Metacomputing Directory Services (MDS) is used to map and identify resources.

·        The Globus Resource Allocation Manager (GRAM) is used to allocate resources.

·        The Global Access to Secondary Storage (GASS) is used for a high performance data transfer.

 

The current WebFlow system is based on a mesh of Java-enhanced web servers (Apache), running servlets that manage and coordinate distributed computation. This management infrastructure is implemented by three servlets: session manager, module manager, and connection manager. These servlets use URL addresses and can offer dynamic information about their services and current state. Each management servlet can communicate with others via sockets. The servlets are persistent and application-independent. Future implementations of WebFlow will use emerging standards for distributed objects and take advantage of commercial technologies, such as CORBA, as the base distributed object model.

 

WebFlow takes a different approach to both Globus and Legion. It is implemented in a hybrid manner using a three-tier architecture that encompasses both the Web and third party backend services. This approach has a number of advantages, including the ability to plug-in to a diverse set of backend services. For example, many of these services are currently supplied by the Globus toolkit, but they could be replaced with components from CORBA or Legion.

 

3.3.4 Nimrod/G Resource Broker and GRACE

Nimrod [Nimrod] is a tool designed to aid researchers undertaking parametric studies on a variety of computing platforms. In Nimrod, a typical parametric study is one that requires the same sequential application be executed numerous times with different input data sets, until some problem space has been fully explored.

 

Nimrod provides a declarative parametric modelling language for expressing an experiment. Domain experts can create a plan for a parametric computation (task farming) and use the Nimrod runtime system to submit, run, and collect the results from multiple computers. Nimrod has been used to run applications ranging from Bio-informatics and Operational Research, to the simulation of business processes.

 

A new system called Nimrod/G [Buyya00b] uses the Globus middleware services for dynamic resource discovery and dispatching jobs over Grids. The user need not worry about the way in which the complete experiment is set up, data or executable staging, or management. The user can also set the deadline by which the results are needed and the Nimrod/G broker tries to find the optimal resources available and use them so that the user deadline is met or the cost of computation is kept to a minimum.

 

The current focus of the Nimrod/G project team is on the use of economic theories (as discussed in section 2.1) in grid resource management and scheduling as part of a new framework called GRACE (Grid Architecture for Computational Economy) [Buyya00a]. The components that make up GRACE include global scheduler (broker), bid-manager, directory server, and bid-server working interacting with grid middleware. The GRACE infrastructure APIs that grid tools and applications programmers can use to develop software support the computational economy. Grid resource brokers, such as Nimrod/G, uses GRACE services to dynamically trade with resource owners to select those resources that offer optimal user cost or timelines criteria.

 

3.3.5 Jini and RMI

Jini [Jini] is designed to provide a software infrastructure that can form a distributed computing environment that offers network plug and play. A collection of Jini-enabled processes constitutes a Jini community – a collection of clients and services all communicating by the Jini protocols. In Jini, applications will normally be written in Java and communicate using the Java Remote Method Invocation (RMI) mechanism. Even though Jini is written in pure Java, neither Jini clients nor services are constrained to be pure Java. They may include native code, acting as wrappers around non-Java objects, or even be written in some other language altogether. This enables a Jini community to extend beyond the normal Java framework and link services and clients from a variety of sources.

 

More fundamentally, Jini is primarily concerned with communications between devices (not what devices do). The abstraction is the service and an interface that defines a service. The actual implementation of the service can be in hardware, software, or both. Services in a Jini community are mutually aware and the size of a community is generally considered that of a workgroup. A community’s lookup service (LUS) can be exported to other communities, thus providing interaction between two or more isolated communities.

 

In Jini a device or software service can be connected to a network and can announce its presence. Clients that wish to use such a service can then locate it and call it to perform tasks. Jini is built on RMI, which introduces some constraints. Furthermore, Jini is not a distributed operating system, as an operating system provides services such as file access, processor scheduling, and user logins.

 

The five key concepts of Jini are:

 

·        Lookup – search for a service and download the code needed to access it;

·        Discovery – spontaneously find a community and join;

·        Leasing – time-bounded access to a service;

·        Remote Events – service A notifies service B of A’s state change. Lookup can notify all services of a new service;

·        Transactions – used to ensure that a system’s distributed state stays consistent.

 

3.3.6 The Common Component Architecture Forum

The Common Component Architecture Forum [CCA] is attempting to define a minimal set of standard features that a high performance component framework would need to provide, or can expect, in order to be able to use components developed within different frameworks. Such a standard will promote interoperability between components developed by different teams across different institutions.

 

The idea of using component frameworks to deal with the complexity of developing interdisciplinary high performance computing applications is becoming increasingly popular. Such systems enable programmers to accelerate project development by introducing higher-level abstractions and allowing code reusability. They also provide clearly defined component interfaces, which facilitate the task of team interaction. These potential benefits have encouraged research groups within a number of laboratories and universities to develop, and experiment with prototype systems. This has led to some fragmentation and there is a need for interoperability standards.

 

3.3.7 Batch/Resource Scheduling

There are three freely available systems whose primary focus is batching and resource scheduling:

 

·        Condor [Condor] is a software package for executing batch jobs on a variety of UNIX platforms, in particular those that would otherwise be idle. The major features of Condor are automatic resource location and job allocation, check pointing and the migration of processes. These features are achieved without modification to the underlying UNIX kernel. However, it is necessary for a user to link their source code with Condor libraries. Condor monitors the activity on all the participating computing resources, those machines, which are determined to be available, are placed into a resource pool. Machines are then allocated from the pool for the execution of jobs. The pool is a dynamic entity – workstations enter when they become idle, and leave when they get busy.

 

·        The Portable Batch System (PBS) [PBS] is a batch queuing and workload management system (originally developed for NASA). It operates on a variety of UNIX platforms, from clusters to supercomputers. The PBS job scheduler allows sites to establish their own scheduling policies for running jobs in both time and space. PBS is adaptable to a wide variety of administrative policies, and provides an extensible authentication and security model. PBS provides a GUI for job submission, tracking, and administrative purposes.

 

·        The Sun Grid Engine (SGE) [SGE] is based on the software developed by Genias known as Codine/GRM. In the SGE, jobs wait in a holding area and queues located on servers provide the services for jobs. A user submits a job to the SGE, and declares a requirements profile for the job. As soon as a queue becomes available for execution of a new job, the SGE determines suitable jobs for the queue and will dispatch the job with the highest priority or longest waiting time; it will try to start new jobs in the least loaded and most suitable queue.

3.3.8 Storage Resource Broker

The Storage Resource Broker (SRB) [SRB] has been developed at San Diego Supercomputer Centre (SDSC) to provide “uniform access to distributed storage” across a range of storage devices, via a well-defined API. The SRB supports file replication, and this can occur either offline or on-the-fly. Interaction with the SRB is via a GUI. The SRB servers can be federated. The SRB is managed by an administrator, with authority to create user groups.

 

A key feature of the SRB is that it supports metadata associated with a distributed file system, such as location, size and creation date information. It also supports the notion of application-level (or domain-dependent) metadata, specific to the content and not generalisable across all data sets.

 

In contrast with traditional network file systems, SRB is attractive for grid applications in that it deals with large volumes of data, which can transcend individual storage devices, because it deals with metadata and takes advantage of file replication.

 

3.4 Grid portals on the Web

A web portal allows application scientists and researchers to access resources specific to a particular domain of interest via a web interface. Unlike typical web subject portals, a grid portal may also provide access to grid resources. For example a grid portal may authenticate users, permit them to access remote resources, help them make decisions about scheduling jobs, allowing users to access and manipulate resource information obtained and stored on a remote database. Grid portal access can also be personalised by the use of profiles which are created and stored for each portal user. These attributes, and others, make grid portals the appropriate means for e-Science users to access grid resources. 

 

3.4.1 The NPACI HotPage

The NPACI HotPage [Hotpage] is a user portal that has been designed to be a single point-of-access to computer-based resources. It attempts to simplify access to resources that are distributed across member organisations and allows them to be viewed as either an integrated grid system or as individual machines.

 

The two key services provided by the HotPage include information and resource access and management services. The information services are designed to increase the effectiveness of users. It provides links to:

 

·        User documentation and navigation;

·        News items of current interest;

·        Training and consulting information;

·        Data on platforms and software applications;

·        Information resources, such as user allocations and accounts.

 

Another key service offered by HotPage is that it provides status of resources and supports an easy mechanism for submitting jobs to resources. The status information includes:

 

·        CPU load/percent usage;

·        Processor node maps;

·        Queue usage summaries;

·        Current queue information for all participating platforms.

 

HotPage’s interactive web-based service offers secure transactions for accessing resources and allows the user to perform tasks such as command execution, compilation, and running programs.

 

3.4.2 The SDSC Grid Port Toolkit

The SDSC grid port toolkit [GridPort] is a reusable portal toolkit that uses HotPage infrastructure. The two key components of GridPort are the web portal services and the application APIs. The web portal module runs on a web server and provides a secure (authenticated) connectivity to the Grid. The application APIs provide a web interface that helps end-users develop customised portals (without having to know the underlying portal infrastructure). GridPort is designed to allow the execution of portal services and the client applications on separate web servers. The GridPortal toolkit modules have been used to develop science portals for applications areas such as pharmacokinetic modeling, molecular modelling, cardiac physiology, and tomography.

 

3.4.3 Grid Portal Development Kit

The Grid Portal Collaboration is an alliance between NCSA, SDSC and NASA IPG [NLANR]. The purpose of the Collaboration is to support a common set of components and utilities to make portal development easier and allow various portals to interoperate by using the same core infrastructure (namely the GSI and Globus).

 

Example portal capabilities include the following:

 

·        Running simulations either interactively or submitted to a batch queue.

·        File transfer including: file upload, file download, and third party file transfers (migrating files between various storage systems).

·        Querying databases for resource/job specific information.

·        Maintaining user profiles that contains information about past jobs submitted, resources used, results information and user preferences.

 

The portal architecture is based on a three-tier model, where a client browser securely communicates to a web server over a secure sockets (via https) connection. The web server is capable of accessing various grid services using the Globus infrastructure. The Globus toolkit provides mechanisms for securely submitting jobs to a Globus gatekeeper, querying for hardware/software information using LDAP, and a secure PKI infrastructure using GSI.

 

The portals discussion in this sub-section highlights the characteristics and capabilities that are required in grid environments.  Portals are also relevant at the information and knowledge layers and the associated technologies are discussed in sections 4 and 5.

3.5 Grid Deployments

Having reviewed some of the most significant projects that are developing grid technologies, in this section we now describe some of the major projects that are deploying grid technologies. In particular we identify their aims and scope, to indicate the direction of current activities.

 

3.5.1 Information Power Grid

The main objective of NASA’s Information Power Grid (IPG) [IPG] is to provide NASA’s scientific and engineering communities with a substantial increase in their ability to solve problems that depend on the use of large-scale and/or distributed resources. The project team is focused on creating an infrastructure and services to locate, combine, integrate, and manage resources from across NASA centres, based on the Globus Toolkit. An important goal is to produce a common view of these resources, and at the same time provide for distributed management and local control. It is useful to note the development goals:

 

·        Independent but consistent tools and services that support a range of programming environments used to build applications in widely distributed systems.

·        Tools, services, and infrastructure for managing and aggregating dynamic collections of resources: processors, data storage/information systems, communications systems, real-time data sources and instruments, as well as human collaborators.

·        Facilities for constructing collaborative, application-oriented workbenches and problem solving environments across NASA, based on the IPG infrastructure and applications.

·        A common resource management approach that addresses areas such as systems management, user identification, resource allocations, accounting, and security.

·        An operational grid environment that incorporates major computing and data resources at multiple NASA sites in order to provide an infrastructure capable of routinely addressing larger scale, more diverse, and more transient problems than is currently possible.

 

3.5.2 Particle Physics Grids

The EU Data Grid [DataGrid] has three principal goals: middleware for managing the grid infrastructure, a large scale testbed involving the upcoming Large Hadron Collider (LHC) facility, and demonstrators (production quality for high energy physics, also earth observation and biology). The DataGrid middleware is Globus-based.

 

The GriPhyN (Grid Physics Network) [GriPhyN] collaboration, led by the universities of Florida and Chicago, is composed of a team of experimental physicists and information technology researchers who plan to implement the first Petabyte-scale computational environments for data intensive science. The requirements arise from four physics experiments involved in the project. GriPhyN will deploy computational environments called Petascale Virtual Data Grids (PVDGs) that meet the data-intensive computational needs of a diverse community of thousands of scientists spread across the globe.

 

The Particle Physics Data Grid [PPG] is a consortium effort to deliver an infrastructure for very widely distributed analysis of particle physics data at multi-petabyte scales by hundreds to thousands of physicists. It also aims to accelerate the development of network and middleware infrastructure for data-intensive collaborative science.

 

3.5.3 EuroGrid and UNICORE

The EuroGrid [EuroGrid] is a project funded by the European Commission. It aims to demonstrate the use of grids in selected scientific and industrial communities, address the specific requirements of these communities, and highlight their use in the areas of biology, meteorology and computer-aided engineering.

 

The objectives of the EuroGrid project include the support of the EuroGrid software infrastructure, the development of software components, and demonstrations of distributed simulation codes from different application areas (biomolecular simulations, weather prediction, coupled CAE simulations, structural analysis, real-time data processing).

 

The EuroGrid software is UNICORE (UNiform Interface to COmputing REsources) [UNICORE] which has been developed for the German supercomputer centres. It is based on Java-2 and uses Java objects for communication, with the UNICORE Protocol Layer (UPL) handling authentication, SSL communication and transfer of data; Unicore pays particular attention to security.

 

3.6 Research Issues

The computation layer is probably the layer with the most software technology that is currently available and directly useable. Nevertheless, as this section has highlighted, distributed applications still offer many conceptual and technical challenges. The requirements in section 3.2 express some of these, and we highlight the following as key areas for further work:

  1. Resource discovery – given a resource’s unique name or characteristics there need to be mechanisms to locate the resource within the distributed system.  Services are resources. Some resources may persist, some may be transitory, some may be created on demand.
  2. Synchronisation and coordination – how to orchestrate a complex sequence of computations over a variety of resources, given the inherent properties of both loosely- and tightly-coupled distributed systems.  This may involve process description, and require an event-based infrastructure.  It involves scheduling at various levels, including metascheduling and workflow.
  3. Fault tolerance and dependability – environments need to cope with the failure of software and hardware components, as well as access issues – in general, accommodating the exception-handling that is necessary in such a dynamic, multi-user, multi-organisation system.
  4. Security – authentication, authorisation, assurance and accounting mechanisms need to be set in place, and these need to function in the context of increasing scale and automation.  For example, a user may delegate privileges to processes acting on their behalf, which may in turn need to propagate some privileges further.
  5. Concurrency and consistency – the need to maintain an appropriate level of data consistency in the concurrent, heterogeneous environment.  Weaker consistency may be sufficient for some applications.
  6. Performance – the need to be able to cope with non-local access to resources, through caching and duplication.  Moving the code (or service) to the data (perhaps with scripts or mobile agents) is attractive and brings a set of challenges.
  7. Heterogeneity – the need to work with a multitude of hardware, software and information resources, and to do so across multiple organisations with different administrative structures.
  8. Scalability – systems need to be able to scale up the number and size of services and applications, without scaling up the need for manual intervention.  This requires automation, and ideally self-organisation.

 

Although we believe the service-oriented approach will prove to be most useful at the information and knowledge layers, it is interesting to note that it can be applied to the various systems discussed in this section. In particular, it is compatible with the CORBA model, with the way Java is used as a distributed environment, and with Globus. We expect to see this trend continue in most of the significant grid technologies. We note that layered and service-based architectures are common in descriptions of the systems discussed in this section, notably Globus and IPG.   For example, IPG has a ‘grid common services layer’.  Determining the core grid services in a service-oriented architecture is also a research issue.

 

Java and CORBA address several of the above issues and provide a higher level conceptual model, in both cases based on distributed object systems. However most currently deployed systems use Globus, with the Java-based UNICORE system an interesting alternative. A key question in all of this work, however, is what happens as scale increases? None of the systems considered in this section have demonstrated very large scale deployments, though some are set to do so. Pragmatically, some solutions will scale reasonably well, and the Web has demonstrated that scalability can be achieved; e.g. consider search engines. However we have significant concern about the need to cross organisational boundaries and the obstacles this will impose, relating to system management and security.

 

Many of these research issues are the subjects of working groups in the Global Grid Forum [GGF].

 



[1] This notion of services is different from that described in section 2, although it can be regarded as a particular technical implementation of these ideas.