Mario Cannataro
University `Magna Græcia' of Catanzaro, 88100 Catanzaro, Italy
cannataro@unicz.it
Main issues to be faced by next-generation Grids are the management and exploitation of the overwhelming amount of data produced by applications but also by Grid operation, and the intelligent use of Grid resources and services. To achieve these very ambitious goals, next-generation Grids should include knowledge discovery and knowledge management functionalities, for both applications and system management. The way how data and information available at different levels of Grid can be effectively acquired, represented, exchanged, integrated, and converted into useful knowledge is an emerging research field known as `Grid Intelligence'. Ontologies and metadata are the basic elements through which such Grid Intelligence can be deployed. Moreover Grids should offer semantic modeling of user's tasks/needs, available services, and data sources to support high level services and dynamic services finding and composition. This document describes some of these emerging services and a first implementation in the KNOWLEDGE GRID, an environment for the design and execution of geographically distributed high-performance knowledge discovery applications.
Main issues to be faced by next-generation Grids are the management and exploitation of the overwhelming amount of data produced by applications but also by Grid operation, and the intelligent use of Grid resources and services. To achieve these very ambitious goals, next-generation Grids should include knowledge discovery and knowledge management functionalities, for both applications and system management. The way how data and information available at different levels of Grid can be effectively acquired, represented, exchanged, integrated, and converted into useful knowledge is an emerging research field known as `Grid Intelligence'.
The solutions that will be developed will certainly driven by the previous needs and requirements, but will also leverage and probably integrate some key technologies and methodologies emerging in many computer science fields, apparently far and unaware of Grids, such as peer-to-peer [8] and ubiquitous computing [9], ontology-based reasoning, and knowledge management. In particular, ontologies and metadata are the basic elements through which Grid Intelligence services can be deployed. Using ontologies, Grids may offer semantic modeling of user's tasks/needs, available services, and data sources to support high level services and dynamic services finding and composition [4]. Moreover, data mining and knowledge management techniques could enable high level services based on the semantics of stored data. Such services could be employed both at operation layer, where Grid management could gain from information hidden into data, and at application layer, where user could be able to exploit distributed data repository, using the Grid not only for high-performance access, movement and processing of data, but also to apply key analysis tools and instruments.
In this scenario, where resource ontologies and metadata allow intelligent searching and browsing, and knowledge discovery and management techniques allow high level services, peer-to-peer and ubiquitous computing, will be the orthogonal key technologies through which realize basic services such as presence management, resource discovery and sharing, collaboration and self-configuration.
Recent Grid developments [5] aim to simplify and structure the systematic building of Grid applications through the composition and reuse of software components and the development of knowledge-based services and tools. Following the trend emerged in the Web community the Open Grid Services Architecture (OGSA) introduced the service-oriented model [7, 10]. Semantic Grid focuses on the systematic adoption of metadata and ontologies to describe resources, services, data sources over the Grid, to enhance, and possibly automate, processes such as service discovery and negotiation, application composition, information extraction, and knowledge discovery [6]. Finally, Knowledge Grids [1] offer high-level tools and techniques for the distributed mining and extraction of knowledge from data repositories available on the Grid, leveraging semantic descriptions of components and data, as provided by Semantic Grid, and possibly offering knowledge discovery services as Grid Services.
Next section discusses some high-level services useful to build the emerging next-generation Grid. The KNOWLEDGE GRID, an environment for the design and execution of distributed knowledge discovery applications, is then briefly presented [2]. Finally we describe a case study where semantics and knowledge are used to enhance Grid functionalities and create new services: ontology-based semantic modelling is used to enhance component-based programming on the Grid.
To face the growing complexity of Grids and the overwhelming amount of data to be managed, main requirements of future Grids will be:
In particular, to fulfill some of the requirements listed before, we envision that next-generation Grids should first provide the following three main classes of services and related architectural framework:
Such services can be incrementally built leveraging current Grid efforts and projects. Figure 1 shows how the recent research initiatives in the Grid community (OGSA, Semantic Grid, and Knowledge Grids) could be composed to provide a coherent architecture of services. Although these initiatives present some overlapping, they complement each others. Some enabling technologies, such as ontologies and reasoning, knowledge management and knowledge discovery, are currently offered by the depicted layers, but their main impact will be really evident when they will be used internally to enhance Grid management and operation. On the other hand, peer-to-peer and ubiquitous computing techniques start to be used very recently. In our opinion peer-to-peer will be the orthogonal technology on which main tasks such as presence management, resource discovery and sharing, collaboration and self-configuration will be based [11].

Figure 1: Building Knowledge Discovery and Ontolgy-based services
Next-generation Grids must be able to produce, use and deploy knowledge as a basic element of advanced applications. In this scenario we designed the KNOWLEDGE GRID system as a joint research project of ICAR-CNR, University of Calabria, and University of Catanzaro, Italy, aiming at the development of an environment for geographically distributed high-performance knowledge discovery applications [2]. The KNOWLEDGE GRID is a high-level system for providing Grid-based knowledge discovery services. These services allow professionals and scientists to create and manage complex knowledge discovery applications composed as workflows that integrate data sets, mining tools, and computing and storage resources provided as distributed services on a Grid.
KNOWLEDGE GRID facilities allow users to compose, store, share, and execute these knowledge discovery workflows as well as publish them as new components and services on the Grid. The KNOWLEDGE GRID can be used to perform data mining on very large data sets available over Grids, to make scientific discoveries, improve industrial processes and organization models, and uncover business valuable information. Other examples of Knowledge Grids are shortly described in [12].
The KNOWLEDGE GRID provides a higher level of abstraction and a set of services based on the use of Grid resources to support all those phases of the knowledge discovery process. Therefore, it allows the end-users to concentrate on the knowledge discovery process they must develop without worrying about Grid infrastructure details.
The KNOWLEDGE GRID architecture is composed of a set of services divided in two layers:
In component-based Grid programming the user designs an application by composing available software components. However, choosing components (i.e., using domain knowledge) and enforcing constraints (i.e., using programming knowledge) are often left to the designer activity. In this case study we will show how ontologies can help users in designing and programming knowledge discovery applications on the KNOWLEDGE GRID. After a brief description of the developed domain ontology, we show how a designer can exploit that knowledge base to formulate its application and choose the software components.
Domain Ontology: a view over the Grid knowledge base. DAMON (DAta Mining ONtology) is an ontology for Data Mining domain that explicitly manages the knowledge about the domain of interest (Data Mining) and related software tools, offering users a reference model for the different classes of data mining tasks, methodologies, and software components available to solve a given problem [4]. The choice of how to structure an ontology determines what a system can know and reason about. We have built our ontology through a classification of data mining software that allows for selecting the most appropriate software to solve a KDD (Knowledge Discovery in Databases) problem. The ontology represents the features of the available data mining software by classifying their main components and evidencing the relationships and the constraints among them. The categorization of the data mining software has been made on the basis of the following classification parameters:
The Data Mining knowledge base used to support knowledge discovery programming has two conceptual layers: at the top layer the DAMON ontology gives general information about the Data Mining domain, whereas specific information about installed software components and data sources are maintained where resources resides. From an architectural point of view the ontology is a central resource, whereas specific metadata are distributed ones. As an example, DAMON stores the fact that the C5.0 Software implements the C5 Algorithm, that uses the Decision Tree method, that is a Classification method. The C5.0 Software node of ontology contains the URLs of the metadata files describing details about all the installed instances of that software.
Ontology-based programming: accessing the Grid knowledge base through ontology. DAMON is used as an ontology-based assistant that suggests the KNOWLEDGE GRID application designer what to do and what to use on the basis of his/her needs as well as a tool that makes possible semantic search of data mining software. In other words, it can be used to enhance the application formulation and design, helping the user to select and configure the most suitable Data Mining solution for a specific KDD process.
Information about Data Mining tasks/methodologies, and specific software components implementing data mining algorithms can be obtained by browsing or searching the ontology. In particular, semantic search (concept-based) of data mining software and others data mining resources has been implemented. The search and selection of the resources (data sources and software components, as well as the type of data mining tasks, methodologies and algorithms), to be used in a knowledge discovery application, are accomplished in the following steps:
By using DAMON, the user first searches the clustering algorithms by browsing or querying the ontology on the basis of some user requirements (computational complexity of the algorithm, attitude to solve the given problem or the method used to perform the data mining task), then she/he searches the clustering software implementing the algorithms and working on the data set DBX, and finally locates the metadata URLs referring to the nodes KG1, KG2 and KG3, offering respectively the clustering software K-Means, Intelligent Miner, and AutoClass. Moreover, the user also finds the node KG4 that offers the C5.0 classifier. At this point the user can access specific information about the software by accessing the specific metadata on each identified node.
Such obtained information is then used for the visual composition of those software components and data sources through a graphic interface. The abstract description of this component-based application is then translated into the Grid submission language (RSL in the Globus case). After that the implemented application can be embodied into the DAMON ontology. In this way the knowledge base can be enriched and extended with new complex data mining tasks.
The programming example discussed here is just a case of how ontology-based services can be used for high-level programming of complex applications on Grids. This approach allows the re-use of previously developed applications and software components that can be integrated in new Grid applications.
This work has been partially supported by Project `FIRB GRID.IT' funded by MIUR.