Towards the Grid of People

Summary of Keynote Address at UK e‑Science All Hands Meeting 2006

David De Roure, University of Southampton

The UK is realising the vision of an advanced e‑Infrastructure to enhance the activities of researchers and learners.

Introduction

Dan Atkins, the Director of the NSF Office of Cyberinfrastructure, uses the picture of three symmetric interlocking rings (‘Borromean rings’) to illustrate the alignment of endeavours necessary to create, provision, and apply cyberinfrastructure to enhance the activities of knowledge communities. Removing any one of the three symmetric rings destroys the synergy:

The first of these is addressed by e‑Science, and the second is very much the role of JISC in providing the UK’s infrastructure (call it cyberinfrastructure or e‑infrastructure) for research and learning, across all disciplines.

In this talk I will suggest that through JISC and the e‑Science programme we are uniquely placed to realise this synergistic advanced cyberinfrastructure articulated by Dan Atkins, and it is the third ring – the socio-technical enhancement – that is the main subject of my talk. In particular I will show how the R experience from e‑Science projects moves through into the D of JISC and its many users, largely through activities funded by the JISC Support of Research (JSR) committee. JSR supports the requirements of the research community, and significantly it includes representatives nominated by all the UK research councils – emphasising the research engagement and also the essential (and unique) cross-disciplinary nature of JISC.

In order to illustrate how e‑Science feeds through into JISC we will take a look at two sets of projects, chosen to illustrate my key points. Inevitably there are a great many projects that I won’t be mentioning and you should take a look at the JISC Web site to see the full range of activities (www.jisc.ac.uk). I’m also going to provide some motivation for people in the e‑Science community to take a look at the Virtual Research Environments and e‑Infrastructure calls – and to engage with JISC in using, building and sustaining our advanced and synergistic cyberinfrastructure.

The Scholarly Knowledge Cycle

I start my story with one of the e‑Science pilot projects, the CombeChem project (www.combechem.org), which is broadly characteristic of many e‑Science projects focusing on using grid techniques to cope with the data deluge from new experimental practices – in this case combinatorial chemistry, but equally it could be data from DNA microarrays. The CombeChem pilot focused on gathering data in laboratories and from instruments on the grid, and enabling researchers to use it (for example, by performing compute-intensive computations) to generate results and papers – the scholarly research output. It introduced the notion of “publication@source”, a term coined by Jeremy Frey to describe the need to capture data and its context from the outset, maintaining the provenance information in order to facilitate subsequent interpretation and reuse of the data.

But this is only part of a much bigger picture of the scholarly knowledge cycle. The data and publication outputs of the scientific process feed into repositories, archives and digital libraries. They are used by researchers and also by learners, facilitated by support for discovery and use of resources. This more holistic perspective – which transcends any one research project and is very much the broader scope of JISC – is captured very effectively by Liz Lyon’s diagram which appeared in Ariadne in 2003 (Issue 36, July 2003, see www.ariadne.ac.uk).

This is also an example of the crucially content-centric perspective on the research and learning infrastructure – users don't got to a computer to use software, they go to use information. And increasingly we find ourselves discussing the issue of ‘freeing the data’, by which I mean making it available but also making it usable – for both anticipated and unanticipated reuse by others. For me the cyberinfrastructure is as much about content as software.

Feeding through

Let’s look at the feed-through of the CombeChem e‑Science experience into JISC, starting early in that cycle. The JISC Repository for the Laboratory (R4L) project is developing digital data and document repositories for laboratory-based science (r4l.eprints.org). Funded under the Digital Repositories programme, R4L addresses the interactions between repositories of primary research data, the laboratory environment in which they operate, and repositories of research publications they feed into.

This interlinking of research data and research publication is the subject of the JISC eBank project, which provides open access crystallography data interlinked with its derived research publications – it’s possible to chase back to see exactly where results have come from, or even to find research publications arising from data. In line with the digital library context for this work, OAI (Open Archives Initiative) metadata is harvested from institutional data repositories. The project also supports learning – learners can trace from research publication back to research data which they can then interpret and reuse.

Through projects like eBank we can explore the intersection of the publishing and data worlds, be they open or closed. For example, the ecrystals interface provides a web page complete with a 3D visualisation of a molecule, data collection parameters and links back to the files of data which led to this output. Behind this simple interface there is a complex picture including a diverse set of stakeholders – the federation model involves data collection, deposit in R4L, data curation and preservation in databases and databanks, institutional data repositories, aggregator services, portals and publishers. Significantly, JISC has relationships with this multitide of players.

Emerging from this work we also have an educational application. e‑Malaria, a computer-aided drug discovery system for chemistry teaching, is one of three JISC projects taking e‑Science into schools. From a Web browser which displays a 3D visualisation, students take a suitable enzyme target in the malaria parasite, design a small molecule as a possible drug, ‘dock’ it into the enzyme target to find an improved binding and modify to yield a drug-like molecule. This is an example of chemistry in context, an authentic activity in which students use real data and real software – the chance drug candidates could go on for in vitro and in vivo tests.

Grid of People

JISC is very much about connecting people to resources – what I am calling the Grid of People. However, we are seeing an interesting development which is set to take this notion deeper. The many users of shared research outputs, and for example the many users of e‑Malaria, are all participants in the Grid – not just consumers of information but generators of new content of value to others. This collective intelligence perspective, a characteristic of Web 2.0 (think flickr), illustrates the value that people bring and the ‘network effects’ that can occur. With the appropriate tools we are beginning to see a deeper sense of people contributing to the cyberinfrastructure through being participants rather than just consumers.

Another e‑Science pilot project has embraced this participation perspective. Led by Carole Goble, myGrid is building myExperiment, a collaborative platform for life scientists to share experimental information – in this case workflows. Think mySpace for scientists. Additional to the direct benefits to the users, collecting and sharing information about what people are actually doing with the cyberinfrastructure provides a basis for research into enhancing the research environment. For example, by collecting a large number of workflows from multiple scientists, and studying how they are reused, we can begin to identify design patterns in workflows and to optimise our systems to meet these requirements.

Collaborative Tools

Which brings us on to Virtual Research Environments. Here the goal is to help researchers in all disciplines manage the increasingly complex range of tasks involved in carrying out research – for example, bringing together scientists in the field, e‑Scientists, resources and experiments. Phase 1 of the JISC VRE programme was exploratory, in particular looking at the tightly-coupled world of Virtual Learning Environments versus the loosely coupled world of e‑Science through a diverse set of studies.

The scholarly knowledge cycle is about sharing and collaboration of an asynchronous nature (to use the CSCW term) – publishing things at each other. The VRE programme has also addressed synchronous collaboration, as occurs in the meetings that pervade the life of almost all researchers, increasingly taking the form of telephone and videoconferences amongst geographically dispersed colleagues. The e‑Science CoAKTinG project, which was part of the Advanced Knowledge Technologies Interdisciplinary Research Collaboration (www.aktors.org/coakting/), investigated the use of knowledge technologies to enhance meetings, and this has fed through directly into the JISC Memetic VRE project (www.memetic-vre.net) led by Mike Daw.

By providing tools for mapping and recording Access Grid meetings, Memetic makes meetings persistent and replayable – it turns a meeting into an artifact for retrospective use. In fact it blurs the synchronous-asynchronous distinction by effectively turning meetings into documents, which in the future could be stitched in with the experimental data and results to provide a completely interlinked digital record.

While CoAKTinG conducted trials with two groups of end-users (chemists and, with NASA, astronauts on Mars!), Memetic has deployed these technologies for use by a variety of users in a variety of settings. These include distance learning, social science and education seminars, interviews, recording mathematical work, and meetings between system developers in UK and US. In particular it has been used for research, both for observing and annotating events of groups of students in learning scenarios and for observation and evaluation of performance art. In a series of three “performativity, place, space” workshops led by Angela Piccini and supported by the Arts and Humanities Research Council, Memetic and Access Grid have been used together as a telematic performance environment and dissemination tool. This broad participation, and the benefits and challenges that arise, demonstrates the benefit of JISC in addressing a spectrum of disciplines.

Record and Reuse

Behind the scenes, many of the projects discussed here are using ‘semantic’ technologies for metadata, notably the W3C Resource Description Framework; for example CombeChem stores provenance, and Memetic stores events, in an RDF store. eBank and e‑Malaria were funded under the JISC Semantic Grid and Autonomic Computing programme, which was a joint initiative with EPSRC. Another VRE project, Iugo, is looking at conferences, the Semantic Web and social software – again supporting collaboration and sharing using semantic technologies. This approach is beiung adopted in these projects because machine-processable descriptions (metadata) enable an increased degree of automation, for example in resource discovery and in sharing and reusing resources. This means that the machines can get on with what they’re good at and let the users focus on research and learning. For more on the Semantic Grid see www.semanticgrid.org.

There’s another way of looking at this which pervades all the examples given so far. In all cases we are making recordings – be it data from an X-Ray diffractometer or a video of a performance – and then reusing this archived digital record in ways that may have been anticipated at the time of capture or, significantly, may not have been. The semantic technologies facilitate the reuse. If we take this “record and reuse” perspective, we can view the metadata as annotation which adds value to the stored information, and all use leads to further annotation – if we then share annotations we begin to see the network effects discussed earlier. This accumulation of annotation is exactly what the Semantic Web technologies are designed for.

Infrastructure

The forthcoming VRE phase 2 continues to address collaboration and sees a move from a technology focus into user and research practice focus, developmental rather than experimental, with unified approaches and integrated solutions – significantly, projects will address both community engagement and e‑Research interoperability. Hence we see the emphasis on creating the sustainable cyberinfrastructure. Demonstrating this commitment, proposals to VRE phase 2, and also to e‑Infrastructure and to the AHRC e‑Science call, are expected to consider sustainability at proposal stage. Software sustainability is the concern of the JISC Open Source Advisory Service OSS-Watch (www.oss-watch.ac.uk) and also the Open Middleware Infrastructure Institute UK (www.omii.ac.uk), which has been set up by the e‑Science Programme to provide software and support to enable a sustained future for the UK e‑Science community and its international collaborators.

JISC’s ‘Provisioning’ ring – creation, deployment and operation of advanced cyberinfrastructure – features a number of services including the National Grid Service (www.grid-support.ac.uk). NGS has been set up to provide open standards based access to the full range of the UK’s computation and data based research facilities, together with a range of sophisticated services to support the sort of coordinated collaborative and cross-resource activities that we describe. The NGS infrastructure aims to enable researchers to create, process, preserve and publish digital information, easily navigate through the available resources, be confident in the quality of the services available and tie into international efforts.

The advancement of this cyberinfrastructure is being addressed through the £10 million e‑Infrastructure programme which is consolidating the initial five-year investment in the e‑Science programme. It has four thematic areas which reflect the activities needed to enhance the provisioning ring: security, Grid Services and Tools, Knowledge Organisation and Semantic Services, and Community Engagement and support. Note again the emphasis on the Grid of People.

Conclusions

Moving technologies from early adopters through to the majority is a significant challenge which takes time, and JISC is uniquely positioned to enable this. Does this mean that what is being delivered is not leading edge? In fact I believe it is absolutely leading edge, because all three rings are in place and the synergy is there. We are building the Grid of People.

The Next Generation Grid vision, presented this week at the European Grid Technology days in Brussels, majors on the notion of the “Service Oriented Knowledge Utility” (www.semanticgrid.org/NGG3). This is described as: building on existing industry practices and emerging technologies; supporting ecosystems that promote collaboration; moving towards increased agility, lower cost, and broader availability of services; empowering service providers, integrators and consumers of ICT; using concepts from Web, Grid and Knowledge technologies; and as safe, easy and ubiquitous as existing utilities like electricity or water. This description works remarkably well applied to the advanced cyberinfrastructure being realised by JISC.

In conclusion, Grid is about joining things together – so is e‑Science, and so is JISC. These are natural partners. It is about connecting people to resources – the Grid of People – and increasingly we will see people as participants as well as consumers. We are creating a world where datasets are part of the infrastructure and services are utilities. We see research being increasingly automated, as we explore the interaction between human-driven and machine-driven processes. And for researchers, moving technology out of the lab and into The Wild – actually using stuff – informs and enables new research.

And the bottom line JISC message – JISC is engaging with the research community and seeks participation of researchers in using, developing and sustaining the virtual research environment and e‑infrastructure, to deliver the vision of an advanced cyberinfrastructure which enhances the activities of researchers and learners.

Slide Credits

Dan Atkins, Jeremy Frey and Simon Coles, Liz Lyon & eBank project, Carole Goble, Reagan Moore, Mike Daw, Simon Buckingham Shum & Memetic project, Angela Piccini, Neil Geddes, Franco Accordino and the JISC staff.