Slide number shown as [1] etc after corresponding text
This talk is a signpost - a pointer to the road ahead, and a clear position to stimulate debate. [1]
At the outset of the UK e-Science programme, e-Science was defined (by John Taylor, Director General of the Research Councils), to be “... about global collaboration in key areas of science, and the next generation of infrastructure that will enable it”. It's interesting to note the contemporary definition given in Wikipedia: “Due to the complexity of the software and the backend infrastructural requirements, e-Science projects usually involve large teams managed and developed by research laboratories, large universities or governments.” So, are we there yet? [2]
What is e-science really, and how do we know when we have succeeded? Is it (1) when everyone is using the Grid, or (2) when there are routine scientific advances that would not have happened otherwise? I say (2), and more - not just advances happening sooner than they would have, but advances that would not have happened at all, such as can be achieved when scientists can ask completely new questions. Scientific outcomes are not just accelerated but new. [3]
This talk asks how we can move from heroic scientists doing heroic science with heroic infrastructure to everyday scientists doing science they couldn't do before. And not just scientists doing science but researchers in all disciplines, be they humanists, archaeologists, geographers or musicologists - to name but a few. You could say it's about the democratisation of e-Science. This is not to say there's anything wrong with the ‘grand challenge’ e-Science, it's just that we're exploring another piece of the picture. [4]
To understand this we must start with the scientists, and take a holistic view of the process of science - the scholarly knowledge cycle. e-Science has tended to focus on one part of the cycle, but even if all we want to do is accelerate science we need to look at this lifecycle and see how we might reduce the time to discovery by reducing the time to experiment. [5]
Between 19th October and 3rd November 2007 I attended six international meetings related to e-Science. On the plane on the way back, reflecting on my trip, I wrote an email with subject The New e-Science. This landed in people's mailboxes just as the UK was debating the next steps in e-Science and has served as a provocative position in that debate. The next few slides are the contents of that email... [6]
1.Everyday researchers doing everyday research Increasingly we're seeing advanced ICT techniques in the hands of researchers on an everyday basis - the many specialists rather than the few. Chemists are blogging the lab (not just the chemists but their instruments!) to create labbooks with reusable data and provenance records so that it can be interpreted and trusted. Scientists are exploiting the increasing power of the hardware on their desks, and working in an increasingly instrumented environment, interacting through everday devices from laptop to PDA to phone. [7]
2. A data-centric perspective, like researchers e-Science has long been motivated by the data deluge - data is increasingly large, rich, complex and real-time, and often generated locally. Furthermore there is tremendous new value in data, through new digital artefacts and through metadata e.g. context, provenance and workflows. This isn't to say that computation is not important, rather that many scientists are benefitting from interaction designed around data (think scholarly knowledge cycle). [8]
3. Collaborative and participatory The scientific process has always been collaborative and participatory - the process of publication, peer review, critique, reuse. Now we see the social process of science being revisited in the digital age, using collaborative tools such as blogs and Wikis. Significantly, e-Science increasingly focuses on publishing as well as consuming - it's not just about warehousing data. [9]
4. Benefitting from the scale of digital science As the process becomes increasingly digital, we achieve network effects not just through participation and contribution of content but through its very usage. This enables automatic recommendations based on previous activity and outcomes. This is a new and powerful effect. An example of organic growth within this context can be found in OpenWetWare. [10]
5. Increasingly open Preprints servers, institutional repositories, open access journals, are all examples of sharing content. Science Commons provides licensing approaches that facilitate this. The Open Archives Initiative emerging Object Reuse & Exchange standard helps with the mechanics of sharing. [11]
6. Better not Perfect The technologies people are using are not perfect but they are better - the scientists choose them because there is an immediate benefit and often the promise of a longer term benefit too. But they must be easy/familiar to use otherwise the pain isn't worth the gain. [12]
7. Empowering researchers Many of the success stories come from researchers who have learned to use ICT and/or have domain ICT experts who are creating the solutions. Delivering infrastructure solutions to scientists is profoundly different to empowering scientists to create solutions - the latter delivers better solutions and it scales better! In fact I would state this more strongly: anything that takes autonomy away from researchers will be resisted. [13]
8. About pervasive computing e-Science is about the intersection of the digital and physical worlds. On the one hand we have sensor networks delivering more data more often from more places, on the other we are interacting with the digital world not just through portals in web browsers but through handheld devices and new forms of display. [14]
These eight points define the New e-Science. They each reflect trends in ways of working in an increasingly digital world, and they are evident not just in Science but in all areas which are undergoing this transformation, from entertainment to digital health. They are really about society and its evolving relationship with technology - about empowerment and democratisation, through ease of use, sharing and the power of community. [15]
e-Science is now enabling researchers to do some completely new stuff! As the individual pieces become easy to use, researchers can bring them together in new ways and ask new questions. This picture shows a mashup using the Allen Brain Atlas (which uses Semantic Web technologies too). I like it because of the quote ‘Standing on the shoulders of giants’. This is how we move to ‘the next level’ - it's one of the distinguishing characteristics of the New e-Science. [16]
See www.w3.org/2007/Talks/www2007-AnsweringScientificQuestions-Ruttenberg.pdf
Talking of giants, this is where David takes a shot at the Grid. This sounds anti-grid but it isn't. I am simply observing why in some ways Grid is going against the flow of the New e-Science. I will then go on to explain how it all fits together. [17]
In the early days of Grid we heard about ‘The Grid Problem’. Now we have a new Grid Problem - some deployed Grid solutions, at least as perceived by the users, appear to oppose some of the patterns we have just presented:
This picture (for which I thank Prof Malcolm Atkinson) captures the provider mindset - what I call the ‘pipeline of provision’. It shows how the innovation of computer scientists feeds through to mass use by researchers after 15 years. It isn't wrong, but nor is it the whole picture. Not everything scientists are using now was in computer science labs 15 years ago! The problem with the picture is that the arrows need to go both ways. Some have also suggested it has a rather grandiose idea of the role of computer scientists. [19]
I prefer to draw the picture like this (a 3 layer-model - computer scientists understand those!) At the bottom we have the core infrastructure resources, at the top we have scientists interacting with the digital world (the witch is doing forecasting!) In between we have an ecosystem of stakeholders like scientists, software companies, subject ICT experts, software engineers, open source software developers, and technologies from workflows, mashups and Ruby on Rails to applications. [20]
I believe that for a flourishing ecosystem it is crucial that we achieve empowerment as well as provision - we need to tap the people power. Hence usability matters at all levels: simple/familiar interfaces for users and simple/familiar interfaces for developers. There should be no need for a summer school (as there is in Grid and Semantic Web). People learn to develop websites and mashups without that degree of training. Just one of the many players in the ecosystem is the computer scientists, and (being one myself!) even I would suggest they have a special role - as facilitators who can help the ecosystem flourish. [21]
You were expecting a Web 2 talk? In passing we've mentioned some of the things that might characterise it: Wikis, mashups, REST APIs, Google Maps, technologies like AJAX, JSON and Ruby on Rails, and social networking. We could also talk about Web as a distributed application platform, with services ‘in the cloud’ like Amazon S3 and EC2.
But actually... the entire talk so far has been Web 2.0: the eight defining points of the new e-Science correspond directly to the eight design patterns of Web 2.0! [22]
In Web 2.0 the eight design patterns are called:
See www.oreilly.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html [23]
So let's be very clear what I'm saying about the relationship between Web 2.0 and the Grid.
In other words, I am proposing the resources are coupled together using Web 2.0 - that there is not one universal Grid layer. Web 2.0 provides the loose coupling that is so often required in research applications. [24-26]
In passing we note that this vision is in fact consistent with the European vision for Next Generation Grids, which was put together by a panel of industrial and academic experts. The vision of the Next Generation Grids Experts Group is called the Service-Oriented Knowledge Utility, in which ‘A utility is a directly and immediately useable service with established functionality, performance and dependability, illustrating the emphasis on user needs and issues such as trust.’
See semanticgrid.org/NGG3 [27]
There are some standard objections to my suggestion of using Web 2.0 with grid:
Lets find out. This is the purpose of the experiment that we call myExperiment.
See myexperiment.org [29]
There are many workflow systems around - we found 75 in a quick count. They provide the machinery for coordinating the execution of scientific services and linking together scientific resources. They make repetitive and mundane work easier. They are one of the pieces of technology in the layer between resources and users. [30]
Here we see a story in which a 2 year manual study had failed to find a result which was instead achieved by reusing a workflow from another area, demonstrating the power of workflow reuse for new science. Workflows are not just pieces of data, they are reusable descriptions of protocols, of pieces of scientific activity. [31]
Taverna is one such scientific workflow system, and it's particularly interesting because - at 40 downloads a day - it's being used for everyday science in laboratories across the world.
See taverna.sourceforge.net [32]
Significantly Taverna is a "super-client" which can be downloaded and installed on your laptop without enterprise support. It provides easy access to services ‘in the cloud’ - independent third party world-wide service providers of applications, tools and data sets, as well as enterprise and laboratory applications. [33]
Taverna is not alone - other widely deployed systems include Kepler and Triana. We envisage a multiworkflow world where scientists can choose the workflows for the task at hand without knowing which system they are in or where they are running. [34]
To help realise this - and to support the flourishing ecosystem - we have created myExperiment.org, which has been designed throughout with the philosophy of the New e-Science. myExperiment is: a Facebook for Scientists; a community social network for sharing of workflows and other digital assets; a gateway to other publishing environments and a federated repository (to support the scholarly knowledge cycle); a platform for launching workflows (like executing movies but generating data in the process!); sensitive to needs of scientific data (using self-describing Encapsulated myExperiment Objects for experiment snapshots and provenance. The project began in March 2007, went to closed beta by July and open beta in November. [35]
I recommend you try out myexperiment.org for yourself. These two screenshots show workflows (the results of searching for ‘disease’) and groups. In contrast to friendships (which are like those of Facebook), groups have administrators - they also have shared items. [36-37]
myExperiment reuses as much as possible, and provides simple APIs so it can be reused. The principle is to bring “myExperimentness” to the user through their existing interfaces, rather than obliging the user to come to it. For example, this picture shows a Google gadget which accesses myExperiment functionality. [38]
The distinctive thing about myExperiment is that it is designed with the needs of scientists in mind. For example, ownership and attribution of workflows has turned out to be absolutely key, and we have worked hard with our users to make sure the interface is as simple as possible to a sophisticated system of ownership, sharing and permissions. [39]
This picture shows where myExperiment fits. At first glance you might think it shows myExperiment with multiple front ends and multiple backends, but look carefully - myExperiment is just one of the services in the cloud. This is an example of cooperate, don't control. [40]
So how do we decide what is inside myExperiment and what we draw from elsewhere? We have decided that the core value proposition - the thing that makes us different from Facebook - comprises the ownership and sharing model, working with collections of heterogeneous distributed data, and workflow enactment. Hence ‘inside the box’ we have the social network and sharing and support for Encapsulated myExperiment Objects to hold all the experimental data together. Outside the box we can integrate with enactors from various workflow systems. [41]
Now we can revisit the scholarly lifecycle picture from the beginning of the talk, and add myExperiment - it is the piece that enables sharing of the new high-value forms of data such as workflows, metadata and provenance, and it achieves this in a Web 2 way - whilst reaching out to connect into the traditional scholarly lifecycle. [42]
There are six take-home messages from this talk:
For further information please contact:
Thanks to Geoffrey Fox, Savas Parastatides, and the myExperiment and myGrid teams. [44]