e-Science is about Scientists too – The Evolution of the Grid and the Web

Summary of keynote address to 1st eResearch Australasia conference, Brisbane, June 2007

David De Roure, University of Southampton < dder@ecs.soton.ac.uk>

“e-Science is about global collaboration in key areas of science and the next generation of infrastructure that will enable it,” wrote John Taylor at the inception of the UK e-Science Programme.  At that time the Grid technologies on the table included Condor, Globus and SRB. I felt there was a missing link between these infrastructure pieces and the new scientific outcomes that e-Science sought. In a report in 2001 we wrote “there is currently a major gap between these endeavours and the vision of e-Science in which there is a high degree of easy-to-use and seamless automation and in which there are flexible collaborations and computations on a global scale”.

This gap is the subject of this talk. Despite good technical progress over several years I feel it still exists, and that there is an expectation by users that the Grid community will fill the gap.  But my concern is that the Grid mindset, while having the best possible intentions, is just not equipped to achieve this. So an alternative title for this talk might be “ending the accidental tyranny of the Grid mindset”.

Part of our proposal for filling the gap was the use of Web technologies. At that time I was listening to Tim Berners-Lee promoting a vision of a future Semantic Web (a vision presented in his keynote at the Web conference in Brisbane in 1998). Semantic Web is very compelling for science because it’s about linking up data, like Web is about linking up documents – so it helps make data reuseable, and brings together the otherwise decoupled content of e-Science (after all, it’s the data that scientists want!) It is also very compelling because it provides a solution for the machine-processable metadata that is essential to achieve automation, so we promoted Semantic Web not only for scientific data but within the middleware too.  We called this Semantic Grid and created the Semantic Grid Research Group in the Global Grid Forum.

Many e-Science and Grid projects across the UK and Europe have since successfully deployed these technologies. For those of you interested in the Semantic Web experience let me highlight some projects – others can skip this part!  To introduce this, a rather cheeky definition of Semantic Grid can be obtained by substituting the word “grid” for “web” in the W3C Semantic Web Activity statement:  “The Semantic Grid is an extension of the current Grid in which information and services are given well-defined meaning, better enabling computers and people to work in cooperation.”

But as the Web has evolved it hasn’t just gone Semantic, it’s gone Web 2.0. It’s become a distributed application platform in its own right, exemplified by mashups (see programmableweb.com) and by provision of storage services (Amazon’s S3) and even compute services (Amazon’s Elastic Compute Cloud). So this raises an obvious question for e-Science – which infrastructure to use?!  We started exploring this in the Open Grid Forum in January this year and the debate rages on. For example, aren’t  scientific workflows a kind of mashup?  Or isn’t a mashup just a workflow that can run in any browser?

Some might dismiss Web 2.0 as hype but it turns out to be very instructive to look at the Web 2.0 Design Patterns and consider e-Science in that light. For example, e-Science projects have tended to focus on small numbers of specialist users, rather than the long tail of researchers doing everyday scientific work. e-Science is data-centric but the infrastructures haven’t focused on giving scientists easy processing of the content they want. Users add value but this requires support for creating and sharing that value, which again isn’t in the infrastructure. Users collaborate over artefacts but we don’t realise the full value of this in terms of making recommendations based on use – network effects by default.

The magic of web 2.0 (or even of Web 1.0 for that matter) is that it’s about building value through the participation of people. e-Science needs this too, but Grid hasn’t gone there yet. Quite rightly the Grid mindset is about provision of an advanced infrastructure to enable science. But a service-provision mindset has a fixed view of users as consumers, and this is contrary to the Web 2.0 view of “users add value”. I believe that e- should be for empower not enable.  e-Science is about Scientists too.

So what we see in Grid is lots of standards work to build well-engineered, sophisticated solutions. In Web 2.0 we see an ecosystem of simple APIs. In Grid we see a separation of content provision from data processing, while in Web 2.0 we see content motivating adoption (think Google Maps). In Grid we see an assumption that users will come. In Web 2.0 they do. This is not to say Grid is wrong or broken, but that we need to look to Web 2.0 in that space between Grid and the scientists – and the Grid mindset doesn’t naturally go there.

However the Grid mindset does deliver the robust, dependable services  that are needed to underly Web 2.0. Indeed, the European vision of the future of the Grid as “Service Oriented Knowledge Utilities” is consistent with this view. To achieve ease of functionality mashups above that level perhaps  we should look at simple Web interfaces to Grid functionality – for example, why can’t Grid be delivered through a RESTful architecture?

I believe that to achieve this shift in thinking in the gap we should look at verticals as well as the seemingly-inevitable horizontal layers (remember the OSI 7 layer model?) A good example of this is the JISC  Virtual Research Environments programme in the UK, where every project reaches from users to developers to providers.

One of these projects is called myExperiment (see myexperiment.org) which provides a social space for scientists to share workflows and other digital artifacts of e-Science – or as New Scientist put it, “MySpace for the dudes in labcoats”. myExperiment builds on everything we’ve learned and adopts a Web 2.0 approach, which we also believe to be the right approach as it is familiar to the next generation of scientists.  myExperiment manifests itself as a public site but you will be able to download it and run your own myExperiments, independently or linked together. It can also execute workflows. Significantly, it is designed with simple programming interfaces so that it’s easy for people to build mix and mash myExperiment as part of new sites, or to add “myExperiment-ness” to existing sites.  It’s all about bringing the new functionality to the user, not forcing the user to come to it – that crucial principle of “cooperate, don’t control”.

So I foresee a future e-Science infrastructure which is a mix of Grid, Semantic Web and Web 2.0, and these sitting comfortably together – and all three will evolve. The Grid is about linking things up so that people can do new stuff, so now we need to empower people create and share these functionality mashups.  And we’ll use Semantic Web technologies to assist the mashing up of data, the hackability and remixibility, and working with live feeds not just batch processing. We will learn from Web 2.0 in terms of how developers and users engage with the new capabilities –bringing new functionality to the users rather than expecting them to come to it, and creating an ecosystem  of participation which understands the incentive models of scientists.

In summary, e- is for Empowering Scientists not just Enabling Science. We’re now approaching a phase of e-Science in which many key infrastructural capabilities exist and we can start operating at a higher level – bringing functionality together to do very exciting, very new things. So we need to empower scientists to rise above all these infrastructural pieces and use their creativity with ease to harness the new capabilities and to conduct that exciting new science – which, we must remember, is the reason we’re doing all this in the first place!

Acknowledgements. The Semantic Grid Research Group is led by myself, Carole Goble, Geoffrey Fox, and Marlon Pierce, while myExperiment is led by Carole Goble and I building on the work of both the myGrid and CombeChem teams. Thanks to everyone in the community who has been involved in these activities. For further info see semanticgrid.org and myexperiment.org. Contact dder@ecs.soton.ac.uk and Carole.Goble@manchester.ac.uk