The University of Liverpool - investing in knowledge
home www.liv.ac.uk |
  Implementing the Kepler Workflow Interface into the Cheshire Digital Library Framework and the Sakai Virtual Research Environment  
      You are in: Library> > SC& Homepage > The Cheshire Project > Kepler Workflow Interface / Cheshire / Sakai      
 

For Technical Development Web pages Click Here

Background/Context

The project seeks to develop and implement the Kepler workflow system as an interface to the Chesire3 digital library framework and the Sakai virtual research environment. The aim is to enable researchers in both the humanities and scientific disciplines to use the Kepler, Cheshire3 and Sakai software to conduct analyses and perform distributed processing in several different software and hardware environments. It will assist in coordinating the export and import of data from one environment to another. We intend to use the Kepler, Cheshire3 and Sakai interface to provide researchers with capabilities ranging from discovering information to publishing and storing results, thus comprising a virtual research environment. In particular, we intend to work with the Arts and Humanities Data Service to develop a number of transactional services for the humanities.

Aims and Objectives

The overall aim of the project is to implement established, automated workflow technologies into the Cheshire3 Digital Library framework. This will provide researches with an easy to use yet powerful system for executing workflows. We wish to leverage the developments for e-science and apply them to a humanities domain as appropriate. We intend to use the Kepler, Cheshire3, Sakai interface to provide researchers with capabilities ranging from discovering information to publishing results and will enable users of the system to generate, more easily, publishable results from relevant text data.

The specific objectives are to:

  • Devise an interface for a workflow creation and execution process so that users may design, execute, monitor, and communicate analytical procedures repeatedly with minimal effort.
  • Implement this as a part on the Cheshire3 digital library framework.
  • Implement a workflow method for extracting information from the Sakai VRE.
  • Incorporate this integration into data-grid systems, through support of the Storage Resource Broker, and Grid workflow patterns. In doing so, address issues of data and process provenance, user interaction, reporting and logging.
  • Test the implementation on large, complex, and heterogeneous data sets particularly from the Arts and Humanities Data Service.
  • Evaluate the implementation with improvements introduced from user feedback.

Project Methodology

There are several important issues which must be addressed. Interoperability between Kepler and Cheshire3 is of absolute importance in this project and we must ensure that we can do this in the best manner possible. Initially we must determine if the Kepler compiler can compile the Web Service Definition Language (WSDL) for SRW, if this is not the case we must find another strategy for building Kepler objects - this issue has been addressed in the project plan. This will allow Kepler to interact with Cheshire3 as a 'black box', the next stage of the project will be to enable Kepler actors to interact directly with Cheshire3 objects.

To allow optimal interaction of Kepler actors and Cheshire3 objects we also need to address the issue of distributed processing, Cheshire3 and Kepler are designed with this type of processing in mind. The expected solution, to this issue, will rely on TCP/IP sockets so as to be distributable in the grid environment.

Once we have fully developed the Cheshire3 framework so that it can use Kepler functionality and developed methods whereby Cheshire3 tools can be used within Kepler workflows, we will investigate integration of Kepler with Sakai. A Kepler actor will be designed that will extract the data that is stored within the Sakai VRE. Once this data has been extracted other Kepler workflow processes and functions can be enacted and used to analyse this data including the Cheshire3. To enable this extraction we must first investigate how the data is stored within Sakai if it is stored in a database and if so which type. Once this has been determined we can investigate the accessibility of the data and how we can extract if for use with Kepler workflows.

The critical success factors of the project are:

  • Implementation of a SRW / Kepler actor
  • Implementation of Kepler actors with the ability to interact with C3 objects
  • Implementation of C3 to Kepler handler
  • Expansion of the C3 Framework to include Kepler functionality
  • Implementation of a Sakai / Kepler actor
  • Interoperability between all applications, usability, accessibility and user acceptance.

Important issues to be addressed:

  • Implementation of multiple protocols required for system interoperability
  • Support for cross-searching different DTD's
  • Interfacing issues across languages (Python/Java)
  • Benchmarking procedures
  • For Sakai, addressing requirements relating to the relative instability of its code base, at an early stage of development
  • Production of in-depth valuable user testing

Implications/ Deliverables/ Stakeholders

The project will support an environment which will enable researchers in both the humanities and scientific disciplines to use the Kepler, Cheshire3 and Sakai software to conduct analyses and perform distributed processing in several different software and hardware environments and to coordinate the export and import of data from one environment to another.

Project Manager

Paul Watry
Special Collections & Archives
Sydney Jones Library
The University of Liverpool
PO Box 123
Liverpool
L69 3DA
Tel: 0151 794 2696
Fax: 0151 794 2681
Email: P.B.Watry@liverpool.ac.uk
Email: clare.llewellyn@liverpool.ac.uk

Project Team

Fabio Corubolo - Lead developer - Research Associate, Email: f.corubolo@liverpool.ac.uk
John Harrison - Research Associate, Email: john.harrison@liverpool.ac.uk

Technical Advisory Committee

Responsible for advising the project group on technical issues particularly grid protocols, grid technology, digital library and digital curation issues. This group is formed from the members of the Cheshire Project team.

Project Partners

Sheila Anderson - Arts and Humanities Data Service. E-mail sheila.anderson@ahds.ac.uk
Paul Ell - Center for Data digitisation and Analysis, Queen's University, Belfast. E-mail p.ell@qub.ac.uk
Ilkay Altintas - Scientific Automation Technologies Lab, SDSC. E-mail altinas@sdsc.edu


Valid XHTML 1.0!

JISC The Joint Information Systems Committee