| Original Proposal |
![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This is the original text from the DataGrid Technical Annex, defining WP2. Workpackage 2 - GRID Data ManagementIn an increasing number of scientific and commercial disciplines, large databases are emerging as important community resources. The goal of this work package is to specify, develop, integrate and test tools and middle-ware infrastructure to coherently manage and share Petabyte-scale information volumes in high-throughput production-quality grid environments. The work package will develop a general-purpose information sharing solution with unprecedented automation, ease of use, scalability, uniformity, transparency and heterogeneity.
It will enable secure access to massive amounts of data in a universal global name space, to move and replicate data at high speed from one geographical site to another, and to manage synchronisation of remote copies. Novel software for automated wide-area data caching and distribution will act according to dynamic usage patterns. Generic interfacing to heterogeneous mass storage management systems will enable seamless and efficient integration of distributed resources.
An important innovative aspect of WP2 is bringing Grid data management technology to a level of practical reliability and functionality to enable it to be deployed in a production quality environment – this is a real challenge. The work by the Globus team and that of current US projects (GriPhyN, PPDG) is attempting to solve similar Data Management problems. We will be trying as far as possible to avoid unnecessary duplication of major middleware features and approaches by keeping aware of their work and collaborating as fully as possible. Work Package Tasks 2.3 (Replication), and 2.6 (Query Optimisation) will be the main areas where novel techniques will be explored, such as the use of cooperating agents with a certain amount of autonomy. It is planned to apply this technology to permit a dynamic optimisation of data distribution across the DataGrid as this data is accessed by a varying load of processing tasks present in the system. Task 2.1 Requirements definition (month 1-3)In this phase a strong interaction with the Architecture Task Force and the end users will be necessary. The results of this task will be collated by the project architect and issued as an internal project deliverable. Task 2.2 Data access and migration (month 4-18)This task handles uniform and fast transfer of files from one storage system to another. It may, for example, migrate a file from a local file system of node X over the grid into a Castor disk pool. An interface encapsulating the details of Mass Storage Systems and Local File System provides access to data held in a storage system. The Data Accessor sits on top of any arbitrary storage system so that the storage system is grid accessible. Task 2.3 Replication (month 4-24)Copies of files and meta data need to be managed in a distributed and hierarchical cache so that a set of files (e.g. Objectivity databases) can be replicated to a set of remote sites and made available there. To this end, location independent identifiers are mapped to location dependent identifiers. All replicas of a given file can be looked up. Plug-in mechanisms to incorporate custom tailored registration and integration of data sets into Database Management Systems will be provided. Task 2.4 Meta data management (month 4-24)The glue for components takes the shape of a Meta Data Management Service, or simply Grid Information Service. It efficiently and consistently publishes and manages a distributed and hierarchical set of associations, i.e. {identifier à information object} pairs. The key challenge of this service is to integrate diversity, decentralisation and heterogeneity. Meta data from distributed autonomous sites can turn into information only if straightforward mechanisms for using it are in place. Thus, the service defines and builds upon a versatile and uniform protocol, such as LDAP. Multiple implementations of the protocol will be used as required, each focussing on different trade-offs in the space spanned by write/read/update/search-performance and consistency. Research is required in the following areas:
Task 2.5 Security and transparent access (month 4-24)This task provides global authentication (“who are you”) and local authorisation (“what can you do”) of users and applications acting on behalf of users. Local sites retain full control over the use of their resources. Users are presented a logical view of the system, hiding physical implementations and details such as locations of data. Task 2.6 Query optimisation support and access
pattern management (month 4-24)
Given a query, produces a migration and replication execution plan that maximises throughput. Research is required in order to determine, for example, how long it would take to run the following execution plan: Purge files {a,b,c}, replicate {d,e,f} from location A to location B, read files {d,e,f} from B, read {h} from location C, in any order; The Meta Data Management service will be used to keep track of what data sets are requested by users, so that the information can be made available for this service. Task 2.7 Testing, refinement and co-ordination
(month 1-36)
The testing and refinement of each of the software components produced by Tasks T2.2, T2.3, T2.4, T2.5, T2.6 will be accomplished by this task, which continues to the end of the project. This task will take as its input the feedback received from the Integration Testbed work package and ensure the lessons learned, software quality improvements and additional requirements are designed, implemented and further tested. In addition, the activities needed for co-ordination of all WP2 tasks will be carried out as part of this Task. ResourcesThe resources required to implement the workpackage
are as follows:
Description sheet:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Feedback and questions concerning this site should be directed
to EDG-WP2@cern.ch Last updated June 20, 2003 |
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||