Grid data access optimization
Jobs submitted to the EU DataGrid require three types of
resources: computing, storage and network. The Grid must
make scheduling and replication decisions based on the
current state of these resources. Any file on the Grid may
have several replicas, in several different locations on the
Grid and the aim of the ROS is to optimize access to these
files for all jobs that run on the Grid. This is done in two
ways: firstly the ROS provides an estimate of the access
cost for a job to run at a particular site, i.e. the
time it would take to access all the files required for the
job. This information is used by a Resource Broker to
schedule the job to the optimal Grid site.
Secondly, the ROS is responsible for optimizing file access as
jobs are running and this is carried out in two stages:
- Short-term optimization: When a job requests a file the ROS
finds the best replica on the Grid in terms of the cost of transferring
the file to the local site the job is running on.
- Long-term optimization: Using long-term data access patterns
the ROS can create and delete replicas anywhere on the Grid according
to its predictions of file usage across the Grid. We plan to use an
economy-based algorithm.