A Distributed Disk Layer for Mass Storage at DESY

Paper: 410
Session: C (talk)
Speaker: Brand, Susanne, DESY, Hamburg
Keywords: data management, hierarchical storage management, large systems, mass storage

A Distributed Disk Layer for Mass Storage at DESY

Susanne Brand
DESY Hamburg, Notkestr.85, 22761 Hamburg, Germany

At DESY currently about 34 TByte of data are kept online
in mass storage systems. The main part of it are raw,
processed and Monte Carlo data of the four HERA
experiments and to a lesser extend user data.
It is obvious that different portions of data are used
with varying intensity and access profiles. To optimize
access to those files we are developing a disk layer,
which is settled between the Open Storage Manager and
the user interface.

The central unit of the disk layer is a Migration Server,
which provides migration, staging and prefetching of data.
Migration policies and prefechting hints can be attached
to storage groups. This allows a flexible choice of
these policies, which can be adapted to well-known access
profiles. Foreseeable data usage like during raw data
processing can be accommodated to as well as to actual
user demands. By these means the data streams from tape
drives and the data streams to the clients are decoupled
and the request distribution is smoothened. This does not
only reduce access times but moreover makes possible full
exploitation of drives and robots and thus decreases
needs on these resources.

To take full advantage of distributed disk resources in
the heterogeneous computing environment at DESY, the
migration server is able to manage lokal disks as well
as remote disk systems via remote file i/o. The selection
of a disk system for file migration can also be influenced
by setting according policies. Thus access times as well
as network traffic is significantly reduced.

The disk layer is in progress. The talk presents its
structure and its communication, data transfer and lock
mechanisms. The underlying migration and prefetching
algorithms with respect to the demands of data access
in high energy physics will be explained. Moreover an
evaluation of the gain in access times and exploitation
of robot and drive resources will be given.