HEPiX Meeting
Abstracts of Talks
An evaluation of PCs for HEP under Windows NT and Linux
Speaker: Michael Ogg University of Texas
This will be a more technical version of a talk presented
at CHEP - http://www.ifh.de/CHEP97/paper/abstracts/g362.html
There is a lot of interest in using PCs in HEP. We have
done performance measurements in several environments: Linux,
NT, and "conventional" workstations (DEC alpha, Sun Sparc,
etc.) As well as cost (a well equipped Pentium Pro costs
around US$2,000), PCs are also able to take advantage of
state-of-the-art hardware. For instance, we have been using
Ultra-Wide SCSI disk and 100 Mbps Fast-Ethernet. The real
test though is performance.
There is (predictably) no single answer. On many C and C++
benchmarks, the PCs (with either Linux or NT) perform almost
as well as the fastest workstations. Compiler differences,
and 64-bit workstation architectures, are also quite clear,
particularly on Fortran double precision benchmarks. But the
real test is the target HEP application code. We present
several results showing that PCs are more than competitive
with workstations.
The RAL NT Farm
Speaker: John Gordon Rutherford Appleton Lab
As a feasibility study, Rutherford Appleton Lab developed
a small batch facility based on Intel hardware and Windows
NT software. The talk will describe how the essential
features of such a farm: remote user access, batch service,
remote data, tapes, etc were provided. It will describe our
experiences and difficulties and how we overcame some of
them and our plans for expansion.
FNAL Domain Plan
Speaker: Jack Schmidt FNAL
FNAL Domain Plan
There are many different NT domain designs offered with
NT. The Computing Division at Fermilab changed their
initial NT structure to one that they feel will scale
with the growing user community. This talk will discuss
briefly, NT domain structures and why Fermilab chose the
one they did.
Experience with Linux on Production Systems
Speaker: Michael Ogg University of Texas
At the University of Texas at Austin, members of the Nile
project have been using Linux on "production" systems for
nearly 3 years. As well as a network of 10 systems (ranging
from a laptop to 200 MHz PPro), we also support 10
X-terminals. We attempt to run the system "professionally"
(i.e. systems must be available, and not subject to reboot
at little notice). Linux has shown itself to be both stable
and manageable. The only weakness is that some device
drivers for the "latest" hardware sometimes have
imperfections. But we have to remember that this hardware
is often not available at all for other workstations.
The one myth (which I call the "Professional Sysadmin
Syndrome") that I try to dispel is that Linux is not a real
OS. We have found it every bit as reliable and as well (or
better) supported than most commercial Unices. And certainly
the usual range of tools (compilers, debuggers, netscape,
Acrobat readers, etc.) is available.
LINUX at CERN
Speaker: Alan Silverman CERN
Abstract - Recently, CERN's central Information Technology
Division discussed options for Linux support in the context
of its other commitments. This talk presents the results of
that review.
Software for Linux - from Java to CORBA
Speaker: Michael Ogg University of Texas
One of the alleged weaknesses of Linux is the lack of
software. Here I comment on the tools available, and what
we have done to port them to Linux. Examples include JDK
(works out of the box), CORBA (we ported the Electra ORB),
g77 (if you must do Fortran), gcc (our nice twist is that
Linux on a PPro far exceeds a Sparc20 in performance, so
we cross-compile SunOS binaries under Linux).
LSF at CERN - A Status Report
Speaker: Tony Cass CERN
In November, CERN's FOCUS committee approved an LSF Pilot
Project in order to carry out large scale testing of LSF,
Platform Computing's Load Sharing Facility, for use in the
CORE Unix Batch Environment and the CUTE Interactive Unix
Services. Although many tests have shown that LSF is a
good candidate to replace NQS as CERN's standard batch
scheduler, these have been small scale and there has been
no attempt to try and implement the changes that were felt
to be necessary to make LSF more useable more generally.
During the Pilot Project LSF will be used to manage the
batch queues on the SHIFT platforms for ATLAS, CMS, DELPHI
and OPAL and on the WGS clusters for ATLAS and CMS. With
these platforms the Pilot Project will be testing LSF in
environments that are representative of the dfifferent
demands on Computer Centre resources--notably for LEP
production and general user analysis and for LHC work
based around Monte Carlo generationand analysis by a
smaller number of users.
Jefferson Lab UNIX Environment, an update
Speaker: I.Bird Jefferson Lab
By mid-year 1997, Jefferson Lab will have simultaneous
production on all 3 Experimental Halls. This talk will
overview the software environment we are developing for
the experimental user -- specifically the Jefferson Lab
Off-Line Batch System (JOBS). JOBS interfaces to the OSM
HSM product to manage the file system for raw and analyzed
data files as well as to the Load Sharing Facility (LSF)
which manages the central batch CPU farm (Solaris & AIX
systems). Using JOBS, a user will be able to automate the
submission of batch jobs to analyze experimental data,
retrieve and store files to the central mass storage silo,
and track the files associated with the data reduction
process.
Unix Clusters Why and How
Speaker: R.Lauer Yale
Most sites provide services to Unix workstations via
servers. Questions arise as to how to ensure reliability
of these services if servers fail.
Most Unix Vendors implement some type of "cluster" which
at minimum provides failover of services. This talk will
present some experience with Digital's TRUcluster
implementation: why it's needed and if it performs.
The Global Resource Director
Speaker: Erik Riedel GENIAS Software, Germany
GRD, the Global Resource Director, provides mission-driven
workload management by fine-grained policy management and
dynamic scheduling.
Enterprise goals such as on-time completion of critical
work or fair resource sharing are expressed as policies
that determine how computing resources are used and shared.
GRD allocates resources among jobs when a job is dispached
and throughout its lifetime. This insures that the most
important work at any instant receives its deserved system
share by allowing newly-arrived, more important work to
take resources away from less important executing jobs.
Patrol - a tool for implementing system policies
Speaker: Chuck Boeheim SLAC
Patrol is a tool written at the Stanford Linear
Accelerator Center that periodically checks the health
and status of machines. It can detect looping and
otherwise misbehaving processes, and can help in
implementing site policy about where different types
of work can be done. It's especially useful in
encouraging users to do batch-like work on designated
batch systems.
SUE Overview
Speaker: Ignacio Reguero CERN
SUE stands for Shrink-wrapped or Standardized Unix
Environment. It is a set of software components,
configuration files and utility programs which together
form a ready-to-use, site-customized Unix system.
Workstation and Boot Management System WBOOM
Speaker: Thomas Finnern DESY
DESY has prepared a set of tools to manage the system
support for a large number of workstations, xterminals,
personal computers and printers in a heterogenous
environment. Major features of the current implementation
are --
- Full Automatic Documentation
- Transparent Teamwork
- Idempotent Operation
- High Efficiency
- Common Handling for all Equipment Classes
- Easy Usage and Installation
- World Wide Naming Space and Distribution
Implementing SUE on Solaris
Speaker: Ignacio Reguero CERN
Systems lacking the support for bootp or other conveniently
routed boot protocol are very difficult to manage in routed
environment. Most of Sun SPARC hardware only supports a boot
protocol based on RARP (Reverse Address Resolution Protocol)
which does not support routing.
This paper presents the system done by the authors in order
to enable a Solaris network installation service in the CERN
network.
Escrow - a method for secure sharing of passwords
Speaker: Chuck BoeheimSLAC
Escrow is a tool written at the Stanford Linear
Accelerator Center that uses PGP to store passwords and
other secrets so that members of the group can securely
retrieve them when needed. It has become an important
tool for adminstration of services at SLAC.
Experience with Java
Speaker: M.Davis Jefferson Lab
The user interface to the Jefferson Lab Off-Line Batch
System (JOBS) has been developed using Java 1.1 including
the jdbc API and a commercial interface product OpenLink to
interface to the Ingres relational database system that will
track the jobs and files associated with processing an
experiment's data. This talk will present our experiences
with using Java as a development tool, performance
considerations, and some thoughts on further development
work we are considering to expand the current Java interface.
FNALU AFS Review and its outcome
Speaker: Lisa Giacchetti FNAL
In January of this year the Computing Division held a
formal review of AFS and its use at FERMI. This talk
will include an overview of that talk and the concerns
that brought the review about. I will also discuss the
decisions made as a result of the review.