HEPIX meeting at Zeuthen - Abstracts of Talks

HEPiX Meeting
Abstracts of Talks

An evaluation of PCs for HEP under Windows NT and Linux

Speaker: Michael Ogg University of Texas

This will be a more technical version of a talk presented at CHEP - http://www.ifh.de/CHEP97/paper/abstracts/g362.html
There is a lot of interest in using PCs in HEP. We have done performance measurements in several environments: Linux, NT, and "conventional" workstations (DEC alpha, Sun Sparc, etc.) As well as cost (a well equipped Pentium Pro costs around US$2,000), PCs are also able to take advantage of state-of-the-art hardware. For instance, we have been using Ultra-Wide SCSI disk and 100 Mbps Fast-Ethernet. The real test though is performance.

There is (predictably) no single answer. On many C and C++ benchmarks, the PCs (with either Linux or NT) perform almost as well as the fastest workstations. Compiler differences, and 64-bit workstation architectures, are also quite clear, particularly on Fortran double precision benchmarks. But the real test is the target HEP application code. We present several results showing that PCs are more than competitive with workstations.

The RAL NT Farm

Speaker: John Gordon Rutherford Appleton Lab

As a feasibility study, Rutherford Appleton Lab developed a small batch facility based on Intel hardware and Windows NT software. The talk will describe how the essential features of such a farm: remote user access, batch service, remote data, tapes, etc were provided. It will describe our experiences and difficulties and how we overcame some of them and our plans for expansion.

FNAL Domain Plan

Speaker: Jack Schmidt FNAL

FNAL Domain Plan
There are many different NT domain designs offered with NT. The Computing Division at Fermilab changed their initial NT structure to one that they feel will scale with the growing user community. This talk will discuss briefly, NT domain structures and why Fermilab chose the one they did.

Experience with Linux on Production Systems

Speaker: Michael Ogg University of Texas

At the University of Texas at Austin, members of the Nile project have been using Linux on "production" systems for nearly 3 years. As well as a network of 10 systems (ranging from a laptop to 200 MHz PPro), we also support 10 X-terminals. We attempt to run the system "professionally" (i.e. systems must be available, and not subject to reboot at little notice). Linux has shown itself to be both stable and manageable. The only weakness is that some device drivers for the "latest" hardware sometimes have imperfections. But we have to remember that this hardware is often not available at all for other workstations.

The one myth (which I call the "Professional Sysadmin Syndrome") that I try to dispel is that Linux is not a real OS. We have found it every bit as reliable and as well (or better) supported than most commercial Unices. And certainly the usual range of tools (compilers, debuggers, netscape, Acrobat readers, etc.) is available.

LINUX at CERN

Speaker: Alan Silverman CERN

Abstract - Recently, CERN's central Information Technology Division discussed options for Linux support in the context of its other commitments. This talk presents the results of that review.

Software for Linux - from Java to CORBA

Speaker: Michael Ogg University of Texas

One of the alleged weaknesses of Linux is the lack of software. Here I comment on the tools available, and what we have done to port them to Linux. Examples include JDK (works out of the box), CORBA (we ported the Electra ORB), g77 (if you must do Fortran), gcc (our nice twist is that Linux on a PPro far exceeds a Sparc20 in performance, so we cross-compile SunOS binaries under Linux).

LSF at CERN - A Status Report

Speaker: Tony Cass CERN

In November, CERN's FOCUS committee approved an LSF Pilot Project in order to carry out large scale testing of LSF, Platform Computing's Load Sharing Facility, for use in the CORE Unix Batch Environment and the CUTE Interactive Unix Services. Although many tests have shown that LSF is a good candidate to replace NQS as CERN's standard batch scheduler, these have been small scale and there has been no attempt to try and implement the changes that were felt to be necessary to make LSF more useable more generally.

During the Pilot Project LSF will be used to manage the batch queues on the SHIFT platforms for ATLAS, CMS, DELPHI and OPAL and on the WGS clusters for ATLAS and CMS. With these platforms the Pilot Project will be testing LSF in environments that are representative of the dfifferent demands on Computer Centre resources--notably for LEP production and general user analysis and for LHC work based around Monte Carlo generationand analysis by a smaller number of users.

Jefferson Lab UNIX Environment, an update

Speaker: I.Bird Jefferson Lab

By mid-year 1997, Jefferson Lab will have simultaneous production on all 3 Experimental Halls. This talk will overview the software environment we are developing for the experimental user -- specifically the Jefferson Lab Off-Line Batch System (JOBS). JOBS interfaces to the OSM HSM product to manage the file system for raw and analyzed data files as well as to the Load Sharing Facility (LSF) which manages the central batch CPU farm (Solaris & AIX systems). Using JOBS, a user will be able to automate the submission of batch jobs to analyze experimental data, retrieve and store files to the central mass storage silo, and track the files associated with the data reduction process.

Unix Clusters Why and How

Speaker: R.Lauer Yale

Most sites provide services to Unix workstations via servers. Questions arise as to how to ensure reliability of these services if servers fail.

Most Unix Vendors implement some type of "cluster" which at minimum provides failover of services. This talk will present some experience with Digital's TRUcluster implementation: why it's needed and if it performs.

The Global Resource Director

Speaker: Erik Riedel GENIAS Software, Germany

GRD, the Global Resource Director, provides mission-driven workload management by fine-grained policy management and dynamic scheduling.

Enterprise goals such as on-time completion of critical work or fair resource sharing are expressed as policies that determine how computing resources are used and shared. GRD allocates resources among jobs when a job is dispached and throughout its lifetime. This insures that the most important work at any instant receives its deserved system share by allowing newly-arrived, more important work to take resources away from less important executing jobs.

Patrol - a tool for implementing system policies

Speaker: Chuck Boeheim SLAC

Patrol is a tool written at the Stanford Linear Accelerator Center that periodically checks the health and status of machines. It can detect looping and otherwise misbehaving processes, and can help in implementing site policy about where different types of work can be done. It's especially useful in encouraging users to do batch-like work on designated batch systems.

SUE Overview

Speaker: Ignacio Reguero CERN

SUE stands for Shrink-wrapped or Standardized Unix Environment. It is a set of software components, configuration files and utility programs which together form a ready-to-use, site-customized Unix system.

Workstation and Boot Management System WBOOM

Speaker: Thomas Finnern DESY

DESY has prepared a set of tools to manage the system support for a large number of workstations, xterminals, personal computers and printers in a heterogenous environment. Major features of the current implementation are --
- Full Automatic Documentation
- Transparent Teamwork
- Idempotent Operation
- High Efficiency
- Common Handling for all Equipment Classes
- Easy Usage and Installation
- World Wide Naming Space and Distribution

Implementing SUE on Solaris

Speaker: Ignacio Reguero CERN

Systems lacking the support for bootp or other conveniently routed boot protocol are very difficult to manage in routed environment. Most of Sun SPARC hardware only supports a boot protocol based on RARP (Reverse Address Resolution Protocol) which does not support routing.

This paper presents the system done by the authors in order
to enable a Solaris network installation service in the CERN
network.

Escrow - a method for secure sharing of passwords

Speaker: Chuck BoeheimSLAC

Escrow is a tool written at the Stanford Linear Accelerator Center that uses PGP to store passwords and other secrets so that members of the group can securely retrieve them when needed. It has become an important tool for adminstration of services at SLAC.

Experience with Java

Speaker: M.Davis Jefferson Lab

The user interface to the Jefferson Lab Off-Line Batch System (JOBS) has been developed using Java 1.1 including the jdbc API and a commercial interface product OpenLink to interface to the Ingres relational database system that will track the jobs and files associated with processing an experiment's data. This talk will present our experiences with using Java as a development tool, performance considerations, and some thoughts on further development work we are considering to expand the current Java interface.

FNALU AFS Review and its outcome

Speaker: Lisa Giacchetti FNAL

In January of this year the Computing Division held a formal review of AFS and its use at FERMI. This talk will include an overview of that talk and the concerns that brought the review about. I will also discuss the decisions made as a result of the review.