The LAFEX Risc Based Farm Solution for CPU Intensive Applications in HEP

Paper: 148
Session: F (poster)
Presenter: Alves, Gilvan, LAFEX/CBPF, Rio de Janeiro
Keywords: parallelization, simulation, world-wide collaboration, large systems, massive parallel systems


The LAFEX Risc Based Farm Solution for CPU Intensive Applications in HEP

G. A. Alves, M. Mendes, M. Miranda, A. Santoro, C. Silva and M. Vaz

LAFEX/CBPF R. Dr. Xavier Sigaud, 150 Rio de Janeiro, RJ, Brazil

High Energy Physics is well known for its event oriented and CPU intensive type
of applications. Event Reconstruction of a large hadron collider detector, like
the DO detector at Fermilab, usually takes around 1000 Mips/second for a
single event. The situation is even worse for the case of Event Simulation
using the Monte Carlo Method, including the Simulation of the Detector
Response, which in this kind of environment can have more than a factor of 10
over the reconstruction case. It is clear from the current scenario, that the
CPU needs for the future collider experiments will be much bigger than the
present ones. On the other hand, the event oriented character of HEP leads
naturally to the parallel processing paradigma. Available commercial solutions
for massive parallel processing, like the Challenge and SP2 machines, are
currently not very attractive in terms of cost/performance, and its high
costs for maintenance can only be afforded by big institutions, reinventing
the mainframe. We have proposed, and built at our Institute(LAFEX/CBPF), a
solution for the high CPU demand of the High Energy Physics groups at
LAFEX, using a FARM
architecture composed of standard off-shell components. The Farm consists
of 30 RISC based slave nodes(IBM-7248 Model 43P), divided into 3 farmlets,
each one attached to a disk server(IBM-7009 Model C20) via a ethernet switch,
which filters the packet
traffic so that the system can be configured for running as 3 independent
farms. The servers have a high disk storage capability(20 Gbytes/server),
plus magnetic tape storage capability, using high density 8mm tape
library systems(IBM-7331 Model 205). All slave nodes are connected, via serial
port, to a console server,
which performs the functions of monitoring errors, load of the system,
software installation, etc. Coupled to this system is a Client/Server
interface(submited to this conference) for event oriented parallel
processing. This combination of hardware and software represents our
solution for the high demand of Monte Carlo event simulation of the
DO Collaboration. Besides of doing Monte Carlo production, part of the
system can be dynamically allocated to run user applications, resuming
the production job afterwards. The whole system is very affordable for a
ordinary institution and
the cost/performance is a factor of 2 better than the "closed" commercial
solutions.