Operating Systems for Supercomputers
                           Birds of a Feather Session
                         Supercomputing '89, Reno, NV.

                              Daniel J. Kopetzky|-
                                 John Riganati|-
                                  Dieter Fuss|=

                        |- Supercomputing Research Center
                              16100 Science Drive
                                Bowie, MD. 20715

                    |= Lawrence Livermore National Laboratory
                              P. O. Box 808, L-669
                              Livermore, CA 94551

                                    _A_B_S_T_R_A_C_T

            This paper summarizes the discussions held in  a  birds-of-a-
       feather  session  on  the topic of future operating system support
       for supercomputers.  The group  collectively  worked  to  identify
       current problem areas and identify potential future ones.

_1.  _B_a_c_k_g_r_o_u_n_d

     Advanced systems software development for supercomputers in the 90's may be
viewed  as falling primarily into two areas - distributed computing and parallel
processing.  The driving forces for distributed computing come from a desire  to
increase  user  productivity  (for example, with a friendly workstation environ-
ment), maximize efficient use of resources, and enhance  resource  sharing;  the
driving  forces  for  parallel processing come from a desire to increase perfor-
mance, especially for studying physical phenomena and other large-scale calcula-
tions or to decrease cost at a constant performance.

     Requirements for advanced systems software development for distributed com-
puting  and  parallel processing may be structured into at least four areas: the
operating system, the application run-time environment, tools and utilities, and
command language interface. A "birds-of-a-feather" session held and Supercomput-
ing `89 in Reno NV on 16 November, 1989 concentrated on only the operating  sys-
tem.

_2.  _T_o_p_i_c_s _P_r_e_s_e_n_t_e_d

     A set of four topic areas were presented.  End users  require  _s_u_p_p_o_r_t  for
program creation, debugging, and tuning.  _C_o_m_m_u_n_i_c_a_t_i_o_n _M_a_n_a_g_e_m_e_n_t has become an
important topic in a heterogeneous computing environment populated with machines
that   range   from   supercomputers   to  workstations.   _P_r_o_c_e_s_s_o_r  _M_a_n_a_g_e_m_e_n_t

                                  May 11, 1990

                                     - 2 -

encompasses the problems of choosing from multiple computers that  could  run  a
task.   _S_t_o_r_a_g_e _M_a_n_a_g_e_m_e_n_t covers how main memory is used, file system capabili-
ties, and data archiving.

     The group was asked to view them as a starting point to  which  they  could
amplify or add their own concerns.

_3.  "..._i_l_l_i_t_i_e_s" _T_h_a_t _W_e _W_a_n_t

                         The list to the left indicates some of the capabilities
____________________________________________   that are desired in a computing system.  Making systems
_||_D__e__s__i__r__e__d____C__a__p__a__b__i__l__i__t__i__e__s___||_   easy to use is a quest for  reducing  the  people  cost
||    Ease of use     ||   associated with running, and programming at a supercom-
_||_____________________||   puting center.
||    Performance     ||
_||___M_a_n_a_g_e_a_b_i_l_i_t_y______||   Jobs are assigned  to  supercomputers  because  of  the
||   Expandability    ||   machine's  performance.   Predicting  and measuring the
||    Flexibility     ||   performance of tasks  is  necessary.   System  managers
||  Interoperability  ||   should be able to create different levels of service.
||  Maintainability   ||
_||_____________________||   We must be able to plan the growth of  a  center.  That
||    Portability     ||   may  take  the form of expanding an existing machine or
_||__S_t_a_n_d_a_r_d_i_z_a_t_i_o_n_____||   adding new ones. A variety of  machines  must  be  sup-
||    Reliability     ||   ported.
||   Recoverability   ||
_||_____________________||   Programs may have useful lifetimes that exceed that  of
_||||_____________S__e__c__u__r__i__t__y_______________||||_   a  machine.  Mechanisms and guidelines to increase por-
                         tability are needed.

Long running computation will encounter machine failures.   System  support  for
minimizing  lost work is needed.  some tasks may require a dynamic allocation of
computing resources from a distributed computing environment.

A security policy must be provided to insulate users  from  each  other.   These
features  need  to  be  maintained while permitting global access to a computing
center.  Furthermore, some tasks may require a dynamic allocation  of  computing
resources from a distributed computing environment.

_4.  _U_s_e_r _S_u_p_p_o_r_t

                         Multiple process tasks adds a new dimension for bugs to
____________________________________________   inhabit.   With  the  state  of  the computation spread
||Debugging           ||   across several processors cleanly stopping the task and
||  Multi process     ||   repeatability are difficult to achieve.
||  Network           ||
_||__H_e_t_e_r_o_g_e_n_e_o_u_s_______||   Placing  the  component  computations   on   processors
||Performance         ||   separated by a network adds another layer of complexity
||  Prediction        ||   and diminishes the amount of  direct  hardware  control
||  Measurement       ||   over a program.
||  Production        ||
_||_____________________||   Debugging in a heterogeneous system adds a further com-
_||||_S__e__a__m__l__e__s__s____E__n__v__i__r__o__n__m__e__n__t___||||_   plication of trying to map different hardware capabili-
                         ties to a common view.

                                  May 11, 1990

                                     - 3 -

The performance of a program should be planned from its  initial  design.   Pro-
grammers  need  enough  information to be able to predict the cost of the system
services that they use.

The actual performance of a code needs to be measured and those  results  corre-
lated with the predicted performance.

To insure that programs continue to  meet  their  expected  performance  profile
measurements must continue throughout production code runs.

_5.  _C_o_m_m_u_n_i_c_a_t_i_o_n _M_a_n_a_g_e_m_e_n_t

                         Computing centers will  have  multiple  supercomputers.
______________________________________      High  speed networks are needed to transfer information
||Virtual Resources||      between those machines and from  the  machines  to  the
||  Naming         ||      front end and workstations of the users.
_||__D_i_s_k_____________||
||Intertask        ||      The peripheral devices can  be  shared  across  several
||  On machine     ||      machines.  The NFS disk sharing protocol is one example
||  Cross machine  ||      of information sharing used in the mini-computer world.
||  To workstations||      Special  attention needs to be given to problems caused
_||__________________||      by high interconnecting high performance systems.  Nam-
||Gigabit links    ||      ing  conventions  are  needed to identify these distri-
||  Protocols      ||      buted resources without requiring each program to main-
_||||_____G__l__o__b__a__l____n__e__t__w__o__r__k__s___||||_      tain   explicit   knowledge  of  the  location  of  the
                         resources.

Standards for building groups of communicating parallel processes will  allow  a
single  task to be spread across several processors. Different solutions to this
problem may be needed for the special cases of groups of processors running in a
shared  memory  machine and for communication between machines made by different
manufacturers.

Network protocols designed for high speed networks  may  be  needed  to  augment
standard protocols that were developed primaryly for long haul, high error rate,
communication.  Wide area regional network access to supercomputer  centers  are

                                  May 11, 1990

                                     - 4 -

required by some user communities.

_6.  _P_r_o_c_e_s_s_o_r _M_a_n_a_g_e_m_e_n_t

                         The high cost  of  supercomputing  usually  means  that
__________________________________________________machines  must  be  shared across cost accounting boun-
||Authentication         ||daries. Access control mechanisms are used to constrain
||  Accounting           ||users  into  using  a  subset  of a center's resources.
_||__A_c_c_e_s_s__C_o_n_t_r_o_l_________||These mechanisms must be implemented such that they  do
||Usage Control          ||not interfere with the goals of distributed computing.
||  Priority             ||
||  Researcher's iface   ||Researchers need to be able to control their  computing
||  Administrator's iface||costs.   Priority schemes allow a trade off between the
_||________________________||cost and completion time.  Whereas researchers probably
||Load Balancing         ||want  to conserve funds administrators want to minimize
||  CPU Selection        ||idle (unpaid for) resources. This may lead to  signifi-
||  Distributed syscalls ||cant  differences  between  resource control mechanisms
||  Process migration    ||for those two groups.
_||__C_h_e_c_k_p_o_i_n_t_/_r_e_s_t_a_r_t_____||
||Scheduling             ||In a heterogeneous computing environment it may be  the
||  Long running jobs    ||case  that  some tasks can execute (with differing per-
||  Batch                ||formance metrics) on any of a variety of machines.   It
||  Interactive          ||was  mentioned that a system which permits distribution
_||________________________________________________||_of system calls can facilitate load  sharing.   Process
                         migration  allows  tasks to adapt to a changing pattern
                         of resource availability.
The scheduling of work needs to be considered with a view of the entire  comput-
ing  center.  The presence of high-speed long haul networks may even permit load
shifting from one center to another. A balance is needed between  machine  effi-
cient  batch  jobs and interactive computation.  Finally tasks that take a large
amount of wall time, exceeding weeks of supercomputer time, will require special
scheduling consideration.

_7.  _M_e_m_o_r_y _M_a_n_a_g_e_m_e_n_t

__________________________________________  Supercomputers have enormous main memory systems.  Super-
||Main Memory        ||  computer applications seem to want all of that memory for
||  Coordination     ||  themselves.  Algorithms for allocation of main memory and
||  Swapping & Paging||  system  calls  to address memory as a negotiable resource
_||____________________||  are needed to grow  from  the  mini-computer  environment
||I/O Scaling        ||  where  a  task can be moved to secondary storage in a few
||  Cache control    ||  seconds.
_||__I_n_t_e_l_l_i_g_e_n_t__d_i_s_k___||
||File System        ||  Over  time  processor   performance   improvements   have
||  RAID disk        ||  outstriped those in the storage area. Complex I/O systems
||  Data compression ||  that grow  with  increased  system  capabilities  may  be
||  Beyond gigabytes ||  needed to preserve a balance between computation and I/O.
_||____________________||
_||_A__r__c__h__i__v__i__n__g_______________________||_

Newer technology will force change in the  ways  that  storage  is  used.   Data
compression techniques have been used to increase the recording capacity of disk
drives. Perhaps these same techniques can extend network and  channel  bandwidth

                                  May 11, 1990

                                     - 5 -

by  moving data in a compressed form.  Individual data sets will grow as comput-
ing capacity increases

The retention of data for years will need standards that  outlive  the  machines
that originally generated the data.

_8.  _R_e_s_o_u_r_c_e _C_o_n_t_r_o_l _a_n_d _A_d_m_i_n_i_s_t_r_a_t_i_o_n

                           The identification of available excess resources  and
____________________________________________________ their  assignment to interested parties can be accom-
||Brokering               || plished through the  actions  of  a  software  system
||  CPU cycles            || called  a  resource broker. This lessens the need for
||  Processor type        || individual processors to get  involved  in  strategic
||  Disk storage          || resource assignment.
_||__D_i_s_k__b_a_n_d_w_i_d_t_h__________||
||System Updates          || Because supercomputing installations generally differ
||  Internal OS interfaces|| portions  of  the  system need to be tailored to each
||  Customization guides  || site's requirements. Some times this includes insert-
||  Site configuration    || ing extensions into the base systems's service reper-
_||_________________________|| toire.  Guidelines for this type of modification  are
||Documentation           || needed  to ease the burden of re-integration when the
||  Vendor                || vendor updates system software.
_||||_____U__s__e__r____g__e__n__e__r__a__t__e__d___________________||||_
                           The presence  of  high-quality  online  documentation
                           helps  educate  the  user  community and should yield
                           better utilization of existing software and  hardware
                           resources.  Attention  needs to be given to providing
                           the same quality documentation delivery system to  be
                           used by local authors.

                                  May 11, 1990