Operating Systems for Supercomputers Birds of a Feather Session Supercomputing '89, Reno, NV. Daniel J. Kopetzky|- John Riganati|- Dieter Fuss|= |- Supercomputing Research Center 16100 Science Drive Bowie, MD. 20715 |= Lawrence Livermore National Laboratory P. O. Box 808, L-669 Livermore, CA 94551 _A_B_S_T_R_A_C_T This paper summarizes the discussions held in a birds-of-a- feather session on the topic of future operating system support for supercomputers. The group collectively worked to identify current problem areas and identify potential future ones. _1. _B_a_c_k_g_r_o_u_n_d Advanced systems software development for supercomputers in the 90's may be viewed as falling primarily into two areas - distributed computing and parallel processing. The driving forces for distributed computing come from a desire to increase user productivity (for example, with a friendly workstation environ- ment), maximize efficient use of resources, and enhance resource sharing; the driving forces for parallel processing come from a desire to increase perfor- mance, especially for studying physical phenomena and other large-scale calcula- tions or to decrease cost at a constant performance. Requirements for advanced systems software development for distributed com- puting and parallel processing may be structured into at least four areas: the operating system, the application run-time environment, tools and utilities, and command language interface. A "birds-of-a-feather" session held and Supercomput- ing `89 in Reno NV on 16 November, 1989 concentrated on only the operating sys- tem. _2. _T_o_p_i_c_s _P_r_e_s_e_n_t_e_d A set of four topic areas were presented. End users require _s_u_p_p_o_r_t for program creation, debugging, and tuning. _C_o_m_m_u_n_i_c_a_t_i_o_n _M_a_n_a_g_e_m_e_n_t has become an important topic in a heterogeneous computing environment populated with machines that range from supercomputers to workstations. _P_r_o_c_e_s_s_o_r _M_a_n_a_g_e_m_e_n_t May 11, 1990 - 2 - encompasses the problems of choosing from multiple computers that could run a task. _S_t_o_r_a_g_e _M_a_n_a_g_e_m_e_n_t covers how main memory is used, file system capabili- ties, and data archiving. The group was asked to view them as a starting point to which they could amplify or add their own concerns. _3. "..._i_l_l_i_t_i_e_s" _T_h_a_t _W_e _W_a_n_t The list to the left indicates some of the capabilities ____________________________________________ that are desired in a computing system. Making systems _||_D__e__s__i__r__e__d____C__a__p__a__b__i__l__i__t__i__e__s___||_ easy to use is a quest for reducing the people cost || Ease of use || associated with running, and programming at a supercom- _||_____________________|| puting center. || Performance || _||___M_a_n_a_g_e_a_b_i_l_i_t_y______|| Jobs are assigned to supercomputers because of the || Expandability || machine's performance. Predicting and measuring the || Flexibility || performance of tasks is necessary. System managers || Interoperability || should be able to create different levels of service. || Maintainability || _||_____________________|| We must be able to plan the growth of a center. That || Portability || may take the form of expanding an existing machine or _||__S_t_a_n_d_a_r_d_i_z_a_t_i_o_n_____|| adding new ones. A variety of machines must be sup- || Reliability || ported. || Recoverability || _||_____________________|| Programs may have useful lifetimes that exceed that of _||||_____________S__e__c__u__r__i__t__y_______________||||_ a machine. Mechanisms and guidelines to increase por- tability are needed. Long running computation will encounter machine failures. System support for minimizing lost work is needed. some tasks may require a dynamic allocation of computing resources from a distributed computing environment. A security policy must be provided to insulate users from each other. These features need to be maintained while permitting global access to a computing center. Furthermore, some tasks may require a dynamic allocation of computing resources from a distributed computing environment. _4. _U_s_e_r _S_u_p_p_o_r_t Multiple process tasks adds a new dimension for bugs to ____________________________________________ inhabit. With the state of the computation spread ||Debugging || across several processors cleanly stopping the task and || Multi process || repeatability are difficult to achieve. || Network || _||__H_e_t_e_r_o_g_e_n_e_o_u_s_______|| Placing the component computations on processors ||Performance || separated by a network adds another layer of complexity || Prediction || and diminishes the amount of direct hardware control || Measurement || over a program. || Production || _||_____________________|| Debugging in a heterogeneous system adds a further com- _||||_S__e__a__m__l__e__s__s____E__n__v__i__r__o__n__m__e__n__t___||||_ plication of trying to map different hardware capabili- ties to a common view. May 11, 1990 - 3 - The performance of a program should be planned from its initial design. Pro- grammers need enough information to be able to predict the cost of the system services that they use. The actual performance of a code needs to be measured and those results corre- lated with the predicted performance. To insure that programs continue to meet their expected performance profile measurements must continue throughout production code runs. _5. _C_o_m_m_u_n_i_c_a_t_i_o_n _M_a_n_a_g_e_m_e_n_t Computing centers will have multiple supercomputers. ______________________________________ High speed networks are needed to transfer information ||Virtual Resources|| between those machines and from the machines to the || Naming || front end and workstations of the users. _||__D_i_s_k_____________|| ||Intertask || The peripheral devices can be shared across several || On machine || machines. The NFS disk sharing protocol is one example || Cross machine || of information sharing used in the mini-computer world. || To workstations|| Special attention needs to be given to problems caused _||__________________|| by high interconnecting high performance systems. Nam- ||Gigabit links || ing conventions are needed to identify these distri- || Protocols || buted resources without requiring each program to main- _||||_____G__l__o__b__a__l____n__e__t__w__o__r__k__s___||||_ tain explicit knowledge of the location of the resources. Standards for building groups of communicating parallel processes will allow a single task to be spread across several processors. Different solutions to this problem may be needed for the special cases of groups of processors running in a shared memory machine and for communication between machines made by different manufacturers. Network protocols designed for high speed networks may be needed to augment standard protocols that were developed primaryly for long haul, high error rate, communication. Wide area regional network access to supercomputer centers are May 11, 1990 - 4 - required by some user communities. _6. _P_r_o_c_e_s_s_o_r _M_a_n_a_g_e_m_e_n_t The high cost of supercomputing usually means that __________________________________________________machines must be shared across cost accounting boun- ||Authentication ||daries. Access control mechanisms are used to constrain || Accounting ||users into using a subset of a center's resources. _||__A_c_c_e_s_s__C_o_n_t_r_o_l_________||These mechanisms must be implemented such that they do ||Usage Control ||not interfere with the goals of distributed computing. || Priority || || Researcher's iface ||Researchers need to be able to control their computing || Administrator's iface||costs. Priority schemes allow a trade off between the _||________________________||cost and completion time. Whereas researchers probably ||Load Balancing ||want to conserve funds administrators want to minimize || CPU Selection ||idle (unpaid for) resources. This may lead to signifi- || Distributed syscalls ||cant differences between resource control mechanisms || Process migration ||for those two groups. _||__C_h_e_c_k_p_o_i_n_t_/_r_e_s_t_a_r_t_____|| ||Scheduling ||In a heterogeneous computing environment it may be the || Long running jobs ||case that some tasks can execute (with differing per- || Batch ||formance metrics) on any of a variety of machines. It || Interactive ||was mentioned that a system which permits distribution _||________________________________________________||_of system calls can facilitate load sharing. Process migration allows tasks to adapt to a changing pattern of resource availability. The scheduling of work needs to be considered with a view of the entire comput- ing center. The presence of high-speed long haul networks may even permit load shifting from one center to another. A balance is needed between machine effi- cient batch jobs and interactive computation. Finally tasks that take a large amount of wall time, exceeding weeks of supercomputer time, will require special scheduling consideration. _7. _M_e_m_o_r_y _M_a_n_a_g_e_m_e_n_t __________________________________________ Supercomputers have enormous main memory systems. Super- ||Main Memory || computer applications seem to want all of that memory for || Coordination || themselves. Algorithms for allocation of main memory and || Swapping & Paging|| system calls to address memory as a negotiable resource _||____________________|| are needed to grow from the mini-computer environment ||I/O Scaling || where a task can be moved to secondary storage in a few || Cache control || seconds. _||__I_n_t_e_l_l_i_g_e_n_t__d_i_s_k___|| ||File System || Over time processor performance improvements have || RAID disk || outstriped those in the storage area. Complex I/O systems || Data compression || that grow with increased system capabilities may be || Beyond gigabytes || needed to preserve a balance between computation and I/O. _||____________________|| _||_A__r__c__h__i__v__i__n__g_______________________||_ Newer technology will force change in the ways that storage is used. Data compression techniques have been used to increase the recording capacity of disk drives. Perhaps these same techniques can extend network and channel bandwidth May 11, 1990 - 5 - by moving data in a compressed form. Individual data sets will grow as comput- ing capacity increases The retention of data for years will need standards that outlive the machines that originally generated the data. _8. _R_e_s_o_u_r_c_e _C_o_n_t_r_o_l _a_n_d _A_d_m_i_n_i_s_t_r_a_t_i_o_n The identification of available excess resources and ____________________________________________________ their assignment to interested parties can be accom- ||Brokering || plished through the actions of a software system || CPU cycles || called a resource broker. This lessens the need for || Processor type || individual processors to get involved in strategic || Disk storage || resource assignment. _||__D_i_s_k__b_a_n_d_w_i_d_t_h__________|| ||System Updates || Because supercomputing installations generally differ || Internal OS interfaces|| portions of the system need to be tailored to each || Customization guides || site's requirements. Some times this includes insert- || Site configuration || ing extensions into the base systems's service reper- _||_________________________|| toire. Guidelines for this type of modification are ||Documentation || needed to ease the burden of re-integration when the || Vendor || vendor updates system software. _||||_____U__s__e__r____g__e__n__e__r__a__t__e__d___________________||||_ The presence of high-quality online documentation helps educate the user community and should yield better utilization of existing software and hardware resources. Attention needs to be given to providing the same quality documentation delivery system to be used by local authors. May 11, 1990