Title: Net100: The Development of Network-Aware Operating Systems http://www.net100.org

 

Pittsburgh Supercomputer

PI: Gwendolyn Huntoon

Assistant Director, Pittsburgh Supercomputing Center

4400 Fifth Avenue, Pittsburgh, PA

412/268-6354, Email: huntoon@psc.edu

Lawrence Berkeley National Laboratory

PI: Brian Tierney

Lawrence Berkeley National Laboratory (LBNL)

1 Cyclotron Rd. MS: 50B-2239, Berkeley, CA 94720

tel: 510-486-7381 fax: 510-495-2998 efax: 240-332-4065

bltierney@lbl.gov http://www-didc.lbl.gov/~tierney

 

Oak Ridge National Laboratory

PI: Thomas Dunigan/Nagi Roa

Mailstop 6355, Bldg 6010, Computer Science and Mathematics Division

Oak Ridge National Laboratory, Oak Ridge, TN 37831-6355

phone:(865) 574-7517; fax: (865) 241-0381; email; raons@ornl.gov

Executive Summary

The well-known difficulties in obtaining good network performance for TCP-based applications without expert tuning or case-by-case application optimization can be overcome by building expertise into the operating system where it will benefit all users and applications.

The Net100 Collaboration (PSC, NCAR, UT, LBL, and ORNL) will to develop a model for network-aware operating systems using Web100 as the means for incorporating network information and its analysis into host operating systems to improve performance. To investigate how effective network-aware operating systems can be, we will use a three-phase approach. First, we will use the network-aware; Web100 based operating system that we develop to create a simple, bulk-transport application and demonstrate its use over high performance network links. We will then extend this model to support more advanced and complex applications, moving from point-to-point optimization to optimizations for fully distributed environments. Finally, as proof that a network-aware operating system can tune and optimize performance on behalf of applications, we will also develop application-internal tools (based on NetLogger) to monitor the efficiency of application support, and provide an external monitoring methodology (based on the Network Weather Service) to gauge the impact this system has on the rest of the network.

In addition to serving the needs of high-performance computing and network users, this project will serve as an (open source) showcase for network-aware operating systems, beginning with a Web100-based O/S. The tools, sources, measured results, and methods will be showcased on a "Closing the Wizard Gap" web site and made available to the high performance networking community.

Major Goals and Technical Challenges:

The core component of the Net100 project is a network aware operating system (NAOS). The NAOS collectively refers to kernel level modifications as well as API’s, libraries, and daemons developed as part of the project to support the network-aware functions. In many cases actual NAOS components may depend on the base operating system and/or the underlying network technologies.

Net100 will use the above technologies and expertise as mechanisms for testing and refining Web100. however, there are some critical missing components which must be developed to allow a thorough analysis. These will also be provided by Net100, and include:

The results of this monitoring and analysis will then be fed back to the Web100 team for further refinement of the Web100 kernel.

The goal of Net100 is to eliminate what has been called the "wizard gap". Through the integration of end-to-end and application-level monitoring capabilities with the tuning and diagnostic capabilities provided by Web100, we will develop a unique and general-purpose system for optimizing and understanding end-to-end network and application performance. This will allow us to create a network-aware operating system that will be able to maximize network utilization for a wide variety of applications and, without help from the wizards, eliminate the wizard gap.

Major Milestones and Activities:

Year 1:

Year 2:

Year 3:

Current Connections with Other SciDAC Projects:

This project will work closely with the LBNL "Self Configuring Network Monitoring" project, which plans to utilize the monitoring data archive that net100 will deliver. We also expect to collaborate with the "Bandwidth Estimation: Measurement Methodologies and Application" project lead by K. Claffy. We plan to test any tools provided by this project using the NTAF.