Project: Stability Modeling and Control of Transport Protocols for High-Speed Data Grids

PI: Nagi Rao

Oak Ridge National Laboratory

Senior Research Staff

Computer Science and Mathematics Division

Oak Ridge National Laboratory

Tel: 865-574-7517, Email: nrao@icesar.epm.ornl.gov

 

 

Executive Summary

DOE's networking needs for applications such as distributed computation using networked supercomputers, and remote experimentation and instrument control require unprecedented end-to-end performance from the network. Precise prediction and control of end-to-end dynamics of various network mechanisms is essential in these applications. Unpredictable delays could result in the idling of supercomputers waiting for coordination messages, and uncontrolled delays and jitter can result in the lack of controllability or destabilization of network control loops. The end-to-end dynamics in the underlying computer network are due to the cumulative effect of traffic in the network as well as various router and host mechanisms. Consequently, the observed end-to-end behavior exhibits complicated dynamics, which must be explicitly accounted for in these applications.

We propose analytical methods to identify various regions of the end-to-end dynamics of communication mechanisms over wide-area networks with a special emphasis on identifying highly unstable and chaotic regions. Detailed analysis will be performed to identify various stable regions wherein the end-to-end dynamics can be predicted and controlled. Mechanisms will then be designed to maintain the dynamics in stable yet high throughput regions by suitable traffic engineering at the hosts and by employing multiple diverse paths. The traffic engineering suitably controls the host processes and the multiple paths provide the diversity of communication streams that can ameliorate the destabilization often caused by single streams. Network instruments will be designed to implement the required traffic engineering methods and to realize multiple paths. The measurements collected by the instruments enable the state estimation, which is then utilized in end-to-end control. This scheme will be implemented by a set of daemon processes, which completely off-load the networking tasks from the applications.

 

Deliverables and milestones

Year 1.

  1. Dynamics and Chaos Analysis: A complete analysis of communication mechanisms, including TCP, will be carried out to identify the stable and chaotic regions by utilizing the generalization of our projection method.
  2. Measurement Instrument Design: Network instruments will be designed to collect end-to-end measurements at the hosts and to obtain information using available mechanisms such as tcpdump, traceroute and generalized ping.
  3. End-to-End Prediction: The analytical methods will be combined with the measurements to estimate the nature of the dynamic regions with a special focus on detecting high levels of instability and chaos.

 

 

 

Year 2.

  1. End-To-End Control: Host-based control traffic engineering mechanisms will be designed, including graceful tuning of TCP, and implemented to drive the dynamics into stable and predictable yet high throughput regimes.
  2. Multiple Path Instrumentation: Methods will be designed to implement multiple paths to dynamically balance the traffic to provide the stability and predictability.
  3. ESnet and Internet Implementation: The measurement and control instruments will be implemented and tested in ESnet and Internet environments.

 

Year 3.

  1. Complete Integrated Design: An integrated design will be performed to couple the measurement and control modules by optimizing the information flows and interfaces between various modules.
  2. Implementation and Testing: The entire system will be implemented and tested as a single integrated unit using our ORNL networks.

3. Internet and ESnet Implementation: The proposed system will be customized to ESnet and Internet and will be provided to application developers.