Title: Self-Configuring Network Monitor
PI: Brian Tierney
1 Cyclotron Rd., 50B-2239
Lawrence Berkeley National Laboratory
Berkeley, CA 94720 USA
Tel: 510-486-6363, Email: BLTierney@LBL.GOV
Executive Summary:
Application developers currently have very few tools to aid in developing distributed applications that effectively utilize the network; the tools which do exist are generally accessible only to the network engineer and do not provide information regarding the entire network path (local and wide area networks). Without information about a stream from intermediate hops within the network, the end-to-end system is often unable to identify and diagnose problems within the network. For a distributed application to fully utilize the network, it must first know the current network properties and what is happening to its data. This project is addressing the need for a network monitoring infrastructure to support passive network monitoring. The ultimate goal of this infrastructure is to provide accurate, comprehensive, and on-demand, application-to-application monitoring capabilities throughout the interior of the interconnecting network domains. In this project we are designing and implementing a self-configuring monitoring system that uses special request packets to automatically activate monitoring along the network path between communicating endpoints. Archived monitoring data will help point the way beyond the handcrafted systems of network testbeds to a production environment that can routinely support high performance distributed applications. This passive monitoring system will integrate with active monitoring efforts and provide an essential component in a complete end-to-end network test and monitoring capability. It will complement the existing network operation efforts. A principal design goal of the system is to provide components that are secure, easy to install, and easy to maintain so that the system does not add a burden to the network’s administration. This architecture will not require modifications to the application, network routing, or forwarding infrastructure, nor is human intervention required once monitoring has been triggered.
Major Goals and Technical Challenges:
Comprehensive end-to-end and top-to-bottom monitoring is critical for developing and debugging high performance, distributed applications. However, this service is largely unavailable to the application developer except in testbed environments. Increasingly the approach of these applications is to rely on "automatic" tuning of transport parameters such as TCP window size, parallel streams, etc. However, the results of the tuning still must be verified, and sometimes debugged, both of which rely on fine-grained network monitoring. In addition, end-to-end approaches are limited in their ability to diagnose problems in the intervening networks and to diagnose the impact of tuning on other traffic in the network. The information from the monitors will be directly available to applications to aid in debugging and tuning of application data transmission.
Applications will be able to send "request" packets to automatically activate monitoring along the network path between communicating endpoints. The request packets pass through passive sensors that are deployed at the ingress and egress routers of the wide-area networks and at critical points in the end site networks. To activate monitoring, an endpoint of a data stream runs a program that sends request packets to the other endpoint. The goal of these packets is to alert each monitor in the interior of the network that the corresponding application flow is requesting monitoring from the network. Once activated, the monitors open a connection to a remote agent. The sensors will send to the agent a stream of monitoring data extracted from the packet flow. We will be deploying this system at critical ESnet ingress and egress sites and at a few prototype end sites. This passive monitoring system will provide an essential component in a complete end-to-end network test and monitoring capability and will complement the existing network operation efforts. Most critically, this monitoring system will provide a mechanism for applications to determine what is happening to their data in the network. It is expected to be critical in helping to bridge the gap between network engineers and application designers/users.
The goals of this project are:
Major Milestones and Activities:
Year 1:
Year 2:
Year 3:
Current Connections with Other SciDAC Projects:
This project will work closely with the net100 project, and in particular plans to utilize the monitoring data archive that net100 will deliver.