Monitoring

The implementation of a performance monitoring program improves the network’s operational efficiency and cost-effectiveness by facilitating faster problem detection and resolution.  Reducing down-time and facilitating optimal usage of various applications will improve the overall quality of service.  Empowering the end-user and providing the network administrator with powerful problem detection tools will increase user-confidence both inside and outside the institution.  These benefits are described in greater detail below.

 

Monitoring Improved Quality of Service

The quality of service required varies among applications.  Some applications are engineered for speed; others do not tolerate packet loss.  The overall quality of networking services will be improved through the provision of means for monitoring network performance on an ongoing basis so that, regardless of the performance required, engineers will be better equipped to address various problems.  Once a monitoring infrastructure is in place the task of monitoring performance and addressing evident problems will resemble maintenance more than troubleshooting.  Network administrators will act proactively to resolve networking problems and will be able to much more quickly troubleshoot those problems that are not anticipated thanks to better, more informative communication with end-users.  This reduces overall downtime and maintains a higher level of performance throughout the network.

 

Monitoring Reduced Detection Lag

Utilizing network performance monitoring tools reduces the amount of time a network administrator must spend tracking down problems along the path in question.  Certain tools allow a data path to be broken down such that the distance between each machine on the path can be independently analyzed.  This quickly isolates the problem and enables the network administrator to contact the proper engineer to get the problem resolved.  Trial and error troubleshooting will quickly become a less-used tool as more efficient solutions are devised.

 

Monitoring Empowered End-Users

End-users represent the ultimate customer for all networking services.  Without the use of monitoring beacons and performance monitoring tools it is very difficult for end-users to participate in the resolution of problems they are experiencing.  Many users simply accept that sometimes the network is “slow” and fail to report problems.  When problems are reported the user can only provide sketchy details and the network administrator must devote a great deal of time towards tracking down the associated problems.  Through educational campaigns and regular performance testing end-users can become active participants in the resolution of network performance problems.  Not only does this help users attain the quality of service their applications require but generates positive attitudes in end-users and builds good will within the information technology department.

 

Monitoring Network Transparency

Providing public, or semi-public, performance monitoring statistics creates an atmosphere or transparency around the network.  End-users and connecting networks alike will be able to monitor and evaluate the performance of the institution’s network.  The increased exposure creates an incentive to maintain high performance and because of our highly trained staff Alpha University can expect to build a solid reputation as a reliable and consistent networking partner.

 

Specific Uses/Applications

 

Often the best way to describe or illustrate the recommended uses of a given technology is through the use of case studies.  Following is a list and related descriptions of types of specific performance monitoring applications known to be in production at other locations at the time of writing this document.  The tools employed and technical terms are explained more fully in the Glossary.  Application descriptions have been grouped into broad application categories to help the reader understand their role in an overall IT infrastructure.

 

Real time point-to-point data transfer

Tuning a path in preparation for a real time point-to-point data transfer is accomplished by testing multiple points along a network path to determine the network characteristics.  Doing so requires direct contact with the network or system administrators who control the hosts along the path to conduct Iperf tests.  These tests consume a great deal of bandwidth and administrators are, rightfully, wary of allowing others to perform such tests on their networks.  If servers along the path have installed BWCTL, testers can schedule Iperf tests remotely, without contacting the administrator because BWCTL encapsulates the Iperf test, rendering it harmless to the server.  As a result, the tester has the results he needs regarding performance and the network administrator does not have to worry about the bandwidth requirements of the test or grant privileges to individuals he or she may not know.  In addition, should a problem be noted, the tester now has data to back-up his or her claim of non-performance and the administrator of the node in question has resources to support his troubleshooting.

 

High-Performance Applications

While developers and users of applications would like the network to run perfectly – with high speed and zero-loss – network engineers recognize that the network will never run absolutely perfectly.  But, until applications are developed to be more robust and able to withstand common network errors, network administrators and engineers will need to be able to tweak networks to deliver near-zero packet loss.  Speed is not the only component to performance that can be of concern.  Packet-loss can often be more devastating than slow traffic on the network.  Using, and reporting, regularly to network “weather maps” can help network administrators diagnose problems on their own campus as well as point to problems along the path of interest in other domains.

 

High volume, regular data transfers

Occasionally, a user will experience a sudden drop in network performance while performing routine data transfers.  When end-users are familiar with performance on their systems and have a good understanding of their network topology they are better prepared to address sudden changes in network performance.  Having a set of network tools available to users will help define and isolate the problem.  When such a problem occurs, immediately talking to network staff about upgrades or modifications to the path you are on will go a long way towards quickly resolving the problem.  Simple switch and router configurations can have unforeseen consequences, especially in regards to performance and network administrators will not know of the direct impact unless end-users can inform them and back up their claims with data.

 

Problem Isolation

Using cakeboxes, or small, inexpensive PCs configured to register its presence with on a network so you can “find it”, network engineers can test H.323 video conferencing and other network application capabilities.  Using the cakeboxes an engineer can locate where packet loss in a transmission may be occurring.  The tests can be directionalized so that problems can be isolated to particular venues or areas of the network.  Sometimes these problems are as simple as a duplex mismatch but without tools to help identify the location of the problem the resolution becomes much more difficult.  Other diagnostic tools, such as the Network Diagnostic Tool (NDT), allow users to perform limited diagnoses from their desktop.  These easy to install and inexpensive tools quickly eliminate specific paths as possible problems and can help point to the true culprit.

 


Monitoring Impacts

While implementing performance monitoring, a number of issues may affect our deployment.

 

Time and effort are required to conduct campus-wide planning, review and negotiation processes.  Educating the campus and stakeholders on the benefits and implications of network monitoring is necessary for a long-term, viable implementation.  Outcomes of this deployment include developing new administrative policies and processes to enable access to and use of data by various monitoring groups and public displays.  After implementation, this education and negotiation should continue to accommodate on-going change in staff resources and institutional systems and processes.

 

Exploring the political implications of releasing and publishing network performance data.  Potential enduring challenges can arise when negotiating processes, data use, data ownership, and application of data with stakeholders on and off campus.  Who determines what reasonable thresholds for performance are, for example?  Who is responsible for performance and what happens as a result of poor performance?

 

Assessing the legal impact or risk of litigation.  If the institution participates in peering agreements that specify minimum performance guarantees some risk assessment should be completed to determine what the ramifications are for an uncooperative or unresponsive node operator who repeatedly fails to participate in the repair of a network or light path.

 

One-time costs to establish monitoring beacons, install software.  Depending on the size and scope of the monitoring project we will need to plan for short-term increased or re-purposed staff-time devoted to installing software and monitoring performance closely to ensure proper monitoring techniques and build the monitoring infrastructure.  Additional resources and guidelines will be needed as more applications use this new infrastructure.  For example, policies and requirements for identifying what individuals and research teams may or must utilize the monitoring infrastructure and what type of access they are given will have to be codified.  These are important considerations in ongoing maintenance.