Enterprise network management is the task of ensuring that the networks and systems provide the required services with the specified quality of service to the users and other systems. Most enterprise network management architectures use agent-manager relationship where the agents, residing on managed network/system elements, provide network/system management information such as alerts or performance measurements to the manager. The manager reacts to these messages by executing one or more actions such as operator notification, event logging, system shutdown, and automatic attempts at system repair. Management entities also poll end stations, automatically or upon user request, to check the values of certain variables. Agents have information about the managed devices in which they reside and provide that information (proactively or reactively) to management entities within one or more enterprise management systems (EMSs) via a network management protocol. The term enterprise network management refers to the combined task of network and system management.
Network Management Functions
The functions of an enterprise manager facilitated by an Energy Management System includes:
- Performance Management which involves measurements of various metrics for network/system performance, analyzing the measurements to determine normal levels, and determination of appropriate threshold values to ensure required level of performance for each service. Examples of performance metrics include network/system throughput, user response times, and line utilization. Management entities continually monitor values of the performance metrics. An alert is generated and sent to the enterprise management system when a threshold is exceeded
- Configuration Management which involves maintaining an inventory of the network and system configuration information. This information is used to assure inter-operability and problem detection. Examples of configuration information include device/system OS name and version, types and capacity of interfaces, types and version of the protocol stacks, type and version of network/system management SW, etc.
- Accounting Management which keeps track of usage per account, billing, and ensures resources are available according to the account requirements.
- Fault Management detects, fixes, logs, and reports network/system problems. Fault management involves determining symptoms through measurements and monitoring, and isolating the problem.
- Security Management which controls access to network/system resources according to security guidelines. Security manager partitions network/system resources into authorized and unauthorized areas. Users are provided access rights to one or more areas. Security managers identify sensitive network/system resources (including systems, files, and other entities) and determine accessibility of users and the resources. Security manager monitors access points to sensitive network/system resources and log inappropriate access.
Typically, network management refers to management of network/system resources such as routers, switches, hubs, customer premises equipment and communication links. We extend the domain of enterprise management to enterprise management, defined as the set of functions needed to manage the following resources:
- Network resources, as defined above,
- Systems – Computing resources such as substation automation systems, data concentrators, servers such as Market Interface Servers, applications such as data acquisition and control systems, and database management systems,
- Service and business functions such as RTP customer pricing service, security and operational policy servers,
- Power system devices such as IEDs and RTUs,
- Customer premises equipment such as digital meters and consumer portals, and
- Storage area networks.
Network Management Activities
Activity/Service Name [i] | Activities/Services Provided [ii] |
Object management – Defining resources and attributes | EnergyManagementSystem needs to be aware of resources: routers, hubs, computers, and their attributes. |
Defining, modifying and examining relationships | EnergyManagementSystem needs to be aware of the object relationships. |
Setting, modifying and examining attribute values | Object attributes need to have values. E.g, number & types of ports per card. |
Inventory Management | IM is the task of maintaining types and configuration of resources. The inventory information is required for SW and HW maintenance, determination of faults and recovery, and capacity planning. |
Network Discovery | Dynamically creates a representation of the network topology, and configuration of the devices. The data could be collected manually, which is very tedious and often not accurate for a large network, or though an EnergyManagementSystem. Instances of the managed devices and their internal components are created and connections are made. Components and info on the devices include network cards, ports, interfaces, power supplies, MAC addresses, SW version, OS type, CPU types, IP addresses, etc. |
Address Management | Address management includes allocation IP addresses to devices, determination of subnets, keeping track of used and available IP addresses, and reuse of unused addresses. This task reduces addressing complexities and waste of address space. |
Name Management | Naming establishes a connection between a name and a device, its location, its type, etc. Helps identify devices, IP address mappings, etc. Naming conventions for network devices, starting from device name to individual interface, should be planned and implemented as part of the configuration standard. A well defined naming convention provides the ability to obtain accurate information when troubleshooting. The naming convention for devices can use geographical location, building name, floor, and so forth. For the interface naming convention, it can include the segment to which a port is connected, name of connecting hub, and so forth. |
Routing management | Determine and configure routing tables. This includes configuring parameters for IP routing , Quality of Service, etc. |
SW distribution and upgrade | This includes detection of SW releases, distribution of new releases, and testing for interoperability. |
Setting & verifying user authorization | |
Scheduling, user/flow/packet prioritization | This is to allow for a specific treatment of users, flows, or packets based on availability of features on the routers, switches and computers to meet QoS requirements or SLA’s. |
Resource dimensioning and allocation | Engineering the network elements for more efficient utilization and assurance to meet QoS. For example, sizing buffers. |
Configuring for redundancies to assure reliability requirements | This is to design the network/systems to provide some tolerance to faults. For example, providing alternative routing, redundant computing, etc. |
Initializing and terminating network operations, device reset. | This task is to initialize or shutdown the network and systems. |
Setting values for fault threshold, health check intervals, performance thresholds | This task requires an enterprise manager to set and configure threshold values for the purpose of alarm monitoring and performance monitoring. |
Polling for faults, health check, running watch-dog timers, processing traps | This task defines the function of either receiving or polling for alarms. |
Log control | |
Diagnostic testing, testing capacity and special conditions | Testing to either proactively detect a failure of some device/application/element or trying to locate faults. |
fault location | Determination of fault location through testing, alarm correlation, analysis, etc. |
Fault data summarization | |
Reconfigure, reroute, remove Reroute | Activities to recover from fault conditions |
Issue trouble ticket | Activity to document fault |
Dispatch technician | |
Determining the set of key performance indicators | The task of determining what performance metrics to measure. Examples are delay, response time, packet loss, buffer overlflow, etc. |
Mapping SLA/user perf. objectives into network/system performance objectives | Mapping higher level service agreements such as response time, to network and system performance objectives such as processing times on each CPU, transport time, priority setting, etc. |
Continuous real-time performance monitoring, performance alarm generation | Alarms, statistics, history, and host/conversation groups are used to monitor and maintain network/system availability based on application-layer traffic. Performance metrics at the interface, device, and protocol levels are collected regularly to facilitate enterprise management, capacity planning, rerouting functions The EMSs typically collect, store, and present performance data from network devices and servers. Examples of performance metrics colleted are: response time, jitter (delay variance), packet loss, input/output queuing time, input/output buffer overflow, transaction time, occupancy (utilization) of resources. |
Performance and statistical analysis of measured values, Performance data summarization | Post analysis of measured performance indicators for capacity planning, traffic engineering, reconfigurations, etc. |
Traffic management | Determine the traffic characteristics from each source, and their resource requirements. configure the network elements, systems, to meet the requirements. User and application traffic profiling provides a detailed view of the traffic in the network. Some EMSs allow the enterprise managers to analyze and troubleshoot networked applications such as Web traffic, NetWare, Notes, e-mail, database access, Network File System (NFS),etc. |
Capacity planning | Determine the traffic growth and plan for growth. Capacity planning for the network/system can be done following gathering of traffic statistics such as traffic amount and source and destination IP addresses, Input and output interface numbers, TCP/UDP source port and destination ports, source and destination of administrative groups, etc. |
Establishing, maintaining and monitoring Service Level Agreements (SLA) | A service level agreement (SLA) is established between a service provider and its customer on the expected performance level of network/system services. Examples of the performance metrics used in SLA’s are : guaranteed throughput, percentage of time with service availability, packet latency, percentage of packet delivery, outage reporting time, response time to denial of service attacks, service activation time, etc. Set parameters (routing, addressing, etc) in devices to meet policy requirements. Monitor operations according to the policy. Identify policy violations |
Authentication and Authorization | Identify users before being allowed to access network/system resources. Authorization provides various level of authority to the user. |
Accounting of Security Info | Collect and report security information used for billing, auditing, such as user identities, start and stop times, and executed commands. Accounting enables enterprise managers to track the services that users are accessing as well as the amount of network/system resources they are consuming. |
Establish Access Control List | To control access of unauthorized users to network/system resources.. |
Policy Management, policy specification, translation and distribution. | This activity involves collection and inclusion of the various network/system related policies into the enterprise management activities. The policies include QoS, Security, Address allocation, and routing policies. A policy management tool can assist the enterprise managers in obtaining high level policies and translating them into low level policies that are to be enforced by the network devices, or policy enforcement points. A policy repository , a database of the high and low level policies, is used by these tools. |
Accounting Management | Accounting management is the process used to measure network/system utilization parameters so that individual or group users on the network/system for accounting or billing. A usage-based accounting and billing system is an essential part of any service level agreement (SLA). It provides both a practical way of defining obligations under an SLA and clear consequences for behavior outside the terms of the SLA. The data can be collected via NMSs. The probes to measure the statistics are places on the edge or access routers at the point of entry to the network/system. Measuring traffic flow (number of bytes, number of packets) for a specific source-destination pair (based on IP addresses). This information can also be used to check for security violations. |
Specifying accounting information to be collected | |
Setting and modifying accounting limits | |
Defining accounting metrics | |
Implementing/activating metering functions | |
Controlling the storage of and access to accounting information | |
Monitoring usage | |
Regulating users and groups | |
Billing | |
Reporting | Report accounting information, configuration status, fault data, performance data , policy changes and violations |
[i] The Service Name corresponds to a Use Case which is associated with the main domain template use case using the <<includes>> relationship.
[ii] The Service Description corresponds to the documentation attribute of a Use Case.