There are a number of reasons EcoStruxure IT may not be able to communicate with a device during discovery, or communication could be lost some time later. There are things you can check to verify proper SNMP communication.
EcoStruxure IT is vendor neutral. Configuration options may differ from device to device. For information about APC devices and Data Center Expert, see here.
Initial configuration for discovery
The devices you want to monitor must first have SNMP enabled. Only one SNMP version should be enabled on a device at a time, either SNMPv1 or SNMPv3.
Once enabled, specify the read community name to access information on the device. You may also need to specify the write community name.
If you specify an IP address or an address range, only those systems will be allowed to use the community names. For troubleshooting purposes, this is best leave the IP address at all zeroes (0.0.0.0) to allow the community name to be read by any system. After communication is established, you can change these options.
Security for SNMPv3 includes User Name, Authentication Passphrase, Privacy Passphrase, Authentication Protocol, and Privacy Protocol. These must be noted and are required to communicate with EcoStruxure IT using SNMPv3. Once configured, you can enable each profile individually and associate it with a specific IP address or address range.
EcoStruxure IT needs a Device Definition File (DDF) to communicate with a device and display device information. A DDF is an XML file that contains the SNMP OID information for specific sensors in the proper format.
EcoStruxure IT contains DDFs for APC devices, many other well-known manufacturers, and one for a generic UPS. If EcoStruxure IT does not have the DDF for a device you want to monitor, you can contact Device Support to request that a DDF be created.
Troubleshooting Failed Discoveries
The most common reason for a failed discovery is incorrect security parameters.
Verify that the SNMP read community name matches the device you are trying to discover. In SNMPv3, all parameters much match the settings on the device.
Be sure to verify there is no specific NMS IP address associated with the security profiles or community names. An incorrect IP or any type of Network Address Translation could cause this feature to block SNMP traffic. Device logs may indicate if a system tried to access the device with incorrect security parameters, including the IP address of the system that attempted this action.
Make sure the port used during the discovery matches the port configured on the device. discovery will probably fail. The default port is 161. Device configuration may vary.
If all your security and port configurations match, and you have ruled out ports, profiles, and community names being the issue, try to simplify the settings. For example, if you are using a long, complex community name with special characters, try using something simple for testing purposes. For SNMPv3, try using a different set of security options, or no encryption. If SNMPv3 continues to fail, try testing with SNMPv1 first to rule out configuration issues.
If failure continues, check the network. A packet capture can be helpful in this case. You can use tools like WireShark to capture network traffic and help determine if the SNMP packets are actually getting to the device, and, if the device is responding properly.
A system running a packet capture near the computer running EcoStruxure IT Gateway will show if EcoStruxure IT is sending and receiving; a packet capture closer to the device you are trying to monitor will show if the network is allowing the SNMP packets from EcoStruxure IT to get to the device.
Keep in mind that utilities like ping will show if basic network traffic can get from one system to another, but do not rule out the possibility of networks blocking specific ports or protocols.
Troubleshooting lost communication after discovery
If the IP address of an SNMP device has changed, the device will go into a lost communication state. When you rediscover a device using a different IP address, all historical data from the device will be lost. To maintain communication and keep historical data, do not change device IP addresses after discovery.
Changes in security settings also cause a device to lose communication. If the community names have been changed on the device after it was discovered, the systems will fail to communicate. You must rediscover the device using the correct community name. All historical data from the device will be lost. To maintain communication and keep historical data, do not change community names after discovery.
Lost communications issues can also be caused by a change in the network configuration. Firewalls can sometimes be reconfigured without notice to block ports or traffic from specific IP addresses. A packet capture can come in handy here.
Intermittent lost communications issues
Intermittent communication issues can usually be attributed to traffic issues, either network related or on the device itself. If there is too much network traffic, or too many systems are trying to poll a single device at the same time, the UDP packets used by SNMP may not be able to get through to the device or be answered by the device. You can increase timeouts and retries in the gateway's Device Polling menu option under SNMP to allow the system to regain communication caused by such traffic.
If it is available on the device, you can also specify an NMS IP. address. Setting a specific IP address or address range will stop other systems from being able to poll the device in question. This may or may not be a permanent fix, but can be useful for testing.