Aprisa Link MonitoringPosted: November 9, 2010
I came in this morning to find an Aprisa link missing pings in the event log on The Dude. I called Paul, our radio tech, who looked into the issue. He logged into the radio and found that one of the fans went out. This was causing a temperature alarm and the component was reaching short thermal shutdowns. The ping times were not significantly long but I could hear some clicking and occasional serialization on the 2-way radio.
Paul and Ben told me that the fans have gone out and have needed to be replaced several times on just about every Aprisa radio that we had in service; however, because they are inexpensive to replace, all issues are known with that radio, everyone in the shop can live with what it does. We have no intention on fork-lifting them in the future.
So, we have a known issue with some fans in bad-order. How can I prevent this from causing future downtime? I know! Configure The Dude to monitor them…
Walking The OIDs
The first thing I needed to do was setup SNMP on the Aprisa equipment to be monitored from The Dude. This was easily accomplished using the web management interface that comes with the radios. After that, I setup my network map to enable SNMP on the Aprisa devices. I was already monitoring them using a ping. This is the minimum that I use to let me know when something is going wrong with any network device. I then ran an SNMP walk and searched for the string “temp” by clicking on the little binoculars in the left of the walk screen.
Now, the Oid column gave me a hint that I didn’t have the custom MIB for the Aprisa that I wanted to monitor. Without getting into too much detail on how SNMP works, I think of an MIB as a simple DNS host file. Similar to mapping a name to an IP, the MIB maps an object ID (OID) to its description or name. The Dude can take these MIBs and make the OIDs look really nice and verbose to read. It’s not like I need the entire MIB; I just need to know the OID to setup a probe to monitor the fans.
The first thing I attempted to research was the OID 184.108.40.206.4.1.14817. www.mibdepot.com is a good place to start. Unfortunately, MIB Depot didn’t have anything listed for this vendor ID, so I went on 4rf’s website and e-mailed a support request to get the MIB.
Instead of waiting, I looked for patterns in the SNMP Walk that might mean something. There is some logic to the way these things are set up. It’s kind of like a SCADA system where the indexes line up with other logical analog counters (if 1=IA, 2=IB, then what is 3?? Could it be IC?) In this case, the index 45.1.0 = “TX Temp Warning”. The following index OID (45.2.0) is set to “0”. I think that if I were setting this up, the next index would assert if the condition was true… sounds logical to me. I searched for a condition that is true on the web management interface that can correspond with the SNMP walk to somewhat confirm my guess.
Ah! Here’s one. It’s an Alarm Output that is asserted. Let’s see if they correspond. It’s interesting that the radio has two inputs and 4 outputs. The OID 220.127.116.11.4.1.14818.104.22.168.2.11 are the ones that I’m interested in because one of the indexes is a description string with the value “Alarm Input 1”, has 2 inputs, and four other mysterious indexes follow. Could these be the Alarm Output index status points? To test, I’ll make an insignificant change to the mapping on the web interface and see what happens to the SNMP OID values.
I changed the mapping to “Local Major” for Alarm Output 1. The OID 22.214.171.124.4.1.148126.96.36.199.188.8.131.52.0 changed from a “0” to a “1”. So, if I change it from “Local Major” to “Local Minor” (the second in the web interface list) should I get a “2”? Yes! Since I can’t take away voltage from the outputs without climbing a mountain, I am going to assume that the parameter is the second object and the first object is whether it is asserted or not. So, following this basic pattern, I’ll infer that the “Temp Warning” is when 148184.108.40.206.220.127.116.11.2.0=1 (at least until I get the MIB from the manufacturer to verify it.)
Setting up the probe
Now that we have identified the OID we want to get status points from, we have to setup a probe. The Dude must have a probe setup so that it can be monitored in the network map. Setting this up is straightforward. Just right-click on the OID you are interested in, and select Create Probe.
Now I’ll apply the probe to the equipment I want to monitor. In this case, I want to monitor all of my Aprisa radios. This is the easy part; select the devices and add the probe to the services tab. I also setup the notifications tab to text message Ben and I, e-mail the radio techs and log to the event log in case the probe is down.
Now, I am pretty happy. If this happens again, we should get a warning page on my personal cell phone, e-mail the techs and a log in the event viewer. The only follow up that I have to do now is wait for Aprisa support to mail me that MIB so I can verify that I’m really monitoring the right OID! So, I put a note on the probe that this needs to be confirmed in the future (document my work!).