Aprisa Link Monitoring Part 2Posted: November 29, 2010
Another Fan Failure
Last weekend was short lived. The probes that I setup on the Dude to monitor the fan status worked all too well. We got the first stiff winds with snow/ice mix over the weekend. One of the fans was going from good to bad over and over again. I called Paul and Jeremy who told me that there must be something wrong with my alerts; they probably thought that it was a false alarm because I just set those alerts up last week. After telling them that I setup the same alert on all the radios and only one of them is complaining, their tone changed.
Another Link Flapping
On November 17th, another one of the links started flapping. This was causing the 2-way radio network to key-up because the voter was losing its tone from the remote end’s E&M module. Paul came in and asked me if I was doing anything with the radio configs.
“Nope. But I can see that the Aprisa is going up and down again. It looks like there are errors on the line.”
“These are Ethernet radios. The error on the near end is saying that there is an Ethernet problem. Did you disconnect the Ethernet cable from the radio at the far end?”
“Well, yes. About 4 weeks ago. Do you really think that has anything to do with it? ‘Ethernet’ could mean the RF link – which is what I believe we are having trouble with because we can’t ping the remote end. The wireless link can be looked at simply an extension cable for an Ethernet cable that goes sixty miles. The sideband E&M cards are just VoIP devices in the radio…” Paul’s cell phone rang.
“Ok. I’ll be right there…” and hung up his phone. “That was Jeremy. He logged into the camera that’s on the tower. There’s someone climbing it right now. I’m going to check it out.”
Revisiting ‘New’ Problems
My first post on Aprisa link monitoring involved setting up simple probes for the fans and doing some rudimentary checking for temperature alarms. Based on this last couple of new issues that we have experienced, I decided to revisit this one. Aside from getting too many text messages and Paul not trusting the alerts, the fan alarms worked very well. After talking to Paul and him thinking that I had caused a failure by pulling a Ethernet cable out of the far end radio thereby causing a radio link error, I have decided to revisit monitoring the RF Ethernet link.
On my first post, I mentioned that I needed to follow up with the manufacturer and verify that the OIDs that I picked actually were the right ones. Well, I got the reply:
From: Ian Fleming [mailto:email@example.com]
Sent: Thursday, 11 November 2010 3:40 a.m.
Subject: RE: Support request
Thanks for the MIB. We are having some trouble with fans failing and would like to monitor them. This helps. […]
We have revised the fan used in the XE a few years ago, the new part number is KD1206PTB1 (2).GN.I55, SUNON DC12V 1.8W. The newer ball bearing fans have proven to be more reliable than the older style fans. Replacement fans can be purchased from 4RF directly.
So, I got more information than I expected. The OEM fans have been replaced with a new model. Paul put the new fans on order. After they are replaced, maybe my fan probes will never go into alarm again! The more interesting item e-mailed was the MIB that came as an attachment. I don’t know why they don’t post these on their website. It’s too bad you have to e-mail support to get this information. Hopefully this will help someone out there with an Aprisa XE.
The intent of this post is to address what actually happened with the link flapping on the 17th of November. Paul and Jeremy confirmed that a tower climber was next to our dish with a hot feed line. Essentially, he was spraying RF all over the place as he was climbing our tower. This caused errors on the Ethernet RF link. When the climber finally plugged the hot line into his antenna, the interference stopped. I now want to graph RSSI and errors over the RF link. This is how I did it with help from the MIB e-mailed to me from the engineer, Lief, at 4RF.
Graphing and displaying RSSI is a little different than just reporting whether something is operational or not using a probe. The probe only tests if a condition is true or not and takes action based on the condition. To graph in the Dude required me to setup a function. Then the function can then be setup as a probe. When the function is setup as a probe, I was given more options than just to test for a simple condition. I can display the information on the device and test for more than one condition.
Create the Function
The OIDs that I am interested in are:
- aprisaXECorrectableErrorCount.0: 188.8.131.52.4.1.148184.108.40.206.220.127.116.11
- aprisaXEUncorrectableErrorCount.0: 18.104.22.168.4.1.14822.214.171.124.126.96.36.199
- aprisaXETerminalRSSI.0: 188.8.131.52.4.1.148184.108.40.206.220.127.116.11
The first one that I mapped to a function was the relative signal strength indicator (RSSI). Because this OID will always return a negative integer, I decided to make it positive by multiplying it by -1. This makes the chart “work”. When I tried to graph it using negative values, the chart didn’t appear at all. Unfortunately, the Dude does not have an absolute value function. Being that RSSI will always be negative, multiplying by -1 should do fine all the time.
After creating the function, I had to test it. I displayed this function on the device. To do this, I right clicked on the device and selected the ‘Appearance’ option. Under General tab, the Label field will display everything that shows up on the device. Variables and functions are enclosed in square brackets. I had to play around with this a little to get the RSSI to show up the way that I wanted it to be displayed. This was only for displaying the real-time data for the OID function that I just setup. Next I had to setup a probe to monitor which gave me a nice chart to view the history of RSSI on the Aprisa radio.
Also notice my error counters. I made a function for the “link error” OID’s that I noted above as well; but this is an incremental counter. To alert for the errors, I had to use the following function Aprisa_ErrorCount():
rate(oid("18.104.22.168.4.1.14822.214.171.124.126.96.36.199") + oid("188.8.131.52.4.1.148184.108.40.206.220.127.116.11"),10)
Because the OIDs counters are cumulative, the rate (10 seconds) is graphed. The probe is setup to error if the function does not equal zero:
if(Aprisa_ErrorCount() = 0, "", concatenate("Warning: errors detected! Error rate is ", Aprisa_ErrorCount()))
This probe will be setup using the function you just created. Add a new probe, and select function. For the RSSI to be marked as available, the function has to result in a number greater than 1. The error condition checks that if() statement to see if it is normal. The first parameter of the statement must be true for it not to be in an error state. Because I am using this probe only to graph RSSI, I set the normal condition to -10000 – an impossible value. This probe should never go into an error if it is available.
The Value field is used for the chart. I put in my function so that it would graph it out. Now, I have a nice little chart to look at:
You are probably wondering about the fans. My last post I mapped the temp alarm OID. Well, this is great; however, there will probably be some damaged equipment or a malfunctioning link if the temperature alarm is going off. How about just monitoring the fans for alarm conditions? That’s just what I did after getting the MIB from New Zealand and reading it.
I found the alarm OID. I just mapped them to probes. The normal condition is zero. Anything else is bad.