Search

T1 Line Testing with an Adtran Netvanta

In my prior post, Configuring Dude’s Syslog, I mentioned a couple of strange or interesting cases where syslog helped troubleshoot an issue I was experiencing.  The last log entry was a PCV 24 hour threshold alarm.  I mentioned that it had something to do with cabling or a bad CSU and that this issue had not been resolved.  After looking at the alarm for several months, it was time to take action.  I decided to run another T1 alongside the existing one through our microwave network; essentially replace the circuit and move the production traffic to the new line.  I did ask Paul and Jeremy to keep the existing path in place so that we can test it and find the cause for the PCV errors.

Before the circuit was replaced, I took a closer look at the issue. After looking at the alarms for a while on syslog, I decided to setup a regex for ‘PCV’ to page my cell immediately after the event took place.  I was thinking that the syslog wasn’t sending me the entire story.  After receiving the first page, I quickly ssh’d into the router and looked at the troubled interface:

LAKESIDE_NEC_WAN>sho int t1 3/4
t1 3/4 is UP
 Description: ### T1 To Frisco ###
 Receiver has no alarms
 T1 coding is B8ZS, framing is ESF
 Clock source is internal, FDL type is ANSI
 Line build-out is 0dB
 No remote loopbacks, No network loopbacks
 Acceptance of remote loopback requests enabled
 Tx Alarm Enable: rai
 Last clearing of counters 2w 5d 00:39:24
 loss of frame  : 0
 loss of signal : 0
 AIS alarm      : 0
 Remote alarm   : 0

 DS0 Status: 123456789012345678901234
             NNNNNNNNNNNNNNNNNNNNNNNN
 Status Legend: '-' = DS0 is unallocated
                'N' = DS0 is dedicated (nailed)

 Line Status: -- No Alarms --

 5 minute input rate 592 bits/sec, 1 packets/sec
 5 minute output rate 6264 bits/sec, 7 packets/sec
 Current Performance Statistics:
 1 Errored Seconds, 0 Bursty Errored Seconds
 1 Severely Errored Seconds, 0 Severely Errored Frame Seconds
 0 Unavailable Seconds, 2048 Path Code Violations  0 Line Code Violations, 0 Controlled Slip Seconds
 0 Line Errored Seconds, 0 Degraded Minutes

Syslog wasn’t giving me the entire picture.  The PCV threshold alarm it gave me was a clue.  The cause of the PCV’s was the errored second (ES) and the following severely errored second (SES).  The ES/SES counter would clear in the next five minutes. The reason these were not being delivered to syslog was because a single ES didn’t violate any system thresholds.  But the PCV errors did violate a threshold.

This confirmed it.  ES and SES are mostly physical problems with cable or a MUX.  When the snow melted, Paul and Jeremy drove up to each mountain and patched in a new path.  The path for both of the links went from HQ — Porter Mountain — Greens — South Mountain — Frisco.  When they went to Frisco, I had them replace the single port Adtran 1g 3205 with a 3g 3205 with two T-1 interfaces and the latest firmware.  The changes to the AOS was interesting.  Because we use bridging on this network (I know – friends don’t let friends bridge networks; be gentle if you have comments about this configuration) I had to configure what is called a BVI (bridge virtual interface) to tie an IP address to the router.  This took me a while to figure out.  In Cisco land, I would use a loopback interface.  This will not work for Adtran routers.  If you want to bridge and have in-band access to your router, you have to configure the BVI.

I had them terminate both of the circuits into the new replacement router and configured the newly provisioned T1 path back to HQ.  Then I started to test the old link.

I had a sneaky suspicion that the router at the far end could be the culprit.  I had replaced that router once before and still had issues with the link.  Just because we replaced the router, I ran a test in-place with the new router alongside the new production link.  Sure enough, the old path gave PCV errors.  The new path was solid.  It wasn’t the router on the Frisco end.

Because we had five physical hops from end-to-end, it could be any one of the patch cables at each site.  I decided to run a test simultaneously from both ends of the link.  Getting the radio guys to loopback both ends at Greens, each router could send test patterns.  The topology after the loopbacks looked like this:

  • Router — HQ — Porter Mountain — Greens (loop)
  • Router — Frisco — South Mountain — Greens (loop)

Enabling the router, I chose the interfaces that I wanted to test and issued the following commands:

FRISCO_NEC_WAN#conf t
FRISCO_NEC_WAN(config)#int t1 1/2
FRISCO_NEC_WAN(config-t1 1/2)# test-pattern p215
FRISCO_NEC_WAN(config-t1 1/2)# do sh int t1 1/2

t1 1/2 is IN TEST
 Description: ### suspect T-1 to Lakeside ###
 Receiver has no alarms
 T1 coding is B8ZS, framing is ESF
 Clock source is through t1 1/1, FDL type is ANSI
 Line build-out is 0dB
 No remote loopbacks, No network loopbacks
 Acceptance of remote loopback requests enabled
 In Test: Sending 2^15-1 pattern
 Tx Alarm Enable: rai
 Last clearing of counters 01:54:29
 loss of frame  : 0
 loss of signal : 0
 AIS alarm      : 0
 Remote alarm   : 0

 DS0 Status: 123456789012345678901234
             ------------------------
 Status Legend: '-' = DS0 is unallocated
                'N' = DS0 is dedicated (nailed)

 Line Status: -- No Alarms --

 5 minute input rate 0 bits/sec, 0 packets/sec
 5 minute output rate 0 bits/sec, 0 packets/sec
 Current Performance Statistics:
 0 Errored Seconds, 0 Bursty Errored Seconds
 0 Severely Errored Seconds, 0 Severely Errored Frame Seconds
 0 Unavailable Seconds, 0 Path Code Violations
 0 Line Code Violations, 0 Controlled Slip Seconds
 0 Line Errored Seconds, 0 Degraded Minutes

Before doing anything with my production routers (or any remote router for that matter), I usually schedule a reload.  Have you ever screwed up a running configuration and had to drive to a site to reload it?  Well, this has happened to me more than once.  Next time, schedule a reload and your downtime will be a max of 15 minutes should you screw something up.  Just be sure not to save the running config while you are testing.  I run the following commands whenever I’m working on a router remotely:

FRISCO_NEC_WAN#rel in 15
Save System Configuration?[y/n]y
Reload scheduled in 15 minutes
You are about to reboot the system. Continue?[y/n]y

2010.12.07 09:57:28 OPERATING_SYSTEM System reboot scheduled in 15 minutes!

After this, I set a stopwatch or alarm clock on my cell phone to sound-off after 12 minutes (T minus 3 minutes).  This gives me a warning before the reload is going to take place.  I can either extend the reload timer another 15 minutes if I need more time (run the commands above again) or cancel the reload:

FRISCO_NEC_WAN#rel can

******RELOAD CANCELLED******

2010.12.07 09:57:34 OPERATING_SYSTEM Scheduled system reboot cancelled.

After running the test for 2 hours, I got my first error on the HQ end router.  Nothing showed up on the Frisco end.  Logically, I asked the radio guys to loop both ends at Porter.  My idea is that if the error moves from HQ to Frisco, the problem is between Greens and Porter.  If not, then the problem is between HQ and Porter or the cable between the router and HQ. After taking the loop off of Greens and extending it to Porter, I got this interesting series of syslogs:

T1:t1 1/2 Tx Yellow, Red
INTERFACE_STATUS:t1 1/2 changed state to down
T1:t1 1/2 LIU eq bumped
T1:t1 1/2 No Alarms
INTERFACE_STATUS:t1 1/2 changed state to up

It makes sense that the link would bounce.  The most interesting was the LIU eq bumped informational log message.  LIU stands for Line Interface Unit.  The LIU is part of the t1 framer in the Adtran NIM (network interface module).  The framer locates the frame and multiframe boundaries and monitors the data stream for alarms.  The cool thing about this alarm (I think – because I couldn’t find it anywhere in the documentation) is that when the link was extended to Porter which added about 100 miles to the loop, it calculated on-the-fly that the distance on the link changed and ‘bumped’ the frame boundaries.  This calculation uses the speed of light as a constant.  I don’t know why, but I thought his event was really cool to see – it’s like watching theory in action.

The most logical place to check next was Porter at this point.  Jeremy went up to check the cabling.  He called me and said that he was going to shake the cables around and asked me to check for errors.  Nothing.  Jeremy looped up Porter and I got an alarm during lunch.

At this point I decided to loop up the interface at the HQ router and move the test to another port.  Because the PCV value was always 2048, I found that significant.  If it was a cabling problem, the errors would vary.  So, I put in a hard loop on an RJ-45 and kept the tests running after clearing the counters.  Sure enough, port t1 3/4 reported an error.

All this time it was a faulty t1 port in my HQ router!

We’ll see if Adtran is going to replace the network module.  This was one heck of a troubleshooting day!

FRISCO_NEC_WAN#sho int t1 1/2
t1 1/2 is IN TEST
Description: ### suspect T-1 to Lakeside ###
Receiver has no alarms
T1 coding is B8ZS, framing is ESF
Clock source is through t1 1/1, FDL type is ANSI
Line build-out is 0dB
No remote loopbacks, No network loopbacks
Acceptance of remote loopback requests enabled
In Test: Sending 2^15-1 pattern
Tx Alarm Enable: rai
Last clearing of counters 01:54:29
loss of frame  : 0
loss of signal : 0
AIS alarm      : 0
Remote alarm   : 0DS0 Status: 123456789012345678901234
————————
Status Legend: ‘-’ = DS0 is unallocated
‘N’ = DS0 is dedicated (nailed)Line Status: — No Alarms –

5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 0 bits/sec, 0 packets/sec
Current Performance Statistics:
0 Errored Seconds, 0 Bursty Errored Seconds
0 Severely Errored Seconds, 0 Severely Errored Frame Seconds
0 Unavailable Seconds, 0 Path Code Violations
0 Line Code Violations, 0 Controlled Slip Seconds
0 Line Errored Seconds, 0 Degraded Minutes

About these ads

One Comment on “T1 Line Testing with an Adtran Netvanta”

  1. itcoop says:

    Adtran called me today. After reading this blog they agreed to send me a replacement NIM. I was also told that my dual-port NIM for the Netvanta 3205 G3 had a bug:
    If you want to configure both t1′s on the legacy interface card, you have to set it to clock source internal. The legacy dual t-1 card is not capable of independent clocking. To determine if you have the legacy card, run ‘show modules’. If the software version is Unavailable, it is a legacy card.
    This sucks because if you have two ISPs, you won’t be able to clock independently on the same NIM. I don’t have this problem though because I operate my own network.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.