Search

Configuring Dude’s Syslog

Why Syslog?

Polling for information is similar to asking a little kid who is sitting in the back seat of your vehicle “Are you hungry?” every 5 seconds.  Your mental session will process their response to your poll for information and react accordingly based on their response similar to the way The Dude will respond when a probe condition is true or not.
Unsolicited information is similar to a kid sitting in the back seat of your vehicle saying (not asking), “Are we there yet?” every 5 seconds.  Your mental session will process this information and act accordingly.  Most of the time, the most appropriate response is a non-response; you just log it.  But occasionally, the kid will say something interesting like, “I feel sick.  I think I’m gonna throw up.”  You might want to take action based on these unsolicited pieces of information. Why this distinction is important may be the difference between a quick pit stop and cleaning vomit from the backseat of your car.

By now you are using the Dude or some other network management system to poll your network devices for information using SNMP.  Now I’m asking about your syslog facility. Why is syslog important if you are able to poll for stats real-time?  Isn’t polling good enough? Well, no – not really.  Like the kid in your back seat, there is a big difference between the two: polling implies solicitation of information and syslog is unsolicited information. 
You work in the electric utility industry so it’s probably easier for you to think of it this way: you are using DNP3 to poll a substation.  The intervals are set to update every 30 seconds.  You have a binary state set on your SCADA display that tells you what state a substation breaker is currently operating in.  Knowing in advance that substation breakers operate in less than 30 seconds, it is conceivable that you could miss events, right?  Depending on the curve your relay is using, an operation can occur in 1/60th of a second.  If you aren’t polling at this split second interval, the event could be missed.  These cases need to be known immediately.  They must preempt standard polling intervals and let you know before the next poll that something just happened.  This is why unsolicited messaging is so important.

SNMP does offer a way to send unsolicited messages (traps). Unfortunately, The Dude does not have a facility to receive them and make alerts out of them.  But it offers a really cool alternative.  Syslog can be setup on The Dude to notify you of anything interesting that you might want to be notified about.

Setting up Syslog in Dude

Syslog is a facility that directs information across your network to The Dude.  This facility is actually a service on the device that you configure on each piece of your equipment. By default, the Syslog server is already enabled on your Dude installation.  Click on the Settings button and select the Syslog tab.  By default, everything that is pointed to the Dude’s IP will be accepted and logged in the Syslog section of the interface.  Now you will record everything.  Nice.

Syslog configure on The Dude

What if you want to be notified if something interesting is logged?  There’s got to be a better way than looking through all of the daily logs – just like ignoring that kid in your backseat saying, “Are we there yet?”.  Well, there is a way to filter it.  The Dude lets you create regular expressions to filter each message for certain words.  You can specify a regexp based on each source IP address or device that you are monitoring.  I usually apply the regexp for all messages because most of my equipment is similar and I expect the same information to come in from all of them.

Using Regexp to Log and Notify

Regexp is the most useful method, in my opinion, for processing unsolicited events sent to syslog.  The scope of this is to describe how to notify and log interesting syslog messages using regex and the Dude.  Describing what regex is and how it works is out of my scope or expertise to describe. There is a good crash course on regular-expressions.info if you don’t know at all what it is.  In the Dude, we use it for searching the logs for interesting messages. For example, if our regexp is “crash”, we will match all lines that contain “crash”; if our regexp is crash$, it will match only strings that end with the word “crash”

syslogrule

Creating a new syslog rule

To setup a regexp rule, click on the red “+”.  Insert your regex that you want to be notified if it appears.  Some that I use are “CORE DUMP” for when a router takes a dump, “ENABLE OK” for when someone successfully enables a router, and “attempt FAILED” for when someone is trying to log in to a router using an incorrect password.

The check-box next to the Regexp: field is a ! (a.k.a NOT).  This will match everything that does NOT equal what you want to match.  Let me know if you find a use for this check box.  I’m really curious why anyone would want to take an action based on something NOT matching.  My guess is that it would be useful if you also filter by IP in addition to regex.

Conclusion

So, now you can setup your syslog and get notified when your equipment starts spewing interesting messages. Have fun!

There were several instances where we found leaky memory in a router located in a remote office which was resulting in several seconds of downtime.  While the router was rebooting, the remote office would always complain about dropped calls or sessions.  It was very rare that our 30 second ping interval would fail during these outages and even if it did, it had to miss three pings to be considered down anyway.  It wasn’t until we configured syslog on each piece of network equipment to report unsolicited messages that we obtained enough information to troubleshoot the problem.  The ones outlined below are some of the more “interesting” cases.  The point of this is to prove to you that this type of information is important to set up and record.  It really helps in troubleshooting issues when they arise… and they will.

Router Firmware Problem

The following syslog was obtained from an Adtran router.  No one complained about this 18 second outage which happened every week since the day the router was put into service around 2006.  The other downstream routers reported link issues about the same time.  Notice the timestamps:  the first time is when The Dude received the message and the second time is the date/time stamp of what’s on the equipment when it sent the message.  There are deviations in time because of the delay of the network or, in this case, the clock is not set properly on the router.  This is good to know when troubleshooting issues with network equipment.  A functioning device may report a link failure while the broken device is silent.  This is why it is also important to keep your clocks synched up!

2010.04.28-05:22:14 <private.ip.address.here>: <3>Jan 02 10:14:13 Porter_NEC_WAN OPERATING_SYSTEM:CORE DUMP
17.01.02.00\source\PacketCore\AdUtil\AdAlloc.cpp#439: AdFatal(OUT OF MEMORY - Called from 0x001AE0C0  Stack
Trc:  002FF038 002FF838 002FFAB8 002FFF1C 002FFC38 001AE0C0 00A8CB8C 00A8D9F0 00195310 00197854 001979EC 00
197ADC 001A7340 00C101FC ....Regs: r0=0x002ff810 r1=0x045e3b54 r2=0x019a0b30 r3=0x00000026r4=0x045e3ad4 r5=
0x045e3aa1 r6=0x045e3aa6 r7=0x062bf964 r8=0x00000002 r9=0x00000000 r10=0x00000000 r11=0x045e3adc r12=0x0000
0000 r13=0x04008000 r14=0x00000004 r15=0x00000005 r16=0x00000006 r17=0x00000007 r18=0x00000008 r19=0x000000
09 r20=0x0000000a r21=0x0000000b r22=0x0000000c r23=0x0000000d r24=0x00000001 r25=0x00000001 r26=0x00000000
 r27=0x0ff01d90 r28=0x00000008 r29=0x001ae0c0 r30=0x0ff0a000 r31=0x00008000 LR=0x002ff810 CTR=0x0140d5c4.)

STP network loop abatement

While the polling of SNMP was successful, these logs show how the Spanning Tree Protocol process is running from inside of my precious HP network switch that I purchased off of eBay.  If you loop a network cable and do not have STP turned on, a broadcast storm get your network.  This shows when network loops are being blocked on my network gear. This is good to look into because it is important to know how or why this is occurring. For example: did someone install a hub on their desk and loop a cable?

2010.11.10-07:41:00 : <12> Nov 10 07:41:05 10.3.12.200 00564 ports:port B18 PD Invalid Signature indication.
2010.11.10-07:41:08 : <14> Nov 10 07:41:14 10.3.12.200 00435 ports:port B18 is Blocked by STP
2010.11.10-07:41:11 : <14> Nov 10 07:41:17 10.3.12.200 00076 ports:port B18 is now on-line
2010.11.10-07:41:28 : <12> Nov 10 07:41:33 10.3.12.200 00331 FFI:port B18-High collision or drop rate. See help.
2010.11.10-07:43:20 : <14> Nov 10 07:43:25 10.3.12.200 00077 ports:port B18 is now off-line
2010.11.10-07:43:25 : <14> Nov 10 07:43:30 10.3.12.200 00435 ports:port B18 is Blocked by STP
2010.11.10-07:43:27 : <14> Nov 10 07:43:33 10.3.12.200 00076 ports:port B18 is now on-line
2010.11.10-07:44:17 : <12> Nov 10 07:44:22 10.3.12.200 00331 FFI:port B18-High collision or drop rate. See help.
2010.11.10-07:46:51 : <14> Nov 10 07:46:56 10.3.12.200 00077 ports:port B18 is now off-line
2010.11.10-07:46:56 : <14> Nov 10 07:47:01 10.3.12.200 00435 ports:port B18 is Blocked by STP
2010.11.10-07:46:58 : <14> Nov 10 07:47:04 10.3.12.200 00076 ports:port B18 is now on-line
2010.11.10-08:12:46 : <14> Nov 10 08:12:51 10.3.12.200 00077 ports:port B18 is now off-line
2010.11.10-13:51:01 : <12> Nov 10 13:51:05 10.3.12.200 00564 ports:port B18 PD Invalid Signature indication.

Random switch reboots

Stay away from D-Link switches.  They’re junk.  This manufacturer has made it on my “crap” list.  Here are some syslog’s that we get from them:

(blank – yes, you read right: “blank” as in nothing…)

Strange T-1 circuit problems

PCV stands for Path Code Violation and is common with cabling problems.  (We still haven’t figured this one out)

2010.11.10-10:14:45 : <4>Nov 10 10:14:53 LAKESIDE_NEC_WAN T1:t1 3/4 PCV 15 min threshold exceeded
Advertisements

16 Comments on “Configuring Dude’s Syslog”

  1. […] my syslog is working great.  I setup a new log file called “Windows Events” in the Dude’s Syslog to filter out these interesting windows syslog events and separate the already configured network […]

  2. […] my prior post, Configuring Dude’s Syslog, I mentioned a couple of strange or interesting cases where syslog helped troubleshoot an issue I […]

  3. […] Configuring Dude’s Syslog was probably the most beneficial thing that I’ve done in my spare time.  Ben came into my office this morning and asked about the numerous attempts to log in to one of the branch office’s Internet routers.  I went to The Dude to investigate. Well, well… I see numerous problems here […]

  4. CypherBit says:

    I’ve read all your posts regarding The Dude and syslog and am impressed.

    I am however not able to set it up as described.

    Regexp to Log and Notify doesn’t work and I’m unable to have separate syslogs for different devices/IPs (less important since I can always filter, but nice to have).

    I’m currently testing with our Juniper SSG and everything gets written to the syslog, but the regexp rules don’t. Here’s my setup: http://i.imgur.com/fuTDg.png

    I never receive any popup for that VIP entry, same thing if I choose e-mail, flash…

    Any assistance would be appreciated.

    • itcoop says:

      Hello,

      It looks to me that there are problems with your regexp. Remove the quotation marks. To test your regexp, use the filter option on the top of the syslog viewer. When you enter your filter only matches should show up. Use this as your test. To move syslogs by host, create a new log for that host and then setup a regexp match per IP address with an action to log to a different log (such as event).
      Also, the order of your syslog rules is important. Make sure you don’t have a default “accept” action on the top of your list.
      Notifications must be configured separately. Make sure that they also test positive in The Dude before using them. E-mail to SMS, for instance, will not work for some wireless carriers unless you use their SMTP server directly (read: Verizon).

      Hope this helps.

      • CypherBit says:

        It definitelly does help!

        I changed the order (although you mention in one of the blog posts how important that is…I ignored that 🙂 ) and regexp and it works great.

        Much appreciated.

  5. Edward says:

    Hello! I currently use a The Dude monitoring a small network, I try to set up regular expressions syslog (specify the source address of the router and a regular expression such as “admin” or “Port” choose to accept and method of notification – a popup window. Then calling to check the device but unfortunately, in The Dude did not come, already tried different expressions and different addresses. Conventional messages syslog come to his every magazine, I just came across your article – could you tell me what could be the problem? I tried to write a regular expression in # # or admin / admin / as well to no avail

    • itcoop says:

      Be sure to check the list order of your regexp and the action of each. If there is a default “accept all” at the top, none of the following regexp will be processed.

      • Edward says:

        A great big thank you for the help, now it works. I was 11 source addresses that 11 router that was not an option to use a regular expression – just hit all the entries in the log. Now I move the mouse a new record with a regular expression in the top and on the record triggered a popup window!

  6. Trevor Stuart says:

    I know this is an old thred, but hopefully someone will still see this and be able to help.
    I have my syslog passthrough everything, logging as it goes, then I want it to executeLocally a command. I have the execute Locally setup, and when I issue a test it comes through perfectly.
    But when in the syslogs rules I never get the packet. I’ve double checked the order of things and it SHOULD be working. I’m assuming the issue is with my Regexp somewhere. But I have another one, that accepts and goes to syslog without e-mailing and that one seems to work just fine.
    Even tried putting my “packetSend” notification as the first thing and it never issues the packet.

    • itcoop says:

      You mention that you “never get the packet”. Are you referring to never getting a regexp match? There are three Action options: accept, passthrough, and drop. Setting up a notification to “execute locally” is something different than matching. If you have your regexp setup to parse for packetSend, action to passthrough to the next rule, and a notification to execute something, my guess is that your notification isn’t functioning properly. Try changing the notification to something simple: like “beep”. If the syslog message comes through, is matched, and the Dude beeps, then there’s nothing wrong with syslog or your regexp. The issue is with the “execute locally” notification script. To test the execute locally notification script, try a simple batch script: echo triggered at %time% >> c:\test.txt

      • Trevor Stuart says:

        I was able to confirm that the issue is with Execute Locally script. I’ve now tried the script as a bat file. Tried running dude as a service. Tried running as administrator, already had the account logged in as an admin… All the same issue. It works fine when I click Test but when in the rules it does not.
        Even tried upgrading to latest beta, 4beta3 and still no luck.
        I’m on Windows SErver 2008 R2, UAC is disabled.

        I’ve been posting to the forum too, as I wasn’t sure originally if this was still being watched… But I’d appreciate any ideas I can try.
        Using the simple batch script it does not work either.Unlike before test fails too though… If I open a cmd prompt and execute “echo triggered at %time% >> c:\test.txt” I get it to work. But that same line as Execute Locally doesn’t work. Perhaps there is something in Windows stopping Dude from running commands directly?

      • itcoop says:

        Is UAC turned off? Also, check if Dude is running as a service account with execute locally privileges (administrator)

      • Trevor Stuart says:

        Yes UAC is off.
        I have Dude running as a service, and using the local administrator account.

      • itcoop says:

        At this point, I’d turn on auditing on windows check the event logs for reason your script is failing to execute. My guess is some obscure permission issue.

      • itcoop says:

        Another option I’d suggest trying is using schtasks.exe. What I do is create scheduled tasks on all my servers that I wish to run remote commands on and name each scheduled task uniquely. When a condition is triggered by Dude, it executes:
        schtasks /run /s \\[Device.FirstDnsName] /tn (taskname)
        Where taskname is the name of the scheduled task on the machine. It seems to work well with the windows systems as long as DNS resolves properly. I believe I have an example here on the blog regarding monitoring CPU runaway processes… or something like that.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s