Archive for the ‘snmp’ tag
I apologize to my adoring fans (both of you) for the lack of posting. I’m in the middle of moving, buying a new house, selling my current house, getting a mortgage, etc. I’ve up until 11:30 nearly every night filling out forms and going through red tape. Don’t get me started on getting money from a 401k! Anyway…
I got in this morning, and a coworker was telling me that the data center’s HVAC was crippled due to an oil leak, and it was 90F in there. D’oh! It wasn’t quite that high, but it was warm. Luckily, all of our network gear is on the end of the rows with AC, so we’re safe, but it got me thinking about monitoring temperature of our 6500s via SNMP. I’ve done it via Cacti, but I never really looked how to do it manually.
I read this article on Cisco’s site which made it clear as mud. Basically, you have to query the entPhysicalIndex OID branch to get an integer that represents the physical sensor, then you query the entSensorValue to get the temperature. I made the lazy choice to use grep on entPhysicalDesc OID to find the module I wanted and use that to grep the entSensorValue for the temp.
Let’s say I’m looking for the outlet temperature (the temperature of the air on the way out) on module 3. First, let’s find out what th entPhysicalIndex is by walking the entPhysicalDesc OID and grepping out “module 3″.
[myserver]$ snmpwalk -c <COMM> -v 2c <HOST> 188.8.131.52.184.108.40.206.220.127.116.11 | grep "module 3" SNMPv2-SMI::mib-18.104.22.168.22.214.171.12401 = STRING: "module 3 power-output-fail Sensor" SNMPv2-SMI::mib-126.96.36.199.188.8.131.5202 = STRING: "module 3 insufficient cooling Sensor" SNMPv2-SMI::mib-184.108.40.206.220.127.116.1103 = STRING: "module 3 outlet temperature Sensor" SNMPv2-SMI::mib-18.104.22.168.22.214.171.12404 = STRING: "module 3 inlet temperature Sensor" SNMPv2-SMI::mib-126.96.36.199.188.8.131.5205 = STRING: "module 3 device-1 temperature Sensor" SNMPv2-SMI::mib-184.108.40.206.220.127.116.1106 = STRING: "module 3 device-2 temperature Sensor"
Since we’re looking for the outlet temperature, we can see the index in question is 3003. Now, we use that to query the entSensorValue.
[myserver]$ snmpwalk -c <COMM> -v 2c <HOST> 18.104.22.168.22.214.171.124.126.96.36.199.1.4.3003 SNMPv2-SMI::enterprises.188.8.131.52.184.108.40.206.3003 = INTEGER: 19
Ah…a cool 19C.
This is not a very graceful way to do it, but it gets the job done in a pinch. I haven’t tested it, but I would imagine this technique would work on 7600s as well. I know it doesn’t work on 2950s and the like.
Send any house keys questions my way.
I thought I’d throw an easy one out before taking off for the holiday. Merry Christmas, Hannukah, Kwanzaa, Saturnia, etc., to all.
A few years ago, I was looking through some Cacti graphs of gigabit trunks between 6500s and noticed an abrupt change in traffic. The graphs were nice and smooth at around 135Mpbs until, seemingly randomly, they just started going wild. It seriously looked like a lie detector from the movies; I saw spikes up to 140Mbps in one sample and 2Mpbs the next sample for days and days. I looked around to see if anything weird was going on somewhere on the network, but I didn’t find anything.
I manually went to the trunk ports and sampled the output of show interface over the course of a day or so. Nothing strange. Everything was moving up and down about 10% during the day, but there were no huge jumps and drops like the graphs told me. I asked our monitoring guy for a little help, and we sat down and found…nothing. On a whim, we started looking through the templates that we could use on an interface and saw the 64-bit counters, which triggered a little binary math in both of our heads.
2^32 = 4,294,967,296
120Mbps * 60 sec/min * 5 min/sample = 3,600,000,000 bits/sample
That’s awful close, isn’t it? What if traffic goes up to 150Mbps.
150Mbps * 60 sec/min * 5 min/sample = 4,500,000,000 bits/sample
That’s bigger than a 32-bit counter! If a trunk was pushing 150Mbps at any time, Cacti would not be able to detect that a counter had been flipped multiple times between samples!
Cacti (and any or most other SNMP tools) polls an interface, it gets the total number of bits that have been sent or received since these counters were reset (usually at boot). When it polls again in 5 minutes, it get the new number and subtracts the old number, and, voila, the total number of bits transferred in the last 5 minutes. If the new number is smaller than the first, then Cacti assumes that the counter flipped and adds the second number to the difference between the first number and 2^32 to get the value. If, however, the interface is spewing out 150Mpbs of data, the counters may flip around once and then still be higher than the original number. If that the case, Cacti only sees a small number of bits difference and show you a sample rate of 2Mpbs. What if you’re pushing 300Mbps on the trunk? It may flip twice and still land higher than the first sample for a rate of 2Mbps. Ack!
The fix? Query the proper OID for 64-bit counters. It shows the same data, but reports it in much larger numbers. Calculating 2^64 gives you 18,446,744,073,709,551,616. That’s 18.4Ebps. Exabits. Wow. I can’t even imagine that much traffic. I’m sure I’ll be dead and gone by the time network reach those speeds in the wild.
All modern network gear has the capability to use 64-bit counters, so use them where you can. Since it’s jut another OID, using 64-bit counters doesn’t add any more CPU to the gear or the monitoring box. Some packages like Cacti come bundled with support for “the big boy” counters, but you may have to do a little research and find the right OID to query. Google is your friend. Let me know if you have problems, and I’ll try to help.
Do it now, by the way; you don’t want to have to explain those flaky graphs to the boss. The concept of Exabits may be a little much for him to understand.
We all have limited budgets these days. Long gone are the days of unlimited resources and uncontrollable expansion of the network, so it’s important that any network dude or dudette pay attention to the open-source world. Below is a list of stuff I use at the office and at home to monitor, trend, and alert the network. All this stuff is free and runs on Linux to save even more cash.
- Cacti – This is a system for trending pretty much anything. If it has an SNMP value, Cacti can trend it. It’s also really flexible, allowing multiple displays of data and even a mechanism to get values from scripts you write. At the office, we use it to monitor utilization of the circuit and Ethernet ports, CPU and memory of the gear, and the number of connections on the load-balancer. At home, I use it to watch utilization and track the number of connections to the wireless networks.
- Nagios – This is a monitoring and alerting system for all sorts of stuff. It watches hosts and applications for availability and response time, then alerts based on threshold. This is one of the most complicated apps to configure, but, once it’s up, it rocks. I use it at home to monitor all the network gear and systems for response times. I also use it to monitor the web servers and restart them if they’re down.
- Apache – You know what Apache is. You use it already. About 71% of webservers on the Internet are Apache.
- Squid – A caching proxy server by the same guys who do Apache. It can be configured for both inbound and outbound application acceleration. It’s great to put in front of a CMS like Drupal or Joomla. It has a mess of built-in functions that can look for bad requests, do redirects, or completely rewrite requests. At work, it fronts our application and CMS servers so users don’t have access directly to them. At home, it runs on the firewall to serve pages to the Internet. The real webserver actually sits on a box behind the firewall for security.
- Subversion – This is a version control system. Subverions lets you create repositories, check out the contents, edit them, and check them back in. This is good for keeping track of configuration files or scripts you write. We use it at work to track configuration files for Apache, NTP, yum, etc. At home, I use it to keep track of my scripts and Perl module.
- Rancid – This is configuration management for Cisco (and other network) devices. It gets configs from devices and checks them for changes. It’s got built-in alerting and is easy to set up. I use it at home to keep track of the configs on the switches and access points.
- nfsen/nfdump – These are [tag]netflow[/tag] tools. Nfdump is a suite for collecting the data, while nfsen is for displaying the information. Check out netflow if you’ve never worked with it…it’s pretty cool.
- Dyanmips/dynagen – These apps let you run virtual Cisco routers on a machine. You can set up full network deployments for testing and configuration experimentation. It takes a good bit of resources, but it’s well worth it for the functionality. I use it all the time at work to test or tweak configs. I also use it to simulate certification labs.
I had an article a few weeks ago about the Cisco CSM, which is a load-balancer module for the 6500 series switches. This thing is a pretty good device, but monitoring the connections to each VIP and RIP is not very straightforward. If you have an SNMP monitoring system like Cacti or MRTG, you need to know the OID to monitor, but it doesn’t work like anything else in the world.
Let’s start with the OID for the vserver. First, there’s the base OIDs that you can look up on CCO. This is just standard SNMP stuff that Cisco defined long ago. The slot number that the CSM is in is added to that base OID. You then have to add the length of the vserver name — don’t ask me…I don’t know. Next comes the really stupid part — you have to take the names of the vserver and get the ASCII values of every character in the name and add each to the end to get the full OID. Yes, it’s that stupid.
<BASE VSERVER CONN OID>.<SLOT>.<LENGTH>.<VSERVER NAME>
The serverfarm OID is pretty close — the base OID and slot. You then add the length of the serverfarm name along with the ASCII values of the serverfarm name (quite like we did for the vserver). Finally, the only part that might make sense, you take the IP of the real server and add it to the end with the instance of “0″. Again, I have no clue why it’s like this or what Cisco was trying to do. It’s terrible.
<BASE SF CONN OID>.<SLOT>.<LENGTH>.<SF NAME>.<REAL IP>.0
How about an example. Let’s say you have a vserver called VSERVER1 that you want to monitor. This vserver is configured to use the serverfarm FARM1, which has two reals of 192.168.1.101 and 192.168.1.102 that you also want to monitor. You also know that the CSM is in slot 3. The base OID for vserver connections is .220.127.116.11.18.104.22.168.22.214.171.124.1.17; the base OID for serverfarm connections is .126.96.36.199.188.8.131.52.184.108.40.206.1.5. This all gives you this:
Did I mention it’s overly-complicated and terrible? It’s actually so bad that I just wrote a Perl script to do it for me because, for God’s sake, I’m not doing that by hand. Let me know if you need any help with it.
I finally got around to looking into [tag]SNMP[/tag] v3 and was shocked at how easy it actually is. When I first looked up info on it so many moons ago, I saw table after tables of views and privilege levels and thought I would have to put in a billion hours getting it customized. I settled down and went through some Google results and found a blog post by
SNMP v3 gives you a few things that you’ll like. First of all, the transactions can be encrypted, so you don’t have to worry about people sniffing your traffic on the evil Internet. You also get username and password combos for authentication. Older versions use the community, which serves as a password. My buddy has a story about using the default communities on his cable modem to find upstream hosts, the using the same on his ISP’s network gear. That’s pretty lackluster security that could be hardened with v3.
This version of SNMP is very complicated, and the key to starting off is to forget about the views. Views make SNMP v3 ultra-powerful, but you don’t need them in a simple setup. Obviously, we all evolve, and you’ll probably use it later, but there’s no need to worry yourself to start.
Let’s do this, then. First, you need to define a v3 group and user/pass. Just something simple will do. Let’s choose a group name of “snmp-group” and a user of “snmp-user” with the password “user-pass”. Now all we have to do is configure the thing. I’m assuming you’re setting it up on a Cisco IOS device of some kind, but SNMP v3 has been available for quite a while on a lot of platforms and OSes.
snmp-server group snmp-group v3 auth
snmp-server user snmp-user snmp-group v3 auth md5 user-pass priv des56 encryption-key
Note that we’re using an MD5 hash for the password right now. If you have the right code, you can do DES56 encryption, but every version of [tag]IOS[/tag] that supports SNMP v3 also supports MD5.
That’s it, actually. By default, you have read-only access to the whole MIB tree. If you want to set up more granular access, you can look at views, but that’s beyond the scope here. I’m sure I’ll have an article about that later.
Let’s test our new setup with an snmpwalk. There’s some new flags you need to pass to use v3, but it’s pretty straightforward. “a” is for encyrption; “u” is for user; “A” is for password; “l” is for the authorization level (another advanced SNMP v3 topic).
snmpwalk -v 3 -a MD5 -u snmp-user -A user-pass -X encryption-key -l authPriv hostname .1
You should see a whole list of stuff scrolling by. If you don’t, check the username and password and try again. Let me know if you need any help getting it running.
Here’s my usual note. If you still have your “snmp-server community” line configured, then v1 or v2c is still available. If you’re converting completely to v3, then just remove the community line. This will disable the old school versions and let you enjoy your encrypted goodness.
Also note that I’m not sure if Cacti or Nagios actually supports SNMP v3 encryption. You’re on your own with that one. If you decide to not use encryption, you can just take out the “priv” section of the user configuration to go with authentication only.