Leap Second

Posted on January 2nd, 2009 in Misc, Technical by Aaron Conaway

Did anyone notice (or care about) the leap second?  I did neither.  Here’s some cool output from Kevin Oberman on the NANOG list, though.

bash-2.05b# date
Thu Jan  1 00:59:58 CET 2009
bash-2.05b# date
Thu Jan  1 00:59:59 CET 2009
bash-2.05b# date
Thu Jan  1 00:59:60 CET 2009
bash-2.05b# date
Thu Jan  1 01:00:00 CET 2009
bash-2.05b# date
Thu Jan  1 01:00:01 CET 2009
bash-2.05b#

A Little Politics for the New Year

Posted on December 29th, 2008 in Misc, Networking, Security, Tools by Aaron Conaway

Stretch at Packetlife has a lively little write-up on the Australian government’s attempt to implement a nation-wide web filtering service.

From Packetlife.net:

Setting aside the myriad of technical barriers to implementing such a system, the most obvious question is, “who decides what gets blocked?” When a corporation implements a web filter, it does so in accordance with corporate policy — policy that is set by the owner of the network. But the Internet doesn’t belong to any one entity, be it governmental or commercial, so such an authority simply doesn’t exist at this scale. In a very Orwellian sense, this filtering initiative appears to want to create that authority out of thin air.

I don’t know enough about the specifics down under to weigh in very heavily, but I would never support any service that filters web content from my house.

Is That a Bandwidth Graph or a Polygraph?

Posted on December 23rd, 2008 in SNMP, Technical, Tools by Aaron Conaway

I thought I’d throw an easy one out before taking off for the holiday.  Merry Christmas, Hannukah, Kwanzaa, Saturnia, etc., to all.

A few years ago, I was looking through some Cacti graphs of gigabit trunks between 6500s and noticed an abrupt change in traffic.  The graphs were nice and smooth at around 135Mpbs until, seemingly randomly, they just started going wild.  It seriously looked like a lie detector from the movies; I saw spikes up to 140Mbps in one sample and 2Mpbs the next sample for days and days.  I looked around to see if anything weird was going on somewhere on the network, but I didn’t find anything.

I manually went to the trunk ports and sampled the output of show interface over the course of a day or so.  Nothing strange.  Everything was moving up and down about 10% during the day, but there were no huge jumps and drops like the graphs told me. I asked our monitoring guy for a little help, and we sat down and found…nothing.  On a whim, we started looking through the templates that we could use on an interface and saw the 64-bit counters, which triggered a little binary math in both of our heads.

2^32 = 4,294,967,296
120Mbps * 60 sec/min * 5 min/sample = 3,600,000,000 bits/sample

That’s awful close, isn’t it?  What if traffic goes up to 150Mbps.

150Mbps * 60 sec/min * 5 min/sample = 4,500,000,000 bits/sample

That’s bigger than a 32-bit counter!  If a trunk was pushing 150Mbps at any time, Cacti would not be able to detect that a counter had been flipped multiple times between samples!

Cacti (and any or most other SNMP tools) polls an interface, it gets the total number of bits that have been sent or received since these counters were reset (usually at boot).  When it polls again in 5 minutes, it get the new number and subtracts the old number, and, voila, the total number of bits transferred in the last 5 minutes.  If the new number is smaller than the first, then Cacti assumes that the counter flipped and adds the second number to the difference between the first number and 2^32 to get the value.  If, however, the interface is spewing out 150Mpbs of data, the counters may flip around once and then still be higher than the original number.  If that the case, Cacti only sees a small number of bits difference and show you a sample rate of 2Mpbs.  What if you’re pushing 300Mbps on the trunk?  It may flip twice and still land higher than the first sample for a rate of 2Mbps.  Ack!

The fix?  Query the proper OID for 64-bit counters.  It shows the same data, but reports it in much larger numbers.  Calculating 2^64 gives you 18,446,744,073,709,551,616.  That’s 18.4Ebps.  Exabits.  Wow.  I can’t even imagine that much traffic.  I’m sure I’ll be dead and gone by the time network reach those speeds in the wild.

All modern network gear has the capability to use 64-bit counters, so use them where you can.  Since it’s jut another OID, using 64-bit counters doesn’t add any more CPU to the gear or the monitoring box.  Some packages like Cacti come bundled with support for “the big boy” counters, but you may have to do a little research and find the right OID to query. Google is your friend.  Let me know if you have problems, and I’ll try to help.

Do it now, by the way; you don’t want to have to explain those flaky graphs to the boss.  The concept of Exabits may be a little much for him to understand.

Configuring Dedicated Trunks for the CSM

Posted on November 24th, 2008 in CSM, Catalyst, Cisco, IOS, LAN, Switching, Trunking, VLANs by Aaron Conaway

Did you catch the article on setting up fault tolerance on the CSM?  In that article, I mentioned that Cisco recommends a dedicated trunk for the FT VLAN if you have two HA CSMs in two chassis.  Discuss amongst yourselves while I drone on.

Why should you set up a dedicated trunk for this stuff?  The most obvious reason is to be sure that normal traffic doesn’t step on the syncing traffic.  Since we’re syncing state information as well as configuration, the frames need to arrive in a timely manner.  Any errors could potentially disrupt the FT process, which is bad.  You surely don’t want the primary to fail only to find out that the standby doesn’t have the complete or current config.

Another reason is to keep the syncing traffic from stepping on normal traffic.  The CSM is a pretty robust box and can handle a pretty good chunk of data.  If you had a 100Mbps trunk between your chassis, there is the potential for the link to get flooded if the CSM ever starts sending some real data.  All things being equal, though, your trunks are probably sized properly for your network, and the addition of the syncing traffic probably won’t affect much.

Let’s review our configuration from the other article.

vlan 83
 name CSM-Sync
!
module csm 3
 ft group 1 vlan 83
  priority 100 alt 90
  preempt

This snippet creates VLAN 83 and tells the CSM to use it for syncing, but how do we dedicate a trunk for that VLAN?  We use the switchport trunk allowed vlan directive.  We’ll assume that G1/1 on your primary switch is connected to G1/1 on your standby.

interface GigabitEthernet1/1
 description CSM Syncing
 switchport
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 83
 switchport mode trunk

This sets G1/1 up to only allow VLAN 83 across it.  If you do a show int G1/1 trunk, you’ll see that this VLAN is the only one allowed, the only one active, and the only one one forwarding on that link.  Of course, you’ll need to do the same on the other side to keep traffic flow sane, but it’s fairly easy.

What if G1/1 goes down, though?  You’d lose sync, so you probably want to look at a solution for that little problem.  You could put in multiple links and let Spanning Tree do the work.  You could even turn those links into an EtherChannel for redundancy and throughput.  If you have more than two chassis, you could full mesh them with trunks dedicated to VLAN 83.  There are a number of ways around the problem.  Be creative.

Be sure to send turkey questions my way.

Using Probes on the CSM

Posted on November 6th, 2008 in CSM, Catalyst, IOS by Aaron Conaway

There are three different ways that a CSM checks for the health of the servers — active probes, inband health checking, and inband HTTP monitoring.  Let’s talk about active probes.

Active probes (or just probes) typically send traffic to one of the RIPs of a serverfarm, do some stuff, and give a pass or fail grade.  If the probe fails a certain number of times in a row, that server is considered sick and taken out of the pool for use.  The CSM keeps checking the unhealthy until it passes a number of times in a row, at which point it is placed back in the pool for use.  Almost everything is configurable, of course, so let’s look at some of those settings.

These all have their defaults, so you don’t need to actually configure them, but it’s important to know they’re there to tweak later.  There are other parameters to the more specific types of probes as well.  You’ll have to venture forth on your own to figure those out, though I’ll be glad to help.

  • interval:  The time between probes when the server is healthy.
  • retries:  The number of consecutive failures before a healthy server is considered failed.
  • failed:  The time between probes when the server is failed.
  • recover:  The number of consecutive successes before a failed server is considered health.

I always said that the CSM only speaks HTTP, but it knows how to order a ham and cheese sandwich in a few other protocols, including doing some decent stuff like watching for SMTP banners, looking for ICMP reachability, or getting a response from a Tcl script.  Here’s a list of the probes with some boring commentary.  Depending on IOS versions, you may have more or less available to you.

  • tcp: Establishes a connection to a TCP port.  If the port is open, we pass.
  • udp: Same as TCP, but in UDP.  Duh.
  • icmp: Ping-a-ling.  Do I need to explain this one?
  • http: Requests a URL from a webserver and looks for HTTP return codes.
  • dns, smtp, telnet: These guys just attach to the service and look for a proper response header.  It doesn’t do any transactions or anything but simply makes sure that DNS, SMTP, or telnet is running on the port.
  • script:  These probes run a Tcl script (that you write) and look for the return code of the script.  Very powerful!
  • kal-ap-tcp, kal-ap-udp:  Admittedly, I have no earthly idea what those are.  Most references I’ve seen to it involve the Cisco ACE, but I have no clue.  Can someone fill in for me here?

Shall we try one or two?  How about a TCP probe that makes sure your custom app is still running on TCP/8839?  Since it’s a custom app, we can use a TCP probe on that port to make sure something is listening (If you need something more, you’ll need to check out script probes.).

probe MYAPP tcp
description My app on TCP/8839
port 8839

Now we apply the probe to the serverfarm.

serverfarm MYFARM
probe MYAPP
real 1.1.1.1
inservice
real 1.1.1.2
inservice

Easy enough.  How about another?  I want to configure an HTTP probe that gets the URL /test.php and looks for the status code 200 and apply it to a serverfarm.

probe LOOKFOR200 http
request url /test.php
expect status 200

serverfarm YOURFARM
probe LOOKFOR200
real 2.2.2.1
inservice
real 2.2.2.2
inservice

By the way, you don’t want to use this in production at all.  You’ll probably want to elect to look for ranges of status codes instead of a single value.  Google up HTTP return codes and you’ll see why.

I will mention again, though, that the script probe is very interesting.  If you know Tcl or have done any development, check these things out.  You can do whatever you want with them instead of using the canned probes that come with your CSM.  I suggest taking a look at Ivan Pepelnjak’s page for some insight into Tcl on Cisco devices.

I think you can take it from here.  As always, comments are welcome.

Using CDP To Track Down Physical Connections

Posted on October 31st, 2008 in CDP, Documentation, LAN, Networking, Switching by Aaron Conaway

We have a location that’s a few blocks down from the main office here, and we were reviewing the circuit size to make sure it was sized properly.  Since not one person knows what’s going on and the trending graphs gave us conflicting details, one of our network dudes took me down to the site to do a physical survey to see what’s going on.  Well, besides the fact that no one was there, we discovered a hodgepodge of routers and switches that were cross-connected to one another on multiple floors of the building (I really wish I could post pics to emote the effect).  It’s kind of hard to figure out what’s going on when you can’t see both ends of the cable, so we had to abandon all hope.

What are our options, then, to see how things are uplinked and connected?  In this case, CDP is the answer.  The Cisco Discovery Protocol (CDP), if you don’t know, tells you what other Cisco devices a particular Cisco device in plugged into.  So, if you have a 2811 plugged into a 2960, you can see what ports they’re connect on along with some other details.  Here’s an example.

Switch1#sh cdp neighbors
Capability Codes: R - Router, T - Trans Bridge, B - Source Route Bridge
S - Switch, H - Host, I - IGMP, r - Repeater, P - Phone

Device ID            Local Intrfce         Holdtme   Capability    Platform   Port ID
Router1                    Gig 0/10              122             R       3640      Fas 0/1
Switch2                    Gig 0/9               141            S I      WS-C2950G-Fas 0/48

As you can see from the output, Switch1’s G0/10 is plugged into Router1’s F0/1 interface, and Switch1’s G0/9 is plugged into Switch2’s F0/48.  You can also see that Router1 is a 3640 and a router (the “R” under Capability).  Switch2 is a 2950G switch.

So, today, I’m going to start at the head end of my frazzled location and try to figure out where everything is connected.  I’ll get all the CDP neighbors for that device, document it, then repeat for the next hop until I’m all the way through.  When I’m done, I should have a nice physical map.

Beware, my friends, that the “C” stands for Cisco.  It doesn’t stand for Juniper or Nortel or anybody else.  The rule is that CDP only shows your Cisco devices that are connected together and won’t show any other devices in the path, but there are exceptions.  Since it’s broadcast-based, a lot (maybe all?) non-Cisco switches just pass along the packet to the next hop on layer 2, so you may see CDP neighbor adjacencies between switches that aren’t connected to one another.  CDP will think they are, and I don’t know of any way to detect that, so be careful.

Send me money Halloween candy comments if you feel inclined.

Using MAC Access-lists

Posted on October 27th, 2008 in ACLs, Catalyst, Cisco, LAN, Switching by Aaron Conaway

We ran into this today, and, though I knew it existed, I never actually saw it in the wild.  I’m talking about MAC access-lists.

In the example setup, we have a DMZ off of a firewall that contains a whole mess of servers — email, web, ftp, etc.  These should all be in the DMZ for sure, but they shouldn’t talk to each other.  If a bad guy was able to own my FTP server, he would have a nice platform to use to attack my email server.  That’s not cool, so we’ve put in MAC access-lists to help out.

MAC access-lists do just what you think they do; they put access controls on what MAC addresses can talk to what on a  particular port.  You build a list of permit and deny lines that you want to use to control access and apply them to a port.  Sound familiar?  Yeah…it does sound a lot like IP ACLs, doesn’t it?  Here’s some technical detail for those that care.

  • MAC ACLs can be numbered or named.  The range for numbered ACLs is 700-799.  The naming conventions follow the same rules as an IP list.
  • A port must be a in access mode before you can apply a MAC ACL.  You can’t apply them to trunks.
  • You can use the any and host directives instead of using MAC/mask combos.
  • If you’re feeling froggy, you can actually do a MAC/mask combo, but, since the MACs on your hosts probably aren’t sequential, I don’t see the point.

Let’s build one.  Suppose we have a host with the MAC 1111.2222.3333 plugged into a switch on port F0/1.  It is a web server and needs access to the mail server on the same network (4444.5555.6666) and the firewall (9999.8888.7777) as a gateway.  How do we set it up so that the web server can only speak to those guys?

mac access-list extended WEBSERVER
permit host 1111.2222.3333 host 4444.5555.6666
permit host 1111.2222.3333 host 9999.8888.7777

int f0/1
mac access-group WEBSERVER in

Now that host can only speak to those two MACs on layer 2.  Pretty simple yet again.  There are some things to note, though.

First of all, this just keeps this host from talking to other MACs on the same network and does nothing to keep packets from other host from reaching our webserver.  Though the webeserver won’t be able to respond, one could argue that it’s best practice to apply an outbound MAC ACL that mirrors the inbound.

There’s also an issue of broadcast.  Say that the webserver is now trying to send a packet to the mail server to send an alert.  One of the first things that it will do is to send a broadcast (ffff.ffff.ffff) asking for an ARP reply.  Guess what?  That MAC isn’t in the ACL.  You could put static ARP entries on the boxes, but it may be easier just to allow the host to talk to the broadcast.

Don’t get layer-2 security mixed up with layer-3 security.  This restricts access on (usually) a single  IP network where you don’t have routers or firewalls between hosts.  Use the old-fashioned IP ACL for between networks and MAC ACLs for between hosts on a network.

Questions?  Comments?  Bribes?  Free money?  Send them all my way.

Configuring Fault Tolerance on the CSM

Posted on October 10th, 2008 in CSM, Catalyst, IOS by Aaron Conaway

Like (nearly) everything in the Cisco world, you can set up your CSM to fail over to another module when the primary dies a horrible death.  You can have two in the same chassis or even have them in separate chassis — the process is the same no matter how you have it set up.  Either way, you have a primary and a secondary module in fault tolerance (FT) mode.

First, we’ll establish a VLAN that the CSM will use to do its configuration and state syncing over.  This is just an ordinary VLAN; there’s nothing special about it, really, but it should be dedicated for the CSM to use for syncing.  Let’s randomly choose VLAN 83.

vlan 83
name CSM-Sync

You will, of course, have to do this on every switch that holds a CSM, so, if you’re using them in two different chassis, you’ll put the same VLAN on each making sure they can see each other through a trunk.  Cisco recommends that you dedicate a trunk between the two switches for the sync VLAN in order to remove the chance of other traffic stepping on the sync packets, but I’m not convinced that’s necessary.  Use your judgement on that one.

Back to it.  Next, you need to decide on a FT group ID.  This is similar to a HSRP group and lets you run multiple FT groups on the same VLAN.  The group ID needs to be in the range of 1 to 256, so, since this is the first one, let’s just use 1.  Get into config mode for the CSM that you want to be the primary and do this.

ft group 1 vlan 83

This takes you to the config-slb-ft prompt.  Just like HSRP, we need to set priorities for each device and whether or not it should preempt, so let’s configure.  Yes, we want to preempt, right?  Let’s set the priorities to 100 and 90, too.

priority 100 alt 90
preempt

This sets the primary CSM to priority 100 and the secondary to 90; both will preempt.

What about configuring the secondary for FT? That’s easy.  Go into CSM config mode on the secondary and enter the ft group 1 vlan 83 command.  That’s it. The two CSMs will do a little arguing and come back as the primary and secondary.  After that, all configuration is done on the primary, which is synced over to the secondary just like an ASA.  Pretty cool, eh?

When configuring things like IP addresses, though, you’ll need to make provisions for the secondary with the alt directive (remember that one from the priority).  I won’t go into much, but you’ll need it mostly when settings IPs to VLANs.  Here’s an example of setting an IP address on client VLAN 100 for both the primary and secondary.

vlan 100 client
ip address 192.168.0.11 255.255.255.0 alt 192.168.0.12 255.255.255.0

Alright…one more thing.  The configurations don’t sync automagically (at least not on my old version of code).  If you make a change to the primary CSM, you’ll see an out-of-sync message when you look at the FT status.

Switch#sh mod csm X ft
FT group 1, vlan 83
This box is active
Configuration is out-of-sync
priority 100, heartbeat 1, failover 3, preemption is on
alternate priority 90

If the primary goes down now and the secondary takes over, the changes you just made won’t be reflected on the secondary.  You fix this with the hw-module contentSwitchingModule X standby config-sync command (where X is the module slot in the chassis).  Alternatively, you can just type hw c X s c as a shortcut.  It’ll take a few minutes depending on your configuration, so check your logs for when it’s finished.  Note that the secondary does not save the new configuration to its startup-config; you’ll have to log in and save that manually (or automatically through CiscoWorks or something) to save changes there.

Let me know if you have any questions and check out my page on getting output from Cisco’s fine mid-tier load balancer.  :)

Running Multiple Data Centers on a Stick with the CSM

Posted on August 12th, 2008 in CSM by Aaron Conaway

That’s an awesome title, eh?  I’ve mentioned a router-on-a-stick before but not a data-center-on-a-stick (DCOAS).  This is one of those Cisco terms I ran across a while ago and is a group of servers sort of sticking out on their own behind a load balancer and/or firewall.  Connections to and from the server group go through a single spoke — kinda like stubby routing.  Here’s a pretty picture.

Data Center on a Stick

To configure this type of setup on the CSM, assuming you’re running in router mode, you just define a server and client VLAN pair along with an alias IP on the server VLAN.  The servers on that VLAN point to the alias as their gateway for return traffic, and the CSM handles the VIP/RIP conversion.

What if you have more than one server/client VLAN pair, though?

Data Center on a Stick

You set up your VLANs and your server VLAN alias for both sets and all is well, right?  Not really.  The CSM isn’t a router and doesn’t do a very good job at routing.  If you were to send traffic from “Other Stuff” to a VIP on VLAN101, everything works great.  If, however, one of those servers initiates a connection, the traffic will come out of the client VLAN with the lowest VLAN ID.  If you have VLANs 1 and 2 for the top pair, traffic from all servers will come out VLAN 1.  Very big problem if you have a firewall that’s expecting traffic on another interface.

How do you get around it, then?  You have to trick the CSM by generating new catch-all vservers on each server VLAN to “load balance” the gateway for each VLAN.  Let’s look at the configs.

Here are the serverfarms.

serverfarm VLAN1-SF
no nat server
no nat client
real 1.1.1.1
inservice

serverfarm VLAN101-SF
no nat server
no nat client
real 1.1.101.1
inservice

And the vservers.

vserver VLAN1-VS
virtual 0.0.0.0 0.0.0.0 any
vlan 2
serverfarm VLAN1-SF
inservice

vserver VLAN101-VS
virtual 0.0.0.0 0.0.0.0 any
vlan 102
serverfarm VLAN101-SF
inservice

When a server makes a new connection to a database or whatnot, it sends the traffic to the alias you already configured on the server VLAN.  When the CSM receives that traffic through the alias, this new vserver takes over since it’s set for any protocol and port on any IP.  This vserver uses a serverfarm that contains the gateway for the appropriate client VLAN — in our case, the IP of the firewall — so now every connection from the server will exit an appropriate VLAN instead of out the lowest ID.

Confusing?  Yes.  Should be fixed?  Yes.  Is already fixed?  I have no idea, but it’s not as of 12.2 and some change.

Setting Up System Logging on an IOS Device

Posted on August 11th, 2008 in IOS, Tools by Aaron Conaway

I like logging on an IOS device.  I like to look at the buffer and tell you that your interface went down 30 seconds ago.  I like to look on the box and see that BGP with my Internet provider has been flapping since 02:13ET.  I like to look and see that one of the other guys has been making changes to the gear all morning.  I could go on and on.

There are lots of ways to monitor a Cisco box — SNMP polling, SNMP traps, show commands, etc. — but there’s nothing so handy as the log buffer.   A show logging can provide you all sorts of information on things you do and don’t care about, so it’s important to know the destinations and levels when setting up logging.

There are four logging destinations.

  • Console — logs to the console ports of the device
  • Monitor — logs to any device or pseudo-device that’s in monitoring mode.  The most common application is when you do a terminal monitor to see output of debugs.
  • Buffer — logs to a memory device that let’s you see the log messages on demand.  It has a finite size and scrolls old messages out after X bytes have been written.
  • Host or Trap — logs to an external syslog server.

What’s the most important destination?  There’s not one.  I personally thing the syslog host is the most important since it allows you to log messages to disk on a server somewhere.  The buffer is also important since it lets anyone with access see what’s going on with the device.  Your mileage will vary depending on what you have set up.

There are eight logging levels as well.

  • Debug - level 7
  • Informational - level 6
  • Notifications - level 5
  • Warnings - level 4
  • Errors - level 3
  • Critical - level 2
  • Alerts - level 1
  • Emergencies - level 0

Wow.  There’s some numbers.  What does it mean?  Every logging message comes with a level.  If my CSM loses a RIP, it generates a level 6 message (%CSM_SLB-6-RSERVERSTATE) telling me.  If I configure the box, I get a level 5 message (%SYS-5-CONFIG_I) saying that I’ve done so.  If my router is on fire in the rack, I’ll get a level 0 message telling me.  Now, when you configure a destination, you have to give a logging level, and every message at that level or below will be logged.  If I set my logging buffer to 5, I’ll see the configuration message and the “Oh, the humanity!” message, but not the RIP failures.  If I set it to 0, I only see the emergencies.  If I set it to 7, I see everything.

Let’s do the configuration, then.  After hours or research, you’ve decided to use a remote syslog server at 1.2.3.4 for warnings and the buffer for informational.  Here’s what you’d do.

logging host 1.2.3.4
logging trap warnings
logging buffer information

It’s not that hard.  You can even use the number instead of the words for the logging level if you would like.  The same procedure holds true for the console and monitor mechanisms — logging <mechanism> <level>.  Easy.

If you care, here’s what I usually run.

  • Console — off.  I’ve seen a console rendered unusable because the console was being obliterated with syslog messages.  Not only is it an issue with being able to see what you’re typing when stuff is scrolling, but some older devices wind up using 100% CPU because they’re sending messages to the console.
  • Monitor — debug.  It doesn’t really log anything unless you do a term mon or something, and, in that case, I want to see my debugs.
  • Buffer — informational, but it depends on the device.  It lets me see all the messages except for debugs, which is probably just right for most routers and switches.  If you’re switch is in a closet somewhere with users plugged directly into it, you may be flooded with up/down messages, so keep an eye out for stuff like that.
  • Host or Trap — informational.  Debug’s a little too much for the corporate environment, but, depending on how much disk space you have, you may be able to handle it.

There’s a lot more to syslog and log messages, so see this nifty Cisco page.

Next Page »