Configuring Dedicated Trunks for the CSM

Posted on November 24th, 2008 in CSM, Catalyst, Cisco, IOS, LAN, Switching, Trunking, VLANs by Aaron Conaway

Did you catch the article on setting up fault tolerance on the CSM?  In that article, I mentioned that Cisco recommends a dedicated trunk for the FT VLAN if you have two HA CSMs in two chassis.  Discuss amongst yourselves while I drone on.

Why should you set up a dedicated trunk for this stuff?  The most obvious reason is to be sure that normal traffic doesn’t step on the syncing traffic.  Since we’re syncing state information as well as configuration, the frames need to arrive in a timely manner.  Any errors could potentially disrupt the FT process, which is bad.  You surely don’t want the primary to fail only to find out that the standby doesn’t have the complete or current config.

Another reason is to keep the syncing traffic from stepping on normal traffic.  The CSM is a pretty robust box and can handle a pretty good chunk of data.  If you had a 100Mbps trunk between your chassis, there is the potential for the link to get flooded if the CSM ever starts sending some real data.  All things being equal, though, your trunks are probably sized properly for your network, and the addition of the syncing traffic probably won’t affect much.

Let’s review our configuration from the other article.

vlan 83
 name CSM-Sync
!
module csm 3
 ft group 1 vlan 83
  priority 100 alt 90
  preempt

This snippet creates VLAN 83 and tells the CSM to use it for syncing, but how do we dedicate a trunk for that VLAN?  We use the switchport trunk allowed vlan directive.  We’ll assume that G1/1 on your primary switch is connected to G1/1 on your standby.

interface GigabitEthernet1/1
 description CSM Syncing
 switchport
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 83
 switchport mode trunk

This sets G1/1 up to only allow VLAN 83 across it.  If you do a show int G1/1 trunk, you’ll see that this VLAN is the only one allowed, the only one active, and the only one one forwarding on that link.  Of course, you’ll need to do the same on the other side to keep traffic flow sane, but it’s fairly easy.

What if G1/1 goes down, though?  You’d lose sync, so you probably want to look at a solution for that little problem.  You could put in multiple links and let Spanning Tree do the work.  You could even turn those links into an EtherChannel for redundancy and throughput.  If you have more than two chassis, you could full mesh them with trunks dedicated to VLAN 83.  There are a number of ways around the problem.  Be creative.

Be sure to send turkey questions my way.

Using Probes on the CSM

Posted on November 6th, 2008 in CSM, Catalyst, IOS by Aaron Conaway

There are three different ways that a CSM checks for the health of the servers — active probes, inband health checking, and inband HTTP monitoring.  Let’s talk about active probes.

Active probes (or just probes) typically send traffic to one of the RIPs of a serverfarm, do some stuff, and give a pass or fail grade.  If the probe fails a certain number of times in a row, that server is considered sick and taken out of the pool for use.  The CSM keeps checking the unhealthy until it passes a number of times in a row, at which point it is placed back in the pool for use.  Almost everything is configurable, of course, so let’s look at some of those settings.

These all have their defaults, so you don’t need to actually configure them, but it’s important to know they’re there to tweak later.  There are other parameters to the more specific types of probes as well.  You’ll have to venture forth on your own to figure those out, though I’ll be glad to help.

  • interval:  The time between probes when the server is healthy.
  • retries:  The number of consecutive failures before a healthy server is considered failed.
  • failed:  The time between probes when the server is failed.
  • recover:  The number of consecutive successes before a failed server is considered health.

I always said that the CSM only speaks HTTP, but it knows how to order a ham and cheese sandwich in a few other protocols, including doing some decent stuff like watching for SMTP banners, looking for ICMP reachability, or getting a response from a Tcl script.  Here’s a list of the probes with some boring commentary.  Depending on IOS versions, you may have more or less available to you.

  • tcp: Establishes a connection to a TCP port.  If the port is open, we pass.
  • udp: Same as TCP, but in UDP.  Duh.
  • icmp: Ping-a-ling.  Do I need to explain this one?
  • http: Requests a URL from a webserver and looks for HTTP return codes.
  • dns, smtp, telnet: These guys just attach to the service and look for a proper response header.  It doesn’t do any transactions or anything but simply makes sure that DNS, SMTP, or telnet is running on the port.
  • script:  These probes run a Tcl script (that you write) and look for the return code of the script.  Very powerful!
  • kal-ap-tcp, kal-ap-udp:  Admittedly, I have no earthly idea what those are.  Most references I’ve seen to it involve the Cisco ACE, but I have no clue.  Can someone fill in for me here?

Shall we try one or two?  How about a TCP probe that makes sure your custom app is still running on TCP/8839?  Since it’s a custom app, we can use a TCP probe on that port to make sure something is listening (If you need something more, you’ll need to check out script probes.).

probe MYAPP tcp
description My app on TCP/8839
port 8839

Now we apply the probe to the serverfarm.

serverfarm MYFARM
probe MYAPP
real 1.1.1.1
inservice
real 1.1.1.2
inservice

Easy enough.  How about another?  I want to configure an HTTP probe that gets the URL /test.php and looks for the status code 200 and apply it to a serverfarm.

probe LOOKFOR200 http
request url /test.php
expect status 200

serverfarm YOURFARM
probe LOOKFOR200
real 2.2.2.1
inservice
real 2.2.2.2
inservice

By the way, you don’t want to use this in production at all.  You’ll probably want to elect to look for ranges of status codes instead of a single value.  Google up HTTP return codes and you’ll see why.

I will mention again, though, that the script probe is very interesting.  If you know Tcl or have done any development, check these things out.  You can do whatever you want with them instead of using the canned probes that come with your CSM.  I suggest taking a look at Ivan Pepelnjak’s page for some insight into Tcl on Cisco devices.

I think you can take it from here.  As always, comments are welcome.

Using MAC Access-lists

Posted on October 27th, 2008 in ACLs, Catalyst, Cisco, LAN, Switching by Aaron Conaway

We ran into this today, and, though I knew it existed, I never actually saw it in the wild.  I’m talking about MAC access-lists.

In the example setup, we have a DMZ off of a firewall that contains a whole mess of servers — email, web, ftp, etc.  These should all be in the DMZ for sure, but they shouldn’t talk to each other.  If a bad guy was able to own my FTP server, he would have a nice platform to use to attack my email server.  That’s not cool, so we’ve put in MAC access-lists to help out.

MAC access-lists do just what you think they do; they put access controls on what MAC addresses can talk to what on a  particular port.  You build a list of permit and deny lines that you want to use to control access and apply them to a port.  Sound familiar?  Yeah…it does sound a lot like IP ACLs, doesn’t it?  Here’s some technical detail for those that care.

  • MAC ACLs can be numbered or named.  The range for numbered ACLs is 700-799.  The naming conventions follow the same rules as an IP list.
  • A port must be a in access mode before you can apply a MAC ACL.  You can’t apply them to trunks.
  • You can use the any and host directives instead of using MAC/mask combos.
  • If you’re feeling froggy, you can actually do a MAC/mask combo, but, since the MACs on your hosts probably aren’t sequential, I don’t see the point.

Let’s build one.  Suppose we have a host with the MAC 1111.2222.3333 plugged into a switch on port F0/1.  It is a web server and needs access to the mail server on the same network (4444.5555.6666) and the firewall (9999.8888.7777) as a gateway.  How do we set it up so that the web server can only speak to those guys?

mac access-list extended WEBSERVER
permit host 1111.2222.3333 host 4444.5555.6666
permit host 1111.2222.3333 host 9999.8888.7777

int f0/1
mac access-group WEBSERVER in

Now that host can only speak to those two MACs on layer 2.  Pretty simple yet again.  There are some things to note, though.

First of all, this just keeps this host from talking to other MACs on the same network and does nothing to keep packets from other host from reaching our webserver.  Though the webeserver won’t be able to respond, one could argue that it’s best practice to apply an outbound MAC ACL that mirrors the inbound.

There’s also an issue of broadcast.  Say that the webserver is now trying to send a packet to the mail server to send an alert.  One of the first things that it will do is to send a broadcast (ffff.ffff.ffff) asking for an ARP reply.  Guess what?  That MAC isn’t in the ACL.  You could put static ARP entries on the boxes, but it may be easier just to allow the host to talk to the broadcast.

Don’t get layer-2 security mixed up with layer-3 security.  This restricts access on (usually) a single  IP network where you don’t have routers or firewalls between hosts.  Use the old-fashioned IP ACL for between networks and MAC ACLs for between hosts on a network.

Questions?  Comments?  Bribes?  Free money?  Send them all my way.

Configuring Fault Tolerance on the CSM

Posted on October 10th, 2008 in CSM, Catalyst, IOS by Aaron Conaway

Like (nearly) everything in the Cisco world, you can set up your CSM to fail over to another module when the primary dies a horrible death.  You can have two in the same chassis or even have them in separate chassis — the process is the same no matter how you have it set up.  Either way, you have a primary and a secondary module in fault tolerance (FT) mode.

First, we’ll establish a VLAN that the CSM will use to do its configuration and state syncing over.  This is just an ordinary VLAN; there’s nothing special about it, really, but it should be dedicated for the CSM to use for syncing.  Let’s randomly choose VLAN 83.

vlan 83
name CSM-Sync

You will, of course, have to do this on every switch that holds a CSM, so, if you’re using them in two different chassis, you’ll put the same VLAN on each making sure they can see each other through a trunk.  Cisco recommends that you dedicate a trunk between the two switches for the sync VLAN in order to remove the chance of other traffic stepping on the sync packets, but I’m not convinced that’s necessary.  Use your judgement on that one.

Back to it.  Next, you need to decide on a FT group ID.  This is similar to a HSRP group and lets you run multiple FT groups on the same VLAN.  The group ID needs to be in the range of 1 to 256, so, since this is the first one, let’s just use 1.  Get into config mode for the CSM that you want to be the primary and do this.

ft group 1 vlan 83

This takes you to the config-slb-ft prompt.  Just like HSRP, we need to set priorities for each device and whether or not it should preempt, so let’s configure.  Yes, we want to preempt, right?  Let’s set the priorities to 100 and 90, too.

priority 100 alt 90
preempt

This sets the primary CSM to priority 100 and the secondary to 90; both will preempt.

What about configuring the secondary for FT? That’s easy.  Go into CSM config mode on the secondary and enter the ft group 1 vlan 83 command.  That’s it. The two CSMs will do a little arguing and come back as the primary and secondary.  After that, all configuration is done on the primary, which is synced over to the secondary just like an ASA.  Pretty cool, eh?

When configuring things like IP addresses, though, you’ll need to make provisions for the secondary with the alt directive (remember that one from the priority).  I won’t go into much, but you’ll need it mostly when settings IPs to VLANs.  Here’s an example of setting an IP address on client VLAN 100 for both the primary and secondary.

vlan 100 client
ip address 192.168.0.11 255.255.255.0 alt 192.168.0.12 255.255.255.0

Alright…one more thing.  The configurations don’t sync automagically (at least not on my old version of code).  If you make a change to the primary CSM, you’ll see an out-of-sync message when you look at the FT status.

Switch#sh mod csm X ft
FT group 1, vlan 83
This box is active
Configuration is out-of-sync
priority 100, heartbeat 1, failover 3, preemption is on
alternate priority 90

If the primary goes down now and the secondary takes over, the changes you just made won’t be reflected on the secondary.  You fix this with the hw-module contentSwitchingModule X standby config-sync command (where X is the module slot in the chassis).  Alternatively, you can just type hw c X s c as a shortcut.  It’ll take a few minutes depending on your configuration, so check your logs for when it’s finished.  Note that the secondary does not save the new configuration to its startup-config; you’ll have to log in and save that manually (or automatically through CiscoWorks or something) to save changes there.

Let me know if you have any questions and check out my page on getting output from Cisco’s fine mid-tier load balancer.  :)

Back to Basics — CAM Table Population

Posted on July 14th, 2008 in Catalyst, LAN, Switching by Aaron Conaway

At the office, we reprovision servers like it’s going out of style.  It happens so often that my cabling documentation rarely matches what’s actually out in field, which is a pretty big problem when you’re trying to find to what switch port a server is connected.  I finally relegated myself to asking for the MAC address of the server, having the admin ping something, and then tracing it down through the CAM table entries of the switches.  It works, but the guys really don’t know how a switch populates its CAM table, so they always say “Why can’t you just look on the switch?  I shouldn’t have to ping anything.”  Here’s one just for the aspiring system admin.

The Content Addressable Memory (CAM) table on a switch keeps track of MAC addresses and on what port they appear, along with some other stuff like age.  When a device that’s plugged into a particular port sends a frame to the switch, the switch makes note of the source MAC and the port and checks the CAM table.  If it’s a new MAC, it adds an entry in the CAM table; if it’s an existing on a different port, it removes the old entry and adds a new one; if it’s an old MAC on the same port, it updates the age.  By default, Cisco switches keep CAM entries for 300 seconds, so they don’t stay there forever.

What about the destination MAC?  Good question.  That’s a pretty important field when sending a packet, but, when generating a CAM entry, the destination MAC is ignored.  If  a host talks, a switch knows exactly from where the frame came, but there’s no way to know exactly where it should go without the destination first speaking up.

Let’s set up an example.  You have a Cisco 2950 switch that you’ve just powered on with nothing plugged into it except the console cable. If you do a show mac-address-table, you’ll see the CAM table — a table of MAC addresses that the switch knows; you would think that it would be empty since nothing’s plugge din, but the switch has its own MACs, so it always knows those guys.  There’s not much to see here yet, though, since we haven’t hooked anything up to the switch yet.

Switch#sh mac-address-table
Mac Address Table
-------------------------------------------
Vlan    Mac Address       Type        Ports
----    -----------       --------    -----
 All    000a.f43b.ddc0    STATIC      CPU
 All    0100.0ccc.cccc    STATIC      CPU
 All    0100.0ccc.cccd    STATIC      CPU
 All    0100.0cdd.dddd    STATIC      CPU
Switch#

Next, let’s plug a Linux desktop up to it.  Once that box has booted, what should you see in the CAM table?  If you guessed the MAC of the Linux box, you may be right; it all depends on if the server sent a frame or not.  There’s lots of things that run on a Linux box that could send frames on startup — DHCP requests, multicast services, network-based storage — so, more than likely, a frame did get sent.  The only way to know it to take a gander.

Switch#sh mac-address-table
Mac Address Table
-------------------------------------------
Vlan    Mac Address       Type        Ports
----    -----------       --------    -----
...
   1    001c.0cbb.ada2    DYNAMIC     Fa0/1
Switch#

Ah…it worked. That’s good, but it’s boring with only a single device. Let’s plug in a simply-configured and fully-booted Cisco router and see what happens. More than likely the router won’t speak until spoken to, so the CAM table won’t update, and the switch won’t know where to send the frame, right? Yes, but the frame still gets sent. If the switch doesn’t know where the destination MAC lives (i.e., it’s not in the CAM table), then it floods the frame out every port except the one on which it was received.

When I first learned that this is how it worked, I immediately wondered why LANs weren’t flooded out constantly. I didn’t think long enough to realize that the host being sent the packet will actually respond within several milliseconds (hopefully), so the CAM table will then have an entry for that guy. In reality, it’s even simpler than that. Since most of the world runs TCP/IP, we have this wonderful thing called ARP. When a host needs to talk to another host on the same network segment (IP subnet), it checks its ARP table, and, if it doesn’t know a MAC for that IP, it will actually ask what MAC it should use via a broadcast message. In a well-behaved network, the mystery host will answer with a “Here I am!” type message, which causes the switch to generate a CAM entry. In a “perfect world”, you should only have a few floods on a switch per day/week/month/year/decade.

Here’s a couple items of note.

  • A trunk interface will have a whole bunch of MACs listed as attached to that port.  This is quite normal, so don’t freak out.
  • If someone plugs a switch or hub into one of your ports, you will see multiple CAM entries for the same port.  This is a good way to see who brought in their Linksys hub from home.
  • If a host hasn’t sent a frame in more than 5 minutes, it disappears from the CAM table, so the whole discovery process starts over again.
  • There’s a limit to the size of a CAM table, so it’s possible to fill it up and then every new destination gets flooded.  Wow, I can see your packets.

Have fun.  Be safe.  Practice safe computing.  Lock down your network.

Storm Control

Posted on May 15th, 2008 in Catalyst, LAN, Switching by Aaron Conaway

We run a large number of LANs all over the country that are “controlled” by the particular business unit. We manage the gear, but, since they have the money and have to pay for anything we do, they make the final decision on what gets put in. Sometimes that gets out of hand, as you can well imagine.

A good terrible example came up a few months ago. It seems that, at some time in the past, one site needed some more LAN ports, but, instead of calling us and having us send them another switch, one of the “technical people” there brought in a hub from home. It really irks me to see a hub on the switched LAN, but we really have no control over those decisions. They plugged the hub into one of the existing drops somewhere in the building and plugged everyone in. It worked…until somebody moved one of the machines. The machine was at a desk near the hub, and the network cable, still with one end plugged into the hub, was just left lying there. A good Samaritan came by, saw that the hub was not plugged into the network (though it was through another path), and plugged it back in for us — providing a nice second link from the hub to the switch stack in the closet. Take one switch stack, add a hub, insert a switching loop, bake at 350F for a few milliseconds, and you have a broadcast storm. If you don’t know already, broadcast storms are bad and eat switch CPU like the yummy cookies we baked. In this case, several 3750s were taken completely down.

How does one prevent such from happening again? Well, the first thing to do is to get the CTO to tell everyone that they can’t plug hubs into the network. That works about 0% of the time, though, so we had to find a solution that was enforceable. One of my coworkers found the traffic storm control mechanism built into Cisco switches. This mechanism allows you to set thresholds based on broadcast, multicast, and unicast traffic and take action when those are reached.

Here are the gory details. I need to mention, though, that storm-control is configured very differently across platforms and IOS versions. I would say your mileage may vary, but it’s probably more accurate to say that this won’t work on your switch. A 6500 is configured differently than a 4500. A 2900XL is different from a 2950. This will get you going, but you’re going to have to do some research on your own to find out what works on your platform.

interface FastEthernet 0/1
storm-control broadcast level 50
storm-control action shutdown

What just happened? Good question to ask. If broadcast traffic on F0/1 utilizes 50% of available bandwidth, the port is shutdown. That means that if broadcast traffic takes up 50Mbps of bandwidth on this port, the port is admined down just as if you did a shutdown on it.. You should probably do the same for multicast or unicast as well to make sure you don’t get bitten by those. If you don’t want to shut down the port, you can also use the trap action to just send an SNMP trap with the port and information, but that doesn’t prevent very much; the storm will probably wreak havoc before an email for the trap lands on your Crackberry.

Here’s another big disclaimer. Finding a good level for a port can be very, very difficult. A linux box is going to have very different broadcast/multicast/unicast traffic than a Windows box which is different than a Mac. You may have to spend a lot of time analyzing SNMP counters to find out what a good level is. God help you if you have a hub like we did with mixed computer platforms on it.

Getting Started with EtherChannel

Posted on April 18th, 2008 in Catalyst, LAN, Switching by Aaron Conaway

In my professional life at some point, I came across someone who had a stack of Catalyst 2950 switches all trunked together with their Internet routers connected to the top of the stack. This was all well and good until they kept adding hosts to the “middle” of the stack, then they had all sorts of latency and packet loss.

The old adage of your chain only being as strong as your weakest length holds true in this case. Here, the weakest link is actually the most-congested trunk, though. Let’s step through to see. A 2950 is a 10/100 switch, so a single trunk can handle 100Mbps of traffic. We have 10 of these guys, Switch1 to Switch10, all trunked to the one above and below. If a server in the center of the stack on Switch5 is sending a lot of data to the Internet routers on Switch1, the trunks off of Switch5 will start to get saturated. Switch4 has a few hosts doing the same thing, so traffic from both Switch4 and Switch5 heads towards Switch1, further filling the trunks. Same for Switch3. Same for Switch2. Next thing you know, there’s 184Mbps or so trying to go across a 100Mbps link.

I fixed the problem using EtherChannel. EtherChannel, sometimes called trunking (not Cisco trunks…don’t get confused) or bonding, takes up to 8 interfaces of a switch and binds them logically as one Port-channel interface. The switch then load-balances (not really, but can be pretty close) the traffic across all the links, so, in essence, if you have 2 100Mbps ports in an EtherChannel, you get 200Mbps of bandwidth. If you put in 8 ports, you’ll get 800Mbps. Or, at least, close to it.

There are, as usual, stipulations for using it.

  • All ports on the EtherChannel must terminate to the same device on both ends. You can’t have one port go to one switch and another to another switch.
  • The ports must use identical media. You can’t have a copper port and a fiber port in the same group.
  • The ports must have the same capabilities. You can’t put a 100Mbps port and a 1Gbps port in the same group.
  • The ports must be configured identically or it won’t work. This is actually pretty easy to maintain since configuring the Port-channel actually sets the config on all participating ports.
  • Both ends of the ports must be EtherChannel. You can’t run it on one switch but not the other.

Enough of that. Let’s configure. The scenario is that you have SwitchA and you want to bind F0/1 and F0/2 into an EtherChannel. Of course, you want to carry traffic from all VLANs, so let’s make it a trunk.

int F0/1
channel-group 1

int F0/2
channel-group 1

int Port-channel 1
speed 100
duplex full
switchport trunk encapsulation dot1q
switchport mode trunk

Easy as pie. Do the same thing on the other end and you have a 200Mbps, bonded trunk.

Trunking on a Catalyst Switch

Posted on March 21st, 2008 in Catalyst, LAN, Switching by Aaron Conaway

If you didn’t now already, trunks are connections between switches that carry traffic for all VLANs. It allows you to have, say, VLAN 10 and VLAN 20 on two switches appear as the same network. Unless you’re a really small shop, you’ve already dealt with trunks, so there’s no need for an introduction.

Let’s say we have a Catlyst 2950 switch with multiple VLANs connected to another 2950 configured with those same VLANs. We’ll say we have VLANs 10, 20, and 30 and that the switches are connected to port F0/24 of each switch. First, let’s turn on the trunk.

interface F0/24
switchport trunk encapsulation dot1q
switchport mode trunk

Quite easy there. With this configuration on each switch, the connection between them will carry traffic for all VLANs. The encapsulation directive tells the switches to use the IEEE standard 802.1Q for the trunk, which is VLAN tagging. Cisco has its own trunk encapsulation called ISL, but that’s not compatible with non-Cisco gear. If you have a mix of switches, just use the dot1q encapsulation so you don’t hurt yourself later.

A note on the word “encapsulation” here. Dot1q does not actually encapsulate; it adds 4 bytes to the frame header that marks the VLAN the frame is for. ISL, however, does encapsulate; it takes the whole frame, shoves it into an ISL frame, and sends it on. Since Cisco’s preferred method for a trunk is an encapsulation method, we have the directive “encapsulation” in the configs.

At this point, all VLANs are being carried across the trunk, but what if you want to use multiple trunks and send different traffic across each one? For example, let’s say that you want to have VLAN 10 traffic use a second trunk while the other VLANs use our original trunk. To do that, you get into pruning.

interface F0/24

switchport trunk allowed vlan 20, 30

interface F0/23
switchport trunk encapsulation dot1q
switchport mode trunk
switchport trunk allowed vlan 10

The switchport trunk allowed vlan directive says that only traffic on VLANs 20 and 30 are allowed across F0/24 and only VLAN 10 across F0/23. I use this type of setup to give high-bandwidth VLANs (like VLANs for backups) their own trunk so they won’t eat all the bandwidth of the other VLANs. To use the terminology, F0/24 is pruned to VLANs 20 and 30, while F0/23 is pruned to VLAN 10.

I also want to mention that the word trunking is used differently across different platforms. We have a nearly-totally Cisco LAN, and trunks are the connections that carry all VLANs as described. On other LAN gear, trunking is actually the act of combining port, links, cables, whatever, together to form a single logical connection (Cisco calls this EtherChannel). VLAN tagging is what other manufacturers call a Cisco trunk. It makes sense if you remember that 802.1Q simply tags the frame.