Posts tagged ‘switch’

Catalyst 3750s – Bad Luck with a Cisco Logo

Last week, @fletcherjoyce posted an article on his blog about his positive experiences with Cisco’s 3750 switches.  If you follow my complaints tweets, you know that I’ve had quite the opposite experience with them.  I would never pick on anyone, but I had to throw in my 2 cents.

I’m guessing here, but we have about 50 3750 stacks in the enterprise.  Most of them are pairs, you wind up with roughly 120 switches.  Since we’ve done about 20 replacements over the last 5 years, that means we have a 17% failure rate.  That’s pretty horrible, isn’t it?

For the most part and with few (if any) exception, we use the 3750s as aggregation points for our access switches.  We don’t do QoS on them.  We don’t do any access control on them.  We don’t even do routing on them.  They’re simply used to connect all the access switches in the closet to the core, so they’re not doing anything funky or burdensome.  The CPU and memory are always well within normal operating parameters.  They just fail and fail repeatedly.

The flies started dropping in closets at our corporate headquarters a few years ago.  It was the middle of summer, and the temperatures kept rising to over 90F (32C) until the we lost 3 switches in 3 weeks.  If you could stand to make it into the closet, you could feel that the sheet metal of the switches was hot enough to make you pull your hand back!  When the facilities team added more cooling, the temperatures dropped to around 82F there (28C), but we continued losing switches.  I figured the newly-failed switches were feeling the effects of the earlier heat wave and were just getting around to giving up the ghost.  Surely the heat was the culprit.

A few months after our headquarters meltdown, a tech for a satellite office called and asked if we could help with some latency issues.  He showed me the switch stacks throughout the building, and I noticed that only one of the 10 switches actually had a label.  The tech said that he never got around to relabeling them after they were replaced.  Some, he said, had been replaced multiple times.  The closets were running about 76F (24C), so heat didn’t seem to be the problem at this location.  The closets were clean as a whistle, and everything in the racks was on building UPS.  I couldn’t find a pattern at all.  For the record, all their latency issues were related to two unrelated 3750s.  Two RMAs later, and their problems were gone.

I’ve been trying to find patterns for the failures, but I can’t think of any.  If it’s heat, humidity, power, dust, etc., then why are we not replacing 2950s as well?  There are 4-10 of them for every 3750s stack we have.  We’re replacing them, but it’s a rate of less than 1%.  If it is environment, then the 2950s are English hooligans compared to the 3750s being French aristocracy.  Maybe it’s sabotage.  I still don’t know after years of watching RMA after RMA come in.

I have noticed one pattern, though.  The only deployments of 3750s that have never had a problem are in data centers.  They seem to love any room that has an ambient temperature of 62F (16C) with less than 40% humidity and large volumes of air flow.  If only we could install micro-data centers in all our closets, then I would be a happy network dude.

Send any wooden shoes questions my way.

Edit:  I went back and checked our TAC cases to see what switches we actually replaced.  It turns out that we’ve done 19 replacements, and they’ve all been 3750G-12S-S switches.

Stubby Post – VTP Clients Send Updates

VTP clients send VLAN updates.  Did you know that?

I had a VTP server and client in the same VTP domain, and, when I cabled up the trunk, the client overwrote the VLAN database on the server.

The moral of the story is that the best revision number will win no matter what the operating mode of the switch.

SWITCH – Epic Regression

Just because I like giving more money to Pearson Vue, I took the BCMSN test today to see how I would do.  I passed with no problem.

In my mind, the CCNP is a technical certification, so I expect to be tested on technical topics.  Are there topics beyond technology that P-levels should know?  Of course there are, but I really don’t think whole chunks of the test should be about a preparation plan and rollback procedures.  The BCMSN had a lot more technical questions at a much higher level of expertise; it seems much better suited to the CCNP track than the SWITCH test did.

I was really surprised at how many questions today were repeats from the SWITCH test last week.  Of the three lab exercises I worked, two of them were exactly the same as last week.  I would venture to guess that there were also 8 to 10 repeated multiple choice questions.  It seems that this is going against my argument of being more technical, though, doesn’t it?  If you mix in the remaining questions that were at a much higher technical level, you wind up with a pretty darn good test.

I’ve really got nothing more to say about the BCMSN.  It’s a good test with an appropriate level of technical (and paper-pushing) detail.  I’m very glad I was able to take it before the 31 July 2010 deadline, and I advise anyone who needs the SWITCH test to try and do the same.

The next stop is ROUTE (642-902) for me.  I’m taking a class on that one soon, so I’m confident I can pass it in the next 11 weeks we have left until the deadline.

Audio commentary

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

SWITCH – Epic Fail

I did my standard 2ish-hour drive to the closest testing center today to take the SWTCH test (642-813).  Utter failure.  That’s 3 for those scoring at home.

The test was the absolute worst I’ve ever taken.  I know that I complain a lot, but this is totally justified in my eyes.  My 4th grade spelling tests were better than this.  I’ve seen kindergarten plays with better production value.

First of all, it was poorly written.  Whoever wrote those questions has a few pieces of information about English sentence structure missing from their skill set.  A sentence needs a verb, right?  Well, a lot of the sentences were missing those.  It’s kind of important to know what the whole point of the sentence is, or is that too much to ask?  The “drag this over here” exercise questions all started with the same 13-word phrase that left the question so long that it was unreadable.  A couple of commas would have been nice in some.  Others I just had to infer from the answers what they were trying to ask.

There were lots of spelling errors as well.  Most of them were just stupid stuff like switched letters or missing characters, but, at one point, I had to figure out that I needed to look at the “router” instead of the “route”.  That’s not really cool.  The misspellings were so bad that they were actually misspelling the hostnames on the diagrams provided.  Does anyone even try any more?

Let’s talk about the technical level of the test.  If I didn’t know any better, I would swear I was taking a CCNA test.  The technical material was so elementary that it bordered on comical.  If I recall correctly (which I never do), there were about 3 questions on trunking which were so easy that my wife could answer them.  There were about 4 FHRP questions that were out of the “Cisco for Dummies” book.  I could go on, but I have better things about which to complain.

“So,” you might ask, “why did you fail it if it was so easy?”  That’s a great question.  I failed it because the name of the test is misleading.  When Cisco says “Implementing Cisco IP Switched Networks”, they really mean “Collecting Documentation About VLANs.”  There were at least four questions on this test that asked what information you need to collect before implementing some unknown step of a project involving VLANs.  Sometimes, the reference was to rollback plans.  Sometimes it discussed IP assignments.  Sometimes it even talked about collecting user requirements.  It seemed that nearly half of the questions on the test discussed planning for making changes or preparing change documentation.  There was very little “implementing.”

To top it all off, too, one of my labs froze.  I entered a command into a router, and it didn’t come back.  I couldn’t change to the other lab windows, either (the “Scenario” or “Topology” windows included), but my timer kept ticking.  I could click around in the testing software, but the lab itself was toast.  I got the administrator who helped me out a bit after the machine was rebooted.  I didn’t run out of time or anything, but getting up to find help to troubleshoot a problem really throws you off.

How about some closing words?  First of all, I have given up on the Cisco Press books and other materials.  Each time I use them they have little to no coverage about topics on the test itself.  The ISCW was that way, and we all know about my problems with the ONT.  I figured that those were just aged text, but SWITCH is only a month or two old, isn’t it?  That means the test hasn’t had that much time to change, but the materials are totally different already.

I actually have an example of the books leading the reader directly away from the test materials.  I’m reading from the “CCNP SWITCH 642-813 Quick Reference” book by Donohue.  On page 8, it discusses the PPDIOO lifecycle approach.

Network engineers at the CCNP level will likely be involved at the implementation and following phases.  They can also participate in the design phase.

That doesn’t make any sense, does it?  Didn’t I just say that there were a good number of questions on preparation (the first P) and planning (the second P).  Both of those come before the design phase.

Somebody help me out here.  What am I missing?  Is there some magical book series that has the answers?

I should have bought testing vouchers in bulk when they were $150.

Audio commentary

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

UPDATE:  It seems that the idea of seeing topics on the exam that aren’t are the test go beyond just me.  I’m getting in touch with as many people related to the SWITCH book as I can to let them know that this is a serious problem.  I’m sure I’ll have a post or two on the outcome of that effort.

Stubby Post – UplinkFast

I’ve got a few switches daisy chained together with single links and have enabled UplinkFast on them.  This switch is not the root bridge; F0/24 is the root port and F0/23 is a blocked alternate port. I’ve got debug spanning-tree uplinkfast on to help out.

SW3#sh span | incl 0/2[34]
Fa0/23           Altn BLK 3019      128.23   P2p
Fa0/24           Root FWD 3019      128.24   P2p

Now let’s unplug F0/24 and see what happens.

19:05:05: STP FAST: UPLINKFAST: make_forwarding on VLAN0001 FastEthernet0/23 roo
t port id new: 128.23 prev: 128.24

19:05:05: %SPANTREE_FAST-7-PORT_FWD_UPLINK: VLAN0001 FastEthernet0/23 moved to Forwarding (UplinkFast).
19:05:05: STP: UFAST: removing prev root port Fa0/24 VLAN0001 port-id 8018
SW3#
19:05:06: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/24, changed state to down
SW3#
19:05:07: %LINK-3-UPDOWN: Interface FastEthernet0/24, changed state to down

Before the switch even reports that F0/24 is down, F0/23 is brought into the forwarding state. Now let’s plug F0/24 back in.

19:07:16: %LINK-3-UPDOWN: Interface FastEthernet0/24, changed state to up
SW3#
19:07:17: STP FAST: make_forwarding: via UPLINKFAST: NOT: port FastEthernet0/23
VLAN0001 is: uplink enabled new root FastEthernet0/23 (me)prev root exists(8018/) cur state forwarding role uplink
19:07:17: STP FAST: make_forwarding: via UPLINKFAST: NOT: port FastEthernet0/24
VLAN0001 is: uplink enabled new root FastEthernet0/23 (not me)prev root exists(8018/) cur state blocking role looped
19:07:18: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/24, changed state to up
SW3#
19:07:18: STP FAST: make_forwarding: via UPLINKFAST: NOT: port FastEthernet0/23
VLAN0001 is: uplink enabled new root FastEthernet0/23 (me)prev root exists(8018/) cur state forwarding role uplink
SW3#sh span | incl 0/2[34]
Fa0/23           Root FWD 3019      128.23   P2p
Fa0/24           Altn BLK 3019      128.24   P2p

Notice that the port comes back up, but it isn’t returned as the root port immediately. It should be, though, right? The original STP convergence said that it was the closest to the root bridge, so it makes sense that it should be the root port again, right? Since the port just came up, STP still has to make sure there’s no loop, so it has to step through all the states like any good port does. If we wait a few more seconds, we see this.

19:07:53: STP FAST: UPLINKFAST: make_forwarding on VLAN0001 FastEthernet0/24 root port id new: 128.24 prev: 128.23

19:07:53: %SPANTREE_FAST-7-PORT_FWD_UPLINK: VLAN0001 FastEthernet0/24 moved to Forwarding (UplinkFast).

SW3#sh span | incl 0/2[34]
Fa0/23           Altn BLK 3019      128.23   P2p
Fa0/24           Root FWD 3019      128.24   P2p

Now we’re back to where we were originally. The moral of the story is that UplinkFast already knew the status of both ports, so it could quickly move the blocked port to fowarding when the port failed. Traditional STP would have to send a TCN message to the root bridge, which would then forward them out with the rest of the switches so they can reconverge. UplinkFast skips the whole reconverging thing.

Send any questions my way.

Stubby Post – Path Cost of EtherChannels

I was doing some STP labs tonight and found something that caught me off guard a bit.  I had been meddling with some EtherChannels between a pair of 3750s earlier today, and I forgot to reset the configs before starting on the STP stuff.  One my secondary root switch, I ran a show spanning-tree vlan 1 to see what status the ports were in, and I noticed the root path cost.

VLAN0001
  Spanning tree enabled protocol ieee
  Root ID    Priority    24577
             Address     001b.d4fa.bb00
             Cost        12

This switch is directly connected to the root bridge via a pair of EtherChanneled FastEthernets, so I just assumed I’d get a cost of 19.  I surely didn’t expect a cost of 12.  I added a third interface to the channel-group and wound up with this.

VLAN0001
  Spanning tree enabled protocol ieee
  Root ID    Priority    24577
             Address     001b.d4fa.bb00
             Cost        9

Obviously there’s some internal math going on with the EtherChannel and STP.  Guess what happens when I add a fourth link?

VLAN0001
  Spanning tree enabled protocol ieee
  Root ID    Priority    24577
             Address     001b.d4fa.bb00
             Cost        8

It’s interesting to see how the path cost changes in a way to seems disproportionate to the bandwidth.

Send any new math formulae comments this way.

SWITCH – STP Exercise #1

Here’s an STP exercise for you.  Given the bridge priorities, MAC addresses, and interface types in the diagram, calculate the root bridge, root ports, designated ports, and blocked ports.  You can click on the image to enlarge it.  I’ll post a solution in the next few days.  As always, feel free to comment and ridicule my utter idiocy.  Be gentle, though; I don’t usually post exercises like this.

Send any configuration BPDUs questions my way.

STP Exercsie #1

BCMSN Notes — STP States

I’ve decided to take on the CCNP certification, so I’m going to wind up with a few posts will be more my own notes than anything.  :)

A switch port on a 2960 comes up with a default configuration on VLAN 1.  What happens from the perspective of spanning-tree?

  • First, the port comes up on blocking mode.  This is to make sure that loops aren’t created without first listening to the network to see what’s going on.
  • Next, if the port may be a root or designated port, the port is moved to the listening state.  In this state, the port can send and receives BPDUs only.  It can’t send traffic, but it can discover the other switches participating in STP.
  • After the forwarding delay, the port goes into the learning state.   In this state, the port can send and receive BPDUs as in listening, but it can now receive traffic.  It can’t yet send any.
  • After the forwarding delay again, the port goes into the forwarding state.  The port can now send and receive data.

If the port is configured with spanning-tree portfast, the mode goes from blocking directly to forwarding without going through these steps.  Obviously you don’t want a switch plugged into a port configured for portfast since you may wind up with a loop.

Here’s the debug spanning-tree events output from one of my labs.  F0/3 is configured for portfast.  I shut/no shut it to see what happens.

*Mar  8 18:09:51.163: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/3, changed state to down
sw01#
*Mar  8 18:09:51.747: set portid: VLAN0007 Fa0/3: new port id 8003
*Mar  8 18:09:51.747: STP: VLAN0007 Fa0/3 ->jump to forwarding from blocking
sw01#
*Mar  8 18:09:53.739: %LINK-3-UPDOWN: Interface FastEthernet0/3, changed state to up
*Mar  8 18:09:54.739: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/3, changed state to up

Notice the “jump to forwarding from blocking”.

Here’s the same output when the port is not in portfast mode.  Notice the timestamps.  It takes about 30 seconds (2 x default foward delay) to go from blocking to listening to learning to forwarding.

*Mar  8 18:13:05.313: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/3, changed state to down
sw01#
*Mar  8 18:13:06.013: set portid: VLAN0007 Fa0/3: new port id 8003
*Mar  8 18:13:06.013: STP: VLAN0007 Fa0/3 -> listening
sw01#
*Mar  8 18:13:06.381: %LINK-3-UPDOWN: Interface FastEthernet0/3, changed state to up
*Mar  8 18:13:07.381: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/3, changed state to up
sw01#
*Mar  8 18:13:21.013: STP: VLAN0007 Fa0/3 -> learning
sw01#
*Mar  8 18:13:36.013: STP: VLAN0007 Fa0/3 -> forwarding

Send any obvious corrections and questions my way.

Using SSH to Run Commands on a Router or Switch

SSH is more than just a shell.  You can copy files from and to a server or piece of network gear with it.  You can use it to tunnel traffic.  Possibly my favorite, though, is to use SSH to run a command on a remote box without interacting with a shell.

One of my biggest pet peeves with IOS (or pretty much any Cisco OS) is the lack of complex filtering.  Let’s say I want to look at all the downed ports and interfaces on modules 3 and 6 of my 6509.  I can’t easily do that with command from the IOS, but, on my Linux box, I can use multiple grep commands to get exactly what I want really easily.  Let’s work through the example, shall we?

To start with, let’s just do a show ip int brief without getting a shell on the switch.

ssh my.switch.com "show ip int brief"

When you run this and give your password, you see the output we’ve all learned to love, and, now that you’ve got it in STDOUT on your Linux box, you can start filtering. Now, let’s use grep to find the downed ports and interfaces on modules 3 and 6.

ssh my.switch.com "show ip int brief" | grep down | grep Ethernet[36]

How about downed ports and interfaces on modules 3 and 6 that not administratively down?

ssh my.switch.com "show ip int brief" | grep down | grep Ethernet[36] | grep -v admin

I’ll stop there, but it can go on and on.  Read up on regular expression and/or grep if you don’t know what we’re doing here.

What’s really happening is that we’re taking the output of the command “ssh ….” and piping it (with |) to the command grep.  We can send it to whatever command we want, though, so don’t be shy.  I’ve actually written several scripts that take output of commands like show int description on a router to generate some reports.  When I want to run one of those, I do something like this.

ssh my.switch.com "show int desc" | parseOutput.pl

There’s always a gotcha or two to watch for, isn’t there?  I’ve found a couple.

First, your command runs at your privilege level, so, if your user is priv 1, you’re not going to be able to do a show run or reload.  You could just ignore security for a bit and set your privilege to 15, but I don’t recommend doing anything like that.  Before you say it, you’ll probably have a hard time with enabling as well.  You can only run one command at a time, so you would just enable yourself and get kicked off.  Not very helpful.

Another problem I see is the lack of public/private key pair support on Cisco devices.  On a Linux box, you can copy your keys around, and those are presented in lieu of a password.  Since (most) Cisco devices don’t have home directories, there’s no place to drop the keys, and we’re left with just using passwords.  Support for this would be nice, but the security problems associated with keep SSH keys and user home directories are probably too much to even think about.

What else?  Oh, yeah.  The PIX/FWSM/ASA family supports SSH, but it acts differently from the IOS guys.  When you run a command through SSH, you actually get an interactive shell with the command already on the CLI for you. This is probably by design; the only thing you can really do from a non-priv prompt is to enable.

Anyway, send any grilling tips questions my way.

Server NIC Aggregation to a Cisco Switch

Have you even noticed that your new servers all have 2 NICs on the board?  At least all of them that I’ve seen in the last 3 years have.  A lot of server admin actually use them in a NIC teaming scenario where both NICs are used as one logical device — much the same as Etherchannel on a switch.  This provides some fault tolerance and availability in case of failure, which is good idea in most cases.

There are a few different ways to configure teaming on the box (usually called bonding in Linux), and each has its own advantages and disadvantages.  The network dude(tte) may have to do some things on the switch side for some of them to work, though.  If you’re want to run in link aggregation mode (mode 4), for example, the switch ports need to be in the same channel group to work appropriately.

Let’s look at mode 4 a little closer to see what we need to do.  The scenario is that you have eth0 plugged into F0/15 of a 2950 and eth1 is in F0/16.  You’ve seen the configuration for channelling between switches before, so you know the basics.  Put the ports in the same channel-group and configure the proper Port-channel interface to do the work.  In this case, we’re just configuring the ports to house a host instead of being trunks.

int F0/15
 channel-group 1

int F0/16
 channel-group 1

int Port-channel 1
 speed 100
 duplex full
 switchport
 switchport mode access

I detect at least one problem with our setup, though.  Both NICs are plugged into the same switch; what happens when the switch goes down?  The server goes away.  Logic should tell you, then, to put the NICs in different switches to fix that, but you can’t do Ethernchannel on two different switches.   The ports have to be in the same device for the aggregation to work.  What’s the fix?

You can look at getting a nice chassis switch and putting each NIC in different modules.  Modern IOS versions allow etherchanneling across modules, so, if one module fails, you still have that other.  That would do it, but I’m sure you don’t have the money for a 4500 in the budget, right?

Another solution is to use a couple 3760s which, when connected using the StackWise cable, are one logical device.  That gives you two separate switches that you can configure with the same channel group.  An upgrade to this solution is to use a pair of 6500s with VSS 1440 modules in them so that you have a stack of 6500s!  I’m sure that’s not expensive at all, though.

Send any white shoes questions my way.