No ISSU for you!

The Nexus 5000 series has the capability to do ISSU, an In-Service Software Upgrade. You can upgrade a vPC pair of Nexus 5000s without impacting any hosts (assuming everything is dual connected). Well, that is, unless you are using Windows Server 2012 with LACP. Let’s take a look at some show command output, shall we?

Nexus6# sh lacp issu-impact
For ISSU to Proceed, Check the following:
1. All port-channel member port should be in a steady state.
2. LACP rate fast should not be enabled on satellite member ports.

The following ports are not ISSU ready
Eth1/28     ,

OK, so Eth1/28 is going to prevent an ISSU because of an LACP issue, let’s check out the LACP details.

Nexu6# sh lacp interface e1/28
Interface Ethernet1/28 is up
  Channel group is 28 port channel is Po28
  PDUs sent: 21679
  PDUs rcvd: 750
  Markers sent: 0
  Markers rcvd: 0
  Marker response sent: 0
  Marker response rcvd: 0
  Unknown packets rcvd: 0
  Illegal packets rcvd: 0
Lag Id: [ [(0, 90-e2-ba-23-9e-8c, 0, 0, 100), (7f9b, 0-23-4-ee-be-2,
801c, 8000, 11c)] ]
Operational as aggregated link since Tue Jul 23 15:00:18 2013

Local Port: Eth1/28   MAC Address= 54-7f-ee-ef-cd-ab
  System Identifier=0x8000,54-7f-ee-ef-cd-ab
  Port Identifier=0x8000,0x11c
  Operational key=32796
  LACP_Activity=active
  LACP_Timeout=Long Timeout (30s)
  Synchronization=IN_SYNC
  Collecting=true
  Distributing=true
  Partner information refresh timeout=Short Timeout (3s)
Actor Admin State=(Ac-1:To-1:Ag-1:Sy-0:Co-0:Di-0:De-0:Ex-0)
Actor Oper State=(Ac-1:To-0:Ag-1:Sy-1:Co-1:Di-1:De-0:Ex-0)
Neighbor: 0x100
  MAC Address= 90-e2-ba-23-9e-8c
  System Identifier=0x0,90-e2-ba-23-9e-8c
  Port Identifier=0x0,0x100
  Operational key=0
  LACP_Activity=active
  LACP_Timeout=short Timeout (1s)
  Synchronization=IN_SYNC
  Collecting=true
  Distributing=true
Partner Admin State=(Ac-0:To-1:Ag-0:Sy-0:Co-0:Di-0:De-0:Ex-0)
Partner Oper State=(Ac-1:To-1:Ag-1:Sy-1:Co-1:Di-1:De-0:Ex-0)

You’ll notice that the LACP_Timeout values in bold do not match between the Local Port and the Neighbor. We need both ends set to the long timeout for ISSU to be happy. The LACP neighbor is a Windows Server 2012 box using switch dependent NIC teaming (LACP). We researched and were unable to find a way to tweak this timeout setting. This means you either run your Windows Server 2012 NIC team in switch independent mode (works the same as ESXi without LACP) or you don’t get to do ISSU… Somewhat defeats the whole point, no?

[UPDATE: Microsoft has since released a hotfix that allows you to change the timeout: https://support.microsoft.com/en-us/kb/3109099]

This problem also crops up with Linux hosts, but at least Linux lets you change the timeout. Getting your Linux admins to make the change may be another issue…

FIN

12 thoughts on “No ISSU for you!

  1. Oh yeah I missed that, sorry. It looks like it’s “not guaranteed” to be non-disruptive … I’m wondering what happens if you force the ISSU. It also seems to be difficult (impossible?) to change the timeout on the Windows 2012 side. Hmm.

    • We’ve looked around and there are a number of people who have tried to solve this problem. No one has, yet. We’ve also put the question in through some of our Microsoft contacts. We will be testing the the update today to see what the actual impact is, and that will result in another blog post. I intend to test to see what the real difference is between the non-disruptive and disruptive upgrades. You cannot force the system to attempt an ISSU, but you can force a disruptive upgrade.

    • I did eventually test the update and as best I can tell the main difference between a non-disruptive and disruptive upgrade is whether the FEXen are upgraded all at once (disruptive) or in a rolling upgrade (theoretically non-disruptive if your hosts are multi-homed).

  2. Do you have any notions of downtime during software upgrade? Are we talking about 30sec or a few minutes?

  3. I had the exact same experience with a 7010; the ISSU warns that some aspects might be disruptive (the SUP’s) .. but linecards wont be,, Trust me.. when the whole thing reboots the linecards go offline :(

    Long story short –> even though you have read the doco etc.. make sure you give yourself a MUCH longer outage window when using ISSU

  4. You can on course shutdown the port that is blocking the ISSU (as it should be redundant) and the ISSU will work.

Leave a Reply