The Nexus 5000 series has the capability to do ISSU, an In-Service Software Upgrade. You can upgrade a vPC pair of Nexus 5000s without impacting any hosts (assuming everything is dual connected). Well, that is, unless you are using Windows Server 2012 with LACP. Let’s take a look at some show command output, shall we?
Nexus6# sh lacp issu-impact For ISSU to Proceed, Check the following: 1. All port-channel member port should be in a steady state. 2. LACP rate fast should not be enabled on satellite member ports. The following ports are not ISSU ready Eth1/28 ,
OK, so Eth1/28 is going to prevent an ISSU because of an LACP issue, let’s check out the LACP details.
Nexu6# sh lacp interface e1/28 Interface Ethernet1/28 is up Channel group is 28 port channel is Po28 PDUs sent: 21679 PDUs rcvd: 750 Markers sent: 0 Markers rcvd: 0 Marker response sent: 0 Marker response rcvd: 0 Unknown packets rcvd: 0 Illegal packets rcvd: 0 Lag Id: [ [(0, 90-e2-ba-23-9e-8c, 0, 0, 100), (7f9b, 0-23-4-ee-be-2, 801c, 8000, 11c)] ] Operational as aggregated link since Tue Jul 23 15:00:18 2013 Local Port: Eth1/28 MAC Address= 54-7f-ee-ef-cd-ab System Identifier=0x8000,54-7f-ee-ef-cd-ab Port Identifier=0x8000,0x11c Operational key=32796 LACP_Activity=active LACP_Timeout=Long Timeout (30s) Synchronization=IN_SYNC Collecting=true Distributing=true Partner information refresh timeout=Short Timeout (3s) Actor Admin State=(Ac-1:To-1:Ag-1:Sy-0:Co-0:Di-0:De-0:Ex-0) Actor Oper State=(Ac-1:To-0:Ag-1:Sy-1:Co-1:Di-1:De-0:Ex-0) Neighbor: 0x100 MAC Address= 90-e2-ba-23-9e-8c System Identifier=0x0,90-e2-ba-23-9e-8c Port Identifier=0x0,0x100 Operational key=0 LACP_Activity=active LACP_Timeout=short Timeout (1s) Synchronization=IN_SYNC Collecting=true Distributing=true Partner Admin State=(Ac-0:To-1:Ag-0:Sy-0:Co-0:Di-0:De-0:Ex-0) Partner Oper State=(Ac-1:To-1:Ag-1:Sy-1:Co-1:Di-1:De-0:Ex-0)
You’ll notice that the LACP_Timeout values in bold do not match between the Local Port and the Neighbor. We need both ends set to the long timeout for ISSU to be happy. The LACP neighbor is a Windows Server 2012 box using switch dependent NIC teaming (LACP). We researched and were unable to find a way to tweak this timeout setting. This means you either run your Windows Server 2012 NIC team in switch independent mode (works the same as ESXi without LACP) or you don’t get to do ISSU… Somewhat defeats the whole point, no?
[UPDATE: Microsoft has since released a hotfix that allows you to change the timeout: https://support.microsoft.com/en-us/kb/3109099]
This problem also crops up with Linux hosts, but at least Linux lets you change the timeout. Getting your Linux admins to make the change may be another issue…
lacp rate fast ?
“You can change the timeout rate from the default rate (30 seconds) to the fast rate (1 second). This command is supported only on LACP-enabled interfaces.”
You missed the “not” in the “LACP rate fast should not be enabled on satellite member ports” requirement. :)
You must be set to slow.
Oh yeah I missed that, sorry. It looks like it’s “not guaranteed” to be non-disruptive … I’m wondering what happens if you force the ISSU. It also seems to be difficult (impossible?) to change the timeout on the Windows 2012 side. Hmm.
We’ve looked around and there are a number of people who have tried to solve this problem. No one has, yet. We’ve also put the question in through some of our Microsoft contacts. We will be testing the the update today to see what the actual impact is, and that will result in another blog post. I intend to test to see what the real difference is between the non-disruptive and disruptive upgrades. You cannot force the system to attempt an ISSU, but you can force a disruptive upgrade.
I did eventually test the update and as best I can tell the main difference between a non-disruptive and disruptive upgrade is whether the FEXen are upgraded all at once (disruptive) or in a rolling upgrade (theoretically non-disruptive if your hosts are multi-homed).
Do you have any notions of downtime during software upgrade? Are we talking about 30sec or a few minutes?
For everything to be up and running again, including attached FEX, about 30m…
I had the exact same experience with a 7010; the ISSU warns that some aspects might be disruptive (the SUP’s) .. but linecards wont be,, Trust me.. when the whole thing reboots the linecards go offline :(
Long story short –> even though you have read the doco etc.. make sure you give yourself a MUCH longer outage window when using ISSU
Great post. Pointed me in the right direction and that is a cool command
Microsoft have released a hotfix
That’s great news, thanks for letting us know!
You can on course shutdown the port that is blocking the ISSU (as it should be redundant) and the ISSU will work.