Nexus 5k NTP Failure

Ran into this bug today. Went to a pair of Nexus 5500s to debug a vPC link and noticed the timestamps were off. I thought that was odd. I tried show ntp peer-status and received no output in return, which I thought was even more odd. I poked at NTP for a while and decided it had to be a bug. I found the bug in the release notes and it was fixed in 5.2(1)N1(6). Here’s the Cisco bug report:

Symptom:
Nexus 5k acting as an NTP Client can’t sync with any NTP server(s).
when issuing a “show ntp peer-status” or a “show ntp peers” it does not display any of the servers/peers configured.

Conditions:
Nexus 5500/5000 running 5.2(1)N1(5).

Workaround:
Proactive workaround to prevent from this issue is none.
Reactive workaround to recover this issue is below. However, after reloading system, same issue may happen again.

#conf t
#clock protocol none
#clock protocol ntp
#copy run start

Fun. At least it can be fixed without reloading, which is a good thing in a data center switch.

FIN

No ISSU for you!

The Nexus 5000 series has the capability to do ISSU, an In-Service Software Upgrade. You can upgrade a vPC pair of Nexus 5000s without impacting any hosts (assuming everything is dual connected). Well, that is, unless you are using Windows Server 2012 with LACP. Let’s take a look at some show command output, shall we?

Nexus6# sh lacp issu-impact
For ISSU to Proceed, Check the following:
1. All port-channel member port should be in a steady state.
2. LACP rate fast should not be enabled on satellite member ports.

The following ports are not ISSU ready
Eth1/28     ,

OK, so Eth1/28 is going to prevent an ISSU because of an LACP issue, let’s check out the LACP details.

Nexu6# sh lacp interface e1/28
Interface Ethernet1/28 is up
  Channel group is 28 port channel is Po28
  PDUs sent: 21679
  PDUs rcvd: 750
  Markers sent: 0
  Markers rcvd: 0
  Marker response sent: 0
  Marker response rcvd: 0
  Unknown packets rcvd: 0
  Illegal packets rcvd: 0
Lag Id: [ [(0, 90-e2-ba-23-9e-8c, 0, 0, 100), (7f9b, 0-23-4-ee-be-2,
801c, 8000, 11c)] ]
Operational as aggregated link since Tue Jul 23 15:00:18 2013

Local Port: Eth1/28   MAC Address= 54-7f-ee-ef-cd-ab
  System Identifier=0x8000,54-7f-ee-ef-cd-ab
  Port Identifier=0x8000,0x11c
  Operational key=32796
  LACP_Activity=active
  LACP_Timeout=Long Timeout (30s)
  Synchronization=IN_SYNC
  Collecting=true
  Distributing=true
  Partner information refresh timeout=Short Timeout (3s)
Actor Admin State=(Ac-1:To-1:Ag-1:Sy-0:Co-0:Di-0:De-0:Ex-0)
Actor Oper State=(Ac-1:To-0:Ag-1:Sy-1:Co-1:Di-1:De-0:Ex-0)
Neighbor: 0x100
  MAC Address= 90-e2-ba-23-9e-8c
  System Identifier=0x0,90-e2-ba-23-9e-8c
  Port Identifier=0x0,0x100
  Operational key=0
  LACP_Activity=active
  LACP_Timeout=short Timeout (1s)
  Synchronization=IN_SYNC
  Collecting=true
  Distributing=true
Partner Admin State=(Ac-0:To-1:Ag-0:Sy-0:Co-0:Di-0:De-0:Ex-0)
Partner Oper State=(Ac-1:To-1:Ag-1:Sy-1:Co-1:Di-1:De-0:Ex-0)

You’ll notice that the LACP_Timeout values in bold do not match between the Local Port and the Neighbor. We need both ends set to the long timeout for ISSU to be happy. The LACP neighbor is a Windows Server 2012 box using switch dependent NIC teaming (LACP). We researched and were unable to find a way to tweak this timeout setting. This means you either run your Windows Server 2012 NIC team in switch independent mode (works the same as ESXi without LACP) or you don’t get to do ISSU… Somewhat defeats the whole point, no?

[UPDATE: Microsoft has since released a hotfix that allows you to change the timeout: https://support.microsoft.com/en-us/kb/3109099]

This problem also crops up with Linux hosts, but at least Linux lets you change the timeout. Getting your Linux admins to make the change may be another issue…

FIN