Adam Raffe Cloud, Data Centre & Networking

Useful Nexus 1000V 'vemcmd' Commands

For anyone using the Nexus 1000V virtual switch, it’s sometimes useful to see what is happening directly on the Virtual Ethernet Module (VEM) residing on the host, rather than just having visibility from the Virtual Supervisor Module (VSM). The method to achieve this is via the ‘vemcmd’ syntax on the ESXi host. Unfortunately these commands are not well documented, so I thought it would be useful to provide a list of some of the more common ones.

**vemcmd show card **- shows a multitude of information about the VEM, including the domain, switch name, ‘slot’ number, which control mode you are using (L2 or L3) and much more.

~ # <strong>vemcmd show card</strong>
Card UUID type  2: 4225e800-d334-720d-e73e-d3a8d500a201
Card name: lab.cisco.com
Switch name: N1KV
Switch alias: DvsPortset-0
Switch uuid: b6 e9 19 50 c5 4b 78 ff-c3 f3 a1 13 61 d2 8a d5
Card domain: 10
Card slot: 3
VEM Tunnel Mode: L3 Mode
L3 Ctrl Index: 49
L3 Ctrl VLAN: 666
VEM Control (AIPC) MAC: 00:02:3d:10:0a:02
VEM Packet (Inband) MAC: 00:02:3d:20:0a:02
VEM Control Agent (DPA) MAC: 00:02:3d:40:0a:02
VEM SPAN MAC: 00:02:3d:30:0a:02
Primary VSM MAC : 00:50:56:99:a7:82
Primary VSM PKT MAC : 00:50:56:99:94:df
Primary VSM MGMT MAC : 00:50:56:99:eb:a7
Standby VSM CTRL MAC : ff:ff:ff:ff:ff:ff
Management IPv4 address: 192.168.66.55
Management IPv6 address: 0000:0000:0000:0000:0000:0000:0000:0000
Primary L3 Control IPv4 address: 192.168.66.57
Secondary VSM MAC : 00:00:00:00:00:00
Secondary L3 Control IPv4 address: 0.0.0.0
Upgrade : Default
Max physical ports: 32
Max virtual ports: 300
Card control VLAN: 1
Card packet VLAN: 1
Control type multicast: No
Card Headless Mode : No
       Processors: 2
  Processor Cores: 2
Processor Sockets: 1
  Kernel Memory:   8388084
Port link-up delay: 5s
Global UUFB: DISABLED
Heartbeat Set: True
PC LB Algo: source-mac
Datapath portset event in progress : no
Licensed: Yes

vemcmd  show port **- shows a list of the ports on this VEM and the VMs or VMKs connected to them. **

~ # vemcmd show port
  LTL   VSM Port  Admin Link  State  PC-LTL  SGID  Vem Port  Type
   20     Eth3/4     UP   UP    FWD     561     3    vmnic3
   49      Veth1     UP   UP    FWD       0     3      vmk1
   50      Veth3     UP   UP    FWD       0     3      vmk2  VXLAN
   51      Veth5     UP   UP    FWD       0        Ubuntu-1.eth0
  561        Po1     UP   UP    FWD       0

**vemcmd  show port vlans **- shows a list of the ports on this VEM and the VLANs associated with them.

~ # vemcmd show port vlans
                          Native  VLAN   Allowed
  LTL   VSM Port  Mode    VLAN    State* Vlans
   20     Eth3/4   T          1   FWD    1,666,800
   49      Veth1   A        666   FWD    666
   50      Veth3   A        800   FWD    800
  561        Po1   T          1   FWD    1,666,800

**vemcmd  show stats **- displays some general statistics (bytes sent & received, etc) associated with each port.

~ # vemcmd show stats
 LTL  Received        Bytes      Sent        Bytes   Txflood    Rxdrop    Txdrop  Name
 8    853658    163106173   1706853    285044028    853377         0         0
 9   1706853    285044028    853658    163106173    853658         0         0
 10        16          960        42         2520        42         0         0
 12        16          960        42         2520        42         0         0
 15         0            0   3403853    806578120   1682744         0         0
 16      1661       152812         0            0         0         0         0  ar
 20  19831000   2862819019    851081    124246350       653  12789591         2  vmnic3
 49     24916      1494960   2806579    393405034   2781688         0         0  vmk1
 50    813566    119314022    832174    125090824       199         0         0  vmk2
 51    812802     78710878    831405     83473094     18639         0         0  Ubuntu-1.eth0

**vemcmd  show vlan **- displays VLAN information and which port is associated with each one.

~ # vemcmd show vlan
 Number of valid VLANs: 8
 VLAN 1, vdc 1, swbd 1, hwbd 1, 4 ports
Portlist:
 VLAN 666, vdc 1, swbd 666, hwbd 7, 3 ports
Portlist:
 20  vmnic3
 49  vmk1
 561
VLAN 800, vdc 1, swbd 800, hwbd 8, 3 ports
Portlist:
 20  vmnic3
 50  vmk2
 561
VLAN 3968, vdc 1, swbd 3968, hwbd 5, 3 ports
Portlist:
 1  inban
 5  inband port securit
 11
VLAN 3969, vdc 1, swbd 3969, hwbd 4, 2 ports
Portlist:
 8
 9
VLAN 3970, vdc 1, swbd 3970, hwbd 3, 0 ports
Portlist:
 VLAN 3971, vdc 1, swbd 3971, hwbd 6, 2 ports
Portlist:
 14
 15
<strong>vemcmd  show l2 all </strong>- shows MAC address and other information for all bridge domains and VLANs.
~ # vemcmd show l2 all
 Bridge domain    1 brtmax 4096, brtcnt 6, timeout 300
 VLAN 1, swbd 1, ""
 Flags:  P - PVLAN  S - Secure  D - Drop
 Type         MAC Address   LTL   timeout   Flags    PVLAN
 Static   00:02:3d:80:0a:02     6         0
 Static   00:02:3d:40:0a:02    10         0
 Static   00:02:3d:30:0a:02     3         0
 Static   00:02:3d:60:0a:00     5         0
 Static   00:02:3d:20:0a:02    12         0
 Static   00:02:3d:10:0a:02     2         0

**vemcmd  show l2 bd-name <_name_> **- shows MAC address and other information for a specific bridge domain (useful if using VXLANs).

~ # vemcmd show l2 bd-name test-vm
Bridge domain    9 brtmax 4096, brtcnt 2, timeout 300
Segment ID 5000, swbd 4096, "test-vm"
Flags:  P - PVLAN  S - Secure  D - Drop
       Type         MAC Address   LTL   timeout   Flags    PVLAN    Remote IP
    SwInsta   00:50:56:99:3e:45   561         0                   172.16.1.2
     Static   00:50:56:99:1b:c3    51         0                      0.0.0.0

**vemcmd  show packets **- shows traffic statistics for broadcast / unicast / multicast.

~ # vemcmd show packets
  LTL   RxUcast   TxUcast   RxMcast   TxMcast   RxBcast   TxBcast   Txflood    Rxdrop    Txdrop  Name
    8    854192    854171         0         0       161    854072    854072         0         0
    9    854171    854192         0         0    854072       161    854353         0         0
   10         0         0         0         0        16        42        42         0         0
   12         0         0         0         0        16        42        42         0         0
   15         0   3406656         0         0         0         0   1684145         0         0
   16      1661         0         0         0         0         0         0         0         0  ar
   20   4269784    836945     60064     14332  15537893       753       653  12800775         2  vmnic3
   49     25007     24969        18      3483       148   2789944   2782799         0         0  vmk1
   50    813658    832692        20         0       718       200       200         0         0  vmk2
   51    813522    813495      2183     15737       766      2890     18639         0         0  Ubuntu-1.eth0
  561   4269784    836945     60064     14332  15537893       753       653  12800775         2

**vemcmd  show arp all **- shows arp information (could be useful to show VTEP information).

~ # vemcmd show arp all
Flags: D-Dynamic S-Static d-Delete s-Sticky
       P-Proxy B-Public C-Create X-Exclusive
VLAN/SEGID     IP Address           MAC Address          Flags            Expiry
800            172.16.1.1           00:50:56:68:e9:94    D                550
800            172.16.1.2           00:50:56:60:9a:7c    D                550

**vemcmd  show vxlan interfaces **- shows which interfaces (i.e. VMKs and their associated vEths) are configured as VXLAN interfaces.

~ # vemcmd show vxlan interfaces
LTL     VSM Port      IP       Seconds since Last   Vem Port
                               IGMP Query Received
(* = IGMP Join Interface/Designated VTEP)
--------------------------------------------------------------
 50        Veth3    172.16.1.1    856205             vmk2         *

That will do for now - you can actually get a full list of the commands available by doing ‘vem-support all’ on the VEM, but the ones above are some of the most useful. Hope this helps!

Peer-Gateway Doesn't Solve L3 Over vPC

On a recent Packet Pushers podcast, use of the Peer-Gateway feature on the Nexus 7000 and whether it resolves the lack of support for L3 over vPC was briefly discussed. The whole topic has been quite a big source of confusion, so let’s answer it straight away: using Peer-Gateway to try and resolve L3 over vPC issues is not supported, but more importantly in most cases it doesn’t actually work. The question is, why not? There are actually two reasons.

Issue #1: Data Plane

The first issue is the one that many people who have implemented vPC know about. vPC on the Nexus 7000 has a built in rule stating that a packet received over the Peer-Link cannot be forwarded out onto another vPC. This is in place to prevent loops and duplicate packets, however it has a knock on effect on the ability to run routing to and from vPC connected devices, as shown in the following example.

Peer-Gateway-L3-1

In the above example, the layer 3 next hop (from the point of view of the router at the bottom) is Nexus 7000-2, however vPC is choosing to send the frame to Nexus 7000-1. As a result, the packet passes over the Peer-Link, where it is prevented from being sent out onto another vPC member port. It is therefore probable that approximately 50% of your traffic would be lost in this scenario. So if we now enable the Peer-Gateway feature, does that help?

Peer-Gateway was introduced to get around a specific problem found on certain NAS and load balancing platforms that have a habit of replying to traffic using the MAC address of the sending device, rather than the HSRP MAC. In non-vPC environments, that never caused an issue (in fact most people probably never even noticed it happening), but when using vPC, traffic gets lost for similar reasons as in the above example. Peer-Gateway fixes this by allowing a device to route packets originally intended for the other peer. So the question is, can’t we enable Peer-Gateway to get around the L3 over vPC the above issue? The answer is yes, for the data plane. Unfortunately, the same feature has completely the wrong effect on the control plane for dynamic routing.

Issue #2: Control Plane

Let’s now add a routing protocol - OSPF - to our example. We enable OSPF on both Nexus 7000 SVIs as well as the router at the bottom of our diagram with the intention of having adjacencies between all three. Peer-Gateway is enabled to get around the data plane issue described above.

Peer-Gateway-L3-2

In the above diagram, I am showing one OSPF adjacency between our router and Nexus 7000-2. In reality, there are two more adjacencies (one between the router and Nexus 7000-1, and one between the two Nexus 7Ks) but I’m not showing them here for clarity. In order to form those adjacencies, let’s say that the OSPF packets between our router and Nexus 7000-2 take the alternate layer 2 path through Nexus 7000-1 as shown in the following diagram.

Peer-Gateway-L3-3

Now the important thing to remember is that we have Peer-Gateway enabled in the above topology. Peer-Gateway works by forcing routing to take place locally, which means that any OSPF packets which pass through Nexus 7000-1 are routed. What that also means is that the TTL will be decremented, and as most routing protocols use packets with a TTL of 1, the TTL is decremented to 0 - so the packet is dropped. The result of all this is that the OSPF adjacencies, in many cases, will never come up. Of course, you might be lucky and see all your packets being hashed to the ‘correct’ peer, however this is fairly unlikely.

So the bottom line is that enabling Peer-Gateway will solve one problem only to introduce another - the net result is that we cannot use this feature to support dynamic routing over vPC.

Storm Control on Nexus 2000 NIFs

NX-OS release 5.2(1)N1(2) added support for storm control on Nexus 2000 NIFs / FEX Fabric Interfaces (this is also available on 6.0(2)N2(1) for the Nexus 6000) - these are the interfaces used to connect the parent Nexus 5500 or 6000 to the Fabric Extender. I looked into this feature recently for a customer so thought a quick overview might be useful as there are a couple of things to be aware of.

Firstly, the storm control percentage value that you configure gets implemented as a percentage of the total speed of the port-channel between 5K / 6K and FEX. Here’s an example:

NIF-SC-1

In the above drawing, we have four 10GE interfaces used as FEX Fabric ports (i.e. the links between the 5K and 2K). All four links within this port-channel on the 5K are on the same port ASIC (this matters!!), and broadcast storm control is configured on the FEX Fabric port-channel, with a value of 50%. So in this case, a total of 20Gbps of broadcast traffic would be allowed into the parent Nexus 5500 before the threshold is reached and we start dropping traffic. This is pretty straightforward and what you would expect to see.

I said in the example above that the port ASIC allocations matter - this is because the behaviour is slightly different if the FEX Fabric port-channel is spread across multiple port ASICs on the Nexus 5K. Here’s another example:

NIF-SC-2

In the second example, two of the member links in the FEX Fabric port-channel are connected to ports serviced by the first UPC (Unified Port Controller) ASIC. The other two ports in the channel are using a different UPC ASIC. The storm control percentage is still calculated based on the total bandwidth in the port-channel (10%, or 4Gbps in this case) - however, importantly, each UPC port ASIC_ enforces the threshold independently_. What that means in practice is that each UPC will allow 4Gbps of broadcast traffic, for a total of 8Gbps into the switch. If you aren’t aware of this then you could end up with more traffic than you expect before the switch takes action.

Note that the above behaviour is applicable to normal port-channels as well (not just FEX NIF ports).

FabricPath for Layer 2 DC Interconnect?

The requirement for layer 2 interconnect between data centre sites is very common these days. The pros and cons of doing L2 DCI have been discussed many times in other blogs / forums so I won’t revisit that here, however there are a number of technology options for achieving this, including EoMPLS, VPLS, back-to-back vPC and OTV. All of these technologies have their advantages and disadvantages, so the decision often comes down to factors such as scalability, skillset and platform choice.

Now that FabricPath is becoming more widely deployed, it is also starting to be considered by some as a potential L2 DCI technology. In theory, this looks like a good bet - easy configuration, no Spanning-Tree extended between sites, should be a no brainer, right? Of course, things are never that simple - let’s look at some things you need to consider if looking at FabricPath as a DCI solution.

**1: FabricPath requires direct point-to-point WAN links. **

A technology such as OTV uses mac-in-IP tunnelling to transport layer 2 frames between sites, so you simply need to ensure that end-to-end IP connectivity is available. As a result, OTV is very flexible and can run over practically any network as long as it is IP enabled. FabricPath on the other hand requires a direct layer 1 link between the sites (e.g. dark fibre), so it is somewhat less flexible. Bear in mind that you also lose some of the features associated with an IP network - for example, there is currently no support for BFD over FabricPath.

**2: Your multi-destination traffic will be ‘hairpinned’ between sites. **

In order to forward broadcast, unknown unicast and multicast traffic through a FabricPath network, a multi-destination tree is built. This tree generally needs to ‘touch’ each and every FP node so that multi-destination traffic is correctly forwarded. Each multi-destination tree in a FabricPath network must elect a root switch (this is controllable through root priorities, and it’s good practice to use this), and all multi-destination traffic must flow through this root. How does this affect things in a DCI environment? The main thing to remember is that there will generally be a single multi-destination tree spanning both sites, and that the root for that tree will exist on one site or the other. The following diagram shows an example.

FP-DCI-1

In the above example, there are two sites, each with two spine switches and two edge switches. The root for the multi-destination tree is on Spine-3 in Site B. For the hosts connected to the two edge switches in site A, broadcast traffic could follow the path from Edge-1 up to Spine-1, then over to Spine-3 in Site B, then to Spine-4, and then back down to the Spine-2 and Edge-2 switches in Site A before reaching the other host. Obviously there could be slightly different paths depending on topology, e.g. if the Spine switches are not directly interconnected. In future releases of NX-OS, the ability to create multiple FabricPath topologies will alleviate this issue to a certain extent, in that groups of ‘local’ VLANs can be constrained to a particular site, while allowing ‘cross-site’ VLANs across the DCI link.

3: First Hop Routing localisation support is limited with FabricPath.

When stretching L2 between sites, it’s sometimes desirable to implement ‘FHRP localisation’ - this usually involves blocking HSRP using port ACLs or similar, so that hosts at each site use their local gateways rather than traversing the DCI link and being routed at the other site. The final point to be aware of is that when using FabricPath for layer 2 DCI, achieving FHRP localisation is slightly more difficult. On the Nexus 5500, FHRP localisation is supported using ‘mismatched’ HSRP passwords at each site (you can’t use port ACLs for this purpose on the 5K). However, if you have any other FabricPath switches in your domain which aren’t acting as a L3 gateway (e.g. at a third site), then that won’t work and is not supported.

This is because FabricPath will send HSRP packets from the virtual MAC address at each site with the local switch ID as a source. Other FabricPath switches in the domain will see the same vMAC from two source switch IDs and will toggle between them, making the solution unusable. Also, bear in mind that FHRP localisation with FabricPath isn’t (at the time of writing) supported on the Nexus 7000.

The issues noted above do not mean that FabricPath cannot be used as a method for extending layer 2 between sites. In some scenarios, it can be a viable alternative to the other DCI technologies as long as you are aware of the caveats above.

Hope this is useful - thanks for reading!

L3 over vPC: Nexus 7000 vs 5000

The lack of support for running layer 3 routing protocols over vPC on the Nexus 7000 is well documented - less well known however is that the Nexus 5500 platform operates in a slightly different way which does actually allow layer 3 routing over vPC for unicast traffic. Some recent testing and subsequent discussions with one of my colleagues on this topic reminded me that there is still (somewhat understandably) a degree of confusion around this.

Let’s start with a reminder of what doesn’t work on the Nexus 7000:

L3-over-vPC-5Kvs7K1

In the diagram above, a router or similar device is connected to the two Nexus 7000 switches using a vPC and is attempting to form adjacencies with each peer. This design does not work on the Nexus 7000 as traffic may need to traverse the Peer-Link in order to reach its ultimate destination. The Nexus 7000 has a rule which says “any traffic received from the Peer-Link cannot be forwarded out via another vPC”, therefore such traffic would be dropped. Here’s a slightly different example:

L3-over-vPC-5Kvs7K2

In the above example, the routers or firewalls are single attached to the Nexus 7000 switches but do not connect using a vPC. This design doesn’t work either on the Nexus 7000 as each router / firewall will form an adjacency with both Nexus 7000s, and traffic may still traverse the Peer-Link. The common solution to the above problem is to run a separate link between the two Nexus 7000s to handle the non-vPC traffic:

L3-over-vPC-5Kvs7K3

No surprises so far - the above restrictions are well documented in various guides on cisco.com.

What may surprise you is that the Nexus 5500 works in a slightly different way - as a result, there are some topologies that don’t work on the Nexus 7000 but which do work on the Nexus 5500 (when running a layer 3 module). Let’s go back to our first example above, but this time let’s replace the Nexus 7000s with Nexus 5500s running with layer 3 modules:

L3-over-vPC-5Kvs7K4

The above design is supported and works for unicast traffic__ only. Why is this? The reason is that the Nexus 5500 does not handle traffic received from the Peer-Link in the same way as the Nexus 7000, therefore this traffic will be forwarded out on another vPC. Note however that traffic flows may still be suboptimal (i.e. traffic may arrive at Nexus 5500-1, only to have to traverse the Peer-Link to reach Nexus 5500-2), however the Peer-Gateway feature can be enabled to resolve this. I should note at this point that officially, the above design still isn’t recommended (although it works and is supported) - one of the reasons for this is that multicast traffic is subject to the same restrictions as on the Nexus 7000 and therefore does not work properly in this scenario.

If this hasn’t completely blown your mind, then I have one final topology to show you:

L3-over-vPC-5Kvs7K5

In the above example, two routers are single attached to the Nexus 5500s. There is a separate link for non-vPC traffic and the Peer-Gateway feature is enabled. This should work, right? Actually it doesn’t - in this scenario, OSPF adjacencies would not form properly. Note that this is specific to the scenario where there is a non-vPC link and Peer-Gateway is turned on. The solution to this is to revert to running the routed VLAN over the Peer-Link - which does work (for unicast traffic only).

Hopefully this will help to clear up some of the confusion - thanks for reading!