ACI Multi-Site Control Plane and Data Plane
This post consists of some notes from the Cisco ACI Multi-Site white paper.
ACI Multi-Site Overlay Control Plane
I- ACI Multi-Site Underlay Control Plane:
The OSPF control plane is used to exchange between sites routing information for specific IP addresses defined on the spine nodes:
- BGP-EVPN Router-ID (EVPN-RID): This unique IP address is defined on each spine node belonging to the fabric and is used to establish MP-BGP EVPN and VPNv4 adjacencies with the spine nodes in remote sites.
- Overlay Unicast TEP (O-UTEP): This common anycast address is shared by all the spine nodes in the same pod and is used to source and receive unicast VXLAN data-plane traffic. Each pod is characterized by an O-UTEP address.
- Overlay Multicast TEP (O-MTEP): This common anycast address is shared by all the spine nodes in the same site and is used to perform head-end replication for BUM traffic. BUM traffic is sourced from the O-UTEP address defined on the local spine nodes and destined for the O-MTEP of remote sites to which the given bridge domain is being stretched
The EVPN-RID, O-UTEP, and O-MTEP addresses are the only prefixes that must be exchanged across sites to enable the intersite EVPN control plane and the VXLAN data plane. Consequently, they are the only prefixes that should be learned in the ISN routing domain.
This implies that those IP addresses must be globally routable across the ISN, which should normally not be a problem, because they are independent of the original TEP pools associated with each fabric and assigned separately on the Cisco Multi-Site Orchestrator at the time of Multi-Site deployment.
II- ACI Multi-Site Overlay Control Plane:
When an endpoint is discovered as locally connected to a given leaf node, the leaf node originates a COOP control-plane message to communicate the endpoint information (IPv4/IPv6 and MAC addresses) to the spine nodes.
In a Cisco ACI Multi-Site deployment, host information for discovered endpoints must be exchanged between spine nodes that are part of separate fabrics to allow east-west communication between endpoints. This intersite exchange of host information is required only for the endpoints that really need to communicate: Intra EPG and Inter EPG (with contract) communications.
MP-BGP EVPN adjacencies are established between spine nodes belonging to different fabrics by using the EVPN-RID addresses. Both MP Interior BGP (MP-iBGP) and MP External BGP (MP-eBGP) sessions are supported, depending on the specific BGP autonomous system to which each site belongs.
When iBGP is used across sites, you can instead decide whether to use a full mesh or to
introduce route-reflector nodes, usually referred to as External-RRs (Ext-RRs).
Note – External RR & Internal RR in ACI Multi-Site
The Ext-RR nodes discussed above are used for the MP-BGP EVPN peerings established between spine nodes deployed in separate sites. They serve a different function from that of the internal RR nodes, which are always deployed for distributing to all of the leaf nodes that are part of the same fabric external IPv4/IPv6 prefixes learned on the L3Out logical connections.
- Multi-Site overlay control plane Steps for Intersite communication:
1- Endpoints EP1 and EP2 connect to separate Sites 1 and 2.
2- A COOP notification is generated inside each fabric from the leaf nodes on which EP1 and EP2 are discovered and sent to the local spine nodes.
3- An intersite policy is defined in Cisco Multi-Site Orchestrator and is then pushed and rendered in the two sites, The creation of the intersite policy triggers Type-2 EVPN updates across sites to exchange EP1 and EP2 host route information.
Note: that the endpoint information always is associated with the O-UTEP address, univocally identifying the site at which each specific endpoint was discovered. Therefore, no additional EVPN updates are required when an endpoint moves around different leaf nodes that are part of the same fabric until an endpoint is migrated to a different site.
ACI Multi-Site Overlay Data Plane
I – How L2 BUM traffic is handled in ACI Multisite:
This type of traffic can be achieved in one of two ways:
- Native multicast replication functions offered by the Layer 3 infrastructure interconnecting the endpoints (the approach adopted in the Cisco ACI Multi-Pod architecture)
- Ingress replication functions on the source VXLAN TEP (VTEP) devices (that is, the spines in the source fabric), which create multiple unicast copies of each BUM to be sent to all the remote VTEPs on which those endpoints are part of the same Layer 2 domain are connected.
The transmission of Layer 2 BUM frames across sites is required only for the specific bridge domains that are stretched with flooding enabled.
There are three different types of Layer 2 BUM traffic:
- Layer 2 Broadcast frames (B): Those are always forwarded across sites. A special type of Layer 2 broadcast traffic is ARP.
- Layer 2 Unknown Unicast frames (U): Those frames, by default, are not flooded across sites but are instead forwarded in unicast mode, assuming that the destination MAC is known in the COOP database of the local spines (else the traffic will be dropped by the receiving spine). However, there is the possibility of changing this behavior on the bridge-domain-specific configuration of Cisco Multi-Site Orchestrator by selecting the “flood” option associated with the “L2 UNKNOWN UNICAST” traffic.
- Layer 2 Multicast frames (M): The same forwarding behavior applies to intra-bridge-domain Layer 3 multicast frames (that is, the source and receivers are in the same or different IP subnets but part of the same bridge domain) or to “true” Layer 2 multicast frames (that is, the destination MAC address is multicast and there is no IP header in the packet). In both cases, the traffic is forwarded across the sites
where the bridge domain is stretched once BUM forwarding is enabled for that bridge domain.