Multi-protocol BGP

I ran across a setup the other day that had me shaking my head. My customer has a router that has multiple incoming connections. Each connection is in it’s own VRF. Where it get’s strange is the down link to the core. The core is a pair of Nexus 93180YC’s with vPC’s configured. The connection between the router and the core is a layer 2 trunk, port- channeled. So for some reason, we’re terminating WAN connections on a very expensive ASR, and then trunking multiple VLAN’s down to the 9K and routing the VLAN’s there. This is a perfect example when to use Multi-protocol BGP (MP-BGP). Referencing the diagram above, we have three connections coming in that each go to a different business partner. We want segmentation between the customers so they can’t talk to each other, but still want a common link to the core, a routed link to boot.

The * depicts what is advertised by the partner and the IP/mask is the local WAN connection. From the core, we should be able to ping all three partner’s WAN link and advertised network. In this case only partner-1 has a loopback with an IP from it’s advertised space.

partner-1 config

interface Loopback0
  ip address 12.181.161.1 255.255.255.255
 !
 interface GigabitEthernet0/0
  ip address 1.1.1.1 255.255.255.252
  duplex auto
  speed auto
  media-type rj45
 !         
 router bgp 65011
  bgp router-id 1.1.1.2
  bgp log-neighbor-changes
  network 12.181.161.0 mask 255.255.255.0
  redistribute connected
  neighbor 1.1.1.2 remote-as 65000
  neighbor 1.1.1.2 soft-reconfiguration inbound

partner-2 and partner-3 configs are the same, just different IP’s and advertisements. The real work is done on the EDGE-RTR device. First let’s look at the VRF config. We create a VRF for each partner and then a shared or common VRF. This VRF will provide the connectivity down to the core. The rd is the route distinguisher and should be different than any other one on the router. This allows the router to keep separate routing tables. The route-target export function tells BGP that it has the capability to export it’s routes in this routing table. The route-target import command tells BGP to import the routes from the referring routing table. In VRF common, we are exporting our routes (10.0.0.0/30) and importing all the routes from VRF partner-1, partner-2, and partner-3. Partner-1 is exporting its routes and is importing the routes from VRF ‘common’. Same goes for partner-2 and partner-3.

ip vrf common
  rd 65000:1
  route-target export 65000:1
  route-target import 65000:11
  route-target import 65000:12
  route-target import 65000:13
 !
 ip vrf partner-1
  rd 65000:11
  route-target export 65000:11
  route-target import 65000:1
 !
 ip vrf partner-2
  rd 65000:12
  route-target export 65000:12
  route-target import 65000:1
 !
 ip vrf partner-3
  rd 65000:13
  route-target export 65000:13
  route-target import 65000:1
 !

Now let’s take a look at the BGP configuration on EDGE-RTR. The BGP config is pretty straight-forward. We have multiple VRF’s that we want to run BGP on so we have to create each one of those instances.

router bgp 65000
  bgp router-id 1.1.1.1
  bgp log-neighbor-changes
  !
  address-family ipv4 vrf common
   network 10.0.0.0 mask 255.255.255.252
   network 12.181.161.1 mask 255.255.255.255
   redistribute connected
   neighbor 10.0.0.1 remote-as 65000
   neighbor 10.0.0.1 activate
   neighbor 10.0.0.1 soft-reconfiguration inbound
  exit-address-family
  !
  address-family ipv4 vrf partner-1
   neighbor 1.1.1.1 remote-as 65011
   neighbor 1.1.1.1 activate
   neighbor 1.1.1.1 soft-reconfiguration inbound
  exit-address-family
  !
  address-family ipv4 vrf partner-2
   neighbor 2.2.2.1 remote-as 65012
   neighbor 2.2.2.1 activate
   neighbor 2.2.2.1 soft-reconfiguration inbound
  exit-address-family
  !
  address-family ipv4 vrf partner-3
   neighbor 3.3.3.1 remote-as 65013
   neighbor 3.3.3.1 activate
   neighbor 3.3.3.1 soft-reconfiguration inbound
  exit-address-family

Let’s take a look at the routing table and see if we have some connectivity.

CORE#sh ip route
Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area 
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route, H - NHRP, l - LISP
       a - application route
       + - replicated route, % - next hop override, p - overrides from PfR

Gateway of last resort is not set

      10.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
C        10.0.0.0/30 is directly connected, GigabitEthernet0/0
L        10.0.0.1/32 is directly connected, GigabitEthernet0/0
CORE#sh ip bgp sum
BGP router identifier 10.0.0.1, local AS number 65000
BGP table version is 2, main routing table version 2
8 network entries using 1152 bytes of memory
8 path entries using 640 bytes of memory
7/1 BGP path/bestpath attribute entries using 1064 bytes of memory
3 BGP AS-PATH entries using 72 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 2928 total bytes of memory
BGP activity 24/16 prefixes, 26/18 paths, scan interval 60 secs

Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.0.0.2        4        65000      12       4        2    0    0 00:00:25        8

We have no routes in the routing table, but we are learning 8 routes via BGP. Let’s take a closer look at BGP.

CORE#sh ip bgp
BGP table version is 2, local router ID is 10.0.0.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 * i 1.1.1.0/30       1.1.1.1                  0    100      0 65011 ?
 * i 2.2.2.0/30       2.2.2.1                  0    100      0 65012 ?
 * i 3.3.3.0/30       3.3.3.1                  0    100      0 65013 ?
 r>i 10.0.0.0/30      10.0.0.2                 0    100      0 ?
 * i 12.181.161.0/24  1.1.1.1                  0    100      0 65011 i
 * i 12.181.161.1/32  1.1.1.1                  0    100      0 65011 ?
 * i 69.222.73.0/24   2.2.2.1                  0    100      0 65012 i
 * i 198.51.0.0/16    3.3.3.1                  0    100      0 65013 i

Ah ah. Take a look at the next hop. The remote networks are using their real next-hop. We need to change that to make the EDGE-RTR the next hop for all those prefix’s. How do we do that? With next-hop-self.

EDGE-RTR#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
EDGE-RTR(config)#router bgp 65000
EDGE-RTR(config-router)# address-family ipv4 vrf common
EDGE-RTR(config-router-af)# neighbor 10.0.0.1 next-hop-self
EDGE-RTR(config-router-af)#do clear ip bgp * soft

Now let’s go back to the core and take a look.

CORE#sh ip bgp
BGP table version is 9, local router ID is 10.0.0.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>i 1.1.1.0/30       10.0.0.2                 0    100      0 65011 ?
 *>i 2.2.2.0/30       10.0.0.2                 0    100      0 65012 ?
 *>i 3.3.3.0/30       10.0.0.2                 0    100      0 65013 ?
 r>i 10.0.0.0/30      10.0.0.2                 0    100      0 ?
 *>i 12.181.161.0/24  10.0.0.2                 0    100      0 65011 i
 *>i 12.181.161.1/32  10.0.0.2                 0    100      0 65011 ?
 *>i 69.222.73.0/24   10.0.0.2                 0    100      0 65012 i
 *>i 198.51.0.0/16    10.0.0.2                 0    100      0 65013 i
CORE#sh ip route
Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area 
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route, H - NHRP, l - LISP
       a - application route
       + - replicated route, % - next hop override, p - overrides from PfR

Gateway of last resort is not set

      1.0.0.0/30 is subnetted, 1 subnets
B        1.1.1.0 [200/0] via 10.0.0.2, 00:01:49
      2.0.0.0/30 is subnetted, 1 subnets
B        2.2.2.0 [200/0] via 10.0.0.2, 00:01:49
      3.0.0.0/30 is subnetted, 1 subnets
B        3.3.3.0 [200/0] via 10.0.0.2, 00:01:49
      10.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
C        10.0.0.0/30 is directly connected, GigabitEthernet0/0
L        10.0.0.1/32 is directly connected, GigabitEthernet0/0
      12.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
B        12.181.161.0/24 [200/0] via 10.0.0.2, 00:01:49
B        12.181.161.1/32 [200/0] via 10.0.0.2, 00:01:49
      69.0.0.0/24 is subnetted, 1 subnets
B        69.222.73.0 [200/0] via 10.0.0.2, 00:01:49
B     198.51.0.0/16 [200/0] via 10.0.0.2, 00:01:49
CORE#trace 12.181.161.1
Type escape sequence to abort.
Tracing the route to 12.181.161.1
VRF info: (vrf in name/id, vrf out name/id)
  1 10.0.0.2 5 msec 2 msec 3 msec
  2 1.1.1.1 [AS 65011] 2 msec *  5 msec
CORE#

That looks much better. We now have routes in the routing table and we connectivity! Let’s make sure partner-1 cannot talk to partner-2 though.

Gateway of last resort is not set

      1.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
C        1.1.1.0/30 is directly connected, GigabitEthernet0/0
L        1.1.1.1/32 is directly connected, GigabitEthernet0/0
      10.0.0.0/30 is subnetted, 1 subnets
B        10.0.0.0 [20/0] via 1.1.1.2, 19:17:19
      12.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
S        12.181.161.0/24 is directly connected, Null0
C        12.181.161.1/32 is directly connected, Loopback0
partner-1# ping 2.2.2.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2.2.2.1, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
partner-1#

partner-1 and partner-2 are not in the routing table so there is no connectivity. What if we put a static route in partner-1 pointing to partner-2 and vice-versa?

partner-1(config)#ip route 2.2.2.0 255.255.255.252 1.1.1.2
partner-2(config)#ip route 1.1.1.0 255.255.255.252 2.2.2.2

partner-1#ping 2.2.2.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
U.U.U
*May 15 22:39:44.663: ICMP: dst (1.1.1.1) host unreachable rcv from 1.1.1.2

Perfect no connectivity! But let’s say for some crazy reason we do want partner-1 and partner-2 to be able to communicate.

EDGE-RTR(config)#ip vrf partner-1
EDGE-RTR(config-vrf)#route-target import 65000:12
EDGE-RTR(config-vrf)#ip vrf partner-2            
EDGE-RTR(config-vrf)#route-target import 65000:11
EDGE-RTR(config-vrf)#end
EDGE-RTR#

EDGE-RTR#sh run | b vrf 
ip vrf common
 rd 65000:1
 route-target export 65000:1
 route-target import 65000:11
 route-target import 65000:12
 route-target import 65000:13
!
ip vrf partner-1
 rd 65000:11
 route-target export 65000:11
 route-target import 65000:1
 route-target import 65000:12
!
ip vrf partner-2
 rd 65000:12
 route-target export 65000:12
 route-target import 65000:1
 route-target import 65000:11
!
ip vrf partner-3
 rd 65000:13
 route-target export 65000:13
 route-target import 65000:1

We import! In partner-1 VRF we imported partner-2. In partner-2 we imported the VRF of partner-1. We now have connectivity between the partners!

partner-1#sh ip route
Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area 
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route, H - NHRP, l - LISP
       a - application route
       + - replicated route, % - next hop override, p - overrides from PfR

Gateway of last resort is not set

      1.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
C        1.1.1.0/30 is directly connected, GigabitEthernet0/0
L        1.1.1.1/32 is directly connected, GigabitEthernet0/0
      2.0.0.0/30 is subnetted, 1 subnets
B        2.2.2.0 [20/0] via 1.1.1.2, 00:01:29
      10.0.0.0/30 is subnetted, 1 subnets
B        10.0.0.0 [20/0] via 1.1.1.2, 19:25:59
      12.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
S        12.181.161.0/24 is directly connected, Null0
C        12.181.161.1/32 is directly connected, Loopback0
      69.0.0.0/24 is subnetted, 1 subnets
B        69.222.73.0 [20/0] via 1.1.1.2, 00:01:29
partner-1#ping 2.2.2.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
!!!!!

It works. More importantly though, you should never do this. You’re being a transit between the two partners!!

This is what MP-BGP is all about. Not that hard, you just need to plan it all out. Have fun!