diff mbox

[RFC,net-next,3/3] rcv path changes for vrf traffic

Message ID 557634F3.4070700@gmail.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

David Ahern June 9, 2015, 12:36 a.m. UTC
On 6/8/15 1:58 PM, Hannes Frederic Sowa wrote:
> Hi Shrijeet,
>
> On Mo, 2015-06-08 at 11:35 -0700, Shrijeet Mukherjee wrote:
>> From: Shrijeet Mukherjee <shm@cumulusnetworks.com>
>>
>> Incoming frames for IP protocol stacks need the IIF to be changed
>> from the actual interface to the VRF device. This allows the IIF
>> rule to be used to select tables (or do regular PBR)
>>
>> This change selects the iif to be the VRF device if it exists and
>> the incoming iif is enslaved to the VRF device.
>>
>> Since VRF aware sockets are always bound to the VRF device this
>> system allows return traffic to find the socket of origin.
>>
>> changes are in the arp_rcv, icmp_rcv and ip_rcv paths
>>
>> Question : I did not wrap the rcv modifications, in CONFIG_NET_VRF
>> as it would create code variations and the vrf_ptr check is there
>> I can make that whole thing modular.
>
>  From an architectural level I think the output path looks good. For the
> input path I would also to propose my (I think) more flexible solution:
>

Something is still not right on the output path. e.g., I see the wrong 
source address showing up on ping -I vrf0:

# ping -I vrf0 1.1.1.254
ping: Warning: source address might be selected on device other than vrf0.
PING 1.1.1.254 (1.1.1.254) from 172.16.1.52 vrf0: 56(84) bytes of data.
64 bytes from 1.1.1.254: icmp_seq=1 ttl=64 time=0.215 ms
...

The reason is because the datagram connect function fails to look up the 
outbound route in the vrf and falls back to the main table. (As an aside 
the fallback to other tables is something that should not be happening 
for VRFs; you want to use the table specific to the VRF.)

The route lookup fails because it passes in oif = vrf device (this VRF 
design relies on bind to device which sets oif in the flow). That is 
good for selecting the table to use for the lookups, but not good for 
selecting the route within the table.

This is one way to fix the connect problem:

  }


which essentially tells fib_table_lookup to drop the OIF comparison 
after selecting the table per this change made in the patch Shrijeet posted:

                         if (!(flp->flowi4_flags & FLOWI_FLAG_VRFSRC)) {
                                 if (flp->flowi4_oif &&
                                     flp->flowi4_oif != nh->nh_oif)
                                         continue;
                         }

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/route.h b/include/net/route.h
index fe22d03afb6a..a18798caec25 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -245,11 +245,18 @@  static inline void ip_route_connect_init(struct 
flowi4 *fl4, __be32 dst, __be32
                      __be16 sport, __be16 dport,
                      struct sock *sk)
  {
+   struct net_device *dev = dev_get_by_index(sock_net(sk), oif);
     __u8 flow_flags = 0;

     if (inet_sk(sk)->transparent)
         flow_flags |= FLOWI_FLAG_ANYSRC;

+   if (dev) {
+       if (netif_is_vrf(dev))
+           flow_flags |= FLOWI_FLAG_VRFSRC;
+       dev_put(dev);
+   }
+
     flowi4_init_output(fl4, oif, sk->sk_mark, tos, RT_SCOPE_UNIVERSE,
                protocol, flow_flags, dst, src, dport, sport);