mbox series

[ovs-dev,0/3] Add VxLAN encap support for tc offload.

Message ID 1594224636-42337-1-git-send-email-u9012063@gmail.com
Headers show
Series Add VxLAN encap support for tc offload. | expand

Message

William Tu July 8, 2020, 4:10 p.m. UTC
The patch adds VxLAN encap tc-offload support.  The userspace datapath, dpif-netdev,
flow format differs than the kernel datapath in case of tunnel encap.  Unlike kernel,
the dpif-netdev does not use set and output action, but uses a single clone action with
all the tunnel info nested inside.  As an exmaple blow:
actions:clone(tnl_push(tnl_port(5),
  header(size=50,type=4,eth(dst=06:1d:6e:a3:f1:61,src=26:df:25:f6:7b:4f,dl_type=0x0800),
    ipv4(src=172.31.1.100,dst=172.31.1.1,proto=17,tos=0,ttl=64,frag=0x4000),
    udp(src=0,dst=4789,csum=0x0),
    vxlan(flags=0x8000000,vni=0x0)),out_port(2)
  ), 3)

The patch parses the above tunnel encap format and passes to the tc for
offloading the VxLAN tunnel. The idea is similar to the recent dpdk
offload patchset:
  netdev-offload-dpdk: Support offload of clone tnl_push/output actions

Example of tc format:
$ tc -s filter show dev ovs-p1 ingress
filter protocol ip pref 3 flower chain 0
filter protocol ip pref 3 flower chain 0 handle 0x1
  dst_mac 56:2a:1f:3c:bb:f2
  src_mac 96:0c:a7:b0:60:a4
  eth_type ipv4
  ip_tos 0/0x3
  ip_flags nofrag
  skip_hw
  not_in_hw
    action order 1: tunnel_key  set
    src_ip 172.31.1.100
    dst_ip 172.31.1.1
    key_id 0
    dst_port 4789
    nocsum
    ttl 64 pipe
     index 2 ref 1 bind 1 installed 0 sec used 0 sec
    Action statistics:
    Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
    no_percpu

    action order 2: mirred (Egress Redirect to device ovs-p0) stolen
    index 2 ref 1 bind 1 installed 0 sec used 0 sec
    Action statistics:
    Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
    cookie b46e99079448ce581d0fe7a9853c0bb5
    no_percpu

Ilya Maximets (2):
  netdev: Allow storing dpif type into netdev structure.
  netdev-offload: Use dpif type instead of class.

William Tu (1):
  netdev-offload-tc: Add VxLAN encap support.

 lib/dpif-netdev.c             |  15 ++---
 lib/dpif-netlink.c            |  23 ++++----
 lib/dpif.c                    |  21 ++++---
 lib/netdev-offload-dpdk.c     |  17 +++---
 lib/netdev-offload-tc.c       | 124 +++++++++++++++++++++++++++++++++++++++++-
 lib/netdev-offload.c          |  52 +++++++++---------
 lib/netdev-offload.h          |  16 +++---
 lib/netdev-provider.h         |   3 +-
 lib/netdev.c                  |  16 ++++++
 lib/netdev.h                  |   2 +
 ofproto/ofproto-dpif-upcall.c |   5 +-
 11 files changed, 217 insertions(+), 77 deletions(-)

Comments

Ilya Maximets July 8, 2020, 5:55 p.m. UTC | #1
On 7/8/20 6:10 PM, William Tu wrote:
> The patch adds VxLAN encap tc-offload support.  The userspace datapath, dpif-netdev,
> flow format differs than the kernel datapath in case of tunnel encap.  Unlike kernel,
> the dpif-netdev does not use set and output action, but uses a single clone action with
> all the tunnel info nested inside.  As an exmaple blow:
> actions:clone(tnl_push(tnl_port(5),
>   header(size=50,type=4,eth(dst=06:1d:6e:a3:f1:61,src=26:df:25:f6:7b:4f,dl_type=0x0800),
>     ipv4(src=172.31.1.100,dst=172.31.1.1,proto=17,tos=0,ttl=64,frag=0x4000),
>     udp(src=0,dst=4789,csum=0x0),
>     vxlan(flags=0x8000000,vni=0x0)),out_port(2)
>   ), 3)
> 
> The patch parses the above tunnel encap format and passes to the tc for
> offloading the VxLAN tunnel. The idea is similar to the recent dpdk
> offload patchset:
>   netdev-offload-dpdk: Support offload of clone tnl_push/output actions
> 
> Example of tc format:
> $ tc -s filter show dev ovs-p1 ingress
> filter protocol ip pref 3 flower chain 0
> filter protocol ip pref 3 flower chain 0 handle 0x1
>   dst_mac 56:2a:1f:3c:bb:f2
>   src_mac 96:0c:a7:b0:60:a4
>   eth_type ipv4
>   ip_tos 0/0x3
>   ip_flags nofrag
>   skip_hw
>   not_in_hw
>     action order 1: tunnel_key  set
>     src_ip 172.31.1.100
>     dst_ip 172.31.1.1
>     key_id 0
>     dst_port 4789
>     nocsum
>     ttl 64 pipe
>      index 2 ref 1 bind 1 installed 0 sec used 0 sec
>     Action statistics:
>     Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
>     backlog 0b 0p requeues 0
>     no_percpu
> 
>     action order 2: mirred (Egress Redirect to device ovs-p0) stolen
>     index 2 ref 1 bind 1 installed 0 sec used 0 sec
>     Action statistics:
>     Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
>     backlog 0b 0p requeues 0
>     cookie b46e99079448ce581d0fe7a9853c0bb5
>     no_percpu
> 
> Ilya Maximets (2):
>   netdev: Allow storing dpif type into netdev structure.
>   netdev-offload: Use dpif type instead of class.
> 
> William Tu (1):
>   netdev-offload-tc: Add VxLAN encap support.
> 
>  lib/dpif-netdev.c             |  15 ++---
>  lib/dpif-netlink.c            |  23 ++++----
>  lib/dpif.c                    |  21 ++++---
>  lib/netdev-offload-dpdk.c     |  17 +++---
>  lib/netdev-offload-tc.c       | 124 +++++++++++++++++++++++++++++++++++++++++-
>  lib/netdev-offload.c          |  52 +++++++++---------
>  lib/netdev-offload.h          |  16 +++---
>  lib/netdev-provider.h         |   3 +-
>  lib/netdev.c                  |  16 ++++++
>  lib/netdev.h                  |   2 +
>  ofproto/ofproto-dpif-upcall.c |   5 +-
>  11 files changed, 217 insertions(+), 77 deletions(-)
> 

Hi.

That is interesting thing.  I didn't look to the code, but I have a question.
IIUC, you're running userspace datapath with some linux ports and linux_tc
offloading provider enabled for them.  I tried this combination previously
and it has an issue that having a RAW socket open, even if packet was redirected
by TC to another OVS port, we will still receive it via RAW socket at least
on the destination port.  I'm not sure how to work around this issue.
Do you have any thoughts?

Or you're using HW offloading with afxdp or/and skip_sw flag?  I guess, there
should be no such issue in this case if packet never reaches the kernel tc.

BTW, I merged the patch-set from Eli, so first two patches are in repository
now.

Best regards, Ilya Maximets.
William Tu July 8, 2020, 7:07 p.m. UTC | #2
On Wed, Jul 08, 2020 at 07:55:58PM +0200, Ilya Maximets wrote:
> On 7/8/20 6:10 PM, William Tu wrote:
> > The patch adds VxLAN encap tc-offload support.  The userspace datapath, dpif-netdev,
> > flow format differs than the kernel datapath in case of tunnel encap.  Unlike kernel,
> > the dpif-netdev does not use set and output action, but uses a single clone action with
> > all the tunnel info nested inside.  As an exmaple blow:
> > actions:clone(tnl_push(tnl_port(5),
> >   header(size=50,type=4,eth(dst=06:1d:6e:a3:f1:61,src=26:df:25:f6:7b:4f,dl_type=0x0800),
> >     ipv4(src=172.31.1.100,dst=172.31.1.1,proto=17,tos=0,ttl=64,frag=0x4000),
> >     udp(src=0,dst=4789,csum=0x0),
> >     vxlan(flags=0x8000000,vni=0x0)),out_port(2)
> >   ), 3)
> > 
> > The patch parses the above tunnel encap format and passes to the tc for
> > offloading the VxLAN tunnel. The idea is similar to the recent dpdk
> > offload patchset:
> >   netdev-offload-dpdk: Support offload of clone tnl_push/output actions
snip

> 
> Hi.
> 
> That is interesting thing.  I didn't look to the code, but I have a question.
> IIUC, you're running userspace datapath with some linux ports and linux_tc
> offloading provider enabled for them.  I tried this combination previously
> and it has an issue that having a RAW socket open, even if packet was redirected
> by TC to another OVS port, we will still receive it via RAW socket at least
> on the destination port.  I'm not sure how to work around this issue.
> Do you have any thoughts?

Yes, I encountered the same issue.
IIUC, the reason is when registering a raw socket, at kernel __netif_receive_skb_core(),
the packet is delivered to raw socket first, then calls the sch_handle_ingress().
So even though at TC layer we return TC_ACT_SHOT, the packet is already delivered
to raw socket and seen by OVS. And this causes my ping test reporting

64 bytes from 10.1.1.2: icmp_seq=7 ttl=64 time=0.503 ms (DUP!)
64 bytes from 10.1.1.2: icmp_seq=7 ttl=64 time=0.508 ms (DUP!)

Even using afxdp socket has the same issue, because the skb deliver point, 
do_xdp_generic() is also before tc. 

> 
> Or you're using HW offloading with afxdp or/and skip_sw flag?  I guess, there
> should be no such issue in this case if packet never reaches the kernel tc.
> 

I don't have a solution to this problem, but my plan is that
So for testing, I'm using the software tc-flower, skip_hw.
And once everything works, we should use HW offload (skip_sw) with afxdp.

> BTW, I merged the patch-set from Eli, so first two patches are in repository
> now.
> 

Thanks for your comment, I will work on v2.
William