mbox series

[bpf-next,0/5] AF_XDP: bug fixes and descriptor changes

Message ID 20180604115715.17895-1-bjorn.topel@gmail.com
Headers show
Series AF_XDP: bug fixes and descriptor changes | expand

Message

Björn Töpel June 4, 2018, 11:57 a.m. UTC
From: Björn Töpel <bjorn.topel@intel.com>

An issue with the current AF_XDP uapi raised by Mykyta Iziumtsev (see
https://www.spinics.net/lists/netdev/msg503664.html) is that it does
not support NICs that have a "type-writer" model in an efficient
way. In this model, a memory window is passed to the hardware and
multiple frames might be filled into that window, instead of just one
that we have in the current fixed frame-size model.

This patch set fixes two bugs in the current implementation and then
changes the uapi so that the type-writer model can be supported
efficiently by a possible future extension of AF_XDP.

These are the uapi changes in this patch:

* Change the "u32 idx" in the descriptors to "u64 addr". The current
  idx based format does NOT work for the type-writer model (as packets
  can start anywhere within a frame) but that a relative address
  pointer (the u64 addr) works well for both models in the prototype
  code we have that supports both models. We increased it from u32 to
  u64 to support umems larger than 4G. We have also removed the u16
  offset when having a "u64 addr" since that information is already
  carried in the least significant bits of the address.

* We want to use "u8 padding[5]" for something useful in the future
  (since we are not allowed to change its name), so we now call it
  just options so it can be extended for various purposes in the
  future. It is an u32 as that it what is left of the 16 byte
  descriptor.

* We changed the name of frame_size in the UMEM_REG setsockopt to
  chunk_size since this naming also makes sense to the type-writer
  model.

With these changes to the uapi, we believe the type-writer model can
be supported without having to resort to a new descriptor format. The
type-writer model could then be supported, from the uapi point of
view, by setting a flag at bind time and providing a new flag bit in
the options field of the descriptor that signals to user space that
all packets have been written in a chunk. Or with a new chunk
completion queue as suggested by Mykyta in his latest feedback mail on
the list.

We based this patch set on bpf-next commit bd3a08aaa9a3 ("bpf:
flowlabel in bpf_fib_lookup should be flowinfo")

The structure of the patch set is as follows:

Patches 1-2: Fixes two bugs in the current implementation.
Patches 3-4: Prepares the uapi for a "type-writer" model and modifies
             the sample application so that it works with the new
	     uapi.
Patch 5: Small performance improvement patch for the sample application.

Cheers: Magnus and Björn

Björn Töpel (4):
  xsk: proper fill queue descriptor validation
  xsk: proper Rx drop statistics update
  xsk: new descriptor addressing scheme
  samples/bpf: adapted to new uapi

Magnus Karlsson (1):
  samples/bpf: minor *_nb_free performance fix

 Documentation/networking/af_xdp.rst | 101 +++++++++++++++++++++---------------
 include/uapi/linux/if_xdp.h         |  12 ++---
 net/xdp/xdp_umem.c                  |  33 ++++++------
 net/xdp/xdp_umem.h                  |  27 +++-------
 net/xdp/xdp_umem_props.h            |   4 +-
 net/xdp/xsk.c                       |  41 ++++++++-------
 net/xdp/xsk_queue.c                 |   2 +-
 net/xdp/xsk_queue.h                 |  63 ++++++++--------------
 samples/bpf/xdpsock_user.c          |  92 +++++++++++++++-----------------
 9 files changed, 172 insertions(+), 203 deletions(-)

Comments

Alexei Starovoitov June 4, 2018, 4:24 p.m. UTC | #1
On Mon, Jun 04, 2018 at 01:57:10PM +0200, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> An issue with the current AF_XDP uapi raised by Mykyta Iziumtsev (see
> https://www.spinics.net/lists/netdev/msg503664.html) is that it does
> not support NICs that have a "type-writer" model in an efficient
> way. In this model, a memory window is passed to the hardware and
> multiple frames might be filled into that window, instead of just one
> that we have in the current fixed frame-size model.
> 
> This patch set fixes two bugs in the current implementation and then
> changes the uapi so that the type-writer model can be supported
> efficiently by a possible future extension of AF_XDP.
> 
> These are the uapi changes in this patch:
> 
> * Change the "u32 idx" in the descriptors to "u64 addr". The current
>   idx based format does NOT work for the type-writer model (as packets
>   can start anywhere within a frame) but that a relative address
>   pointer (the u64 addr) works well for both models in the prototype
>   code we have that supports both models. We increased it from u32 to
>   u64 to support umems larger than 4G. We have also removed the u16
>   offset when having a "u64 addr" since that information is already
>   carried in the least significant bits of the address.
> 
> * We want to use "u8 padding[5]" for something useful in the future
>   (since we are not allowed to change its name), so we now call it
>   just options so it can be extended for various purposes in the
>   future. It is an u32 as that it what is left of the 16 byte
>   descriptor.
> 
> * We changed the name of frame_size in the UMEM_REG setsockopt to
>   chunk_size since this naming also makes sense to the type-writer
>   model.
> 
> With these changes to the uapi, we believe the type-writer model can
> be supported without having to resort to a new descriptor format. The
> type-writer model could then be supported, from the uapi point of
> view, by setting a flag at bind time and providing a new flag bit in
> the options field of the descriptor that signals to user space that
> all packets have been written in a chunk. Or with a new chunk
> completion queue as suggested by Mykyta in his latest feedback mail on
> the list.

for the set:
Acked-by: Alexei Starovoitov <ast@kernel.org>
Thank you for these fixes.
According to unofficial feedback from brcm and netronome folks
the descriptor format should work for these nics too.
At some point we may consider second format, but I think SW
should drive HW requirements and not the other way around.
Daniel Borkmann June 4, 2018, 7:51 p.m. UTC | #2
On 06/04/2018 06:24 PM, Alexei Starovoitov wrote:
> On Mon, Jun 04, 2018 at 01:57:10PM +0200, Björn Töpel wrote:
>> From: Björn Töpel <bjorn.topel@intel.com>
>>
>> An issue with the current AF_XDP uapi raised by Mykyta Iziumtsev (see
>> https://www.spinics.net/lists/netdev/msg503664.html) is that it does
>> not support NICs that have a "type-writer" model in an efficient
>> way. In this model, a memory window is passed to the hardware and
>> multiple frames might be filled into that window, instead of just one
>> that we have in the current fixed frame-size model.
>>
>> This patch set fixes two bugs in the current implementation and then
>> changes the uapi so that the type-writer model can be supported
>> efficiently by a possible future extension of AF_XDP.
>>
>> These are the uapi changes in this patch:
>>
>> * Change the "u32 idx" in the descriptors to "u64 addr". The current
>>   idx based format does NOT work for the type-writer model (as packets
>>   can start anywhere within a frame) but that a relative address
>>   pointer (the u64 addr) works well for both models in the prototype
>>   code we have that supports both models. We increased it from u32 to
>>   u64 to support umems larger than 4G. We have also removed the u16
>>   offset when having a "u64 addr" since that information is already
>>   carried in the least significant bits of the address.
>>
>> * We want to use "u8 padding[5]" for something useful in the future
>>   (since we are not allowed to change its name), so we now call it
>>   just options so it can be extended for various purposes in the
>>   future. It is an u32 as that it what is left of the 16 byte
>>   descriptor.
>>
>> * We changed the name of frame_size in the UMEM_REG setsockopt to
>>   chunk_size since this naming also makes sense to the type-writer
>>   model.
>>
>> With these changes to the uapi, we believe the type-writer model can
>> be supported without having to resort to a new descriptor format. The
>> type-writer model could then be supported, from the uapi point of
>> view, by setting a flag at bind time and providing a new flag bit in
>> the options field of the descriptor that signals to user space that
>> all packets have been written in a chunk. Or with a new chunk
>> completion queue as suggested by Mykyta in his latest feedback mail on
>> the list.
> 
> for the set:
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> Thank you for these fixes.
> According to unofficial feedback from brcm and netronome folks
> the descriptor format should work for these nics too.
> At some point we may consider second format, but I think SW
> should drive HW requirements and not the other way around.

LGTM as well, applied to bpf-next, thanks!