mbox series

[00/11] XDP unaligned chunk placement support

Message ID 20190620090958.2135-1-kevin.laatz@intel.com
Headers show
Series XDP unaligned chunk placement support | expand

Message

Laatz, Kevin June 20, 2019, 9:09 a.m. UTC
This patchset adds the ability to use unaligned chunks in the XDP umem.

Currently, all chunk addresses passed to the umem are masked to be chunk
size aligned (default is 2k, max is PAGE_SIZE). This limits where we can
place chunks within the umem as well as limiting the packet sizes that are
supported.

The changes in this patchset removes these restrictions, allowing XDP to be
more flexible in where it can place a chunk within a umem. By relaxing where
the chunks can be placed, it allows us to use an arbitrary buffer size and
place that wherever we have a free address in the umem. These changes add the
ability to support jumboframes and make it easy to integrate with other
existing frameworks that have their own memory management systems, such as
DPDK.

Structure of the patchset:
Patch 1:
  - Remove unnecessary masking and headroom addition during zero-copy Rx
    buffer recycling in i40e. This change is required in order for the
    buffer recycling to work in the unaligned chunk mode.

Patch 2:
  - Remove unnecessary masking and headroom addition during
    zero-copy Rx buffer recycling in ixgbe. This change is required in
    order for the  buffer recycling to work in the unaligned chunk mode.

Patch 3:
  - Adds an offset parameter to zero_copy_allocator. This change will
    enable us to calculate the original handle in zca_free. This will be
    required for unaligned chunk mode since we can't easily mask back to
    the original handle.

Patch 4:
  - Adds the offset parameter to i40e_zca_free. This change is needed for
    calculating the handle since we can't easily mask back to the original
    handle like we can in the aligned case.

Patch 5:
  - Adds the offset parameter to ixgbe_zca_free. This change is needed for
    calculating the handle since we can't easily mask back to the original
    handle like we can in the aligned case.


Patch 6:
  - Add infrastructure for unaligned chunks. Since we are dealing
    with unaligned chunks that could potentially cross a physical page
    boundary, we add checks to keep track of that information. We can
    later use this information to correctly handle buffers that are
    placed at an address where they cross a page boundary.

Patch 7:
  - Add flags for umem configuration to libbpf

Patch 8:
  - Modify xdpsock application to add a command line option for
    unaligned chunks

Patch 9:
  - Addition of command line argument to pass in a desired buffer size
    and buffer recycling for unaligned mode. Passing in a buffer size will
    allow the application to use unaligned chunks with the unaligned chunk
    mode. Since we are now using unaligned chunks, we need to recycle our
    buffers in a slightly different way.

Patch 10:
  - Adds hugepage support to the xdpsock application

Patch 11:
  - Documentation update to include the unaligned chunk scenario. We need
    to explicitly state that the incoming addresses are only masked in the
    aligned chunk mode and not the unaligned chunk mode.

Kevin Laatz (11):
  i40e: simplify Rx buffer recycle
  ixgbe: simplify Rx buffer recycle
  xdp: add offset param to zero_copy_allocator
  i40e: add offset to zca_free
  ixgbe: add offset to zca_free
  xsk: add support to allow unaligned chunk placement
  libbpf: add flags to umem config
  samples/bpf: add unaligned chunks mode support to xdpsock
  samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  samples/bpf: use hugepages in xdpsock app
  doc/af_xdp: include unaligned chunk case

 Documentation/networking/af_xdp.rst           | 10 +-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 21 ++--
 drivers/net/ethernet/intel/i40e/i40e_xsk.h    |  3 +-
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  3 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 21 ++--
 include/net/xdp.h                             |  3 +-
 include/net/xdp_sock.h                        |  2 +
 include/uapi/linux/if_xdp.h                   |  4 +
 net/core/xdp.c                                | 11 ++-
 net/xdp/xdp_umem.c                            | 17 ++--
 net/xdp/xsk.c                                 | 60 +++++++++--
 net/xdp/xsk_queue.h                           | 60 +++++++++--
 samples/bpf/xdpsock_user.c                    | 99 ++++++++++++++-----
 tools/include/uapi/linux/if_xdp.h             |  4 +
 tools/lib/bpf/xsk.c                           |  7 ++
 tools/lib/bpf/xsk.h                           |  2 +
 16 files changed, 241 insertions(+), 86 deletions(-)

Comments

Laatz, Kevin July 16, 2019, 3:06 a.m. UTC | #1
This patch set adds the ability to use unaligned chunks in the XDP umem.

Currently, all chunk addresses passed to the umem are masked to be chunk
size aligned (default is 2k, max is PAGE_SIZE). This limits where we can
place chunks within the umem as well as limiting the packet sizes that are
supported.

The changes in this patch set removes these restrictions, allowing XDP to
be more flexible in where it can place a chunk within a umem. By relaxing
where the chunks can be placed, it allows us to use an arbitrary buffer
size and place that wherever we have a free address in the umem. These
changes add the ability to support arbitrary frame sizes up to 4k
(PAGE_SIZE) and make it easy to integrate with other existing frameworks
that have their own memory management systems, such as DPDK.

Since we are now dealing with arbitrary frame sizes, we need also need to
update how we pass around addresses. Currently, the addresses can simply be
masked to 2k to get back to the original address. This becomes less trivial
when using frame sizes that are not a 'power of 2' size. This patch set
modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
field for an offset value, leaving the lower 48-bits for the address (this
leaves us with 256 Terabytes, which should be enough!). We only need to use
the upper 16-bits to store the offset when running in unaligned mode.
Rather than adding the offset (headroom etc) to the address, we will store
it in the upper 16-bits of the address field. This way, we can easily add
the offset to the address where we need it, using some bit manipulation and
addition, and we can also easily get the original address wherever we need
it (for example in i40e_zca_free) by simply masking to get the lower
48-bits of the address field.

The numbers below were recorded with the following set up:
  - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
  - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
  - Driver: i40e
  - Application: xdpsock with l2fwd (single interface)

These are solely for comparing performance with and without the patches.
The largest drop was ~1% (in zero-copy mode).

+-------------------------+------------+-----------------+-------------+
| Buffer size: 2048       | SKB mode   | Zero-copy       | Copy        |
+-------------------------+------------+-----------------+-------------+
| Aligned (baseline)      | 1.7 Mpps   | 15.3 Mpps       | 2.08 Mpps   |
+-------------------------+------------+-----------------+-------------+
| Aligned (with patches)  | 1.7 Mpps   | 15.1 Mpps       | 2.08 Mpps   |
+-------------------------+------------+-----------------+-------------+
| Unaligned               | 1.7 Mpps   | 14.5 Mpps       | 2.08 Mpps   |
+-------------------------+------------+-----------------+-------------+ 

NOTE: We are currently working on the changes required in the Mellanox
driver. We will include these in the v3.

Structure of the patchset:
Patch 1:
  - Remove unnecessary masking and headroom addition during zero-copy Rx
    buffer recycling in i40e. This change is required in order for the
    buffer recycling to work in the unaligned chunk mode.

Patch 2:
  - Remove unnecessary masking and headroom addition during
    zero-copy Rx buffer recycling in ixgbe. This change is required in
    order for the  buffer recycling to work in the unaligned chunk mode.

Patch 3:
  - Add infrastructure for unaligned chunks. Since we are dealing with
    unaligned chunks that could potentially cross a physical page boundary,
    we add checks to keep track of that information. We can later use this
    information to correctly handle buffers that are placed at an address
    where they cross a page boundary.  This patch also modifies the
    existing Rx and Tx functions to use the new descriptor format. To
    handle addresses correctly, we need to mask appropriately based on
    whether we are in aligned or unaligned mode.

Patch 4:
  - This patch updates the i40e driver to make use of the new descriptor
    format. The new format is particularly useful here since we can now
    retrieve the original address in places like i40e_zca_free with ease.
    This saves us doing various calculations to get the original address
    back.

Patch 5:
  - This patch updates the ixgbe driver to make use of the new descriptor
    format. The new format is particularly useful here since we can now
    retrieve the original address in places like ixgbe_zca_free with ease.
    This saves us doing various calculations to get the original address
    back.

Patch 6:
  - Add flags for umem configuration to libbpf

Patch 7:
  - Modify xdpsock application to add a command line option for
    unaligned chunks

Patch 8:
  - Since we can now run the application in unaligned chunk mode, we need
    to make sure we recycle the buffers appropriately.

Patch 9:
  - Adds hugepage support to the xdpsock application

Patch 10:
  - Documentation update to include the unaligned chunk scenario. We need
    to explicitly state that the incoming addresses are only masked in the
    aligned chunk mode and not the unaligned chunk mode.

---
v2:
  - fixed checkpatch issues
  - fixed Rx buffer recycling for unaligned chunks in xdpsock
  - removed unused defines
  - fixed how chunk_size is calculated in xsk_diag.c
  - added some performance numbers to cover letter
  - modified descriptor format to make it easier to retrieve original
    address
  - removed patch adding off_t off to the zero copy allocator. This is no
    longer needed with the new descriptor format.

Kevin Laatz (10):
  i40e: simplify Rx buffer recycle
  ixgbe: simplify Rx buffer recycle
  xsk: add support to allow unaligned chunk placement
  i40e: modify driver for handling offsets
  ixgbe: modify driver for handling offsets
  libbpf: add flags to umem config
  samples/bpf: add unaligned chunks mode support to xdpsock
  samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  samples/bpf: use hugepages in xdpsock app
  doc/af_xdp: include unaligned chunk case

 Documentation/networking/af_xdp.rst          | 10 ++-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c   | 39 +++++----
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 39 +++++----
 include/net/xdp_sock.h                       |  2 +
 include/uapi/linux/if_xdp.h                  |  9 ++
 net/xdp/xdp_umem.c                           | 17 ++--
 net/xdp/xsk.c                                | 89 ++++++++++++++++----
 net/xdp/xsk_diag.c                           |  2 +-
 net/xdp/xsk_queue.h                          | 70 +++++++++++++--
 samples/bpf/xdpsock_user.c                   | 61 ++++++++++----
 tools/include/uapi/linux/if_xdp.h            |  4 +
 tools/lib/bpf/xsk.c                          |  3 +
 tools/lib/bpf/xsk.h                          |  2 +
 13 files changed, 266 insertions(+), 81 deletions(-)
Alexei Starovoitov July 23, 2019, 9:08 p.m. UTC | #2
Johnathan, Bjorn, Jakub,
Please review!
The patch set has been pending for a week.

On Tue, Jul 16, 2019 at 4:21 AM Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> This patch set adds the ability to use unaligned chunks in the XDP umem.
>
> Currently, all chunk addresses passed to the umem are masked to be chunk
> size aligned (default is 2k, max is PAGE_SIZE). This limits where we can
> place chunks within the umem as well as limiting the packet sizes that are
> supported.
>
> The changes in this patch set removes these restrictions, allowing XDP to
> be more flexible in where it can place a chunk within a umem. By relaxing
> where the chunks can be placed, it allows us to use an arbitrary buffer
> size and place that wherever we have a free address in the umem. These
> changes add the ability to support arbitrary frame sizes up to 4k
> (PAGE_SIZE) and make it easy to integrate with other existing frameworks
> that have their own memory management systems, such as DPDK.
>
> Since we are now dealing with arbitrary frame sizes, we need also need to
> update how we pass around addresses. Currently, the addresses can simply be
> masked to 2k to get back to the original address. This becomes less trivial
> when using frame sizes that are not a 'power of 2' size. This patch set
> modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
> field for an offset value, leaving the lower 48-bits for the address (this
> leaves us with 256 Terabytes, which should be enough!). We only need to use
> the upper 16-bits to store the offset when running in unaligned mode.
> Rather than adding the offset (headroom etc) to the address, we will store
> it in the upper 16-bits of the address field. This way, we can easily add
> the offset to the address where we need it, using some bit manipulation and
> addition, and we can also easily get the original address wherever we need
> it (for example in i40e_zca_free) by simply masking to get the lower
> 48-bits of the address field.
>
> The numbers below were recorded with the following set up:
>   - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
>   - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
>   - Driver: i40e
>   - Application: xdpsock with l2fwd (single interface)
>
> These are solely for comparing performance with and without the patches.
> The largest drop was ~1% (in zero-copy mode).
>
> +-------------------------+------------+-----------------+-------------+
> | Buffer size: 2048       | SKB mode   | Zero-copy       | Copy        |
> +-------------------------+------------+-----------------+-------------+
> | Aligned (baseline)      | 1.7 Mpps   | 15.3 Mpps       | 2.08 Mpps   |
> +-------------------------+------------+-----------------+-------------+
> | Aligned (with patches)  | 1.7 Mpps   | 15.1 Mpps       | 2.08 Mpps   |
> +-------------------------+------------+-----------------+-------------+
> | Unaligned               | 1.7 Mpps   | 14.5 Mpps       | 2.08 Mpps   |
> +-------------------------+------------+-----------------+-------------+
>
> NOTE: We are currently working on the changes required in the Mellanox
> driver. We will include these in the v3.
>
> Structure of the patchset:
> Patch 1:
>   - Remove unnecessary masking and headroom addition during zero-copy Rx
>     buffer recycling in i40e. This change is required in order for the
>     buffer recycling to work in the unaligned chunk mode.
>
> Patch 2:
>   - Remove unnecessary masking and headroom addition during
>     zero-copy Rx buffer recycling in ixgbe. This change is required in
>     order for the  buffer recycling to work in the unaligned chunk mode.
>
> Patch 3:
>   - Add infrastructure for unaligned chunks. Since we are dealing with
>     unaligned chunks that could potentially cross a physical page boundary,
>     we add checks to keep track of that information. We can later use this
>     information to correctly handle buffers that are placed at an address
>     where they cross a page boundary.  This patch also modifies the
>     existing Rx and Tx functions to use the new descriptor format. To
>     handle addresses correctly, we need to mask appropriately based on
>     whether we are in aligned or unaligned mode.
>
> Patch 4:
>   - This patch updates the i40e driver to make use of the new descriptor
>     format. The new format is particularly useful here since we can now
>     retrieve the original address in places like i40e_zca_free with ease.
>     This saves us doing various calculations to get the original address
>     back.
>
> Patch 5:
>   - This patch updates the ixgbe driver to make use of the new descriptor
>     format. The new format is particularly useful here since we can now
>     retrieve the original address in places like ixgbe_zca_free with ease.
>     This saves us doing various calculations to get the original address
>     back.
>
> Patch 6:
>   - Add flags for umem configuration to libbpf
>
> Patch 7:
>   - Modify xdpsock application to add a command line option for
>     unaligned chunks
>
> Patch 8:
>   - Since we can now run the application in unaligned chunk mode, we need
>     to make sure we recycle the buffers appropriately.
>
> Patch 9:
>   - Adds hugepage support to the xdpsock application
>
> Patch 10:
>   - Documentation update to include the unaligned chunk scenario. We need
>     to explicitly state that the incoming addresses are only masked in the
>     aligned chunk mode and not the unaligned chunk mode.
>
> ---
> v2:
>   - fixed checkpatch issues
>   - fixed Rx buffer recycling for unaligned chunks in xdpsock
>   - removed unused defines
>   - fixed how chunk_size is calculated in xsk_diag.c
>   - added some performance numbers to cover letter
>   - modified descriptor format to make it easier to retrieve original
>     address
>   - removed patch adding off_t off to the zero copy allocator. This is no
>     longer needed with the new descriptor format.
>
> Kevin Laatz (10):
>   i40e: simplify Rx buffer recycle
>   ixgbe: simplify Rx buffer recycle
>   xsk: add support to allow unaligned chunk placement
>   i40e: modify driver for handling offsets
>   ixgbe: modify driver for handling offsets
>   libbpf: add flags to umem config
>   samples/bpf: add unaligned chunks mode support to xdpsock
>   samples/bpf: add buffer recycling for unaligned chunks to xdpsock
>   samples/bpf: use hugepages in xdpsock app
>   doc/af_xdp: include unaligned chunk case
>
>  Documentation/networking/af_xdp.rst          | 10 ++-
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c   | 39 +++++----
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 39 +++++----
>  include/net/xdp_sock.h                       |  2 +
>  include/uapi/linux/if_xdp.h                  |  9 ++
>  net/xdp/xdp_umem.c                           | 17 ++--
>  net/xdp/xsk.c                                | 89 ++++++++++++++++----
>  net/xdp/xsk_diag.c                           |  2 +-
>  net/xdp/xsk_queue.h                          | 70 +++++++++++++--
>  samples/bpf/xdpsock_user.c                   | 61 ++++++++++----
>  tools/include/uapi/linux/if_xdp.h            |  4 +
>  tools/lib/bpf/xsk.c                          |  3 +
>  tools/lib/bpf/xsk.h                          |  2 +
>  13 files changed, 266 insertions(+), 81 deletions(-)
>
> --
> 2.17.1
>
Laatz, Kevin July 24, 2019, 5:10 a.m. UTC | #3
This patch set adds the ability to use unaligned chunks in the XDP umem.

Currently, all chunk addresses passed to the umem are masked to be chunk
size aligned (max is PAGE_SIZE). This limits where we can place chunks
within the umem as well as limiting the packet sizes that are supported.

The changes in this patch set removes these restrictions, allowing XDP to
be more flexible in where it can place a chunk within a umem. By relaxing
where the chunks can be placed, it allows us to use an arbitrary buffer
size and place that wherever we have a free address in the umem. These
changes add the ability to support arbitrary frame sizes up to 4k
(PAGE_SIZE) and make it easy to integrate with other existing frameworks
that have their own memory management systems, such as DPDK.
In DPDK, for example, there is already support for AF_XDP with zero-copy.
However, with this patch set the integration will be much more seamless.
You can find the DPDK AF_XDP driver at:
https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp

Since we are now dealing with arbitrary frame sizes, we need also need to
update how we pass around addresses. Currently, the addresses can simply be
masked to 2k to get back to the original address. This becomes less trivial
when using frame sizes that are not a 'power of 2' size. This patch set
modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
field for an offset value, leaving the lower 48-bits for the address (this
leaves us with 256 Terabytes, which should be enough!). We only need to use
the upper 16-bits to store the offset when running in unaligned mode.
Rather than adding the offset (headroom etc) to the address, we will store
it in the upper 16-bits of the address field. This way, we can easily add
the offset to the address where we need it, using some bit manipulation and
addition, and we can also easily get the original address wherever we need
it (for example in i40e_zca_free) by simply masking to get the lower
48-bits of the address field.

The numbers below were recorded with the following set up:
  - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
  - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
  - Driver: i40e
  - Application: xdpsock with l2fwd (single interface)

These are solely for comparing performance with and without the patches.
The largest drop was ~1% (in zero-copy mode).

+-------------------------+------------+-----------------+-------------+
| Buffer size: 2048       | SKB mode   | Zero-copy       | Copy        |
+-------------------------+------------+-----------------+-------------+
| Aligned (baseline)      | 1.7 Mpps   | 15.3 Mpps       | 2.08 Mpps   |
+-------------------------+------------+-----------------+-------------+
| Aligned (with patches)  | 1.7 Mpps   | 15.1 Mpps       | 2.08 Mpps   |
+-------------------------+------------+-----------------+-------------+
| Unaligned               | 1.7 Mpps   | 14.5 Mpps       | 2.08 Mpps   |
+-------------------------+------------+-----------------+-------------+

This patch set has been applied against commit 66b5f1c43984
("net-ipv6-ndisc: add support for RFC7710 RA Captive Portal Identifier")

Structure of the patch set:
Patch 1:
  - Remove unnecessary masking and headroom addition during zero-copy Rx
    buffer recycling in i40e. This change is required in order for the
    buffer recycling to work in the unaligned chunk mode.

Patch 2:
  - Remove unnecessary masking and headroom addition during
    zero-copy Rx buffer recycling in ixgbe. This change is required in
    order for the  buffer recycling to work in the unaligned chunk mode.

Patch 3:
  - Add infrastructure for unaligned chunks. Since we are dealing with
    unaligned chunks that could potentially cross a physical page boundary,
    we add checks to keep track of that information. We can later use this
    information to correctly handle buffers that are placed at an address
    where they cross a page boundary.  This patch also modifies the
    existing Rx and Tx functions to use the new descriptor format. To
    handle addresses correctly, we need to mask appropriately based on
    whether we are in aligned or unaligned mode.

Patch 4:
  - This patch updates the i40e driver to make use of the new descriptor
    format.

Patch 5:
  - This patch updates the ixgbe driver to make use of the new descriptor
    format.

Patch 6:
  - This patch updates the mlx5e driver to make use of the new descriptor
    format. These changes are required to handle the new descriptor format
    and for unaligned chunks support.

Patch 7:
  - Add flags for umem configuration to libbpf

Patch 8:
  - Modify xdpsock application to add a command line option for
    unaligned chunks

Patch 9:
  - Since we can now run the application in unaligned chunk mode, we need
    to make sure we recycle the buffers appropriately.

Patch 10:
  - Adds hugepage support to the xdpsock application

Patch 11:
  - Documentation update to include the unaligned chunk scenario. We need
    to explicitly state that the incoming addresses are only masked in the
    aligned chunk mode and not the unaligned chunk mode.

---
v2:
  - fixed checkpatch issues
  - fixed Rx buffer recycling for unaligned chunks in xdpsock
  - removed unused defines
  - fixed how chunk_size is calculated in xsk_diag.c
  - added some performance numbers to cover letter
  - modified descriptor format to make it easier to retrieve original
    address
  - removed patch adding off_t off to the zero copy allocator. This is no
    longer needed with the new descriptor format.

v3:
  - added patch for mlx5 driver changes needed for unaligned chunks
  - moved offset handling to new helper function
  - changed value used for the umem chunk_mask. Now using the new
    descriptor format to save us doing the calculations in a number of
    places meaning more of the code is left unchanged while adding
    unaligned chunk support.

Kevin Laatz (11):
  i40e: simplify Rx buffer recycle
  ixgbe: simplify Rx buffer recycle
  xsk: add support to allow unaligned chunk placement
  i40e: modify driver for handling offsets
  ixgbe: modify driver for handling offsets
  mlx5e: modify driver for handling offsets
  libbpf: add flags to umem config
  samples/bpf: add unaligned chunks mode support to xdpsock
  samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  samples/bpf: use hugepages in xdpsock app
  doc/af_xdp: include unaligned chunk case

 Documentation/networking/af_xdp.rst           | 10 ++-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 33 +++----
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 33 +++----
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  8 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/tx.c   |  9 +-
 include/net/xdp_sock.h                        | 17 ++++
 include/uapi/linux/if_xdp.h                   |  9 ++
 net/xdp/xdp_umem.c                            | 18 ++--
 net/xdp/xsk.c                                 | 86 +++++++++++++++----
 net/xdp/xsk_diag.c                            |  2 +-
 net/xdp/xsk_queue.h                           | 68 +++++++++++++--
 samples/bpf/xdpsock_user.c                    | 56 ++++++++----
 tools/include/uapi/linux/if_xdp.h             |  4 +
 tools/lib/bpf/xsk.c                           |  3 +
 tools/lib/bpf/xsk.h                           |  2 +
 15 files changed, 274 insertions(+), 84 deletions(-)
Magnus Karlsson July 24, 2019, 1:25 p.m. UTC | #4
On Tue, Jul 23, 2019 at 11:08 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> Johnathan, Bjorn, Jakub,
> Please review!
> The patch set has been pending for a week.

There is a v3 coming out shortly so I suggest we wait for that. It
will have Mellanox support for this feature too and some clean ups. I
refrained from posting a review on the mailing list due to the merge
window being closed last week, but maybe that was not correct. Should
I still post reviews for new features submitted during the closed
merge window period? I am happy to do it since I do not have any other
tasks during that time. It is a quite period for me. Just let me know.

/Magnus

> On Tue, Jul 16, 2019 at 4:21 AM Kevin Laatz <kevin.laatz@intel.com> wrote:
> >
> > This patch set adds the ability to use unaligned chunks in the XDP umem.
> >
> > Currently, all chunk addresses passed to the umem are masked to be chunk
> > size aligned (default is 2k, max is PAGE_SIZE). This limits where we can
> > place chunks within the umem as well as limiting the packet sizes that are
> > supported.
> >
> > The changes in this patch set removes these restrictions, allowing XDP to
> > be more flexible in where it can place a chunk within a umem. By relaxing
> > where the chunks can be placed, it allows us to use an arbitrary buffer
> > size and place that wherever we have a free address in the umem. These
> > changes add the ability to support arbitrary frame sizes up to 4k
> > (PAGE_SIZE) and make it easy to integrate with other existing frameworks
> > that have their own memory management systems, such as DPDK.
> >
> > Since we are now dealing with arbitrary frame sizes, we need also need to
> > update how we pass around addresses. Currently, the addresses can simply be
> > masked to 2k to get back to the original address. This becomes less trivial
> > when using frame sizes that are not a 'power of 2' size. This patch set
> > modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
> > field for an offset value, leaving the lower 48-bits for the address (this
> > leaves us with 256 Terabytes, which should be enough!). We only need to use
> > the upper 16-bits to store the offset when running in unaligned mode.
> > Rather than adding the offset (headroom etc) to the address, we will store
> > it in the upper 16-bits of the address field. This way, we can easily add
> > the offset to the address where we need it, using some bit manipulation and
> > addition, and we can also easily get the original address wherever we need
> > it (for example in i40e_zca_free) by simply masking to get the lower
> > 48-bits of the address field.
> >
> > The numbers below were recorded with the following set up:
> >   - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
> >   - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
> >   - Driver: i40e
> >   - Application: xdpsock with l2fwd (single interface)
> >
> > These are solely for comparing performance with and without the patches.
> > The largest drop was ~1% (in zero-copy mode).
> >
> > +-------------------------+------------+-----------------+-------------+
> > | Buffer size: 2048       | SKB mode   | Zero-copy       | Copy        |
> > +-------------------------+------------+-----------------+-------------+
> > | Aligned (baseline)      | 1.7 Mpps   | 15.3 Mpps       | 2.08 Mpps   |
> > +-------------------------+------------+-----------------+-------------+
> > | Aligned (with patches)  | 1.7 Mpps   | 15.1 Mpps       | 2.08 Mpps   |
> > +-------------------------+------------+-----------------+-------------+
> > | Unaligned               | 1.7 Mpps   | 14.5 Mpps       | 2.08 Mpps   |
> > +-------------------------+------------+-----------------+-------------+
> >
> > NOTE: We are currently working on the changes required in the Mellanox
> > driver. We will include these in the v3.
> >
> > Structure of the patchset:
> > Patch 1:
> >   - Remove unnecessary masking and headroom addition during zero-copy Rx
> >     buffer recycling in i40e. This change is required in order for the
> >     buffer recycling to work in the unaligned chunk mode.
> >
> > Patch 2:
> >   - Remove unnecessary masking and headroom addition during
> >     zero-copy Rx buffer recycling in ixgbe. This change is required in
> >     order for the  buffer recycling to work in the unaligned chunk mode.
> >
> > Patch 3:
> >   - Add infrastructure for unaligned chunks. Since we are dealing with
> >     unaligned chunks that could potentially cross a physical page boundary,
> >     we add checks to keep track of that information. We can later use this
> >     information to correctly handle buffers that are placed at an address
> >     where they cross a page boundary.  This patch also modifies the
> >     existing Rx and Tx functions to use the new descriptor format. To
> >     handle addresses correctly, we need to mask appropriately based on
> >     whether we are in aligned or unaligned mode.
> >
> > Patch 4:
> >   - This patch updates the i40e driver to make use of the new descriptor
> >     format. The new format is particularly useful here since we can now
> >     retrieve the original address in places like i40e_zca_free with ease.
> >     This saves us doing various calculations to get the original address
> >     back.
> >
> > Patch 5:
> >   - This patch updates the ixgbe driver to make use of the new descriptor
> >     format. The new format is particularly useful here since we can now
> >     retrieve the original address in places like ixgbe_zca_free with ease.
> >     This saves us doing various calculations to get the original address
> >     back.
> >
> > Patch 6:
> >   - Add flags for umem configuration to libbpf
> >
> > Patch 7:
> >   - Modify xdpsock application to add a command line option for
> >     unaligned chunks
> >
> > Patch 8:
> >   - Since we can now run the application in unaligned chunk mode, we need
> >     to make sure we recycle the buffers appropriately.
> >
> > Patch 9:
> >   - Adds hugepage support to the xdpsock application
> >
> > Patch 10:
> >   - Documentation update to include the unaligned chunk scenario. We need
> >     to explicitly state that the incoming addresses are only masked in the
> >     aligned chunk mode and not the unaligned chunk mode.
> >
> > ---
> > v2:
> >   - fixed checkpatch issues
> >   - fixed Rx buffer recycling for unaligned chunks in xdpsock
> >   - removed unused defines
> >   - fixed how chunk_size is calculated in xsk_diag.c
> >   - added some performance numbers to cover letter
> >   - modified descriptor format to make it easier to retrieve original
> >     address
> >   - removed patch adding off_t off to the zero copy allocator. This is no
> >     longer needed with the new descriptor format.
> >
> > Kevin Laatz (10):
> >   i40e: simplify Rx buffer recycle
> >   ixgbe: simplify Rx buffer recycle
> >   xsk: add support to allow unaligned chunk placement
> >   i40e: modify driver for handling offsets
> >   ixgbe: modify driver for handling offsets
> >   libbpf: add flags to umem config
> >   samples/bpf: add unaligned chunks mode support to xdpsock
> >   samples/bpf: add buffer recycling for unaligned chunks to xdpsock
> >   samples/bpf: use hugepages in xdpsock app
> >   doc/af_xdp: include unaligned chunk case
> >
> >  Documentation/networking/af_xdp.rst          | 10 ++-
> >  drivers/net/ethernet/intel/i40e/i40e_xsk.c   | 39 +++++----
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 39 +++++----
> >  include/net/xdp_sock.h                       |  2 +
> >  include/uapi/linux/if_xdp.h                  |  9 ++
> >  net/xdp/xdp_umem.c                           | 17 ++--
> >  net/xdp/xsk.c                                | 89 ++++++++++++++++----
> >  net/xdp/xsk_diag.c                           |  2 +-
> >  net/xdp/xsk_queue.h                          | 70 +++++++++++++--
> >  samples/bpf/xdpsock_user.c                   | 61 ++++++++++----
> >  tools/include/uapi/linux/if_xdp.h            |  4 +
> >  tools/lib/bpf/xsk.c                          |  3 +
> >  tools/lib/bpf/xsk.h                          |  2 +
> >  13 files changed, 266 insertions(+), 81 deletions(-)
> >
> > --
> > 2.17.1
> >
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
Jonathan Lemon July 25, 2019, 3:39 p.m. UTC | #5
On 23 Jul 2019, at 22:10, Kevin Laatz wrote:

> This patch set adds the ability to use unaligned chunks in the XDP umem.
>
> Currently, all chunk addresses passed to the umem are masked to be chunk
> size aligned (max is PAGE_SIZE). This limits where we can place chunks
> within the umem as well as limiting the packet sizes that are supported.
>
> The changes in this patch set removes these restrictions, allowing XDP to
> be more flexible in where it can place a chunk within a umem. By relaxing
> where the chunks can be placed, it allows us to use an arbitrary buffer
> size and place that wherever we have a free address in the umem. These
> changes add the ability to support arbitrary frame sizes up to 4k
> (PAGE_SIZE) and make it easy to integrate with other existing frameworks
> that have their own memory management systems, such as DPDK.
> In DPDK, for example, there is already support for AF_XDP with zero-copy.
> However, with this patch set the integration will be much more seamless.
> You can find the DPDK AF_XDP driver at:
> https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp
>
> Since we are now dealing with arbitrary frame sizes, we need also need to
> update how we pass around addresses. Currently, the addresses can simply be
> masked to 2k to get back to the original address. This becomes less trivial
> when using frame sizes that are not a 'power of 2' size. This patch set
> modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
> field for an offset value, leaving the lower 48-bits for the address (this
> leaves us with 256 Terabytes, which should be enough!). We only need to use
> the upper 16-bits to store the offset when running in unaligned mode.
> Rather than adding the offset (headroom etc) to the address, we will store
> it in the upper 16-bits of the address field. This way, we can easily add
> the offset to the address where we need it, using some bit manipulation and
> addition, and we can also easily get the original address wherever we need
> it (for example in i40e_zca_fr-- ee) by simply masking to get the lower
> 48-bits of the address field.

I wonder if it would be better to break backwards compatibility here and
say that a handle is going to change from [addr] to [base | offset], or
even [index | offset], where address = (index * chunk size) + offset, and
then use accessor macros to manipulate the queue entries.

This way, the XDP hotpath can adjust the handle with simple arithmetic,
bypassing the "if (unaligned)", check, as it changes the offset directly.

Using a chunk index instead of a base address is safer, otherwise it is
too easy to corrupt things.
Jonathan Lemon July 25, 2019, 3:43 p.m. UTC | #6
On 24 Jul 2019, at 6:25, Magnus Karlsson wrote:

> On Tue, Jul 23, 2019 at 11:08 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>>
>> Johnathan, Bjorn, Jakub,
>> Please review!
>> The patch set has been pending for a week.
>
> There is a v3 coming out shortly so I suggest we wait for that. It
> will have Mellanox support for this feature too and some clean ups. I
> refrained from posting a review on the mailing list due to the merge
> window being closed last week, but maybe that was not correct. Should
> I still post reviews for new features submitted during the closed
> merge window period? I am happy to do it since I do not have any other
> tasks during that time. It is a quite period for me. Just let me know.

Same here - last time I posted a patch when the merge window was closed,
it was signaled to me (Hi Jakub!) to wait until it reopened.
Richardson, Bruce July 25, 2019, 3:56 p.m. UTC | #7
> -----Original Message-----
> From: Jonathan Lemon [mailto:jonathan.lemon@gmail.com]
> Sent: Thursday, July 25, 2019 4:39 PM
> To: Laatz, Kevin <kevin.laatz@intel.com>
> Cc: netdev@vger.kernel.org; ast@kernel.org; daniel@iogearbox.net; Topel,
> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
> <magnus.karlsson@intel.com>; jakub.kicinski@netronome.com;
> saeedm@mellanox.com; maximmi@mellanox.com; stephen@networkplumber.org;
> Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
> <ciara.loftus@intel.com>; bpf@vger.kernel.org; intel-wired-
> lan@lists.osuosl.org
> Subject: Re: [PATCH bpf-next v3 00/11] XDP unaligned chunk placement
> support
> 
> 
> 
> On 23 Jul 2019, at 22:10, Kevin Laatz wrote:
> 
> > This patch set adds the ability to use unaligned chunks in the XDP umem.
> >
> > Currently, all chunk addresses passed to the umem are masked to be
> > chunk size aligned (max is PAGE_SIZE). This limits where we can place
> > chunks within the umem as well as limiting the packet sizes that are
> supported.
> >
> > The changes in this patch set removes these restrictions, allowing XDP
> > to be more flexible in where it can place a chunk within a umem. By
> > relaxing where the chunks can be placed, it allows us to use an
> > arbitrary buffer size and place that wherever we have a free address
> > in the umem. These changes add the ability to support arbitrary frame
> > sizes up to 4k
> > (PAGE_SIZE) and make it easy to integrate with other existing
> > frameworks that have their own memory management systems, such as DPDK.
> > In DPDK, for example, there is already support for AF_XDP with zero-
> copy.
> > However, with this patch set the integration will be much more seamless.
> > You can find the DPDK AF_XDP driver at:
> > https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp
> >
> > Since we are now dealing with arbitrary frame sizes, we need also need
> > to update how we pass around addresses. Currently, the addresses can
> > simply be masked to 2k to get back to the original address. This
> > becomes less trivial when using frame sizes that are not a 'power of
> > 2' size. This patch set modifies the Rx/Tx descriptor format to use
> > the upper 16-bits of the addr field for an offset value, leaving the
> > lower 48-bits for the address (this leaves us with 256 Terabytes,
> > which should be enough!). We only need to use the upper 16-bits to store
> the offset when running in unaligned mode.
> > Rather than adding the offset (headroom etc) to the address, we will
> > store it in the upper 16-bits of the address field. This way, we can
> > easily add the offset to the address where we need it, using some bit
> > manipulation and addition, and we can also easily get the original
> > address wherever we need it (for example in i40e_zca_fr-- ee) by
> > simply masking to get the lower 48-bits of the address field.
> 
> I wonder if it would be better to break backwards compatibility here and
> say that a handle is going to change from [addr] to [base | offset], or
> even [index | offset], where address = (index * chunk size) + offset, and
> then use accessor macros to manipulate the queue entries.
> 
> This way, the XDP hotpath can adjust the handle with simple arithmetic,
> bypassing the "if (unaligned)", check, as it changes the offset directly.
> 
> Using a chunk index instead of a base address is safer, otherwise it is
> too easy to corrupt things.
> --

The trouble with using a chunk index is that it assumes that all chunks are
contiguous, which is not always going to be the case. For example, for
userspace apps the easiest way to get memory that is IOVA/physically 
contiguous is to use hugepages, but even then we still need to skip space
when crossing a 2MB barrier.

Specifically in this example case, with a 3k buffer size and 2MB hugepages,
we'd get 666 buffers on a single page, but then waste a few KB before
starting the 667th buffer at byte 0 on the second 2MB page.
This is the setup used in DPDK, for instance, when we allocate memory for
use in buffer pools. 

Therefore, I think it's better to just have the kernel sanity checking the
request for safety and leave userspace the freedom to decide where in its
memory area it wants to place the buffers.

Regards,
/Bruce
Jonathan Lemon July 25, 2019, 5:30 p.m. UTC | #8
On 25 Jul 2019, at 8:56, Richardson, Bruce wrote:

>> -----Original Message-----
>> From: Jonathan Lemon [mailto:jonathan.lemon@gmail.com]
>> Sent: Thursday, July 25, 2019 4:39 PM
>> To: Laatz, Kevin <kevin.laatz@intel.com>
>> Cc: netdev@vger.kernel.org; ast@kernel.org; daniel@iogearbox.net; Topel,
>> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
>> <magnus.karlsson@intel.com>; jakub.kicinski@netronome.com;
>> saeedm@mellanox.com; maximmi@mellanox.com; stephen@networkplumber.org;
>> Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
>> <ciara.loftus@intel.com>; bpf@vger.kernel.org; intel-wired-
>> lan@lists.osuosl.org
>> Subject: Re: [PATCH bpf-next v3 00/11] XDP unaligned chunk placement
>> support
>>
>>
>>
>> On 23 Jul 2019, at 22:10, Kevin Laatz wrote:
>>
>>> This patch set adds the ability to use unaligned chunks in the XDP umem.
>>>
>>> Currently, all chunk addresses passed to the umem are masked to be
>>> chunk size aligned (max is PAGE_SIZE). This limits where we can place
>>> chunks within the umem as well as limiting the packet sizes that are
>> supported.
>>>
>>> The changes in this patch set removes these restrictions, allowing XDP
>>> to be more flexible in where it can place a chunk within a umem. By
>>> relaxing where the chunks can be placed, it allows us to use an
>>> arbitrary buffer size and place that wherever we have a free address
>>> in the umem. These changes add the ability to support arbitrary frame
>>> sizes up to 4k
>>> (PAGE_SIZE) and make it easy to integrate with other existing
>>> frameworks that have their own memory management systems, such as DPDK.
>>> In DPDK, for example, there is already support for AF_XDP with zero-
>> copy.
>>> However, with this patch set the integration will be much more seamless.
>>> You can find the DPDK AF_XDP driver at:
>>> https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp
>>>
>>> Since we are now dealing with arbitrary frame sizes, we need also need
>>> to update how we pass around addresses. Currently, the addresses can
>>> simply be masked to 2k to get back to the original address. This
>>> becomes less trivial when using frame sizes that are not a 'power of
>>> 2' size. This patch set modifies the Rx/Tx descriptor format to use
>>> the upper 16-bits of the addr field for an offset value, leaving the
>>> lower 48-bits for the address (this leaves us with 256 Terabytes,
>>> which should be enough!). We only need to use the upper 16-bits to store
>> the offset when running in unaligned mode.
>>> Rather than adding the offset (headroom etc) to the address, we will
>>> store it in the upper 16-bits of the address field. This way, we can
>>> easily add the offset to the address where we need it, using some bit
>>> manipulation and addition, and we can also easily get the original
>>> address wherever we need it (for example in i40e_zca_fr-- ee) by
>>> simply masking to get the lower 48-bits of the address field.
>>
>> I wonder if it would be better to break backwards compatibility here and
>> say that a handle is going to change from [addr] to [base | offset], or
>> even [index | offset], where address = (index * chunk size) + offset, and
>> then use accessor macros to manipulate the queue entries.
>>
>> This way, the XDP hotpath can adjust the handle with simple arithmetic,
>> bypassing the "if (unaligned)", check, as it changes the offset directly.
>>
>> Using a chunk index instead of a base address is safer, otherwise it is
>> too easy to corrupt things.
>> --
>
> The trouble with using a chunk index is that it assumes that all chunks are
> contiguous, which is not always going to be the case. For example, for
> userspace apps the easiest way to get memory that is IOVA/physically
> contiguous is to use hugepages, but even then we still need to skip space
> when crossing a 2MB barrier.
>
> Specifically in this example case, with a 3k buffer size and 2MB hugepages,
> we'd get 666 buffers on a single page, but then waste a few KB before
> starting the 667th buffer at byte 0 on the second 2MB page.
> This is the setup used in DPDK, for instance, when we allocate memory for
> use in buffer pools.
>
> Therefore, I think it's better to just have the kernel sanity checking the
> request for safety and leave userspace the freedom to decide where in its
> memory area it wants to place the buffers.

That makes sense, thanks.  My concern is that this makes it difficult for
the kernel to validate the buffer start address, but perhaps that isn't a
concern.

I still believe it would simplify things if the format was [base | offset],
any opinion on that?
Richardson, Bruce July 26, 2019, 8:41 a.m. UTC | #9
On Thu, Jul 25, 2019 at 10:30:51AM -0700, Jonathan Lemon wrote:
> 
> 
> On 25 Jul 2019, at 8:56, Richardson, Bruce wrote:
> 
> >> -----Original Message-----
> >> From: Jonathan Lemon [mailto:jonathan.lemon@gmail.com]
> >> Sent: Thursday, July 25, 2019 4:39 PM
> >> To: Laatz, Kevin <kevin.laatz@intel.com>
> >> Cc: netdev@vger.kernel.org; ast@kernel.org; daniel@iogearbox.net; Topel,
> >> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
> >> <magnus.karlsson@intel.com>; jakub.kicinski@netronome.com;
> >> saeedm@mellanox.com; maximmi@mellanox.com; stephen@networkplumber.org;
> >> Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
> >> <ciara.loftus@intel.com>; bpf@vger.kernel.org; intel-wired-
> >> lan@lists.osuosl.org
> >> Subject: Re: [PATCH bpf-next v3 00/11] XDP unaligned chunk placement
> >> support
> >>
> >>
> >>
> >> On 23 Jul 2019, at 22:10, Kevin Laatz wrote:
> >>
> >>> This patch set adds the ability to use unaligned chunks in the XDP umem.
> >>>
> >>> Currently, all chunk addresses passed to the umem are masked to be
> >>> chunk size aligned (max is PAGE_SIZE). This limits where we can place
> >>> chunks within the umem as well as limiting the packet sizes that are
> >> supported.
> >>>
> >>> The changes in this patch set removes these restrictions, allowing XDP
> >>> to be more flexible in where it can place a chunk within a umem. By
> >>> relaxing where the chunks can be placed, it allows us to use an
> >>> arbitrary buffer size and place that wherever we have a free address
> >>> in the umem. These changes add the ability to support arbitrary frame
> >>> sizes up to 4k
> >>> (PAGE_SIZE) and make it easy to integrate with other existing
> >>> frameworks that have their own memory management systems, such as DPDK.
> >>> In DPDK, for example, there is already support for AF_XDP with zero-
> >> copy.
> >>> However, with this patch set the integration will be much more seamless.
> >>> You can find the DPDK AF_XDP driver at:
> >>> https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp
> >>>
> >>> Since we are now dealing with arbitrary frame sizes, we need also need
> >>> to update how we pass around addresses. Currently, the addresses can
> >>> simply be masked to 2k to get back to the original address. This
> >>> becomes less trivial when using frame sizes that are not a 'power of
> >>> 2' size. This patch set modifies the Rx/Tx descriptor format to use
> >>> the upper 16-bits of the addr field for an offset value, leaving the
> >>> lower 48-bits for the address (this leaves us with 256 Terabytes,
> >>> which should be enough!). We only need to use the upper 16-bits to store
> >> the offset when running in unaligned mode.
> >>> Rather than adding the offset (headroom etc) to the address, we will
> >>> store it in the upper 16-bits of the address field. This way, we can
> >>> easily add the offset to the address where we need it, using some bit
> >>> manipulation and addition, and we can also easily get the original
> >>> address wherever we need it (for example in i40e_zca_fr-- ee) by
> >>> simply masking to get the lower 48-bits of the address field.
> >>
> >> I wonder if it would be better to break backwards compatibility here and
> >> say that a handle is going to change from [addr] to [base | offset], or
> >> even [index | offset], where address = (index * chunk size) + offset, and
> >> then use accessor macros to manipulate the queue entries.
> >>
> >> This way, the XDP hotpath can adjust the handle with simple arithmetic,
> >> bypassing the "if (unaligned)", check, as it changes the offset directly.
> >>
> >> Using a chunk index instead of a base address is safer, otherwise it is
> >> too easy to corrupt things.
> >> --
> >
> > The trouble with using a chunk index is that it assumes that all chunks are
> > contiguous, which is not always going to be the case. For example, for
> > userspace apps the easiest way to get memory that is IOVA/physically
> > contiguous is to use hugepages, but even then we still need to skip space
> > when crossing a 2MB barrier.
> >
> > Specifically in this example case, with a 3k buffer size and 2MB hugepages,
> > we'd get 666 buffers on a single page, but then waste a few KB before
> > starting the 667th buffer at byte 0 on the second 2MB page.
> > This is the setup used in DPDK, for instance, when we allocate memory for
> > use in buffer pools.
> >
> > Therefore, I think it's better to just have the kernel sanity checking the
> > request for safety and leave userspace the freedom to decide where in its
> > memory area it wants to place the buffers.
> 
> That makes sense, thanks.  My concern is that this makes it difficult for
> the kernel to validate the buffer start address, but perhaps that isn't a
> concern.
> 
I don't think that's a problem right now, since the start address is still
an offset into the umem, and so must be between 0 and size (or size -
chunk_size, more specifically). Are there other checks that should be
performed on it, do you think?

> I still believe it would simplify things if the format was [base | offset],
> any opinion on that?

Can you perhaps clarify what you mean here. Are you suggesting globally
changing the address format to always use this, irrespective of chunk
alignment, or something else?

Regards,
/Bruce
Laatz, Kevin July 30, 2019, 8:53 a.m. UTC | #10
This patch set adds the ability to use unaligned chunks in the XDP umem.

Currently, all chunk addresses passed to the umem are masked to be chunk
size aligned (max is PAGE_SIZE). This limits where we can place chunks
within the umem as well as limiting the packet sizes that are supported.

The changes in this patch set removes these restrictions, allowing XDP to
be more flexible in where it can place a chunk within a umem. By relaxing
where the chunks can be placed, it allows us to use an arbitrary buffer
size and place that wherever we have a free address in the umem. These
changes add the ability to support arbitrary frame sizes up to 4k
(PAGE_SIZE) and make it easy to integrate with other existing frameworks
that have their own memory management systems, such as DPDK.
In DPDK, for example, there is already support for AF_XDP with zero-copy.
However, with this patch set the integration will be much more seamless.
You can find the DPDK AF_XDP driver at:
https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp

Since we are now dealing with arbitrary frame sizes, we need also need to
update how we pass around addresses. Currently, the addresses can simply be
masked to 2k to get back to the original address. This becomes less trivial
when using frame sizes that are not a 'power of 2' size. This patch set
modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
field for an offset value, leaving the lower 48-bits for the address (this
leaves us with 256 Terabytes, which should be enough!). We only need to use
the upper 16-bits to store the offset when running in unaligned mode.
Rather than adding the offset (headroom etc) to the address, we will store
it in the upper 16-bits of the address field. This way, we can easily add
the offset to the address where we need it, using some bit manipulation and
addition, and we can also easily get the original address wherever we need
it (for example in i40e_zca_free) by simply masking to get the lower
48-bits of the address field.

The numbers below were recorded with the following set up:
  - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
  - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
  - Driver: i40e
  - Application: xdpsock with l2fwd (single interface)
  - Turbo disabled in BIOS

These are solely for comparing performance with and without the patches.
The largest drop was ~1.5% (in zero-copy mode).

+-------------------------+-------------+-----------------+-------------+
| Buffer size: 2048       | SKB mode    | Zero-copy       | Copy        |
+-------------------------+-------------+-----------------+-------------+
| Aligned (baseline)      | 1.25 Mpps   | 10.0 Mpps       | 1.66 Mpps   |
+-------------------------+-------------+-----------------+-------------+
| Aligned (with patches)  | 1.25 Mpps   | 9.85 Mpps       | 1.66 Mpps   |
+-------------------------+-------------+-----------------+-------------+
| Unaligned               | 1.25 Mpps   | 9.65 Mpps       | 1.66 Mpps   |
+-------------------------+-------------+-----------------+-------------+

This patch set has been applied against
commit 475e31f8da1b("Merge branch 'revamp-test_progs'")

Structure of the patch set:
Patch 1:
  - Remove unnecessary masking and headroom addition during zero-copy Rx
    buffer recycling in i40e. This change is required in order for the
    buffer recycling to work in the unaligned chunk mode.

Patch 2:
  - Remove unnecessary masking and headroom addition during
    zero-copy Rx buffer recycling in ixgbe. This change is required in
    order for the  buffer recycling to work in the unaligned chunk mode.

Patch 3:
  - Add flags for umem configuration to libbpf

Patch 4:
  - Add infrastructure for unaligned chunks. Since we are dealing with
    unaligned chunks that could potentially cross a physical page boundary,
    we add checks to keep track of that information. We can later use this
    information to correctly handle buffers that are placed at an address
    where they cross a page boundary.  This patch also modifies the
    existing Rx and Tx functions to use the new descriptor format. To
    handle addresses correctly, we need to mask appropriately based on
    whether we are in aligned or unaligned mode.

Patch 5:
  - This patch updates the i40e driver to make use of the new descriptor
    format.

Patch 6:
  - This patch updates the ixgbe driver to make use of the new descriptor
    format.

Patch 7:
  - This patch updates the mlx5e driver to make use of the new descriptor
    format. These changes are required to handle the new descriptor format
    and for unaligned chunks support.

Patch 8:
  - Modify xdpsock application to add a command line option for
    unaligned chunks

Patch 9:
  - Since we can now run the application in unaligned chunk mode, we need
    to make sure we recycle the buffers appropriately.

Patch 10:
  - Adds hugepage support to the xdpsock application

Patch 11:
  - Documentation update to include the unaligned chunk scenario. We need
    to explicitly state that the incoming addresses are only masked in the
    aligned chunk mode and not the unaligned chunk mode.

---
v2:
  - fixed checkpatch issues
  - fixed Rx buffer recycling for unaligned chunks in xdpsock
  - removed unused defines
  - fixed how chunk_size is calculated in xsk_diag.c
  - added some performance numbers to cover letter
  - modified descriptor format to make it easier to retrieve original
    address
  - removed patch adding off_t off to the zero copy allocator. This is no
    longer needed with the new descriptor format.

v3:
  - added patch for mlx5 driver changes needed for unaligned chunks
  - moved offset handling to new helper function
  - changed value used for the umem chunk_mask. Now using the new
    descriptor format to save us doing the calculations in a number of
    places meaning more of the code is left unchanged while adding
    unaligned chunk support.

v4:
  - reworked the next_pg_contig field in the xdp_umem_page struct. We now
    use the low 12 bits of the addr for flags rather than adding an extra
    field in the struct.
  - modified unaligned chunks flag define
  - fixed page_start calculation in __xsk_rcv_memcpy().
  - move offset handling to the xdp_umem_get_* functions
  - modified the len field in xdp_umem_reg struct. We now use 16 bits from
    this for the flags field.
  - removed next_pg_contig field from xdp_umem_page struct. Using low 12
    bits of addr to store flags instead.
  - fixed headroom addition to handle in the mlx5e driver
  - other minor changes based on review comments

Kevin Laatz (11):
  i40e: simplify Rx buffer recycle
  ixgbe: simplify Rx buffer recycle
  libbpf: add flags to umem config
  xsk: add support to allow unaligned chunk placement
  i40e: modify driver for handling offsets
  ixgbe: modify driver for handling offsets
  mlx5e: modify driver for handling offsets
  samples/bpf: add unaligned chunks mode support to xdpsock
  samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  samples/bpf: use hugepages in xdpsock app
  doc/af_xdp: include unaligned chunk case

 Documentation/networking/af_xdp.rst           | 10 ++-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 26 +++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 26 +++---
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  8 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   |  3 +-
 include/net/xdp_sock.h                        | 40 +++++++++-
 include/uapi/linux/if_xdp.h                   | 14 +++-
 net/xdp/xdp_umem.c                            | 18 +++--
 net/xdp/xsk.c                                 | 79 +++++++++++++++----
 net/xdp/xsk_diag.c                            |  2 +-
 net/xdp/xsk_queue.h                           | 69 ++++++++++++++--
 samples/bpf/xdpsock_user.c                    | 59 ++++++++++----
 tools/include/uapi/linux/if_xdp.h             |  9 ++-
 tools/lib/bpf/xsk.c                           |  3 +
 tools/lib/bpf/xsk.h                           |  2 +
 15 files changed, 280 insertions(+), 88 deletions(-)
Laatz, Kevin Aug. 22, 2019, 1:44 a.m. UTC | #11
This patch set adds the ability to use unaligned chunks in the XDP umem.

Currently, all chunk addresses passed to the umem are masked to be chunk
size aligned (max is PAGE_SIZE). This limits where we can place chunks
within the umem as well as limiting the packet sizes that are supported.

The changes in this patch set removes these restrictions, allowing XDP to
be more flexible in where it can place a chunk within a umem. By relaxing
where the chunks can be placed, it allows us to use an arbitrary buffer
size and place that wherever we have a free address in the umem. These
changes add the ability to support arbitrary frame sizes up to 4k
(PAGE_SIZE) and make it easy to integrate with other existing frameworks
that have their own memory management systems, such as DPDK.
In DPDK, for example, there is already support for AF_XDP with zero-copy.
However, with this patch set the integration will be much more seamless.
You can find the DPDK AF_XDP driver at:
https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp

Since we are now dealing with arbitrary frame sizes, we need also need to
update how we pass around addresses. Currently, the addresses can simply be
masked to 2k to get back to the original address. This becomes less trivial
when using frame sizes that are not a 'power of 2' size. This patch set
modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
field for an offset value, leaving the lower 48-bits for the address (this
leaves us with 256 Terabytes, which should be enough!). We only need to use
the upper 16-bits to store the offset when running in unaligned mode.
Rather than adding the offset (headroom etc) to the address, we will store
it in the upper 16-bits of the address field. This way, we can easily add
the offset to the address where we need it, using some bit manipulation and
addition, and we can also easily get the original address wherever we need
it (for example in i40e_zca_free) by simply masking to get the lower
48-bits of the address field.

Note: The mlx5e patch only adds support for the new offset handling. Maxim
Mikityanski <maximmi@mellanox.com> is working on an additional patch that
enables the unaligned chunks feature (makes mlx5e accept arbitrary frame
sizes) which will be added later and applies on top of this patch.

The numbers below were recorded with the following set up:
  - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
  - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
  - Driver: i40e
  - Application: xdpsock with l2fwd (single interface)
  - Turbo disabled in BIOS

These are solely for comparing performance with and without the patches.
The largest drop was ~1.5% (in zero-copy mode).

+-------------------------+-------------+-----------------+-------------+
| Buffer size: 2048       | SKB mode    | Zero-copy       | Copy        |
+-------------------------+-------------+-----------------+-------------+
| Aligned (baseline)      | 1.25 Mpps   | 10.0 Mpps       | 1.66 Mpps   |
+-------------------------+-------------+-----------------+-------------+
| Aligned (with patches)  | 1.25 Mpps   | 9.85 Mpps       | 1.66 Mpps   |
+-------------------------+-------------+-----------------+-------------+
| Unaligned               | 1.25 Mpps   | 9.65 Mpps       | 1.66 Mpps   |
+-------------------------+-------------+-----------------+-------------+

This patch set has been applied against
commit 0bb52b0dfc88 ("tools: bpftool: add 'bpftool map freeze' subcommand")

Structure of the patch set:
Patch 1:
  - Remove unnecessary masking and headroom addition during zero-copy Rx
    buffer recycling in i40e. This change is required in order for the
    buffer recycling to work in the unaligned chunk mode.

Patch 2:
  - Remove unnecessary masking and headroom addition during
    zero-copy Rx buffer recycling in ixgbe. This change is required in
    order for the  buffer recycling to work in the unaligned chunk mode.

Patch 3:
  - Add infrastructure for unaligned chunks. Since we are dealing with
    unaligned chunks that could potentially cross a physical page boundary,
    we add checks to keep track of that information. We can later use this
    information to correctly handle buffers that are placed at an address
    where they cross a page boundary.  This patch also modifies the
    existing Rx and Tx functions to use the new descriptor format. To
    handle addresses correctly, we need to mask appropriately based on
    whether we are in aligned or unaligned mode.

Patch 4:
  - This patch updates the i40e driver to make use of the new descriptor
    format.

Patch 5:
  - This patch updates the ixgbe driver to make use of the new descriptor
    format.

Patch 6:
  - This patch updates the mlx5e driver to make use of the new descriptor
    format. These changes are required to handle the new descriptor format
    and for unaligned chunks support.

Patch 7:
  - Add flags for umem configuration to libbpf. Since we increase the size
    of the struct by adding flags, we also need to add the ABI versioning
    in this patch.

Patch 8:
  - Modify xdpsock application to add a command line option for
    unaligned chunks

Patch 9:
  - Since we can now run the application in unaligned chunk mode, we need
    to make sure we recycle the buffers appropriately.

Patch 10:
  - Adds hugepage support to the xdpsock application

Patch 11:
  - Documentation update to include the unaligned chunk scenario. We need
    to explicitly state that the incoming addresses are only masked in the
    aligned chunk mode and not the unaligned chunk mode.

---
v2:
  - fixed checkpatch issues
  - fixed Rx buffer recycling for unaligned chunks in xdpsock
  - removed unused defines
  - fixed how chunk_size is calculated in xsk_diag.c
  - added some performance numbers to cover letter
  - modified descriptor format to make it easier to retrieve original
    address
  - removed patch adding off_t off to the zero copy allocator. This is no
    longer needed with the new descriptor format.

v3:
  - added patch for mlx5 driver changes needed for unaligned chunks
  - moved offset handling to new helper function
  - changed value used for the umem chunk_mask. Now using the new
    descriptor format to save us doing the calculations in a number of
    places meaning more of the code is left unchanged while adding
    unaligned chunk support.

v4:
  - reworked the next_pg_contig field in the xdp_umem_page struct. We now
    use the low 12 bits of the addr for flags rather than adding an extra
    field in the struct.
  - modified unaligned chunks flag define
  - fixed page_start calculation in __xsk_rcv_memcpy().
  - move offset handling to the xdp_umem_get_* functions
  - modified the len field in xdp_umem_reg struct. We now use 16 bits from
    this for the flags field.
  - fixed headroom addition to handle in the mlx5e driver
  - other minor changes based on review comments

v5:
  - Added ABI versioning in the libbpf patch
  - Removed bitfields in the xdp_umem_reg struct. Adding new flags field.
  - Added accessors for getting addr and offset.
  - Added helper function for adding the offset to the addr.
  - Fixed conflicts with 'bpf-af-xdp-wakeup' which was merged recently.
  - Fixed typo in mlx driver patch.
  - Moved libbpf patch to later in the set (7/11, just before the sample
    app changes)

Kevin Laatz (11):
  i40e: simplify Rx buffer recycle
  ixgbe: simplify Rx buffer recycle
  xsk: add support to allow unaligned chunk placement
  i40e: modify driver for handling offsets
  ixgbe: modify driver for handling offsets
  mlx5e: modify driver for handling offsets
  libbpf: add flags to umem config
  samples/bpf: add unaligned chunks mode support to xdpsock
  samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  samples/bpf: use hugepages in xdpsock app
  doc/af_xdp: include unaligned chunk case

 Documentation/networking/af_xdp.rst           | 10 +-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 26 +++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 26 +++--
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  8 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   |  3 +-
 include/net/xdp_sock.h                        | 75 ++++++++++++++-
 include/uapi/linux/if_xdp.h                   |  9 ++
 net/xdp/xdp_umem.c                            | 19 +++-
 net/xdp/xsk.c                                 | 96 +++++++++++++++----
 net/xdp/xsk_diag.c                            |  2 +-
 net/xdp/xsk_queue.h                           | 68 +++++++++++--
 samples/bpf/xdpsock_user.c                    | 61 ++++++++----
 tools/include/uapi/linux/if_xdp.h             |  9 ++
 tools/lib/bpf/Makefile                        |  5 +-
 tools/lib/bpf/libbpf.map                      |  1 +
 tools/lib/bpf/xsk.c                           | 33 ++++++-
 tools/lib/bpf/xsk.h                           | 27 ++++++
 17 files changed, 384 insertions(+), 94 deletions(-)
Laatz, Kevin Aug. 27, 2019, 2:25 a.m. UTC | #12
This patch set adds the ability to use unaligned chunks in the XDP umem.

Currently, all chunk addresses passed to the umem are masked to be chunk
size aligned (max is PAGE_SIZE). This limits where we can place chunks
within the umem as well as limiting the packet sizes that are supported.

The changes in this patch set removes these restrictions, allowing XDP to
be more flexible in where it can place a chunk within a umem. By relaxing
where the chunks can be placed, it allows us to use an arbitrary buffer
size and place that wherever we have a free address in the umem. These
changes add the ability to support arbitrary frame sizes up to 4k
(PAGE_SIZE) and make it easy to integrate with other existing frameworks
that have their own memory management systems, such as DPDK.
In DPDK, for example, there is already support for AF_XDP with zero-copy.
However, with this patch set the integration will be much more seamless.
You can find the DPDK AF_XDP driver at:
https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp

Since we are now dealing with arbitrary frame sizes, we need also need to
update how we pass around addresses. Currently, the addresses can simply be
masked to 2k to get back to the original address. This becomes less trivial
when using frame sizes that are not a 'power of 2' size. This patch set
modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
field for an offset value, leaving the lower 48-bits for the address (this
leaves us with 256 Terabytes, which should be enough!). We only need to use
the upper 16-bits to store the offset when running in unaligned mode.
Rather than adding the offset (headroom etc) to the address, we will store
it in the upper 16-bits of the address field. This way, we can easily add
the offset to the address where we need it, using some bit manipulation and
addition, and we can also easily get the original address wherever we need
it (for example in i40e_zca_free) by simply masking to get the lower
48-bits of the address field.

The patch set was tested with the following set up:
  - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
  - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
  - Driver: i40e
  - Application: xdpsock with l2fwd (single interface)
  - Turbo disabled in BIOS

There are no changes to performance before and after these patches for SKB
mode and Copy mode. Zero-copy mode saw a performance degradation of ~1.5%.

This patch set has been applied against
commit 0bb52b0dfc88 ("tools: bpftool: add 'bpftool map freeze' subcommand")

Structure of the patch set:
Patch 1:
  - Remove unnecessary masking and headroom addition during zero-copy Rx
    buffer recycling in i40e. This change is required in order for the
    buffer recycling to work in the unaligned chunk mode.

Patch 2:
  - Remove unnecessary masking and headroom addition during
    zero-copy Rx buffer recycling in ixgbe. This change is required in
    order for the  buffer recycling to work in the unaligned chunk mode.

Patch 3:
  - Add infrastructure for unaligned chunks. Since we are dealing with
    unaligned chunks that could potentially cross a physical page boundary,
    we add checks to keep track of that information. We can later use this
    information to correctly handle buffers that are placed at an address
    where they cross a page boundary.  This patch also modifies the
    existing Rx and Tx functions to use the new descriptor format. To
    handle addresses correctly, we need to mask appropriately based on
    whether we are in aligned or unaligned mode.

Patch 4:
  - This patch updates the i40e driver to make use of the new descriptor
    format.

Patch 5:
  - This patch updates the ixgbe driver to make use of the new descriptor
    format.

Patch 6:
  - This patch updates the mlx5e driver to make use of the new descriptor
    format. These changes are required to handle the new descriptor format
    and for unaligned chunks support.

Patch 7:
  - This patch allows XSK frames smaller than page size in the mlx5e
    driver. Relax the requirements to the XSK frame size to allow it to be
    smaller than a page and even not a power of two. The current
    implementation can work in this mode, both with Striding RQ and without
    it.

Patch 8:
  - Add flags for umem configuration to libbpf. Since we increase the size
    of the struct by adding flags, we also need to add the ABI versioning
    in this patch.

Patch 9:
  - Modify xdpsock application to add a command line option for
    unaligned chunks

Patch 10:
  - Since we can now run the application in unaligned chunk mode, we need
    to make sure we recycle the buffers appropriately.

Patch 11:
  - Adds hugepage support to the xdpsock application

Patch 12:
  - Documentation update to include the unaligned chunk scenario. We need
    to explicitly state that the incoming addresses are only masked in the
    aligned chunk mode and not the unaligned chunk mode.

---
v2:
  - fixed checkpatch issues
  - fixed Rx buffer recycling for unaligned chunks in xdpsock
  - removed unused defines
  - fixed how chunk_size is calculated in xsk_diag.c
  - added some performance numbers to cover letter
  - modified descriptor format to make it easier to retrieve original
    address
  - removed patch adding off_t off to the zero copy allocator. This is no
    longer needed with the new descriptor format.

v3:
  - added patch for mlx5 driver changes needed for unaligned chunks
  - moved offset handling to new helper function
  - changed value used for the umem chunk_mask. Now using the new
    descriptor format to save us doing the calculations in a number of
    places meaning more of the code is left unchanged while adding
    unaligned chunk support.

v4:
  - reworked the next_pg_contig field in the xdp_umem_page struct. We now
    use the low 12 bits of the addr for flags rather than adding an extra
    field in the struct.
  - modified unaligned chunks flag define
  - fixed page_start calculation in __xsk_rcv_memcpy().
  - move offset handling to the xdp_umem_get_* functions
  - modified the len field in xdp_umem_reg struct. We now use 16 bits from
    this for the flags field.
  - fixed headroom addition to handle in the mlx5e driver
  - other minor changes based on review comments

v5:
  - Added ABI versioning in the libbpf patch
  - Removed bitfields in the xdp_umem_reg struct. Adding new flags field.
  - Added accessors for getting addr and offset.
  - Added helper function for adding the offset to the addr.
  - Fixed conflicts with 'bpf-af-xdp-wakeup' which was merged recently.
  - Fixed typo in mlx driver patch.
  - Moved libbpf patch to later in the set (7/11, just before the sample
    app changes)

v6:
  - Added support for XSK frames smaller than page in mlx5e driver (Maxim
    Mikityanskiy <maximmi@mellanox.com).
  - Fixed offset handling in xsk_generic_rcv.
  - Added check for base address in xskq_is_valid_addr_unaligned.

Kevin Laatz (11):
  i40e: simplify Rx buffer recycle
  ixgbe: simplify Rx buffer recycle
  xsk: add support to allow unaligned chunk placement
  i40e: modify driver for handling offsets
  ixgbe: modify driver for handling offsets
  mlx5e: modify driver for handling offsets
  libbpf: add flags to umem config
  samples/bpf: add unaligned chunks mode support to xdpsock
  samples/bpf: add buffer recycling for unaligned chunks to xdpsock
  samples/bpf: use hugepages in xdpsock app
  doc/af_xdp: include unaligned chunk case

Maxim Mikityanskiy (1):
  net/mlx5e: Allow XSK frames smaller than a page

 Documentation/networking/af_xdp.rst           | 10 +-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 26 +++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 26 +++--
 .../ethernet/mellanox/mlx5/core/en/params.c   | 23 ++++-
 .../ethernet/mellanox/mlx5/core/en/params.h   |  2 +
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  8 +-
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   |  5 +-
 .../mellanox/mlx5/core/en/xsk/setup.c         | 15 ++-
 include/net/xdp_sock.h                        | 75 ++++++++++++++-
 include/uapi/linux/if_xdp.h                   |  9 ++
 net/xdp/xdp_umem.c                            | 19 +++-
 net/xdp/xsk.c                                 | 94 +++++++++++++++----
 net/xdp/xsk_diag.c                            |  2 +-
 net/xdp/xsk_queue.h                           | 70 ++++++++++++--
 samples/bpf/xdpsock_user.c                    | 61 ++++++++----
 tools/include/uapi/linux/if_xdp.h             |  9 ++
 tools/lib/bpf/Makefile                        |  5 +-
 tools/lib/bpf/libbpf.map                      |  1 +
 tools/lib/bpf/xsk.c                           | 33 ++++++-
 tools/lib/bpf/xsk.h                           | 27 ++++++
 20 files changed, 417 insertions(+), 103 deletions(-)
Jonathan Lemon Aug. 30, 2019, 3:52 p.m. UTC | #13
On 26 Aug 2019, at 19:25, Kevin Laatz wrote:

> This patch set adds the ability to use unaligned chunks in the XDP umem.

Thanks, Kevin for your hard work and perseverance on this patch set!
Daniel Borkmann Aug. 30, 2019, 11:29 p.m. UTC | #14
On 8/27/19 4:25 AM, Kevin Laatz wrote:
> This patch set adds the ability to use unaligned chunks in the XDP umem.
> 
> Currently, all chunk addresses passed to the umem are masked to be chunk
> size aligned (max is PAGE_SIZE). This limits where we can place chunks
> within the umem as well as limiting the packet sizes that are supported.
> 
> The changes in this patch set removes these restrictions, allowing XDP to
> be more flexible in where it can place a chunk within a umem. By relaxing
> where the chunks can be placed, it allows us to use an arbitrary buffer
> size and place that wherever we have a free address in the umem. These
> changes add the ability to support arbitrary frame sizes up to 4k
> (PAGE_SIZE) and make it easy to integrate with other existing frameworks
> that have their own memory management systems, such as DPDK.
> In DPDK, for example, there is already support for AF_XDP with zero-copy.
> However, with this patch set the integration will be much more seamless.
> You can find the DPDK AF_XDP driver at:
> https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp
> 
> Since we are now dealing with arbitrary frame sizes, we need also need to
> update how we pass around addresses. Currently, the addresses can simply be
> masked to 2k to get back to the original address. This becomes less trivial
> when using frame sizes that are not a 'power of 2' size. This patch set
> modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
> field for an offset value, leaving the lower 48-bits for the address (this
> leaves us with 256 Terabytes, which should be enough!). We only need to use
> the upper 16-bits to store the offset when running in unaligned mode.
> Rather than adding the offset (headroom etc) to the address, we will store
> it in the upper 16-bits of the address field. This way, we can easily add
> the offset to the address where we need it, using some bit manipulation and
> addition, and we can also easily get the original address wherever we need
> it (for example in i40e_zca_free) by simply masking to get the lower
> 48-bits of the address field.
> 
> The patch set was tested with the following set up:
>    - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
>    - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
>    - Driver: i40e
>    - Application: xdpsock with l2fwd (single interface)
>    - Turbo disabled in BIOS
> 
> There are no changes to performance before and after these patches for SKB
> mode and Copy mode. Zero-copy mode saw a performance degradation of ~1.5%.
> 
> This patch set has been applied against
> commit 0bb52b0dfc88 ("tools: bpftool: add 'bpftool map freeze' subcommand")
> 
> Structure of the patch set:
> Patch 1:
>    - Remove unnecessary masking and headroom addition during zero-copy Rx
>      buffer recycling in i40e. This change is required in order for the
>      buffer recycling to work in the unaligned chunk mode.
> 
> Patch 2:
>    - Remove unnecessary masking and headroom addition during
>      zero-copy Rx buffer recycling in ixgbe. This change is required in
>      order for the  buffer recycling to work in the unaligned chunk mode.
> 
> Patch 3:
>    - Add infrastructure for unaligned chunks. Since we are dealing with
>      unaligned chunks that could potentially cross a physical page boundary,
>      we add checks to keep track of that information. We can later use this
>      information to correctly handle buffers that are placed at an address
>      where they cross a page boundary.  This patch also modifies the
>      existing Rx and Tx functions to use the new descriptor format. To
>      handle addresses correctly, we need to mask appropriately based on
>      whether we are in aligned or unaligned mode.
> 
> Patch 4:
>    - This patch updates the i40e driver to make use of the new descriptor
>      format.
> 
> Patch 5:
>    - This patch updates the ixgbe driver to make use of the new descriptor
>      format.
> 
> Patch 6:
>    - This patch updates the mlx5e driver to make use of the new descriptor
>      format. These changes are required to handle the new descriptor format
>      and for unaligned chunks support.
> 
> Patch 7:
>    - This patch allows XSK frames smaller than page size in the mlx5e
>      driver. Relax the requirements to the XSK frame size to allow it to be
>      smaller than a page and even not a power of two. The current
>      implementation can work in this mode, both with Striding RQ and without
>      it.
> 
> Patch 8:
>    - Add flags for umem configuration to libbpf. Since we increase the size
>      of the struct by adding flags, we also need to add the ABI versioning
>      in this patch.
> 
> Patch 9:
>    - Modify xdpsock application to add a command line option for
>      unaligned chunks
> 
> Patch 10:
>    - Since we can now run the application in unaligned chunk mode, we need
>      to make sure we recycle the buffers appropriately.
> 
> Patch 11:
>    - Adds hugepage support to the xdpsock application
> 
> Patch 12:
>    - Documentation update to include the unaligned chunk scenario. We need
>      to explicitly state that the incoming addresses are only masked in the
>      aligned chunk mode and not the unaligned chunk mode.
> 
> ---
> v2:
>    - fixed checkpatch issues
>    - fixed Rx buffer recycling for unaligned chunks in xdpsock
>    - removed unused defines
>    - fixed how chunk_size is calculated in xsk_diag.c
>    - added some performance numbers to cover letter
>    - modified descriptor format to make it easier to retrieve original
>      address
>    - removed patch adding off_t off to the zero copy allocator. This is no
>      longer needed with the new descriptor format.
> 
> v3:
>    - added patch for mlx5 driver changes needed for unaligned chunks
>    - moved offset handling to new helper function
>    - changed value used for the umem chunk_mask. Now using the new
>      descriptor format to save us doing the calculations in a number of
>      places meaning more of the code is left unchanged while adding
>      unaligned chunk support.
> 
> v4:
>    - reworked the next_pg_contig field in the xdp_umem_page struct. We now
>      use the low 12 bits of the addr for flags rather than adding an extra
>      field in the struct.
>    - modified unaligned chunks flag define
>    - fixed page_start calculation in __xsk_rcv_memcpy().
>    - move offset handling to the xdp_umem_get_* functions
>    - modified the len field in xdp_umem_reg struct. We now use 16 bits from
>      this for the flags field.
>    - fixed headroom addition to handle in the mlx5e driver
>    - other minor changes based on review comments
> 
> v5:
>    - Added ABI versioning in the libbpf patch
>    - Removed bitfields in the xdp_umem_reg struct. Adding new flags field.
>    - Added accessors for getting addr and offset.
>    - Added helper function for adding the offset to the addr.
>    - Fixed conflicts with 'bpf-af-xdp-wakeup' which was merged recently.
>    - Fixed typo in mlx driver patch.
>    - Moved libbpf patch to later in the set (7/11, just before the sample
>      app changes)
> 
> v6:
>    - Added support for XSK frames smaller than page in mlx5e driver (Maxim
>      Mikityanskiy <maximmi@mellanox.com).
>    - Fixed offset handling in xsk_generic_rcv.
>    - Added check for base address in xskq_is_valid_addr_unaligned.
> 
> Kevin Laatz (11):
>    i40e: simplify Rx buffer recycle
>    ixgbe: simplify Rx buffer recycle
>    xsk: add support to allow unaligned chunk placement
>    i40e: modify driver for handling offsets
>    ixgbe: modify driver for handling offsets
>    mlx5e: modify driver for handling offsets
>    libbpf: add flags to umem config
>    samples/bpf: add unaligned chunks mode support to xdpsock
>    samples/bpf: add buffer recycling for unaligned chunks to xdpsock
>    samples/bpf: use hugepages in xdpsock app
>    doc/af_xdp: include unaligned chunk case
> 
> Maxim Mikityanskiy (1):
>    net/mlx5e: Allow XSK frames smaller than a page
> 
>   Documentation/networking/af_xdp.rst           | 10 +-
>   drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 26 +++--
>   drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 26 +++--
>   .../ethernet/mellanox/mlx5/core/en/params.c   | 23 ++++-
>   .../ethernet/mellanox/mlx5/core/en/params.h   |  2 +
>   .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  8 +-
>   .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   |  5 +-
>   .../mellanox/mlx5/core/en/xsk/setup.c         | 15 ++-
>   include/net/xdp_sock.h                        | 75 ++++++++++++++-
>   include/uapi/linux/if_xdp.h                   |  9 ++
>   net/xdp/xdp_umem.c                            | 19 +++-
>   net/xdp/xsk.c                                 | 94 +++++++++++++++----
>   net/xdp/xsk_diag.c                            |  2 +-
>   net/xdp/xsk_queue.h                           | 70 ++++++++++++--
>   samples/bpf/xdpsock_user.c                    | 61 ++++++++----
>   tools/include/uapi/linux/if_xdp.h             |  9 ++
>   tools/lib/bpf/Makefile                        |  5 +-
>   tools/lib/bpf/libbpf.map                      |  1 +
>   tools/lib/bpf/xsk.c                           | 33 ++++++-
>   tools/lib/bpf/xsk.h                           | 27 ++++++
>   20 files changed, 417 insertions(+), 103 deletions(-)
> 

Applied, thanks!