Message ID | 20190612155605.22450-1-maximmi@mellanox.com |
---|---|
Headers | show |
Series | AF_XDP infrastructure improvements and mlx5e support | expand |
On 12 Jun 2019, at 8:56, Maxim Mikityanskiy wrote: > This series contains improvements to the AF_XDP kernel infrastructure > and AF_XDP support in mlx5e. The infrastructure improvements are > required for mlx5e, but also some of them benefit to all drivers, and > some can be useful for other drivers that want to implement AF_XDP. > > The performance testing was performed on a machine with the following > configuration: > > - 24 cores of Intel Xeon E5-2620 v3 @ 2.40 GHz > - Mellanox ConnectX-5 Ex with 100 Gbit/s link > > The results with retpoline disabled, single stream: > > txonly: 33.3 Mpps (21.5 Mpps with queue and app pinned to the same CPU) > rxdrop: 12.2 Mpps > l2fwd: 9.4 Mpps > > The results with retpoline enabled, single stream: > > txonly: 21.3 Mpps (14.1 Mpps with queue and app pinned to the same CPU) > rxdrop: 9.9 Mpps > l2fwd: 6.8 Mpps > > v2 changes: > > Added patches for mlx5e and addressed the comments for v1. Rebased for > bpf-next. > > v3 changes: > > Rebased for the newer bpf-next, resolved conflicts in libbpf. Addressed > Björn's comments for coding style. Fixed a bug in error handling flow in > mlx5e_open_xsk. > > v4 changes: > > UAPI is not changed, XSK RX queues are exposed to the kernel. The lower > half of the available amount of RX queues are regular queues, and the > upper half are XSK RX queues. The patch "xsk: Extend channels to support > combined XSK/non-XSK traffic" was dropped. The final patch was reworked > accordingly. > > Added "net/mlx5e: Attach/detach XDP program safely", as the changes > introduced in the XSK patch base on the stuff from this one. > > Added "libbpf: Support drivers with non-combined channels", which aligns > the condition in libbpf with the condition in the kernel. > > Rebased over the newer bpf-next. Very nice change for the RX queues! For the series: Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com>
On Wed, 12 Jun 2019 15:56:33 +0000, Maxim Mikityanskiy wrote: > UAPI is not changed, XSK RX queues are exposed to the kernel. The lower > half of the available amount of RX queues are regular queues, and the > upper half are XSK RX queues. If I have 32 queues enabled on the NIC and I install AF_XDP socket on queue 10, does the NIC now have 64 RQs, but only first 32 are in the normal RSS map? > The patch "xsk: Extend channels to support combined XSK/non-XSK > traffic" was dropped. The final patch was reworked accordingly. The final patches has 2k LoC, kind of hard to digest. You can also post the clean up patches separately, no need for large series here.
On Wed, 12 Jun 2019 at 22:49, Jakub Kicinski <jakub.kicinski@netronome.com> wrote: > > On Wed, 12 Jun 2019 15:56:33 +0000, Maxim Mikityanskiy wrote: > > UAPI is not changed, XSK RX queues are exposed to the kernel. The lower > > half of the available amount of RX queues are regular queues, and the > > upper half are XSK RX queues. > > If I have 32 queues enabled on the NIC and I install AF_XDP socket on > queue 10, does the NIC now have 64 RQs, but only first 32 are in the > normal RSS map? > Additional, related, question to Jakub's: Say that I'd like to hijack all 32 Rx queues of the NIC. I create 32 AF_XDP socket and attach them in zero-copy mode to the device. What's the result? > > The patch "xsk: Extend channels to support combined XSK/non-XSK > > traffic" was dropped. The final patch was reworked accordingly. > > The final patches has 2k LoC, kind of hard to digest. You can also > post the clean up patches separately, no need for large series here.
On 2019-06-12 23:48, Jakub Kicinski wrote: > On Wed, 12 Jun 2019 15:56:33 +0000, Maxim Mikityanskiy wrote: >> UAPI is not changed, XSK RX queues are exposed to the kernel. The lower >> half of the available amount of RX queues are regular queues, and the >> upper half are XSK RX queues. > > If I have 32 queues enabled on the NIC Let's say we have 32 combined channels. In this case RX queues 0..31 are regular ones, and 32..63 are XSK-ZC-enabled. > and I install AF_XDP socket on > queue 10 It'll trigger the compatibility mode of AF_XDP (without zero copy). You should use queue 42, which is in the 32..63 set. > , does the NIC now have 64 RQs, but only first 32 are in the > normal RSS map? Only the regular 0..31 RX queues are part of RSS. > >> The patch "xsk: Extend channels to support combined XSK/non-XSK >> traffic" was dropped. The final patch was reworked accordingly. > > The final patches has 2k LoC, kind of hard to digest. You can also > post the clean up patches separately, no need for large series here. > I used to have the final patch as three patches (add XSK stubs, add RX support and add TX support), but I prefer not to have this separation, because it doesn't look right to add empty stub functions with /* TODO: implement */ comments in one patch and to add the implementations immediately in the next patch. Thanks for reviewing! Max
On 2019-06-13 15:58, Björn Töpel wrote: > On Wed, 12 Jun 2019 at 22:49, Jakub Kicinski > <jakub.kicinski@netronome.com> wrote: >> >> On Wed, 12 Jun 2019 15:56:33 +0000, Maxim Mikityanskiy wrote: >>> UAPI is not changed, XSK RX queues are exposed to the kernel. The lower >>> half of the available amount of RX queues are regular queues, and the >>> upper half are XSK RX queues. >> >> If I have 32 queues enabled on the NIC and I install AF_XDP socket on >> queue 10, does the NIC now have 64 RQs, but only first 32 are in the >> normal RSS map? >> > > Additional, related, question to Jakub's: Say that I'd like to hijack > all 32 Rx queues of the NIC. I create 32 AF_XDP socket and attach them > in zero-copy mode to the device. What's the result? There are 32 regular RX queues (0..31) and 32 XSK RX queues (32..63). If you want 32 zero-copy AF_XDP sockets, you can attach them to queues 32..63, and the regular traffic won't be affected at all. >>> The patch "xsk: Extend channels to support combined XSK/non-XSK >>> traffic" was dropped. The final patch was reworked accordingly. >> >> The final patches has 2k LoC, kind of hard to digest. You can also >> post the clean up patches separately, no need for large series here.
On 2019-06-13 16:01, Maxim Mikityanskiy wrote: > On 2019-06-13 15:58, Björn Töpel wrote: >> On Wed, 12 Jun 2019 at 22:49, Jakub Kicinski >> <jakub.kicinski@netronome.com> wrote: >>> >>> On Wed, 12 Jun 2019 15:56:33 +0000, Maxim Mikityanskiy wrote: >>>> UAPI is not changed, XSK RX queues are exposed to the kernel. The lower >>>> half of the available amount of RX queues are regular queues, and the >>>> upper half are XSK RX queues. >>> >>> If I have 32 queues enabled on the NIC and I install AF_XDP socket on >>> queue 10, does the NIC now have 64 RQs, but only first 32 are in the >>> normal RSS map? >>> >> >> Additional, related, question to Jakub's: Say that I'd like to hijack >> all 32 Rx queues of the NIC. I create 32 AF_XDP socket and attach them >> in zero-copy mode to the device. What's the result? > > There are 32 regular RX queues (0..31) and 32 XSK RX queues (32..63). If > you want 32 zero-copy AF_XDP sockets, you can attach them to queues > 32..63, and the regular traffic won't be affected at all. > Thanks for getting back! More questions! Ok, so I cannot (with zero-copy) get the regular traffic into AF_XDP sockets? How does qids map? Can I only bind a zero-copy socket to qid 32..63 in the example above? Say that I have a a copy-mode AF_XDP socket bound to queue 2. In this case I will receive the regular traffic from queue 2. Enabling zero-copy for the same queue, will this give an error, or receive AF_XDP specific traffic from queue 2+32? Or return an error, and require an explicit bind to any of the queues 32..63? Thanks, Björn
On Thu, 13 Jun 2019 at 16:11, Björn Töpel <bjorn.topel@intel.com> wrote: > > > On 2019-06-13 16:01, Maxim Mikityanskiy wrote: > > On 2019-06-13 15:58, Björn Töpel wrote: > >> On Wed, 12 Jun 2019 at 22:49, Jakub Kicinski > >> <jakub.kicinski@netronome.com> wrote: > >>> > >>> On Wed, 12 Jun 2019 15:56:33 +0000, Maxim Mikityanskiy wrote: > >>>> UAPI is not changed, XSK RX queues are exposed to the kernel. The lower > >>>> half of the available amount of RX queues are regular queues, and the > >>>> upper half are XSK RX queues. > >>> > >>> If I have 32 queues enabled on the NIC and I install AF_XDP socket on > >>> queue 10, does the NIC now have 64 RQs, but only first 32 are in the > >>> normal RSS map? > >>> > >> > >> Additional, related, question to Jakub's: Say that I'd like to hijack > >> all 32 Rx queues of the NIC. I create 32 AF_XDP socket and attach them > >> in zero-copy mode to the device. What's the result? > > > > There are 32 regular RX queues (0..31) and 32 XSK RX queues (32..63). If > > you want 32 zero-copy AF_XDP sockets, you can attach them to queues > > 32..63, and the regular traffic won't be affected at all. > > > Thanks for getting back! More questions! > > Ok, so I cannot (with zero-copy) get the regular traffic into AF_XDP > sockets? > > How does qids map? Can I only bind a zero-copy socket to qid 32..63 in > the example above? > > Say that I have a a copy-mode AF_XDP socket bound to queue 2. In this > case I will receive the regular traffic from queue 2. Enabling zero-copy > for the same queue, will this give an error, or receive AF_XDP specific > traffic from queue 2+32? Or return an error, and require an explicit > bind to any of the queues 32..63? > > Let me expand a bit on why I'm asking these qid questions. It's unfortunate that vendors have different view/mapping on "qids". For Intel, we allow to bind a zero-copy socket to all Rx qids. For Mellanox, a certain set of qids are allowed for zero-copy sockets. This highlights a need for a better abstraction for queues than "some queue id from ethtool". This will take some time, and I think that we have to accept for now that we'll have different behavior/mapping for zero-copy sockets on different NICs. Let's address this need for a better queue abstraction, but that shouldn't block this series IMO. Other than patch: "[PATCH bpf-next v4 07/17] libbpf: Support drivers with non-combined channels" which I'd like to see a bit more discussion on, I'm OK with this series. I haven't been able to test it (no hardware "hint, hint"), but I know Jonathan has been running it. Thanks for working on this, Max! Björn