diff mbox series

[bpf-next,3/6] xsk: introduce xsk_do_redirect_rx_full() helper

Message ID 20200904135332.60259-4-bjorn.topel@gmail.com
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series xsk: exit NAPI loop when AF_XDP Rx ring is full | expand

Commit Message

Björn Töpel Sept. 4, 2020, 1:53 p.m. UTC
From: Björn Töpel <bjorn.topel@intel.com>

The xsk_do_redirect_rx_full() helper can be used to check if a failure
of xdp_do_redirect() was due to the AF_XDP socket had a full Rx ring.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 include/net/xdp_sock_drv.h | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Jesper Dangaard Brouer Sept. 4, 2020, 3:11 p.m. UTC | #1
On Fri,  4 Sep 2020 15:53:28 +0200
Björn Töpel <bjorn.topel@gmail.com> wrote:

> From: Björn Töpel <bjorn.topel@intel.com>
> 
> The xsk_do_redirect_rx_full() helper can be used to check if a failure
> of xdp_do_redirect() was due to the AF_XDP socket had a full Rx ring.

This is very AF_XDP specific.  I think that the cpumap could likely
benefit from similar approach? e.g. if the cpumap kthread is scheduled
on the same CPU.

But for cpumap we only want this behavior if sched on the same CPU as
RX-NAPI.  This could be "seen" by the cpumap code itself in the case
bq_flush_to_queue() drops packets, check if rcpu->cpu equal
smp_processor_id().  Maybe I'm taking this too far?
Björn Töpel Sept. 4, 2020, 3:39 p.m. UTC | #2
On 2020-09-04 17:11, Jesper Dangaard Brouer wrote:
> On Fri,  4 Sep 2020 15:53:28 +0200 Björn Töpel
> <bjorn.topel@gmail.com> wrote:
> 
>> From: Björn Töpel <bjorn.topel@intel.com>
>> 
>> The xsk_do_redirect_rx_full() helper can be used to check if a
>> failure of xdp_do_redirect() was due to the AF_XDP socket had a
>> full Rx ring.
> 
> This is very AF_XDP specific.  I think that the cpumap could likely 
> benefit from similar approach? e.g. if the cpumap kthread is
> scheduled on the same CPU.
> 

At least I thought this was *very* AF_XDP specific, since the kernel is
dependent of that userland runs. Allocation (source) and Rx ring (sink).
Maybe I was wrong! :-)

The thing with AF_XDP zero-copy, is that we sort of assume that if a
user enabled that most packets will have XDP_REDIRECT to an AF_XDP socket.


> But for cpumap we only want this behavior if sched on the same CPU
> as RX-NAPI.  This could be "seen" by the cpumap code itself in the
> case bq_flush_to_queue() drops packets, check if rcpu->cpu equal 
> smp_processor_id().  Maybe I'm taking this too far?
> 

Interesting. So, if you're running on the same core, and redirect fail
for CPUMAP, you'd like to yield the NAPI loop? Is that really OK from a
fairness perspective? I mean, with AF_XDP zero-copy we pretty much know
that all actions will be redirect to socket. For CPUMAP type of
applications, can that assumption be made?


Björn
Jesper Dangaard Brouer Sept. 7, 2020, 12:45 p.m. UTC | #3
On Fri, 4 Sep 2020 17:39:17 +0200
Björn Töpel <bjorn.topel@intel.com> wrote:

> On 2020-09-04 17:11, Jesper Dangaard Brouer wrote:
> > On Fri,  4 Sep 2020 15:53:28 +0200 Björn Töpel
> > <bjorn.topel@gmail.com> wrote:
> >   
> >> From: Björn Töpel <bjorn.topel@intel.com>
> >> 
> >> The xsk_do_redirect_rx_full() helper can be used to check if a
> >> failure of xdp_do_redirect() was due to the AF_XDP socket had a
> >> full Rx ring.  
> > 
> > This is very AF_XDP specific.  I think that the cpumap could likely 
> > benefit from similar approach? e.g. if the cpumap kthread is
> > scheduled on the same CPU.
> >   
> 
> At least I thought this was *very* AF_XDP specific, since the kernel is
> dependent of that userland runs. Allocation (source) and Rx ring (sink).
> Maybe I was wrong! :-)
> 
> The thing with AF_XDP zero-copy, is that we sort of assume that if a
> user enabled that most packets will have XDP_REDIRECT to an AF_XDP socket.
> 
> 
> > But for cpumap we only want this behavior if sched on the same CPU
> > as RX-NAPI.  This could be "seen" by the cpumap code itself in the
> > case bq_flush_to_queue() drops packets, check if rcpu->cpu equal 
> > smp_processor_id().  Maybe I'm taking this too far?
> >   
> 
> Interesting. So, if you're running on the same core, and redirect fail
> for CPUMAP, you'd like to yield the NAPI loop? Is that really OK from a
> fairness perspective? I mean, with AF_XDP zero-copy we pretty much know
> that all actions will be redirect to socket. For CPUMAP type of
> applications, can that assumption be made?

Yes, you are right.  The RX NAPI loop could be doing something else,
and yielding the NAPI loop due to detecting same-CPU is stalling on
cpumap delivery might not be correct action.

I just tested the same-CPU processing case for cpumap (result below
signature), and it doesn't exhibit the bad 'dropping-off-edge'
performance slowdown.  The cpumap code also already tries to mitigate
this, by calling wake_up_process() for every 8 packets (CPU_MAP_BULK_SIZE).

I find your patchset very interesting, as I believe we do need some
kind of general push-back "flow-control" mechanism for XDP. Maybe I
should solve this differently in our XDP-TX-QoS pipe dream ;-)
diff mbox series

Patch

diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index 5b1ee8a9976d..34c58b5fbc28 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -116,6 +116,11 @@  static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool,
 	xp_dma_sync_for_device(pool, dma, size);
 }
 
+static inline bool xsk_do_redirect_rx_full(int err, enum bpf_map_type map_type)
+{
+	return err == -ENOBUFS && map_type == BPF_MAP_TYPE_XSKMAP;
+}
+
 #else
 
 static inline void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries)
@@ -235,6 +240,10 @@  static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool,
 {
 }
 
+static inline bool xsk_do_redirect_rx_full(int err, enum bpf_map_type map_type)
+{
+	return false;
+}
 #endif /* CONFIG_XDP_SOCKETS */
 
 #endif /* _LINUX_XDP_SOCK_DRV_H */