diff mbox series

[bpf] xsk: add missing memory barrier in xskq_has_addrs()

Message ID 1575021070-28873-1-git-send-email-magnus.karlsson@intel.com
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series [bpf] xsk: add missing memory barrier in xskq_has_addrs() | expand

Commit Message

Magnus Karlsson Nov. 29, 2019, 9:51 a.m. UTC
The rings in AF_XDP between user space and kernel space have the
following semantics:

producer                         consumer

if (LOAD ->consumer) {           LOAD ->producer
                   (A)           smp_rmb()       (C)
   STORE $data                   LOAD $data
   smp_wmb()       (B)           smp_mb()        (D)
   STORE ->producer              STORE ->consumer
}

The consumer function xskq_has_addrs() below loads the producer
pointer and updates the locally cached copy of it. However, it does
not issue the smp_rmb() operation required by the lockless ring. This
would have been ok had the function not updated the locally cached
copy, as that could not have resulted in new data being read from the
ring. But as it updates the local producer pointer, a subsequent peek
operation, such as xskq_peek_addr(), might load data from the ring
without issuing the required smp_rmb() memory barrier.

static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
{
        u32 entries = q->prod_tail - q->cons_tail;

        if (entries >= cnt)
                return true;

        /* Refresh the local pointer. */
        q->prod_tail = READ_ONCE(q->ring->producer);
	*** MISSING MEMORY BARRIER ***
        entries = q->prod_tail - q->cons_tail;

        return entries >= cnt;
}

Fix this by adding the missing memory barrier at the indicated point
above.

Fixes: d57d76428ae9 ("Add API to check for available entries in FQ")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 net/xdp/xsk_queue.h | 1 +
 1 file changed, 1 insertion(+)

Comments

Maxim Mikityanskiy Dec. 2, 2019, 9:30 a.m. UTC | #1
On 2019-11-29 11:51, Magnus Karlsson wrote:
> The rings in AF_XDP between user space and kernel space have the
> following semantics:
> 
> producer                         consumer
> 
> if (LOAD ->consumer) {           LOAD ->producer
>                     (A)           smp_rmb()       (C)
>     STORE $data                   LOAD $data
>     smp_wmb()       (B)           smp_mb()        (D)
>     STORE ->producer              STORE ->consumer
> }
> 
> The consumer function xskq_has_addrs() below loads the producer
> pointer and updates the locally cached copy of it. However, it does
> not issue the smp_rmb() operation required by the lockless ring. This
> would have been ok had the function not updated the locally cached
> copy, as that could not have resulted in new data being read from the
> ring. But as it updates the local producer pointer, a subsequent peek
> operation, such as xskq_peek_addr(), might load data from the ring
> without issuing the required smp_rmb() memory barrier.

Thanks for paying attention to it, but I don't think it can really 
happen. xskq_has_addrs only updates prod_tail, but xskq_peek_addr 
doesn't use prod_tail, it reads from cons_tail to cons_head, and every 
cons_head update has the necessary smp_rmb.

Actually, the same thing happens with xskq_nb_avail. In xskq_full_desc, 
we don't have any barrier after xskq_nb_avail, and xskq_peek_desc can be 
called after xskq_full_desc, but it's absolutely fine, because 
xskq_nb_avail doesn't touch cons_head. The same happens with 
xskq_has_addrs and xskq_peek_addr.

So, I don't think this change is required. Please correct me if I'm wrong.

> static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
> {
>          u32 entries = q->prod_tail - q->cons_tail;
> 
>          if (entries >= cnt)
>                  return true;
> 
>          /* Refresh the local pointer. */
>          q->prod_tail = READ_ONCE(q->ring->producer);
> 	*** MISSING MEMORY BARRIER ***
>          entries = q->prod_tail - q->cons_tail;
> 
>          return entries >= cnt;
> }
> 
> Fix this by adding the missing memory barrier at the indicated point
> above.
> 
> Fixes: d57d76428ae9 ("Add API to check for available entries in FQ")
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> ---
>   net/xdp/xsk_queue.h | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index eddae46..b5492c3 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -127,6 +127,7 @@ static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
>   
>   	/* Refresh the local pointer. */
>   	q->prod_tail = READ_ONCE(q->ring->producer);
> +	smp_rmb(); /* C, matches B */
>   	entries = q->prod_tail - q->cons_tail;
>   
>   	return entries >= cnt;
>
Magnus Karlsson Dec. 2, 2019, 12:31 p.m. UTC | #2
On Mon, Dec 2, 2019 at 10:30 AM Maxim Mikityanskiy <maximmi@mellanox.com> wrote:
>
> On 2019-11-29 11:51, Magnus Karlsson wrote:
> > The rings in AF_XDP between user space and kernel space have the
> > following semantics:
> >
> > producer                         consumer
> >
> > if (LOAD ->consumer) {           LOAD ->producer
> >                     (A)           smp_rmb()       (C)
> >     STORE $data                   LOAD $data
> >     smp_wmb()       (B)           smp_mb()        (D)
> >     STORE ->producer              STORE ->consumer
> > }
> >
> > The consumer function xskq_has_addrs() below loads the producer
> > pointer and updates the locally cached copy of it. However, it does
> > not issue the smp_rmb() operation required by the lockless ring. This
> > would have been ok had the function not updated the locally cached
> > copy, as that could not have resulted in new data being read from the
> > ring. But as it updates the local producer pointer, a subsequent peek
> > operation, such as xskq_peek_addr(), might load data from the ring
> > without issuing the required smp_rmb() memory barrier.
>
> Thanks for paying attention to it, but I don't think it can really
> happen. xskq_has_addrs only updates prod_tail, but xskq_peek_addr
> doesn't use prod_tail, it reads from cons_tail to cons_head, and every
> cons_head update has the necessary smp_rmb.

You are correct, it cannot happen. I am working on a 10 part patch set
that simplifies the rings and was staring blindly at that. In that
patch set it can happen since I only have two cached pointers instead
of four so there is a dependency, but not in the current code. I will
include this barrier in my patch set at the appropriate place. Thanks
for looking into this Maxim.

Please drop this patch.

/Magnus
diff mbox series

Patch

diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index eddae46..b5492c3 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -127,6 +127,7 @@  static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
 
 	/* Refresh the local pointer. */
 	q->prod_tail = READ_ONCE(q->ring->producer);
+	smp_rmb(); /* C, matches B */
 	entries = q->prod_tail - q->cons_tail;
 
 	return entries >= cnt;