diff mbox series

[v3,bpf-next,1/6] bpf: Create BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY

Message ID 20190528034907.1957536-2-brakmo@fb.com
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series bpf: Propagate cn to TCP | expand

Commit Message

Lawrence Brakmo May 28, 2019, 3:49 a.m. UTC
Create new macro BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY() to be used by
__cgroup_bpf_run_filter_skb for EGRESS BPF progs so BPF programs can
request cwr for TCP packets.

Current cgroup skb programs can only return 0 or 1 (0 to drop the
packet. This macro changes the behavior so the low order bit
indicates whether the packet should be dropped (0) or not (1)
and the next bit is used for congestion notification (cn).

Hence, new allowed return values of CGROUP EGRESS BPF programs are:
  0: drop packet
  1: keep packet
  2: drop packet and call cwr
  3: keep packet and call cwr

This macro then converts it to one of NET_XMIT values or -EPERM
that has the effect of dropping the packet with no cn.
  0: NET_XMIT_SUCCESS  skb should be transmitted (no cn)
  1: NET_XMIT_DROP     skb should be dropped and cwr called
  2: NET_XMIT_CN       skb should be transmitted and cwr called
  3: -EPERM            skb should be dropped (no cn)

Note that when more than one BPF program is called, the packet is
dropped if at least one of programs requests it be dropped, and
there is cn if at least one program returns cn.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 include/linux/bpf.h | 50 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

Comments

Eric Dumazet May 28, 2019, 1:42 p.m. UTC | #1
On 5/27/19 8:49 PM, brakmo wrote:
> Create new macro BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY() to be used by
> __cgroup_bpf_run_filter_skb for EGRESS BPF progs so BPF programs can
> request cwr for TCP packets.
> 

...

> +#define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func)		\
> +	({						\
> +		struct bpf_prog_array_item *_item;	\
> +		struct bpf_prog *_prog;			\
> +		struct bpf_prog_array *_array;		\
> +		u32 ret;				\
> +		u32 _ret = 1;				\
> +		u32 _cn = 0;				\
> +		preempt_disable();			\
> +		rcu_read_lock();			\
> +		_array = rcu_dereference(array);	\

Why _array can not be NULL here ?

> +		_item = &_array->items[0];		\
> +		while ((_prog = READ_ONCE(_item->prog))) {		\
> +			bpf_cgroup_storage_set(_item->cgroup_storage);	\
> +			ret = func(_prog, ctx);		\
> +			_ret &= (ret & 1);		\
> +			_cn |= (ret & 2);		\
> +			_item++;			\
> +		}					\
> +		rcu_read_unlock();			\
> +		preempt_enable_no_resched();	

Why are you using preempt_enable_no_resched() here ?

	\
> +		if (_ret)				\
> +			_ret = (_cn ? NET_XMIT_CN : NET_XMIT_SUCCESS);	\
> +		else					\
> +			_ret = (_cn ? NET_XMIT_DROP : -EPERM);		\
> +		_ret;					\
> +	})
> +
>  #define BPF_PROG_RUN_ARRAY(array, ctx, func)		\
>  	__BPF_PROG_RUN_ARRAY(array, ctx, func, false)
>  
>
Lawrence Brakmo May 28, 2019, 6:54 p.m. UTC | #2
On 5/28/19, 6:43 AM, "netdev-owner@vger.kernel.org on behalf of Eric Dumazet" <netdev-owner@vger.kernel.org on behalf of eric.dumazet@gmail.com> wrote:

    
    
    On 5/27/19 8:49 PM, brakmo wrote:
    > Create new macro BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY() to be used by
    > __cgroup_bpf_run_filter_skb for EGRESS BPF progs so BPF programs can
    > request cwr for TCP packets.
    > 
    
    ...
    
    > +#define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func)		\
    > +	({						\
    > +		struct bpf_prog_array_item *_item;	\
    > +		struct bpf_prog *_prog;			\
    > +		struct bpf_prog_array *_array;		\
    > +		u32 ret;				\
    > +		u32 _ret = 1;				\
    > +		u32 _cn = 0;				\
    > +		preempt_disable();			\
    > +		rcu_read_lock();			\
    > +		_array = rcu_dereference(array);	\
    
    Why _array can not be NULL here ?

I replaced the call chain 
  BPF_CGROUP_RUN_PROG_INET_EGRESS()
    __cgroup_bpf_run_filter_skb()
      BPF_PROG_RUN_ARRAY()
        __BPF_PROG_RUN_ARRAY( , , false)

With
  BPF_CGROUP_RUN_PROG_INET_EGRESS()
    __cgroup_bpf_run_filter_skb()
      BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY()

BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY() is a instantiation of
__BPF_PROG_RUN_ARRAY() with the argument "check_non_null" = false
Hence I removed the test "if (unlikely(check_non_null && !_array))"
since it always fails. I have assumed the existing code is correct.

    
    > +		_item = &_array->items[0];		\
    > +		while ((_prog = READ_ONCE(_item->prog))) {		\
    > +			bpf_cgroup_storage_set(_item->cgroup_storage);	\
    > +			ret = func(_prog, ctx);		\
    > +			_ret &= (ret & 1);		\
    > +			_cn |= (ret & 2);		\
    > +			_item++;			\
    > +		}					\
    > +		rcu_read_unlock();			\
    > +		preempt_enable_no_resched();	
    
    Why are you using preempt_enable_no_resched() here ?

Because that is what __BPF_PROG_RUN_ARRAY() calls and the macro
BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY() is an instantiation of it
(with minor changes in the return value).
    
    	\
    > +		if (_ret)				\
    > +			_ret = (_cn ? NET_XMIT_CN : NET_XMIT_SUCCESS);	\
    > +		else					\
    > +			_ret = (_cn ? NET_XMIT_DROP : -EPERM);		\
    > +		_ret;					\
    > +	})
    > +
    >  #define BPF_PROG_RUN_ARRAY(array, ctx, func)		\
    >  	__BPF_PROG_RUN_ARRAY(array, ctx, func, false)
    >  
    >
Eric Dumazet May 28, 2019, 8:43 p.m. UTC | #3
On 5/28/19 11:54 AM, Lawrence Brakmo wrote:
> On 5/28/19, 6:43 AM, "netdev-owner@vger.kernel.org on behalf of Eric Dumazet" <netdev-owner@vger.kernel.org on behalf of eric.dumazet@gmail.com> wrote:
> 

>     Why are you using preempt_enable_no_resched() here ?
> 
> Because that is what __BPF_PROG_RUN_ARRAY() calls and the macro
> BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY() is an instantiation of it
> (with minor changes in the return value).

I do not see this in my tree.

Please rebase your tree, do not bring back an issue that was solved already.
Lawrence Brakmo May 28, 2019, 9:23 p.m. UTC | #4
On 5/28/19, 1:43 PM, "Eric Dumazet" <eric.dumazet@gmail.com> wrote:

     
    On 5/28/19 11:54 AM, Lawrence Brakmo wrote:
    > On 5/28/19, 6:43 AM, "netdev-owner@vger.kernel.org on behalf of Eric Dumazet" <netdev-owner@vger.kernel.org on behalf of eric.dumazet@gmail.com> wrote:
    > 
    
    >     Why are you using preempt_enable_no_resched() here ?
    > 
    > Because that is what __BPF_PROG_RUN_ARRAY() calls and the macro
    > BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY() is an instantiation of it
    > (with minor changes in the return value).
    
    I do not see this in my tree.
    
    Please rebase your tree, do not bring back an issue that was solved already.

My mistake. My tree is up to date, I looked in the wrong place and did
not realized the call had changed. I will fix it and resubmit again.
Thank you for the feedback.
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index d98141edb74b..49be4f88454c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -552,6 +552,56 @@  _out:							\
 		_ret;					\
 	 })
 
+/* To be used by __cgroup_bpf_run_filter_skb for EGRESS BPF progs
+ * so BPF programs can request cwr for TCP packets.
+ *
+ * Current cgroup skb programs can only return 0 or 1 (0 to drop the
+ * packet. This macro changes the behavior so the low order bit
+ * indicates whether the packet should be dropped (0) or not (1)
+ * and the next bit is a congestion notification bit. This could be
+ * used by TCP to call tcp_enter_cwr()
+ *
+ * Hence, new allowed return values of CGROUP EGRESS BPF programs are:
+ *   0: drop packet
+ *   1: keep packet
+ *   2: drop packet and cn
+ *   3: keep packet and cn
+ *
+ * This macro then converts it to one of the NET_XMIT or an error
+ * code that is then interpreted as drop packet (and no cn):
+ *   0: NET_XMIT_SUCCESS  skb should be transmitted
+ *   1: NET_XMIT_DROP     skb should be dropped and cn
+ *   2: NET_XMIT_CN       skb should be transmitted and cn
+ *   3: -EPERM            skb should be dropped
+ */
+#define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func)		\
+	({						\
+		struct bpf_prog_array_item *_item;	\
+		struct bpf_prog *_prog;			\
+		struct bpf_prog_array *_array;		\
+		u32 ret;				\
+		u32 _ret = 1;				\
+		u32 _cn = 0;				\
+		preempt_disable();			\
+		rcu_read_lock();			\
+		_array = rcu_dereference(array);	\
+		_item = &_array->items[0];		\
+		while ((_prog = READ_ONCE(_item->prog))) {		\
+			bpf_cgroup_storage_set(_item->cgroup_storage);	\
+			ret = func(_prog, ctx);		\
+			_ret &= (ret & 1);		\
+			_cn |= (ret & 2);		\
+			_item++;			\
+		}					\
+		rcu_read_unlock();			\
+		preempt_enable_no_resched();		\
+		if (_ret)				\
+			_ret = (_cn ? NET_XMIT_CN : NET_XMIT_SUCCESS);	\
+		else					\
+			_ret = (_cn ? NET_XMIT_DROP : -EPERM);		\
+		_ret;					\
+	})
+
 #define BPF_PROG_RUN_ARRAY(array, ctx, func)		\
 	__BPF_PROG_RUN_ARRAY(array, ctx, func, false)