[bpf-next,v2,06/18] bpf: sockmap, add bpf_msg_apply_bytes() helper

Message ID 20180312192334.8039.98220.stgit@john-Precision-Tower-5810
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series
  • bpf,sockmap: sendmsg/sendfile ULP
Related show

Commit Message

John Fastabend March 12, 2018, 7:23 p.m.
A single sendmsg or sendfile system call can contain multiple logical
messages that a BPF program may want to read and apply a verdict. But,
without an apply_bytes helper any verdict on the data applies to all
bytes in the sendmsg/sendfile. Alternatively, a BPF program may only
care to read the first N bytes of a msg. If the payload is large say
MB or even GB setting up and calling the BPF program repeatedly for
all bytes, even though the verdict is already known, creates
unnecessary overhead.

To allow BPF programs to control how many bytes a given verdict
applies to we implement a bpf_msg_apply_bytes() helper. When called
from within a BPF program this sets a counter, internal to the
BPF infrastructure, that applies the last verdict to the next N
bytes. If the N is smaller than the current data being processed
from a sendmsg/sendfile call, the first N bytes will be sent and
the BPF program will be re-run with start_data pointing to the N+1
byte. If N is larger than the current data being processed the
BPF verdict will be applied to multiple sendmsg/sendfile calls
until N bytes are consumed.

Note1 if a socket closes with apply_bytes counter non-zero this
is not a problem because data is not being buffered for N bytes
and is sent as its received.

Note2 if this is operating in the sendpage context the data
pointers may be zeroed after this call if the apply walks beyond
a msg_pull_data() call specified data range. (helper implemented
shortly in this series).

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/uapi/linux/bpf.h |    3 ++-
 net/core/filter.c        |   16 ++++++++++++++++
 2 files changed, 18 insertions(+), 1 deletion(-)

Comments

David Miller March 15, 2018, 6:41 p.m. | #1
From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:23:34 -0700

> A single sendmsg or sendfile system call can contain multiple logical
> messages that a BPF program may want to read and apply a verdict. But,
> without an apply_bytes helper any verdict on the data applies to all
> bytes in the sendmsg/sendfile. Alternatively, a BPF program may only
> care to read the first N bytes of a msg. If the payload is large say
> MB or even GB setting up and calling the BPF program repeatedly for
> all bytes, even though the verdict is already known, creates
> unnecessary overhead.
> 
> To allow BPF programs to control how many bytes a given verdict
> applies to we implement a bpf_msg_apply_bytes() helper. When called
> from within a BPF program this sets a counter, internal to the
> BPF infrastructure, that applies the last verdict to the next N
> bytes. If the N is smaller than the current data being processed
> from a sendmsg/sendfile call, the first N bytes will be sent and
> the BPF program will be re-run with start_data pointing to the N+1
> byte. If N is larger than the current data being processed the
> BPF verdict will be applied to multiple sendmsg/sendfile calls
> until N bytes are consumed.
> 
> Note1 if a socket closes with apply_bytes counter non-zero this
> is not a problem because data is not being buffered for N bytes
> and is sent as its received.
> 
> Note2 if this is operating in the sendpage context the data
> pointers may be zeroed after this call if the apply walks beyond
> a msg_pull_data() call specified data range. (helper implemented
> shortly in this series).
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann March 15, 2018, 8:32 p.m. | #2
On 03/12/2018 08:23 PM, John Fastabend wrote:
> A single sendmsg or sendfile system call can contain multiple logical
> messages that a BPF program may want to read and apply a verdict. But,
> without an apply_bytes helper any verdict on the data applies to all
> bytes in the sendmsg/sendfile. Alternatively, a BPF program may only
> care to read the first N bytes of a msg. If the payload is large say
> MB or even GB setting up and calling the BPF program repeatedly for
> all bytes, even though the verdict is already known, creates
> unnecessary overhead.
> 
> To allow BPF programs to control how many bytes a given verdict
> applies to we implement a bpf_msg_apply_bytes() helper. When called
> from within a BPF program this sets a counter, internal to the
> BPF infrastructure, that applies the last verdict to the next N
> bytes. If the N is smaller than the current data being processed
> from a sendmsg/sendfile call, the first N bytes will be sent and
> the BPF program will be re-run with start_data pointing to the N+1
> byte. If N is larger than the current data being processed the
> BPF verdict will be applied to multiple sendmsg/sendfile calls
> until N bytes are consumed.
> 
> Note1 if a socket closes with apply_bytes counter non-zero this
> is not a problem because data is not being buffered for N bytes
> and is sent as its received.
> 
> Note2 if this is operating in the sendpage context the data
> pointers may be zeroed after this call if the apply walks beyond
> a msg_pull_data() call specified data range. (helper implemented
> shortly in this series).
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
>  include/uapi/linux/bpf.h |    3 ++-
>  net/core/filter.c        |   16 ++++++++++++++++
>  2 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index b8275f0..e50c61f 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -769,7 +769,8 @@ enum bpf_attach_type {
>  	FN(getsockopt),			\
>  	FN(override_return),		\
>  	FN(sock_ops_cb_flags_set),	\
> -	FN(msg_redirect_map),
> +	FN(msg_redirect_map),		\
> +	FN(msg_apply_bytes),
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 314c311..df2a8f4 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -1928,6 +1928,20 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>  	.arg4_type      = ARG_ANYTHING,
>  };
>  
> +BPF_CALL_2(bpf_msg_apply_bytes, struct sk_msg_buff *, msg, u64, bytes)
> +{
> +	msg->apply_bytes = bytes;

Here in bpf_msg_apply_bytes() but also in bpf_msg_cork_bytes() the signature
is u64, but in struct sk_msg_buff and struct smap_psock it's type int, so
user provided u64 will make these negative. Is there a reason to have this
allow a negative value and not being of type u32 everywhere?

> +	return 0;
> +}
> +
> +static const struct bpf_func_proto bpf_msg_apply_bytes_proto = {
> +	.func           = bpf_msg_apply_bytes,
> +	.gpl_only       = false,
> +	.ret_type       = RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_CTX,
> +	.arg2_type      = ARG_ANYTHING,
> +};
> +
>  BPF_CALL_1(bpf_get_cgroup_classid, const struct sk_buff *, skb)
>  {
>  	return task_get_classid(skb);
> @@ -3634,6 +3648,8 @@ static const struct bpf_func_proto *sk_msg_func_proto(enum bpf_func_id func_id)
>  	switch (func_id) {
>  	case BPF_FUNC_msg_redirect_map:
>  		return &bpf_msg_redirect_map_proto;
> +	case BPF_FUNC_msg_apply_bytes:
> +		return &bpf_msg_apply_bytes_proto;
>  	default:
>  		return bpf_base_func_proto(func_id);
>  	}
>
Alexei Starovoitov March 15, 2018, 9:45 p.m. | #3
On Mon, Mar 12, 2018 at 12:23:34PM -0700, John Fastabend wrote:
> A single sendmsg or sendfile system call can contain multiple logical
> messages that a BPF program may want to read and apply a verdict. But,
> without an apply_bytes helper any verdict on the data applies to all
> bytes in the sendmsg/sendfile. Alternatively, a BPF program may only
> care to read the first N bytes of a msg. If the payload is large say
> MB or even GB setting up and calling the BPF program repeatedly for
> all bytes, even though the verdict is already known, creates
> unnecessary overhead.
> 
> To allow BPF programs to control how many bytes a given verdict
> applies to we implement a bpf_msg_apply_bytes() helper. When called
> from within a BPF program this sets a counter, internal to the
> BPF infrastructure, that applies the last verdict to the next N
> bytes. If the N is smaller than the current data being processed
> from a sendmsg/sendfile call, the first N bytes will be sent and
> the BPF program will be re-run with start_data pointing to the N+1
> byte. If N is larger than the current data being processed the
> BPF verdict will be applied to multiple sendmsg/sendfile calls
> until N bytes are consumed.
> 
> Note1 if a socket closes with apply_bytes counter non-zero this
> is not a problem because data is not being buffered for N bytes
> and is sent as its received.
> 
> Note2 if this is operating in the sendpage context the data
> pointers may be zeroed after this call if the apply walks beyond
> a msg_pull_data() call specified data range. (helper implemented
> shortly in this series).

instead of 'shortly in this seris' you meant 'implemented earlier'?
patch 5 handles it, but it's set here, right?

The semantics of the helper looks great.
John Fastabend March 15, 2018, 9:59 p.m. | #4
On 03/15/2018 02:45 PM, Alexei Starovoitov wrote:
> On Mon, Mar 12, 2018 at 12:23:34PM -0700, John Fastabend wrote:
>> A single sendmsg or sendfile system call can contain multiple logical
>> messages that a BPF program may want to read and apply a verdict. But,
>> without an apply_bytes helper any verdict on the data applies to all
>> bytes in the sendmsg/sendfile. Alternatively, a BPF program may only
>> care to read the first N bytes of a msg. If the payload is large say
>> MB or even GB setting up and calling the BPF program repeatedly for
>> all bytes, even though the verdict is already known, creates
>> unnecessary overhead.
>>
>> To allow BPF programs to control how many bytes a given verdict
>> applies to we implement a bpf_msg_apply_bytes() helper. When called
>> from within a BPF program this sets a counter, internal to the
>> BPF infrastructure, that applies the last verdict to the next N
>> bytes. If the N is smaller than the current data being processed
>> from a sendmsg/sendfile call, the first N bytes will be sent and
>> the BPF program will be re-run with start_data pointing to the N+1
>> byte. If N is larger than the current data being processed the
>> BPF verdict will be applied to multiple sendmsg/sendfile calls
>> until N bytes are consumed.
>>
>> Note1 if a socket closes with apply_bytes counter non-zero this
>> is not a problem because data is not being buffered for N bytes
>> and is sent as its received.
>>
>> Note2 if this is operating in the sendpage context the data
>> pointers may be zeroed after this call if the apply walks beyond
>> a msg_pull_data() call specified data range. (helper implemented
>> shortly in this series).
> 
> instead of 'shortly in this seris' you meant 'implemented earlier'?
> patch 5 handles it, but it's set here, right?
> 

Yep just a hold-over from an earlier patch description. I'll remove
that entire note2 and fixup a couple small things Daniel noticed
with a v3.

> The semantics of the helper looks great.
> 

Great!
John Fastabend March 15, 2018, 10:02 p.m. | #5
On 03/15/2018 01:32 PM, Daniel Borkmann wrote:
> On 03/12/2018 08:23 PM, John Fastabend wrote:
>> A single sendmsg or sendfile system call can contain multiple logical
>> messages that a BPF program may want to read and apply a verdict. But,
>> without an apply_bytes helper any verdict on the data applies to all
>> bytes in the sendmsg/sendfile. Alternatively, a BPF program may only
>> care to read the first N bytes of a msg. If the payload is large say
>> MB or even GB setting up and calling the BPF program repeatedly for
>> all bytes, even though the verdict is already known, creates
>> unnecessary overhead.
>>
>> To allow BPF programs to control how many bytes a given verdict
>> applies to we implement a bpf_msg_apply_bytes() helper. When called
>> from within a BPF program this sets a counter, internal to the
>> BPF infrastructure, that applies the last verdict to the next N
>> bytes. If the N is smaller than the current data being processed
>> from a sendmsg/sendfile call, the first N bytes will be sent and
>> the BPF program will be re-run with start_data pointing to the N+1
>> byte. If N is larger than the current data being processed the
>> BPF verdict will be applied to multiple sendmsg/sendfile calls
>> until N bytes are consumed.
>>
>> Note1 if a socket closes with apply_bytes counter non-zero this
>> is not a problem because data is not being buffered for N bytes
>> and is sent as its received.
>>
>> Note2 if this is operating in the sendpage context the data
>> pointers may be zeroed after this call if the apply walks beyond
>> a msg_pull_data() call specified data range. (helper implemented
>> shortly in this series).
>>
>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
>> ---
>>  include/uapi/linux/bpf.h |    3 ++-
>>  net/core/filter.c        |   16 ++++++++++++++++
>>  2 files changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index b8275f0..e50c61f 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -769,7 +769,8 @@ enum bpf_attach_type {
>>  	FN(getsockopt),			\
>>  	FN(override_return),		\
>>  	FN(sock_ops_cb_flags_set),	\
>> -	FN(msg_redirect_map),
>> +	FN(msg_redirect_map),		\
>> +	FN(msg_apply_bytes),
>>  
>>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>>   * function eBPF program intends to call
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 314c311..df2a8f4 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -1928,6 +1928,20 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>>  	.arg4_type      = ARG_ANYTHING,
>>  };
>>  
>> +BPF_CALL_2(bpf_msg_apply_bytes, struct sk_msg_buff *, msg, u64, bytes)
>> +{
>> +	msg->apply_bytes = bytes;
> 
> Here in bpf_msg_apply_bytes() but also in bpf_msg_cork_bytes() the signature
> is u64, but in struct sk_msg_buff and struct smap_psock it's type int, so
> user provided u64 will make these negative. Is there a reason to have this
> allow a negative value and not being of type u32 everywhere?
> 

Nope no reason for negative values, we can make it consistently
u32.

Patch

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b8275f0..e50c61f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -769,7 +769,8 @@  enum bpf_attach_type {
 	FN(getsockopt),			\
 	FN(override_return),		\
 	FN(sock_ops_cb_flags_set),	\
-	FN(msg_redirect_map),
+	FN(msg_redirect_map),		\
+	FN(msg_apply_bytes),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index 314c311..df2a8f4 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1928,6 +1928,20 @@  struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
 	.arg4_type      = ARG_ANYTHING,
 };
 
+BPF_CALL_2(bpf_msg_apply_bytes, struct sk_msg_buff *, msg, u64, bytes)
+{
+	msg->apply_bytes = bytes;
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_msg_apply_bytes_proto = {
+	.func           = bpf_msg_apply_bytes,
+	.gpl_only       = false,
+	.ret_type       = RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type      = ARG_ANYTHING,
+};
+
 BPF_CALL_1(bpf_get_cgroup_classid, const struct sk_buff *, skb)
 {
 	return task_get_classid(skb);
@@ -3634,6 +3648,8 @@  static const struct bpf_func_proto *sk_msg_func_proto(enum bpf_func_id func_id)
 	switch (func_id) {
 	case BPF_FUNC_msg_redirect_map:
 		return &bpf_msg_redirect_map_proto;
+	case BPF_FUNC_msg_apply_bytes:
+		return &bpf_msg_apply_bytes_proto;
 	default:
 		return bpf_base_func_proto(func_id);
 	}