diff mbox series

[bpf-next,1/2] bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB

Message ID 20181017055606.353449-2-songliubraving@fb.com
State Changes Requested, archived
Delegated to: BPF Maintainers
Headers show
Series bpf: add cg_skb_is_valid_access | expand

Commit Message

Song Liu Oct. 17, 2018, 5:56 a.m. UTC
BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the
skb. This patch enables direct access of skb for these programs.

In __cgroup_bpf_run_filter_skb(), bpf_compute_data_pointers() is called
to compute proper data_end for the BPF program.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 kernel/bpf/cgroup.c |  4 ++++
 net/core/filter.c   | 26 +++++++++++++++++++++++++-
 2 files changed, 29 insertions(+), 1 deletion(-)

Comments

Alexei Starovoitov Oct. 17, 2018, 5:26 p.m. UTC | #1
On Tue, Oct 16, 2018 at 10:56:05PM -0700, Song Liu wrote:
> BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the
> skb. This patch enables direct access of skb for these programs.

The lack of direct packet access in CGROUP_SKB progs was
an unpleasant surprise to me, so thank you for fixing it,
but there are few issues with the patch. See below.

> In __cgroup_bpf_run_filter_skb(), bpf_compute_data_pointers() is called
> to compute proper data_end for the BPF program.
> 
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  kernel/bpf/cgroup.c |  4 ++++
>  net/core/filter.c   | 26 +++++++++++++++++++++++++-
>  2 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index 00f6ed2e4f9a..340d496f35bd 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -566,6 +566,10 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
>  	save_sk = skb->sk;
>  	skb->sk = sk;
>  	__skb_push(skb, offset);
> +
> +	/* compute pointers for the bpf prog */
> +	bpf_compute_data_pointers(skb);
> +
>  	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], skb,
>  				 bpf_prog_run_save_cb);
>  	__skb_pull(skb, offset);
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 1a3ac6c46873..8b5a502e241f 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5346,6 +5346,30 @@ static bool sk_filter_is_valid_access(int off, int size,
>  	return bpf_skb_is_valid_access(off, size, type, prog, info);
>  }
>  
> +static bool cg_skb_is_valid_access(int off, int size,
> +				   enum bpf_access_type type,
> +				   const struct bpf_prog *prog,
> +				   struct bpf_insn_access_aux *info)
> +{
> +	if (type == BPF_WRITE)
> +		return false;

this disables writes into cb[0..4] that were allowed for cgroup_inet_* before.
One can argue that this may break existing progs,
but looking at the place where BPF_CGROUP_RUN_PROG_INET_INGRESS is called
it seems it's actually not correct in all cases to access cb there.
Just few lines down we call bpf_prog_run_save_cb() which save/restores
these 24 bytes.
So we have two option either add save/restore for INET_INGRESS only
or disable read and write access to cb[0..4] for CGROUP_SKB progs.
I prefer the former.

> +
> +	switch (off) {
> +	case bpf_ctx_range(struct __sk_buff, len):
> +		break;
> +	case bpf_ctx_range(struct __sk_buff, data):
> +		info->reg_type = PTR_TO_PACKET;
> +		break;
> +	case bpf_ctx_range(struct __sk_buff, data_end):
> +		info->reg_type = PTR_TO_PACKET_END;
> +		break;
> +	default:
> +		return false;
> +	}

this also enables access to a range of fields family..local_port.
It's ok to do for egress, but not for ingress unless we
add code similar to the bottom of sk_filter_trim_cap() that
inits skb->sk.

above change also allows access to data_meta and flow_keys
which is not correct.

Considering all that I'm proposing to fix INET_INGRESS call site
similar to code below it in sk_filter_trim_cap().
In particular to do:
struct sock *save_sk = skb->sk;
skb->sk = sk;
save and clear cb
BPF_CGROUP_RUN_PROG_INET_INGRESS
restore cb
skb->sk = save_sk;

all of above can probaby be inside BPF_CGROUP_RUN_PROG_INET_INGRESS macro.
Then in this cg_skb_is_valid_access() allow access to data/data_end
and family..local_port range as well.
while disallowing access to flow_keys and data_meta.

In patch 2 we gotta have tests for all these fields.

Thoughts?

> +
> +	return bpf_skb_is_valid_access(off, size, type, prog, info);
> +}
> +
>  static bool lwt_is_valid_access(int off, int size,
>  				enum bpf_access_type type,
>  				const struct bpf_prog *prog,
> @@ -7038,7 +7062,7 @@ const struct bpf_prog_ops xdp_prog_ops = {
>  
>  const struct bpf_verifier_ops cg_skb_verifier_ops = {
>  	.get_func_proto		= cg_skb_func_proto,
> -	.is_valid_access	= sk_filter_is_valid_access,
> +	.is_valid_access	= cg_skb_is_valid_access,
>  	.convert_ctx_access	= bpf_convert_ctx_access,
>  };
>  
> -- 
> 2.17.1
>
Alexei Starovoitov Oct. 17, 2018, 7:02 p.m. UTC | #2
On 10/17/18 10:26 AM, Alexei Starovoitov wrote:
> On Tue, Oct 16, 2018 at 10:56:05PM -0700, Song Liu wrote:
>> BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the
>> skb. This patch enables direct access of skb for these programs.
>
> The lack of direct packet access in CGROUP_SKB progs was
> an unpleasant surprise to me, so thank you for fixing it,
> but there are few issues with the patch. See below.
>
>> In __cgroup_bpf_run_filter_skb(), bpf_compute_data_pointers() is called
>> to compute proper data_end for the BPF program.
>>
>> Signed-off-by: Song Liu <songliubraving@fb.com>
>> ---
>>  kernel/bpf/cgroup.c |  4 ++++
>>  net/core/filter.c   | 26 +++++++++++++++++++++++++-
>>  2 files changed, 29 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
>> index 00f6ed2e4f9a..340d496f35bd 100644
>> --- a/kernel/bpf/cgroup.c
>> +++ b/kernel/bpf/cgroup.c
>> @@ -566,6 +566,10 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
>>  	save_sk = skb->sk;
>>  	skb->sk = sk;
>>  	__skb_push(skb, offset);
>> +
>> +	/* compute pointers for the bpf prog */
>> +	bpf_compute_data_pointers(skb);
>> +
>>  	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], skb,
>>  				 bpf_prog_run_save_cb);
>>  	__skb_pull(skb, offset);
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 1a3ac6c46873..8b5a502e241f 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -5346,6 +5346,30 @@ static bool sk_filter_is_valid_access(int off, int size,
>>  	return bpf_skb_is_valid_access(off, size, type, prog, info);
>>  }
>>
>> +static bool cg_skb_is_valid_access(int off, int size,
>> +				   enum bpf_access_type type,
>> +				   const struct bpf_prog *prog,
>> +				   struct bpf_insn_access_aux *info)
>> +{
>> +	if (type == BPF_WRITE)
>> +		return false;
>
> this disables writes into cb[0..4] that were allowed for cgroup_inet_* before.
> One can argue that this may break existing progs,
> but looking at the place where BPF_CGROUP_RUN_PROG_INET_INGRESS is called
> it seems it's actually not correct in all cases to access cb there.
> Just few lines down we call bpf_prog_run_save_cb() which save/restores
> these 24 bytes.
> So we have two option either add save/restore for INET_INGRESS only
> or disable read and write access to cb[0..4] for CGROUP_SKB progs.
> I prefer the former.
>
>> +
>> +	switch (off) {
>> +	case bpf_ctx_range(struct __sk_buff, len):
>> +		break;
>> +	case bpf_ctx_range(struct __sk_buff, data):
>> +		info->reg_type = PTR_TO_PACKET;
>> +		break;
>> +	case bpf_ctx_range(struct __sk_buff, data_end):
>> +		info->reg_type = PTR_TO_PACKET_END;
>> +		break;
>> +	default:
>> +		return false;
>> +	}
>
> this also enables access to a range of fields family..local_port.
> It's ok to do for egress, but not for ingress unless we
> add code similar to the bottom of sk_filter_trim_cap() that
> inits skb->sk.
>
> above change also allows access to data_meta and flow_keys
> which is not correct.
>
> Considering all that I'm proposing to fix INET_INGRESS call site
> similar to code below it in sk_filter_trim_cap().
> In particular to do:
> struct sock *save_sk = skb->sk;
> skb->sk = sk;
> save and clear cb
> BPF_CGROUP_RUN_PROG_INET_INGRESS
> restore cb
> skb->sk = save_sk;
>
> all of above can probaby be inside BPF_CGROUP_RUN_PROG_INET_INGRESS macro.
> Then in this cg_skb_is_valid_access() allow access to data/data_end
> and family..local_port range as well.
> while disallowing access to flow_keys and data_meta.
>
> In patch 2 we gotta have tests for all these fields.
>
> Thoughts?

chatted with Song offline.
I completely misread 'return false' in the above as 'break'.
The patch actually disables access to pkt_type, mark, queue_mapping
and so on. Which is not correct either.
Since tests were not failing we really need to improve this aspect
of test coverage in test_verifier.c

Also I missed that __cgroup_bpf_run_filter_skb() already
does save_sk = skb->sk; skb->sk = sk;
and bpf_prog_run_save_cb()
So no issue in the existing code. That was false alarm.
Revising the proposal...
I think cg_skb_is_valid_access() can be made similar to
lwt_is_valid_access().
Allowing writes into mark, priority, cb[0..4]
and read of data/data_end.
In addition it's also ok to allow family..local_port range
(unlike lwt where sk may not be present).
and no access to data_meta and flow_keys.
Song Liu Oct. 17, 2018, 7:07 p.m. UTC | #3
> On Oct 17, 2018, at 12:02 PM, Alexei Starovoitov <ast@fb.com> wrote:
> 
> On 10/17/18 10:26 AM, Alexei Starovoitov wrote:
>> On Tue, Oct 16, 2018 at 10:56:05PM -0700, Song Liu wrote:
>>> BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the
>>> skb. This patch enables direct access of skb for these programs.
>> 
>> The lack of direct packet access in CGROUP_SKB progs was
>> an unpleasant surprise to me, so thank you for fixing it,
>> but there are few issues with the patch. See below.
>> 
>>> In __cgroup_bpf_run_filter_skb(), bpf_compute_data_pointers() is called
>>> to compute proper data_end for the BPF program.
>>> 
>>> Signed-off-by: Song Liu <songliubraving@fb.com>
>>> ---
>>> kernel/bpf/cgroup.c |  4 ++++
>>> net/core/filter.c   | 26 +++++++++++++++++++++++++-
>>> 2 files changed, 29 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
>>> index 00f6ed2e4f9a..340d496f35bd 100644
>>> --- a/kernel/bpf/cgroup.c
>>> +++ b/kernel/bpf/cgroup.c
>>> @@ -566,6 +566,10 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
>>> 	save_sk = skb->sk;
>>> 	skb->sk = sk;
>>> 	__skb_push(skb, offset);
>>> +
>>> +	/* compute pointers for the bpf prog */
>>> +	bpf_compute_data_pointers(skb);
>>> +
>>> 	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], skb,
>>> 				 bpf_prog_run_save_cb);
>>> 	__skb_pull(skb, offset);
>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>> index 1a3ac6c46873..8b5a502e241f 100644
>>> --- a/net/core/filter.c
>>> +++ b/net/core/filter.c
>>> @@ -5346,6 +5346,30 @@ static bool sk_filter_is_valid_access(int off, int size,
>>> 	return bpf_skb_is_valid_access(off, size, type, prog, info);
>>> }
>>> 
>>> +static bool cg_skb_is_valid_access(int off, int size,
>>> +				   enum bpf_access_type type,
>>> +				   const struct bpf_prog *prog,
>>> +				   struct bpf_insn_access_aux *info)
>>> +{
>>> +	if (type == BPF_WRITE)
>>> +		return false;
>> 
>> this disables writes into cb[0..4] that were allowed for cgroup_inet_* before.
>> One can argue that this may break existing progs,
>> but looking at the place where BPF_CGROUP_RUN_PROG_INET_INGRESS is called
>> it seems it's actually not correct in all cases to access cb there.
>> Just few lines down we call bpf_prog_run_save_cb() which save/restores
>> these 24 bytes.
>> So we have two option either add save/restore for INET_INGRESS only
>> or disable read and write access to cb[0..4] for CGROUP_SKB progs.
>> I prefer the former.
>> 
>>> +
>>> +	switch (off) {
>>> +	case bpf_ctx_range(struct __sk_buff, len):
>>> +		break;
>>> +	case bpf_ctx_range(struct __sk_buff, data):
>>> +		info->reg_type = PTR_TO_PACKET;
>>> +		break;
>>> +	case bpf_ctx_range(struct __sk_buff, data_end):
>>> +		info->reg_type = PTR_TO_PACKET_END;
>>> +		break;
>>> +	default:
>>> +		return false;
>>> +	}
>> 
>> this also enables access to a range of fields family..local_port.
>> It's ok to do for egress, but not for ingress unless we
>> add code similar to the bottom of sk_filter_trim_cap() that
>> inits skb->sk.
>> 
>> above change also allows access to data_meta and flow_keys
>> which is not correct.
>> 
>> Considering all that I'm proposing to fix INET_INGRESS call site
>> similar to code below it in sk_filter_trim_cap().
>> In particular to do:
>> struct sock *save_sk = skb->sk;
>> skb->sk = sk;
>> save and clear cb
>> BPF_CGROUP_RUN_PROG_INET_INGRESS
>> restore cb
>> skb->sk = save_sk;
>> 
>> all of above can probaby be inside BPF_CGROUP_RUN_PROG_INET_INGRESS macro.
>> Then in this cg_skb_is_valid_access() allow access to data/data_end
>> and family..local_port range as well.
>> while disallowing access to flow_keys and data_meta.
>> 
>> In patch 2 we gotta have tests for all these fields.
>> 
>> Thoughts?
> 
> chatted with Song offline.
> I completely misread 'return false' in the above as 'break'.
> The patch actually disables access to pkt_type, mark, queue_mapping
> and so on. Which is not correct either.
> Since tests were not failing we really need to improve this aspect
> of test coverage in test_verifier.c
> 
> Also I missed that __cgroup_bpf_run_filter_skb() already
> does save_sk = skb->sk; skb->sk = sk;
> and bpf_prog_run_save_cb()
> So no issue in the existing code. That was false alarm.
> Revising the proposal...
> I think cg_skb_is_valid_access() can be made similar to
> lwt_is_valid_access().
> Allowing writes into mark, priority, cb[0..4]
> and read of data/data_end.
> In addition it's also ok to allow family..local_port range
> (unlike lwt where sk may not be present).
> and no access to data_meta and flow_keys.

Thanks Alexei! I will send v2 shortly. 

Song
diff mbox series

Patch

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 00f6ed2e4f9a..340d496f35bd 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -566,6 +566,10 @@  int __cgroup_bpf_run_filter_skb(struct sock *sk,
 	save_sk = skb->sk;
 	skb->sk = sk;
 	__skb_push(skb, offset);
+
+	/* compute pointers for the bpf prog */
+	bpf_compute_data_pointers(skb);
+
 	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], skb,
 				 bpf_prog_run_save_cb);
 	__skb_pull(skb, offset);
diff --git a/net/core/filter.c b/net/core/filter.c
index 1a3ac6c46873..8b5a502e241f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5346,6 +5346,30 @@  static bool sk_filter_is_valid_access(int off, int size,
 	return bpf_skb_is_valid_access(off, size, type, prog, info);
 }
 
+static bool cg_skb_is_valid_access(int off, int size,
+				   enum bpf_access_type type,
+				   const struct bpf_prog *prog,
+				   struct bpf_insn_access_aux *info)
+{
+	if (type == BPF_WRITE)
+		return false;
+
+	switch (off) {
+	case bpf_ctx_range(struct __sk_buff, len):
+		break;
+	case bpf_ctx_range(struct __sk_buff, data):
+		info->reg_type = PTR_TO_PACKET;
+		break;
+	case bpf_ctx_range(struct __sk_buff, data_end):
+		info->reg_type = PTR_TO_PACKET_END;
+		break;
+	default:
+		return false;
+	}
+
+	return bpf_skb_is_valid_access(off, size, type, prog, info);
+}
+
 static bool lwt_is_valid_access(int off, int size,
 				enum bpf_access_type type,
 				const struct bpf_prog *prog,
@@ -7038,7 +7062,7 @@  const struct bpf_prog_ops xdp_prog_ops = {
 
 const struct bpf_verifier_ops cg_skb_verifier_ops = {
 	.get_func_proto		= cg_skb_func_proto,
-	.is_valid_access	= sk_filter_is_valid_access,
+	.is_valid_access	= cg_skb_is_valid_access,
 	.convert_ctx_access	= bpf_convert_ctx_access,
 };