diff mbox series

[bpf-next,v4,1/4] bpf: add BPF_CGROUP_INET_SOCK_RELEASE hook

Message ID 20200706230128.4073544-2-sdf@google.com
State Accepted
Delegated to: BPF Maintainers
Headers show
Series bpf: add BPF_CGROUP_INET_SOCK_RELEASE hook | expand

Commit Message

Stanislav Fomichev July 6, 2020, 11:01 p.m. UTC
Implement BPF_CGROUP_INET_SOCK_RELEASE hook that triggers
on inet socket release. It triggers only for userspace
sockets, the same semantics as existing BPF_CGROUP_INET_SOCK_CREATE.

The only questionable part here is the sock->sk check
in the inet_release. Looking at the places where we
do 'sock->sk = NULL', I don't understand how it can race
with inet_release and why the check is there (it's been
there since the initial git import). Otherwise, the
change itself is pretty simple, we add a BPF hook
to the inet_release and avoid calling it for kernel
sockets.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/bpf-cgroup.h | 4 ++++
 include/uapi/linux/bpf.h   | 1 +
 kernel/bpf/syscall.c       | 3 +++
 net/core/filter.c          | 1 +
 net/ipv4/af_inet.c         | 3 +++
 5 files changed, 12 insertions(+)

Comments

Andrii Nakryiko July 6, 2020, 11:42 p.m. UTC | #1
On Mon, Jul 6, 2020 at 4:02 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> Implement BPF_CGROUP_INET_SOCK_RELEASE hook that triggers
> on inet socket release. It triggers only for userspace
> sockets, the same semantics as existing BPF_CGROUP_INET_SOCK_CREATE.
>
> The only questionable part here is the sock->sk check
> in the inet_release. Looking at the places where we
> do 'sock->sk = NULL', I don't understand how it can race
> with inet_release and why the check is there (it's been
> there since the initial git import). Otherwise, the
> change itself is pretty simple, we add a BPF hook
> to the inet_release and avoid calling it for kernel
> sockets.
>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  include/linux/bpf-cgroup.h | 4 ++++
>  include/uapi/linux/bpf.h   | 1 +
>  kernel/bpf/syscall.c       | 3 +++
>  net/core/filter.c          | 1 +
>  net/ipv4/af_inet.c         | 3 +++
>  5 files changed, 12 insertions(+)
>

Looks good overall, but I have no idea about sock->sk NULL case.

Acked-by: Andrii Nakryiko <andriin@fb.com>

> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> index c66c545e161a..2c6f26670acc 100644
> --- a/include/linux/bpf-cgroup.h
> +++ b/include/linux/bpf-cgroup.h

[...]
Daniel Borkmann July 7, 2020, 9:41 p.m. UTC | #2
On 7/7/20 1:42 AM, Andrii Nakryiko wrote:
> On Mon, Jul 6, 2020 at 4:02 PM Stanislav Fomichev <sdf@google.com> wrote:
>>
>> Implement BPF_CGROUP_INET_SOCK_RELEASE hook that triggers
>> on inet socket release. It triggers only for userspace
>> sockets, the same semantics as existing BPF_CGROUP_INET_SOCK_CREATE.
>>
>> The only questionable part here is the sock->sk check
>> in the inet_release. Looking at the places where we
>> do 'sock->sk = NULL', I don't understand how it can race
>> with inet_release and why the check is there (it's been
>> there since the initial git import). Otherwise, the
>> change itself is pretty simple, we add a BPF hook
>> to the inet_release and avoid calling it for kernel
>> sockets.
>>
>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
>> ---
>>   include/linux/bpf-cgroup.h | 4 ++++
>>   include/uapi/linux/bpf.h   | 1 +
>>   kernel/bpf/syscall.c       | 3 +++
>>   net/core/filter.c          | 1 +
>>   net/ipv4/af_inet.c         | 3 +++
>>   5 files changed, 12 insertions(+)
>>
> 
> Looks good overall, but I have no idea about sock->sk NULL case.

+1, looks good & very useful hook. For the sock->sk NULL case here's a related
discussion on why it's needed [0].

   [0] https://lore.kernel.org/netdev/20190221221356.173485-1-ebiggers@kernel.org/

> Acked-by: Andrii Nakryiko <andriin@fb.com>
> 
>> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
>> index c66c545e161a..2c6f26670acc 100644
>> --- a/include/linux/bpf-cgroup.h
>> +++ b/include/linux/bpf-cgroup.h
> 
> [...]
>
Stanislav Fomichev July 7, 2020, 11:43 p.m. UTC | #3
On Tue, Jul 7, 2020 at 2:42 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 7/7/20 1:42 AM, Andrii Nakryiko wrote:
> > On Mon, Jul 6, 2020 at 4:02 PM Stanislav Fomichev <sdf@google.com> wrote:
> >>
> >> Implement BPF_CGROUP_INET_SOCK_RELEASE hook that triggers
> >> on inet socket release. It triggers only for userspace
> >> sockets, the same semantics as existing BPF_CGROUP_INET_SOCK_CREATE.
> >>
> >> The only questionable part here is the sock->sk check
> >> in the inet_release. Looking at the places where we
> >> do 'sock->sk = NULL', I don't understand how it can race
> >> with inet_release and why the check is there (it's been
> >> there since the initial git import). Otherwise, the
> >> change itself is pretty simple, we add a BPF hook
> >> to the inet_release and avoid calling it for kernel
> >> sockets.
> >>
> >> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> >> ---
> >>   include/linux/bpf-cgroup.h | 4 ++++
> >>   include/uapi/linux/bpf.h   | 1 +
> >>   kernel/bpf/syscall.c       | 3 +++
> >>   net/core/filter.c          | 1 +
> >>   net/ipv4/af_inet.c         | 3 +++
> >>   5 files changed, 12 insertions(+)
> >>
> >
> > Looks good overall, but I have no idea about sock->sk NULL case.
>
> +1, looks good & very useful hook. For the sock->sk NULL case here's a related
> discussion on why it's needed [0].
Thanks for the pointer! I'll resend a v5 with s/sock/sock_create/ you
mentioned and will clean up the commit description a bit.
Daniel Borkmann July 7, 2020, 11:56 p.m. UTC | #4
On 7/8/20 1:43 AM, Stanislav Fomichev wrote:
> On Tue, Jul 7, 2020 at 2:42 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>>
>> On 7/7/20 1:42 AM, Andrii Nakryiko wrote:
>>> On Mon, Jul 6, 2020 at 4:02 PM Stanislav Fomichev <sdf@google.com> wrote:
>>>>
>>>> Implement BPF_CGROUP_INET_SOCK_RELEASE hook that triggers
>>>> on inet socket release. It triggers only for userspace
>>>> sockets, the same semantics as existing BPF_CGROUP_INET_SOCK_CREATE.
>>>>
>>>> The only questionable part here is the sock->sk check
>>>> in the inet_release. Looking at the places where we
>>>> do 'sock->sk = NULL', I don't understand how it can race
>>>> with inet_release and why the check is there (it's been
>>>> there since the initial git import). Otherwise, the
>>>> change itself is pretty simple, we add a BPF hook
>>>> to the inet_release and avoid calling it for kernel
>>>> sockets.
>>>>
>>>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
>>>> ---
>>>>    include/linux/bpf-cgroup.h | 4 ++++
>>>>    include/uapi/linux/bpf.h   | 1 +
>>>>    kernel/bpf/syscall.c       | 3 +++
>>>>    net/core/filter.c          | 1 +
>>>>    net/ipv4/af_inet.c         | 3 +++
>>>>    5 files changed, 12 insertions(+)
>>>>
>>>
>>> Looks good overall, but I have no idea about sock->sk NULL case.
>>
>> +1, looks good & very useful hook. For the sock->sk NULL case here's a related
>> discussion on why it's needed [0].
> Thanks for the pointer! I'll resend a v5 with s/sock/sock_create/ you
> mentioned and will clean up the commit description a bit.

Already fixed up the selftest and a typo in the commit desc there & applied it. Let
me know if you prefer a respin though and I can toss it taking the respin which would
work just as well. :)

Thanks,
Daniel
Stanislav Fomichev July 7, 2020, 11:59 p.m. UTC | #5
On Tue, Jul 7, 2020 at 4:56 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 7/8/20 1:43 AM, Stanislav Fomichev wrote:
> > On Tue, Jul 7, 2020 at 2:42 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >>
> >> On 7/7/20 1:42 AM, Andrii Nakryiko wrote:
> >>> On Mon, Jul 6, 2020 at 4:02 PM Stanislav Fomichev <sdf@google.com> wrote:
> >>>>
> >>>> Implement BPF_CGROUP_INET_SOCK_RELEASE hook that triggers
> >>>> on inet socket release. It triggers only for userspace
> >>>> sockets, the same semantics as existing BPF_CGROUP_INET_SOCK_CREATE.
> >>>>
> >>>> The only questionable part here is the sock->sk check
> >>>> in the inet_release. Looking at the places where we
> >>>> do 'sock->sk = NULL', I don't understand how it can race
> >>>> with inet_release and why the check is there (it's been
> >>>> there since the initial git import). Otherwise, the
> >>>> change itself is pretty simple, we add a BPF hook
> >>>> to the inet_release and avoid calling it for kernel
> >>>> sockets.
> >>>>
> >>>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> >>>> ---
> >>>>    include/linux/bpf-cgroup.h | 4 ++++
> >>>>    include/uapi/linux/bpf.h   | 1 +
> >>>>    kernel/bpf/syscall.c       | 3 +++
> >>>>    net/core/filter.c          | 1 +
> >>>>    net/ipv4/af_inet.c         | 3 +++
> >>>>    5 files changed, 12 insertions(+)
> >>>>
> >>>
> >>> Looks good overall, but I have no idea about sock->sk NULL case.
> >>
> >> +1, looks good & very useful hook. For the sock->sk NULL case here's a related
> >> discussion on why it's needed [0].
> > Thanks for the pointer! I'll resend a v5 with s/sock/sock_create/ you
> > mentioned and will clean up the commit description a bit.
>
> Already fixed up the selftest and a typo in the commit desc there & applied it.
Oh, awesome, thanks!
diff mbox series

Patch

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index c66c545e161a..2c6f26670acc 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -210,6 +210,9 @@  int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
 #define BPF_CGROUP_RUN_PROG_INET_SOCK(sk)				       \
 	BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET_SOCK_CREATE)
 
+#define BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk)			       \
+	BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET_SOCK_RELEASE)
+
 #define BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk)				       \
 	BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET4_POST_BIND)
 
@@ -401,6 +404,7 @@  static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map,
 #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk,skb) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk,skb) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET4_BIND(sk, uaddr) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET6_BIND(sk, uaddr) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk) ({ 0; })
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index da9bf35a26f8..548a749aebb3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -226,6 +226,7 @@  enum bpf_attach_type {
 	BPF_CGROUP_INET4_GETSOCKNAME,
 	BPF_CGROUP_INET6_GETSOCKNAME,
 	BPF_XDP_DEVMAP,
+	BPF_CGROUP_INET_SOCK_RELEASE,
 	__MAX_BPF_ATTACH_TYPE
 };
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 8da159936bab..156f51ffada2 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1981,6 +1981,7 @@  bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
 	case BPF_PROG_TYPE_CGROUP_SOCK:
 		switch (expected_attach_type) {
 		case BPF_CGROUP_INET_SOCK_CREATE:
+		case BPF_CGROUP_INET_SOCK_RELEASE:
 		case BPF_CGROUP_INET4_POST_BIND:
 		case BPF_CGROUP_INET6_POST_BIND:
 			return 0;
@@ -2779,6 +2780,7 @@  attach_type_to_prog_type(enum bpf_attach_type attach_type)
 		return BPF_PROG_TYPE_CGROUP_SKB;
 		break;
 	case BPF_CGROUP_INET_SOCK_CREATE:
+	case BPF_CGROUP_INET_SOCK_RELEASE:
 	case BPF_CGROUP_INET4_POST_BIND:
 	case BPF_CGROUP_INET6_POST_BIND:
 		return BPF_PROG_TYPE_CGROUP_SOCK;
@@ -2929,6 +2931,7 @@  static int bpf_prog_query(const union bpf_attr *attr,
 	case BPF_CGROUP_INET_INGRESS:
 	case BPF_CGROUP_INET_EGRESS:
 	case BPF_CGROUP_INET_SOCK_CREATE:
+	case BPF_CGROUP_INET_SOCK_RELEASE:
 	case BPF_CGROUP_INET4_BIND:
 	case BPF_CGROUP_INET6_BIND:
 	case BPF_CGROUP_INET4_POST_BIND:
diff --git a/net/core/filter.c b/net/core/filter.c
index c5e696e6c315..ddcc0d6209e1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6890,6 +6890,7 @@  static bool __sock_filter_check_attach_type(int off,
 	case offsetof(struct bpf_sock, priority):
 		switch (attach_type) {
 		case BPF_CGROUP_INET_SOCK_CREATE:
+		case BPF_CGROUP_INET_SOCK_RELEASE:
 			goto full_access;
 		default:
 			return false;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index ea6ed6d487ed..ff141d630bdf 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -411,6 +411,9 @@  int inet_release(struct socket *sock)
 	if (sk) {
 		long timeout;
 
+		if (!sk->sk_kern_sock)
+			BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk);
+
 		/* Applications forget to leave groups before exiting */
 		ip_mc_drop_socket(sk);