mbox series

[bpf-next,0/5] add bpf cgroup hooks that trigger on socket close

Message ID 20190118004106.163825-1-sdf@google.com
Headers show
Series add bpf cgroup hooks that trigger on socket close | expand

Message

Stanislav Fomichev Jan. 18, 2019, 12:41 a.m. UTC
Currently, we have BPF_CGROUP_INET_SOCK_CREATE hook that triggers on
socket creation and there is no way to know when the socket is being
closed. Add new set of hooks BPF_CGROUP_INET{4,6}_SOCK_RELEASE
that trigger when the socket is closed.

Initial intended usecase is to cleanup statistics after POST{4,6}_BIND.
Hooks have read-only access to all fields of struct bpf_sock.

First patch adds hooks, the rest of the patches add uapi and tests to make
sure these hooks work.

Stanislav Fomichev (5):
  bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE hooks
  tools: bpf: support BPF_CGROUP_INET{4,6}_SOCK_RELEASE in
    libbpf/bpftool
  selftests/bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE to
    test_section_names.c
  selftests/bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE to test_sock.c
  selftests/bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE to
    test_sock_addr.c

 include/linux/bpf-cgroup.h                    |   6 +
 include/net/inet_common.h                     |   1 +
 include/uapi/linux/bpf.h                      |   2 +
 kernel/bpf/syscall.c                          |   8 ++
 net/core/filter.c                             |   7 +
 net/ipv4/af_inet.c                            |  13 +-
 net/ipv6/af_inet6.c                           |   5 +-
 tools/bpf/bpftool/cgroup.c                    |   2 +
 tools/include/uapi/linux/bpf.h                |   2 +
 tools/lib/bpf/libbpf.c                        |   4 +
 .../selftests/bpf/test_section_names.c        |  10 ++
 tools/testing/selftests/bpf/test_sock.c       | 119 ++++++++++++++++
 tools/testing/selftests/bpf/test_sock_addr.c  | 131 +++++++++++++++++-
 13 files changed, 307 insertions(+), 3 deletions(-)

Comments

Eric Dumazet Jan. 18, 2019, 1:02 a.m. UTC | #1
On 01/17/2019 04:41 PM, Stanislav Fomichev wrote:
> Currently, we have BPF_CGROUP_INET_SOCK_CREATE hook that triggers on
> socket creation and there is no way to know when the socket is being
> closed. Add new set of hooks BPF_CGROUP_INET{4,6}_SOCK_RELEASE
> that trigger when the socket is closed.
> 

Are these hooks enough to capture a disconnect() operation ?

A socket can be reused (different flows) without inet_release() being ever called.


> Initial intended usecase is to cleanup statistics after POST{4,6}_BIND.
> Hooks have read-only access to all fields of struct bpf_sock.
> 
> First patch adds hooks, the rest of the patches add uapi and tests to make
> sure these hooks work.
>
Stanislav Fomichev Jan. 18, 2019, 1:58 a.m. UTC | #2
On Thu, Jan 17, 2019 at 5:02 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 01/17/2019 04:41 PM, Stanislav Fomichev wrote:
> > Currently, we have BPF_CGROUP_INET_SOCK_CREATE hook that triggers on
> > socket creation and there is no way to know when the socket is being
> > closed. Add new set of hooks BPF_CGROUP_INET{4,6}_SOCK_RELEASE
> > that trigger when the socket is closed.
> >
>
> Are these hooks enough to capture a disconnect() operation ?
>
> A socket can be reused (different flows) without inet_release() being ever called.
I didn't know about reuse, let me spend some time looking into this.

I mostly thought about the following usecase, where user manually calls close():
sys_close() -> sock_close() -> __sock_release -> inet[6]_release

(I also expected stack to call the release whenever process exits)
>
>
> > Initial intended usecase is to cleanup statistics after POST{4,6}_BIND.
> > Hooks have read-only access to all fields of struct bpf_sock.
> >
> > First patch adds hooks, the rest of the patches add uapi and tests to make
> > sure these hooks work.
> >
Andrey Ignatov Jan. 18, 2019, 2:36 a.m. UTC | #3
Stanislav Fomichev <sdf@google.com> [Thu, 2019-01-17 16:41 -0800]:
> Currently, we have BPF_CGROUP_INET_SOCK_CREATE hook that triggers on
> socket creation and there is no way to know when the socket is being
> closed. Add new set of hooks BPF_CGROUP_INET{4,6}_SOCK_RELEASE
> that trigger when the socket is closed.
> 
> Initial intended usecase is to cleanup statistics after POST{4,6}_BIND.
> Hooks have read-only access to all fields of struct bpf_sock.

Do you need it for both TCP and UDP?

I was thinking about this hook earlier but since in my case only TCP was
needed I ended up using TCP-BPF. E.g. be BPF_SOCK_OPS_TCP_LISTEN_CB or
BPF_SOCK_OPS_TCP_CONNECT_CB can be used instead of POST{4,6}_BIND to
enable something, and then BPF_SOCK_OPS_STATE_CB can be used instead of
SOCK_RELEASE to disable that something when socket transisions to
BPF_TCP_CLOSE (e.g. BPF_TCP_LISTEN -> BPF_TCP_CLOSE).

That turned out to be much cleaner than POST{4,6}_BIND and also works
fine when socket is disconnected with AF_UNSPEC and then connected again
(what Eric mentioned).


> First patch adds hooks, the rest of the patches add uapi and tests to make
> sure these hooks work.
> 
> Stanislav Fomichev (5):
>   bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE hooks
>   tools: bpf: support BPF_CGROUP_INET{4,6}_SOCK_RELEASE in
>     libbpf/bpftool
>   selftests/bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE to
>     test_section_names.c
>   selftests/bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE to test_sock.c
>   selftests/bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE to
>     test_sock_addr.c
> 
>  include/linux/bpf-cgroup.h                    |   6 +
>  include/net/inet_common.h                     |   1 +
>  include/uapi/linux/bpf.h                      |   2 +
>  kernel/bpf/syscall.c                          |   8 ++
>  net/core/filter.c                             |   7 +
>  net/ipv4/af_inet.c                            |  13 +-
>  net/ipv6/af_inet6.c                           |   5 +-
>  tools/bpf/bpftool/cgroup.c                    |   2 +
>  tools/include/uapi/linux/bpf.h                |   2 +
>  tools/lib/bpf/libbpf.c                        |   4 +
>  .../selftests/bpf/test_section_names.c        |  10 ++
>  tools/testing/selftests/bpf/test_sock.c       | 119 ++++++++++++++++
>  tools/testing/selftests/bpf/test_sock_addr.c  | 131 +++++++++++++++++-
>  13 files changed, 307 insertions(+), 3 deletions(-)
> 
> -- 
> 2.20.1.321.g9e740568ce-goog
>
Stanislav Fomichev Jan. 18, 2019, 4:50 p.m. UTC | #4
On 01/18, Andrey Ignatov wrote:
> Stanislav Fomichev <sdf@google.com> [Thu, 2019-01-17 16:41 -0800]:
> > Currently, we have BPF_CGROUP_INET_SOCK_CREATE hook that triggers on
> > socket creation and there is no way to know when the socket is being
> > closed. Add new set of hooks BPF_CGROUP_INET{4,6}_SOCK_RELEASE
> > that trigger when the socket is closed.
> > 
> > Initial intended usecase is to cleanup statistics after POST{4,6}_BIND.
> > Hooks have read-only access to all fields of struct bpf_sock.
> 
> Do you need it for both TCP and UDP?
Yes, we need both TCP and UDP. Although, UDP is tricky in general with
the connected/unconnected cases.

> I was thinking about this hook earlier but since in my case only TCP was
> needed I ended up using TCP-BPF. E.g. be BPF_SOCK_OPS_TCP_LISTEN_CB or
> BPF_SOCK_OPS_TCP_CONNECT_CB can be used instead of POST{4,6}_BIND to
> enable something, and then BPF_SOCK_OPS_STATE_CB can be used instead of
> SOCK_RELEASE to disable that something when socket transisions to
> BPF_TCP_CLOSE (e.g. BPF_TCP_LISTEN -> BPF_TCP_CLOSE).
> 
> That turned out to be much cleaner than POST{4,6}_BIND and also works
> fine when socket is disconnected with AF_UNSPEC and then connected again
> (what Eric mentioned).

What if we do something like the patch below? Add pre_release hook (like we
currently have for pre_connect) and call it from connect(AF_UNSPEC) and
from inet_release? Any concerns here?

(I agree that TCP is probably better handled via BPF_SOCK_OPS_TCP_XYZ hooks,
but we need something for UDP as well)
Andrey Ignatov Jan. 18, 2019, 10:39 p.m. UTC | #5
Stanislav Fomichev <sdf@fomichev.me> [Fri, 2019-01-18 08:50 -0800]:
> On 01/18, Andrey Ignatov wrote:
> > Stanislav Fomichev <sdf@google.com> [Thu, 2019-01-17 16:41 -0800]:
> > > Currently, we have BPF_CGROUP_INET_SOCK_CREATE hook that triggers on
> > > socket creation and there is no way to know when the socket is being
> > > closed. Add new set of hooks BPF_CGROUP_INET{4,6}_SOCK_RELEASE
> > > that trigger when the socket is closed.
> > > 
> > > Initial intended usecase is to cleanup statistics after POST{4,6}_BIND.
> > > Hooks have read-only access to all fields of struct bpf_sock.
> > 
> > Do you need it for both TCP and UDP?
> Yes, we need both TCP and UDP. Although, UDP is tricky in general with
> the connected/unconnected cases.
> 
> > I was thinking about this hook earlier but since in my case only TCP was
> > needed I ended up using TCP-BPF. E.g. be BPF_SOCK_OPS_TCP_LISTEN_CB or
> > BPF_SOCK_OPS_TCP_CONNECT_CB can be used instead of POST{4,6}_BIND to
> > enable something, and then BPF_SOCK_OPS_STATE_CB can be used instead of
> > SOCK_RELEASE to disable that something when socket transisions to
> > BPF_TCP_CLOSE (e.g. BPF_TCP_LISTEN -> BPF_TCP_CLOSE).
> > 
> > That turned out to be much cleaner than POST{4,6}_BIND and also works
> > fine when socket is disconnected with AF_UNSPEC and then connected again
> > (what Eric mentioned).
> 
> What if we do something like the patch below? Add pre_release hook (like we
> currently have for pre_connect) and call it from connect(AF_UNSPEC) and
> from inet_release? Any concerns here?

That's hard to say w/o fully understanding the use-case. Could you
provide more details on it please?

Is my understanding correct that some statistics is needed only for
_client_ sockets (since we're talking about connect) and these client
sockets are always bound to local IP:port (or maybe IP only with
IP_BIND_ADDRESS_NO_PORT?) before sending data and that's why
POST{4,6}_BIND is used to start gathering statistics?

And then later, there should be some event to cleanup statistics and you
chose ops->release for this? If so what's kind of statistics is needed?
Is it per-destination statistics (since AF_UNSPEC matters?)? If so then
it should be cleaned up whenever destination is changed or when socket
is disconnected the last time.

For UDP socket connect(2) can be called many times with different
destinations even w/o AF_UNSPEC in between (in contrast to TCP). So your
patch with pre_release below won't handle it. Furthermore even if
connect(2) was used to set destination, sendmsg(2) can override it for
individual datagram. 

If per-destination statistics is needed for UDP client socket then IMO
both "start events" should be handled: BPF_CGROUP_INET{4,6}_CONNECT
(connected UDP) and BPF_CGROUP_UDP{4,6}_SENDMSG (unconnected UDP). The
former can be used to "close" previous statistics "window" (if it's not
the first connect) as well. The latter can be used to created "separate"
statistics "window" for destination of current datagram only.  And the
only case left, from what I see, is when socket is stopped being used
completely. That's, again, hard to say how to handle it better w/o
understanding use-case (e.g. since it's only UDP problem it may make
sense to implement in UDP stack).

> (I agree that TCP is probably better handled via BPF_SOCK_OPS_TCP_XYZ hooks,
> but we need something for UDP as well)

Do you think you may handle TCP with TCP-BPF and this patch set may
focus on UDP only if there is no good attach point that covers both TCP
and UDP?  That may lead to different BPF programs for TCP and UDP but
that may be hard to avoid anyway (BPF_CGROUP_UDP{4,6}_SENDMSG is a good
example here).

> 
> -- 
> 
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index b703ad242365..ee3dc181df8f 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -568,8 +568,11 @@ int inet_dgram_connect(struct socket *sock, struct sockaddr *uaddr,
>  
>         if (addr_len < sizeof(uaddr->sa_family))
>                 return -EINVAL;
> -       if (uaddr->sa_family == AF_UNSPEC)
> +       if (uaddr->sa_family == AF_UNSPEC) {
> +               if (BPF_CGROUP_PRE_RELEASE_ENABLED(sk))
> +                       sk->sk_prot->pre_release(sk);
>                 return sk->sk_prot->disconnect(sk, flags);
> +       }
>  
>         if (BPF_CGROUP_PRE_CONNECT_ENABLED(sk)) {
>                 err = sk->sk_prot->pre_connect(sk, uaddr, addr_len);
> @@ -632,6 +635,8 @@ int __inet_stream_connect(struct socket *sock, struct sockaddr *uaddr,
>                         return -EINVAL;
>  
>                 if (uaddr->sa_family == AF_UNSPEC) {
> +                       if (BPF_CGROUP_PRE_RELEASE_ENABLED(sk))
> +                               sk->sk_prot->pre_release(sk);
>                         err = sk->sk_prot->disconnect(sk, flags);
>                         sock->state = err ? SS_DISCONNECTING : SS_UNCONNECTED;
>                         goto out; 
> 
> > > First patch adds hooks, the rest of the patches add uapi and tests to make
> > > sure these hooks work.
> > > 
> > > Stanislav Fomichev (5):
> > >   bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE hooks
> > >   tools: bpf: support BPF_CGROUP_INET{4,6}_SOCK_RELEASE in
> > >     libbpf/bpftool
> > >   selftests/bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE to
> > >     test_section_names.c
> > >   selftests/bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE to test_sock.c
> > >   selftests/bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE to
> > >     test_sock_addr.c
> > > 
> > >  include/linux/bpf-cgroup.h                    |   6 +
> > >  include/net/inet_common.h                     |   1 +
> > >  include/uapi/linux/bpf.h                      |   2 +
> > >  kernel/bpf/syscall.c                          |   8 ++
> > >  net/core/filter.c                             |   7 +
> > >  net/ipv4/af_inet.c                            |  13 +-
> > >  net/ipv6/af_inet6.c                           |   5 +-
> > >  tools/bpf/bpftool/cgroup.c                    |   2 +
> > >  tools/include/uapi/linux/bpf.h                |   2 +
> > >  tools/lib/bpf/libbpf.c                        |   4 +
> > >  .../selftests/bpf/test_section_names.c        |  10 ++
> > >  tools/testing/selftests/bpf/test_sock.c       | 119 ++++++++++++++++
> > >  tools/testing/selftests/bpf/test_sock_addr.c  | 131 +++++++++++++++++-
> > >  13 files changed, 307 insertions(+), 3 deletions(-)
> > > 
> > > -- 
> > > 2.20.1.321.g9e740568ce-goog
> > > 
> > 
> > -- 
> > Andrey Ignatov
Stanislav Fomichev Jan. 22, 2019, 5 p.m. UTC | #6
On 01/18, Andrey Ignatov wrote:
> Stanislav Fomichev <sdf@fomichev.me> [Fri, 2019-01-18 08:50 -0800]:
> > On 01/18, Andrey Ignatov wrote:
> > > Stanislav Fomichev <sdf@google.com> [Thu, 2019-01-17 16:41 -0800]:
> > > > Currently, we have BPF_CGROUP_INET_SOCK_CREATE hook that triggers on
> > > > socket creation and there is no way to know when the socket is being
> > > > closed. Add new set of hooks BPF_CGROUP_INET{4,6}_SOCK_RELEASE
> > > > that trigger when the socket is closed.
> > > > 
> > > > Initial intended usecase is to cleanup statistics after POST{4,6}_BIND.
> > > > Hooks have read-only access to all fields of struct bpf_sock.
> > > 
> > > Do you need it for both TCP and UDP?
> > Yes, we need both TCP and UDP. Although, UDP is tricky in general with
> > the connected/unconnected cases.
> > 
> > > I was thinking about this hook earlier but since in my case only TCP was
> > > needed I ended up using TCP-BPF. E.g. be BPF_SOCK_OPS_TCP_LISTEN_CB or
> > > BPF_SOCK_OPS_TCP_CONNECT_CB can be used instead of POST{4,6}_BIND to
> > > enable something, and then BPF_SOCK_OPS_STATE_CB can be used instead of
> > > SOCK_RELEASE to disable that something when socket transisions to
> > > BPF_TCP_CLOSE (e.g. BPF_TCP_LISTEN -> BPF_TCP_CLOSE).
> > > 
> > > That turned out to be much cleaner than POST{4,6}_BIND and also works
> > > fine when socket is disconnected with AF_UNSPEC and then connected again
> > > (what Eric mentioned).
> > 
> > What if we do something like the patch below? Add pre_release hook (like we
> > currently have for pre_connect) and call it from connect(AF_UNSPEC) and
> > from inet_release? Any concerns here?
> 
> That's hard to say w/o fully understanding the use-case. Could you
> provide more details on it please?
The use-case is a simple lightweight per-cgroup bpf audit (without involving
audit subsystem).
Basically record the following events: when the socket is created, when (and
where) it's connected/bound and when it's torn down (plus, _maybe_ rx/tx
stats).

> Is my understanding correct that some statistics is needed only for
> _client_ sockets (since we're talking about connect) and these client
> sockets are always bound to local IP:port (or maybe IP only with
> IP_BIND_ADDRESS_NO_PORT?) before sending data and that's why
> POST{4,6}_BIND is used to start gathering statistics?
> 
> And then later, there should be some event to cleanup statistics and you
> chose ops->release for this? If so what's kind of statistics is needed?
> Is it per-destination statistics (since AF_UNSPEC matters?)? If so then
> it should be cleaned up whenever destination is changed or when socket
> is disconnected the last time.
> 
> For UDP socket connect(2) can be called many times with different
> destinations even w/o AF_UNSPEC in between (in contrast to TCP). So your
> patch with pre_release below won't handle it. Furthermore even if
> connect(2) was used to set destination, sendmsg(2) can override it for
> individual datagram. 
> 
> If per-destination statistics is needed for UDP client socket then IMO
> both "start events" should be handled: BPF_CGROUP_INET{4,6}_CONNECT
> (connected UDP) and BPF_CGROUP_UDP{4,6}_SENDMSG (unconnected UDP). The
> former can be used to "close" previous statistics "window" (if it's not
> the first connect) as well. The latter can be used to created "separate"
> statistics "window" for destination of current datagram only.  And the
> only case left, from what I see, is when socket is stopped being used
> completely. That's, again, hard to say how to handle it better w/o
> understanding use-case (e.g. since it's only UDP problem it may make
> sense to implement in UDP stack).
> 
> > (I agree that TCP is probably better handled via BPF_SOCK_OPS_TCP_XYZ hooks,
> > but we need something for UDP as well)
> 
> Do you think you may handle TCP with TCP-BPF and this patch set may
> focus on UDP only if there is no good attach point that covers both TCP
> and UDP?  That may lead to different BPF programs for TCP and UDP but
> that may be hard to avoid anyway (BPF_CGROUP_UDP{4,6}_SENDMSG is a good
> example here).
Let me look closer into TCP-BPF hooks, it looks like that would be
enough for TCP at least. I'll get back to you in case there is an issue.

I'll also think about UDP use case. Maybe we should do as you suggested
and support release only in UDP. (Or maybe have release for TCP+UDP and an
additional 'disconnect' hook to handle connect(AF_UNSPEC), that might
be more generic).

But thanks for you input, really appreciate it!

> 
> > 
> > -- 
> > 
> > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> > index b703ad242365..ee3dc181df8f 100644
> > --- a/net/ipv4/af_inet.c
> > +++ b/net/ipv4/af_inet.c
> > @@ -568,8 +568,11 @@ int inet_dgram_connect(struct socket *sock, struct sockaddr *uaddr,
> >  
> >         if (addr_len < sizeof(uaddr->sa_family))
> >                 return -EINVAL;
> > -       if (uaddr->sa_family == AF_UNSPEC)
> > +       if (uaddr->sa_family == AF_UNSPEC) {
> > +               if (BPF_CGROUP_PRE_RELEASE_ENABLED(sk))
> > +                       sk->sk_prot->pre_release(sk);
> >                 return sk->sk_prot->disconnect(sk, flags);
> > +       }
> >  
> >         if (BPF_CGROUP_PRE_CONNECT_ENABLED(sk)) {
> >                 err = sk->sk_prot->pre_connect(sk, uaddr, addr_len);
> > @@ -632,6 +635,8 @@ int __inet_stream_connect(struct socket *sock, struct sockaddr *uaddr,
> >                         return -EINVAL;
> >  
> >                 if (uaddr->sa_family == AF_UNSPEC) {
> > +                       if (BPF_CGROUP_PRE_RELEASE_ENABLED(sk))
> > +                               sk->sk_prot->pre_release(sk);
> >                         err = sk->sk_prot->disconnect(sk, flags);
> >                         sock->state = err ? SS_DISCONNECTING : SS_UNCONNECTED;
> >                         goto out; 
> > 
> > > > First patch adds hooks, the rest of the patches add uapi and tests to make
> > > > sure these hooks work.
> > > > 
> > > > Stanislav Fomichev (5):
> > > >   bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE hooks
> > > >   tools: bpf: support BPF_CGROUP_INET{4,6}_SOCK_RELEASE in
> > > >     libbpf/bpftool
> > > >   selftests/bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE to
> > > >     test_section_names.c
> > > >   selftests/bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE to test_sock.c
> > > >   selftests/bpf: add BPF_CGROUP_INET{4,6}_SOCK_RELEASE to
> > > >     test_sock_addr.c
> > > > 
> > > >  include/linux/bpf-cgroup.h                    |   6 +
> > > >  include/net/inet_common.h                     |   1 +
> > > >  include/uapi/linux/bpf.h                      |   2 +
> > > >  kernel/bpf/syscall.c                          |   8 ++
> > > >  net/core/filter.c                             |   7 +
> > > >  net/ipv4/af_inet.c                            |  13 +-
> > > >  net/ipv6/af_inet6.c                           |   5 +-
> > > >  tools/bpf/bpftool/cgroup.c                    |   2 +
> > > >  tools/include/uapi/linux/bpf.h                |   2 +
> > > >  tools/lib/bpf/libbpf.c                        |   4 +
> > > >  .../selftests/bpf/test_section_names.c        |  10 ++
> > > >  tools/testing/selftests/bpf/test_sock.c       | 119 ++++++++++++++++
> > > >  tools/testing/selftests/bpf/test_sock_addr.c  | 131 +++++++++++++++++-
> > > >  13 files changed, 307 insertions(+), 3 deletions(-)
> > > > 
> > > > -- 
> > > > 2.20.1.321.g9e740568ce-goog
> > > > 
> > > 
> > > -- 
> > > Andrey Ignatov
> 
> -- 
> Andrey Ignatov