diff mbox series

[v3,net] inet_diag: add cgroup id attribute

Message ID 20200403095627.GA85072@yandex-team.ru
State Deferred
Delegated to: David Miller
Headers show
Series [v3,net] inet_diag: add cgroup id attribute | expand

Commit Message

Dmitry Yakunin April 3, 2020, 9:56 a.m. UTC
This patch adds cgroup v2 ID to common inet diag message attributes.
Cgroup v2 ID is kernfs ID (ino or ino+gen). This attribute allows filter
inet diag output by cgroup ID obtained by name_to_handle_at() syscall.
When net_cls or net_prio cgroup is activated this ID is equal to 1 (root
cgroup ID) for newly created sockets.

Some notes about this ID:

1) gets initialized in socket() syscall
2) incoming socket gets ID from listening socket
   (not during accept() syscall)
3) not changed when process get moved to another cgroup
4) can point to deleted cgroup (refcounting)

v2:
  - use CONFIG_SOCK_CGROUP_DATA instead if CONFIG_CGROUPS

v3:
  - fix attr size by using nla_total_size_64bit() (Eric Dumazet)
  - more detailed commit message (Konstantin Khlebnikov)

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 include/linux/inet_diag.h      | 6 +++++-
 include/uapi/linux/inet_diag.h | 1 +
 net/ipv4/inet_diag.c           | 7 +++++++
 3 files changed, 13 insertions(+), 1 deletion(-)

Comments

Tejun Heo April 3, 2020, 1:38 p.m. UTC | #1
On Fri, Apr 03, 2020 at 12:56:27PM +0300, Dmitry Yakunin wrote:
> This patch adds cgroup v2 ID to common inet diag message attributes.
> Cgroup v2 ID is kernfs ID (ino or ino+gen). This attribute allows filter
> inet diag output by cgroup ID obtained by name_to_handle_at() syscall.
> When net_cls or net_prio cgroup is activated this ID is equal to 1 (root
> cgroup ID) for newly created sockets.
> 
> Some notes about this ID:
> 
> 1) gets initialized in socket() syscall
> 2) incoming socket gets ID from listening socket
>    (not during accept() syscall)

How would this work with things like inetd? Would it make sense to associate the
socket on the first actual send/recv?

Thanks.
Tejun Heo April 3, 2020, 1:41 p.m. UTC | #2
On Fri, Apr 03, 2020 at 09:38:17AM -0400, Tejun Heo wrote:
> On Fri, Apr 03, 2020 at 12:56:27PM +0300, Dmitry Yakunin wrote:
> > This patch adds cgroup v2 ID to common inet diag message attributes.
> > Cgroup v2 ID is kernfs ID (ino or ino+gen). This attribute allows filter
> > inet diag output by cgroup ID obtained by name_to_handle_at() syscall.
> > When net_cls or net_prio cgroup is activated this ID is equal to 1 (root
> > cgroup ID) for newly created sockets.
> > 
> > Some notes about this ID:
> > 
> > 1) gets initialized in socket() syscall
> > 2) incoming socket gets ID from listening socket
> >    (not during accept() syscall)
> 
> How would this work with things like inetd? Would it make sense to associate the
> socket on the first actual send/recv?

Oh, it's not a problem with your patch as you're just following the associated
ptr, so we can have that discussion separately. Looks good to me from cgroup
side. Please feel free to add my acked-by.

Thanks.
Konstantin Khlebnikov April 3, 2020, 2:37 p.m. UTC | #3
On 03/04/2020 16.38, Tejun Heo wrote:
> On Fri, Apr 03, 2020 at 12:56:27PM +0300, Dmitry Yakunin wrote:
>> This patch adds cgroup v2 ID to common inet diag message attributes.
>> Cgroup v2 ID is kernfs ID (ino or ino+gen). This attribute allows filter
>> inet diag output by cgroup ID obtained by name_to_handle_at() syscall.
>> When net_cls or net_prio cgroup is activated this ID is equal to 1 (root
>> cgroup ID) for newly created sockets.
>>
>> Some notes about this ID:
>>
>> 1) gets initialized in socket() syscall
>> 2) incoming socket gets ID from listening socket
>>     (not during accept() syscall)
> 
> How would this work with things like inetd? Would it make sense to associate the
> socket on the first actual send/recv?

First send/recv seems too intrusive.
Setsockopt to change association to current cgroup (or by id) seems more reasonable.

Systemd variant of inetd handles sockets as separate units and probably
creates own cgroups for them.

> 
> Thanks.
>
Tejun Heo April 3, 2020, 2:45 p.m. UTC | #4
On Fri, Apr 03, 2020 at 05:37:17PM +0300, Konstantin Khlebnikov wrote:
> > How would this work with things like inetd? Would it make sense to associate the
> > socket on the first actual send/recv?
> 
> First send/recv seems too intrusive.

Intrusive in terms of?

> Setsockopt to change association to current cgroup (or by id) seems more reasonable.

I'm not sure about exposing it as an explicit interface.

Thanks.
Konstantin Khlebnikov April 3, 2020, 2:56 p.m. UTC | #5
On 03/04/2020 17.45, Tejun Heo wrote:
> On Fri, Apr 03, 2020 at 05:37:17PM +0300, Konstantin Khlebnikov wrote:
>>> How would this work with things like inetd? Would it make sense to associate the
>>> socket on the first actual send/recv?
>>
>> First send/recv seems too intrusive.
> 
> Intrusive in terms of?

In term of adding more code to networking fast paths.

> 
>> Setsockopt to change association to current cgroup (or by id) seems more reasonable.
> 
> I'm not sure about exposing it as an explicit interface.

Yep, it's better to create thing in right place from the beginning.
Current behaviour isn't bad, just not obvious (and barely documented).
That's why I've asked Dmitry to add these notes.

> 
> Thanks.
>
David Miller April 3, 2020, 11:08 p.m. UTC | #6
From: Dmitry Yakunin <zeil@yandex-team.ru>
Date: Fri, 3 Apr 2020 12:56:27 +0300

> This patch adds cgroup v2 ID to common inet diag message attributes.
> Cgroup v2 ID is kernfs ID (ino or ino+gen). This attribute allows filter
> inet diag output by cgroup ID obtained by name_to_handle_at() syscall.
> When net_cls or net_prio cgroup is activated this ID is equal to 1 (root
> cgroup ID) for newly created sockets.
> 
> Some notes about this ID:
> 
> 1) gets initialized in socket() syscall
> 2) incoming socket gets ID from listening socket
>    (not during accept() syscall)
> 3) not changed when process get moved to another cgroup
> 4) can point to deleted cgroup (refcounting)
> 
> v2:
>   - use CONFIG_SOCK_CGROUP_DATA instead if CONFIG_CGROUPS
> 
> v3:
>   - fix attr size by using nla_total_size_64bit() (Eric Dumazet)
>   - more detailed commit message (Konstantin Khlebnikov)
> 
> Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

As a new feature, this should be resubmitted when net-next opens back
up.  Thank you.
diff mbox series

Patch

diff --git a/include/linux/inet_diag.h b/include/linux/inet_diag.h
index c91cf2d..0696f86 100644
--- a/include/linux/inet_diag.h
+++ b/include/linux/inet_diag.h
@@ -66,7 +66,11 @@  static inline size_t inet_diag_msg_attrs_size(void)
 		+ nla_total_size(1)  /* INET_DIAG_SKV6ONLY */
 #endif
 		+ nla_total_size(4)  /* INET_DIAG_MARK */
-		+ nla_total_size(4); /* INET_DIAG_CLASS_ID */
+		+ nla_total_size(4)  /* INET_DIAG_CLASS_ID */
+#ifdef CONFIG_SOCK_CGROUP_DATA
+		+ nla_total_size_64bit(sizeof(u64))  /* INET_DIAG_CGROUP_ID */
+#endif
+		;
 }
 int inet_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb,
 			     struct inet_diag_msg *r, int ext,
diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index a1ff345..dc87ad6 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -154,6 +154,7 @@  enum {
 	INET_DIAG_CLASS_ID,	/* request as INET_DIAG_TCLASS */
 	INET_DIAG_MD5SIG,
 	INET_DIAG_ULP_INFO,
+	INET_DIAG_CGROUP_ID,
 	__INET_DIAG_MAX,
 };
 
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 8c83775..17e3c52 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -161,6 +161,13 @@  int inet_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb,
 			goto errout;
 	}
 
+#ifdef CONFIG_SOCK_CGROUP_DATA
+	if (nla_put_u64_64bit(skb, INET_DIAG_CGROUP_ID,
+			      cgroup_id(sock_cgroup_ptr(&sk->sk_cgrp_data)),
+			      INET_DIAG_PAD))
+		goto errout;
+#endif
+
 	r->idiag_uid = from_kuid_munged(user_ns, sock_i_uid(sk));
 	r->idiag_inode = sock_i_ino(sk);