inet_diag: fix reporting cgroup classid and fallback to priority
diff mbox series

Message ID 154970855279.305165.13649851988934332761.stgit@buzz
State Accepted
Delegated to: David Miller
Headers show
Series
  • inet_diag: fix reporting cgroup classid and fallback to priority
Related show

Commit Message

Konstantin Khlebnikov Feb. 9, 2019, 10:35 a.m. UTC
Field idiag_ext in struct inet_diag_req_v2 used as bitmap of requested
extensions has only 8 bits. Thus extensions starting from DCTCPINFO
cannot be requested directly. Some of them included into response
unconditionally or hook into some of lower 8 bits.

Extension INET_DIAG_CLASS_ID has not way to request from the beginning.

This patch bundle it with INET_DIAG_TCLASS (ipv6 tos), fixes space
reservation, and documents behavior for other extensions.

Also this patch adds fallback to reporting socket priority. This filed
is more widely used for traffic classification because ipv4 sockets
automatically maps TOS to priority and default qdisc pfifo_fast knows
about that. But priority could be changed via setsockopt SO_PRIORITY so
INET_DIAG_TOS isn't enough for predicting class.

Also cgroup2 obsoletes net_cls classid (it always zero), but we cannot
reuse this field for reporting cgroup2 id because it is 64-bit (ino+gen).

So, after this patch INET_DIAG_CLASS_ID will report socket priority
for most common setup when net_cls isn't set and/or cgroup2 in use.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 0888e372c37f ("net: inet: diag: expose sockets cgroup classid")
---
 include/uapi/linux/inet_diag.h |   16 +++++++++++-----
 net/ipv4/inet_diag.c           |   10 +++++++++-
 net/sctp/diag.c                |    1 +
 3 files changed, 21 insertions(+), 6 deletions(-)

Comments

David Miller Feb. 12, 2019, 6:37 p.m. UTC | #1
From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Date: Sat, 09 Feb 2019 13:35:52 +0300

> Field idiag_ext in struct inet_diag_req_v2 used as bitmap of requested
> extensions has only 8 bits. Thus extensions starting from DCTCPINFO
> cannot be requested directly. Some of them included into response
> unconditionally or hook into some of lower 8 bits.
> 
> Extension INET_DIAG_CLASS_ID has not way to request from the beginning.
> 
> This patch bundle it with INET_DIAG_TCLASS (ipv6 tos), fixes space
> reservation, and documents behavior for other extensions.
> 
> Also this patch adds fallback to reporting socket priority. This filed
> is more widely used for traffic classification because ipv4 sockets
> automatically maps TOS to priority and default qdisc pfifo_fast knows
> about that. But priority could be changed via setsockopt SO_PRIORITY so
> INET_DIAG_TOS isn't enough for predicting class.
> 
> Also cgroup2 obsoletes net_cls classid (it always zero), but we cannot
> reuse this field for reporting cgroup2 id because it is 64-bit (ino+gen).
> 
> So, after this patch INET_DIAG_CLASS_ID will report socket priority
> for most common setup when net_cls isn't set and/or cgroup2 in use.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Fixes: 0888e372c37f ("net: inet: diag: expose sockets cgroup classid")

Applied, and queued up for -stable.

Please always put the Fixes: tag first in the list of tags.  I fixed
it up for you this time.

Thanks.
Eric Dumazet Feb. 12, 2019, 11:03 p.m. UTC | #2
On 02/09/2019 02:35 AM, Konstantin Khlebnikov wrote:
> Field idiag_ext in struct inet_diag_req_v2 used as bitmap of requested
> extensions has only 8 bits. Thus extensions starting from DCTCPINFO
> cannot be requested directly. Some of them included into response
> unconditionally or hook into some of lower 8 bits.
> 
> Extension INET_DIAG_CLASS_ID has not way to request from the beginning.
> 
> This patch bundle it with INET_DIAG_TCLASS (ipv6 tos), fixes space
> reservation, and documents behavior for other extensions.
> 
> Also this patch adds fallback to reporting socket priority. This filed
> is more widely used for traffic classification because ipv4 sockets
> automatically maps TOS to priority and default qdisc pfifo_fast knows
> about that. But priority could be changed via setsockopt SO_PRIORITY so
> INET_DIAG_TOS isn't enough for predicting class.
> 
> Also cgroup2 obsoletes net_cls classid (it always zero), but we cannot
> reuse this field for reporting cgroup2 id because it is 64-bit (ino+gen).
> 
> So, after this patch INET_DIAG_CLASS_ID will report socket priority
> for most common setup when net_cls isn't set and/or cgroup2 in use.
>

Nice catch Konstantin

Are you planing sending an iproute2/ss patch to output NET_DIAG_CLASS_ID values then ?

Thanks.
Konstantin Khlebnikov Feb. 13, 2019, 12:28 p.m. UTC | #3
On 12.02.2019 21:37, David Miller wrote:
> From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Date: Sat, 09 Feb 2019 13:35:52 +0300
> 
>> Field idiag_ext in struct inet_diag_req_v2 used as bitmap of requested
>> extensions has only 8 bits. Thus extensions starting from DCTCPINFO
>> cannot be requested directly. Some of them included into response
>> unconditionally or hook into some of lower 8 bits.
>>
>> Extension INET_DIAG_CLASS_ID has not way to request from the beginning.
>>
>> This patch bundle it with INET_DIAG_TCLASS (ipv6 tos), fixes space
>> reservation, and documents behavior for other extensions.
>>
>> Also this patch adds fallback to reporting socket priority. This filed
>> is more widely used for traffic classification because ipv4 sockets
>> automatically maps TOS to priority and default qdisc pfifo_fast knows
>> about that. But priority could be changed via setsockopt SO_PRIORITY so
>> INET_DIAG_TOS isn't enough for predicting class.
>>
>> Also cgroup2 obsoletes net_cls classid (it always zero), but we cannot
>> reuse this field for reporting cgroup2 id because it is 64-bit (ino+gen).
>>
>> So, after this patch INET_DIAG_CLASS_ID will report socket priority
>> for most common setup when net_cls isn't set and/or cgroup2 in use.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> Fixes: 0888e372c37f ("net: inet: diag: expose sockets cgroup classid")
> 
> Applied, and queued up for -stable.
> 
> Please always put the Fixes: tag first in the list of tags.  I fixed
> it up for you this time.

Ok. Never heard about that rule, checkpatch.pl doesn't complain about that too.

> 
> Thanks.
>

Patch
diff mbox series

diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index 14565d703291..e8baca85bac6 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -137,15 +137,21 @@  enum {
 	INET_DIAG_TCLASS,
 	INET_DIAG_SKMEMINFO,
 	INET_DIAG_SHUTDOWN,
-	INET_DIAG_DCTCPINFO,
-	INET_DIAG_PROTOCOL,  /* response attribute only */
+
+	/*
+	 * Next extenstions cannot be requested in struct inet_diag_req_v2:
+	 * its field idiag_ext has only 8 bits.
+	 */
+
+	INET_DIAG_DCTCPINFO,	/* request as INET_DIAG_VEGASINFO */
+	INET_DIAG_PROTOCOL,	/* response attribute only */
 	INET_DIAG_SKV6ONLY,
 	INET_DIAG_LOCALS,
 	INET_DIAG_PEERS,
 	INET_DIAG_PAD,
-	INET_DIAG_MARK,
-	INET_DIAG_BBRINFO,
-	INET_DIAG_CLASS_ID,
+	INET_DIAG_MARK,		/* only with CAP_NET_ADMIN */
+	INET_DIAG_BBRINFO,	/* request as INET_DIAG_VEGASINFO */
+	INET_DIAG_CLASS_ID,	/* request as INET_DIAG_TCLASS */
 	INET_DIAG_MD5SIG,
 	__INET_DIAG_MAX,
 };
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 1a4e9ff02762..5731670c560b 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -108,6 +108,7 @@  static size_t inet_sk_attr_size(struct sock *sk,
 		+ nla_total_size(1) /* INET_DIAG_TOS */
 		+ nla_total_size(1) /* INET_DIAG_TCLASS */
 		+ nla_total_size(4) /* INET_DIAG_MARK */
+		+ nla_total_size(4) /* INET_DIAG_CLASS_ID */
 		+ nla_total_size(sizeof(struct inet_diag_meminfo))
 		+ nla_total_size(sizeof(struct inet_diag_msg))
 		+ nla_total_size(SK_MEMINFO_VARS * sizeof(u32))
@@ -287,12 +288,19 @@  int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk,
 			goto errout;
 	}
 
-	if (ext & (1 << (INET_DIAG_CLASS_ID - 1))) {
+	if (ext & (1 << (INET_DIAG_CLASS_ID - 1)) ||
+	    ext & (1 << (INET_DIAG_TCLASS - 1))) {
 		u32 classid = 0;
 
 #ifdef CONFIG_SOCK_CGROUP_DATA
 		classid = sock_cgroup_classid(&sk->sk_cgrp_data);
 #endif
+		/* Fallback to socket priority if class id isn't set.
+		 * Classful qdiscs use it as direct reference to class.
+		 * For cgroup2 classid is always zero.
+		 */
+		if (!classid)
+			classid = sk->sk_priority;
 
 		if (nla_put_u32(skb, INET_DIAG_CLASS_ID, classid))
 			goto errout;
diff --git a/net/sctp/diag.c b/net/sctp/diag.c
index 078f01a8d582..435847d98b51 100644
--- a/net/sctp/diag.c
+++ b/net/sctp/diag.c
@@ -256,6 +256,7 @@  static size_t inet_assoc_attr_size(struct sctp_association *asoc)
 		+ nla_total_size(1) /* INET_DIAG_TOS */
 		+ nla_total_size(1) /* INET_DIAG_TCLASS */
 		+ nla_total_size(4) /* INET_DIAG_MARK */
+		+ nla_total_size(4) /* INET_DIAG_CLASS_ID */
 		+ nla_total_size(addrlen * asoc->peer.transport_count)
 		+ nla_total_size(addrlen * addrcnt)
 		+ nla_total_size(sizeof(struct inet_diag_meminfo))