diff mbox series

[net-next] net: DSCP in IPv4 routing v2

Message ID 20201121182250.661bfee5@192-168-1-16.tpgi.com.au
State Superseded
Headers show
Series [net-next] net: DSCP in IPv4 routing v2 | expand

Commit Message

Russell Strong Nov. 21, 2020, 8:24 a.m. UTC
From 2f27f92d5a6f4dd69ac4af32cdb51ba8d2083606 Mon Sep 17 00:00:00 2001
From: Russell Strong <russell@strong.id.au>
Date: Sat, 21 Nov 2020 18:12:43 +1000
Subject: [PATCH] DSCP in IPv4 routing v2

This patch allows the use of DSCP values in routing

The following macros define the scope of IPv4 TOS
bits used in routing decisions:

IPTOS_TOS (include/uapi/linux/ip.h)
IPTOS_TOS_MASK (include/uapi/linux/ip.h)
IPTOS_RT_MASK (include/net/route.h)
RT_TOS (include/net/in_route.h)

To expand the number of bits out to the set of DSCP
bits, two macros are added:

(hash)define IP_DSCP_MASK 0xfc
(hash)define IP_DSCP ((dscp)&IP_DSCP_MASK)

Use of TOS macros are replaced with DSCP macros
where the change does not change the user space API
with one exception:

net/ipv4/fib_rules.c has been changed to accept a
wider range of values ( dscp values ).  Previously
this would have returned an error.

iproute2 already supports setting dscp values through
ip route add dsfield <dscp value> lookup ......

Signed-off-by: Russell Strong <russell@strong.id.au>
---
 .../ethernet/mellanox/mlx5/core/en/tc_tun.c   |  2 +-
 drivers/net/geneve.c                          |  4 ++--
 drivers/net/ipvlan/ipvlan_core.c              |  2 +-
 drivers/net/ppp/pptp.c                        |  2 +-
 drivers/net/vrf.c                             |  2 +-
 drivers/net/vxlan.c                           |  4 ++--
 include/net/ip.h                              |  2 +-
 include/net/route.h                           |  6 ++----
 include/uapi/linux/ip.h                       |  2 ++
 net/bridge/br_netfilter_hooks.c               |  2 +-
 net/core/filter.c                             |  4 ++--
 net/core/lwt_bpf.c                            |  2 +-
 net/ipv4/fib_frontend.c                       |  2 +-
 net/ipv4/fib_rules.c                          |  2 +-
 net/ipv4/icmp.c                               |  6 +++---
 net/ipv4/ip_gre.c                             |  2 +-
 net/ipv4/ip_output.c                          |  2 +-
 net/ipv4/ip_tunnel.c                          |  6 +++---
 net/ipv4/ipmr.c                               |  6 +++---
 net/ipv4/netfilter.c                          |  2 +-
 net/ipv4/netfilter/ipt_rpfilter.c             |  2 +-
 net/ipv4/netfilter/nf_dup_ipv4.c              |  2 +-
 net/ipv4/route.c                              | 20 +++++++++----------
 net/ipv6/ip6_output.c                         |  2 +-
 net/ipv6/ip6_tunnel.c                         |  4 ++--
 net/ipv6/sit.c                                |  4 ++--
 net/xfrm/xfrm_policy.c                        |  2 +-
 27 files changed, 49 insertions(+), 49 deletions(-)

Comments

Guillaume Nault Nov. 23, 2020, 10:55 p.m. UTC | #1
On Sat, Nov 21, 2020 at 06:24:46PM +1000, Russell Strong wrote:
> From 2f27f92d5a6f4dd69ac4af32cdb51ba8d2083606 Mon Sep 17 00:00:00 2001
> From: Russell Strong <russell@strong.id.au>
> Date: Sat, 21 Nov 2020 18:12:43 +1000
> Subject: [PATCH] DSCP in IPv4 routing v2
> 
> This patch allows the use of DSCP values in routing

Thanks. There are some problems with this patch though.

About the email:
  * Why did you duplicate email headers in the body?
  * For the subject, please put the "v2" in the "[PATCH ... ]" part.
  * You're modifying many files, but haven't Cc-ed any of their authors
    or maintainers.
  * The patch content is corrupted.

> Use of TOS macros are replaced with DSCP macros
> where the change does not change the user space API
> with one exception:
> 
> net/ipv4/fib_rules.c has been changed to accept a
> wider range of values ( dscp values ).  Previously
> this would have returned an error.

Have you really verified that replacing each of these RT_TOS calls had
no unwanted side effect?

RT_TOS didn't clear the second lowest bit, while the new IP_DSCP does.
Therefore, there's no guarantee that such a blanket replacement isn't
going to change existing behaviours. Replacements have to be done
step by step and accompanied by an explanation of why they're safe.

BTW, I think there are some problems with RT_TOS that need to be fixed
separately first.

For example some of the ip6_make_flowinfo() calls can probably
erroneously mark some packets with ECT(0). Instead of masking the
problem in this patch, I think it'd be better to have an explicit fix
that'd mask the ECN bits in ip6_make_flowinfo() and drop the buggy
RT_TOS() in the callers.

Another example is inet_rtm_getroute(). It calls
ip_route_output_key_hash_rcu() without masking the tos field first.
Therefore it can return a different route than what the routing code
would actually use. Like for the ip6_make_flowinfo() case, it might
be better to stop relying on the callers to mask ECN bits and do that
in ip_route_output_key_hash_rcu() instead.

I'll verify that these two problems can actually happen in practice
and will send patches if necessary.

> iproute2 already supports setting dscp values through
> ip route add dsfield <dscp value> lookup ......
> 
> Signed-off-by: Russell Strong <russell@strong.id.au>
> ---
>  .../ethernet/mellanox/mlx5/core/en/tc_tun.c   |  2 +-
>  drivers/net/geneve.c                          |  4 ++--
>  drivers/net/ipvlan/ipvlan_core.c              |  2 +-
>  drivers/net/ppp/pptp.c                        |  2 +-
>  drivers/net/vrf.c                             |  2 +-
>  drivers/net/vxlan.c                           |  4 ++--
>  include/net/ip.h                              |  2 +-
>  include/net/route.h                           |  6 ++----
>  include/uapi/linux/ip.h                       |  2 ++
>  net/bridge/br_netfilter_hooks.c               |  2 +-
>  net/core/filter.c                             |  4 ++--
>  net/core/lwt_bpf.c                            |  2 +-
>  net/ipv4/fib_frontend.c                       |  2 +-
>  net/ipv4/fib_rules.c                          |  2 +-
>  net/ipv4/icmp.c                               |  6 +++---
>  net/ipv4/ip_gre.c                             |  2 +-
>  net/ipv4/ip_output.c                          |  2 +-
>  net/ipv4/ip_tunnel.c                          |  6 +++---
>  net/ipv4/ipmr.c                               |  6 +++---
>  net/ipv4/netfilter.c                          |  2 +-
>  net/ipv4/netfilter/ipt_rpfilter.c             |  2 +-
>  net/ipv4/netfilter/nf_dup_ipv4.c              |  2 +-
>  net/ipv4/route.c                              | 20 +++++++++----------
>  net/ipv6/ip6_output.c                         |  2 +-
>  net/ipv6/ip6_tunnel.c                         |  4 ++--
>  net/ipv6/sit.c                                |  4 ++--
>  net/xfrm/xfrm_policy.c                        |  2 +-
>  27 files changed, 49 insertions(+), 49 deletions(-)
Russell Strong Nov. 24, 2020, 2:41 a.m. UTC | #2
On Mon, 23 Nov 2020 23:55:05 +0100
Guillaume Nault <gnault@redhat.com> wrote:

> On Sat, Nov 21, 2020 at 06:24:46PM +1000, Russell Strong wrote:
> > From 2f27f92d5a6f4dd69ac4af32cdb51ba8d2083606 Mon Sep 17 00:00:00 2001
> > From: Russell Strong <russell@strong.id.au>
> > Date: Sat, 21 Nov 2020 18:12:43 +1000
> > Subject: [PATCH] DSCP in IPv4 routing v2
> > 
> > This patch allows the use of DSCP values in routing  
> 
> Thanks. There are some problems with this patch though.
> 
> About the email:
>   * Why did you duplicate email headers in the body?
>   * For the subject, please put the "v2" in the "[PATCH ... ]" part.
>   * You're modifying many files, but haven't Cc-ed any of their authors
>     or maintainers.
>   * The patch content is corrupted.

I'm still quite new to this.  I used git format-patch then inserted
into claws.....  I have since read the doc on email clients and
switched off autowrapping :)

I was wondering if one patch would be acceptable, or should it be broken
up?  If broken up. It would not make sense to apply 1/2 of them.

> 
> > Use of TOS macros are replaced with DSCP macros
> > where the change does not change the user space API
> > with one exception:
> > 
> > net/ipv4/fib_rules.c has been changed to accept a
> > wider range of values ( dscp values ).  Previously
> > this would have returned an error.  
> 
> Have you really verified that replacing each of these RT_TOS calls had
> no unwanted side effect?
> 
> RT_TOS didn't clear the second lowest bit, while the new IP_DSCP does.
> Therefore, there's no guarantee that such a blanket replacement isn't
> going to change existing behaviours. Replacements have to be done
> step by step and accompanied by an explanation of why they're safe.

Original TOS did not use this bit until it was added in RFC1349 as "lowcost".
The DSCP change (RFC2474) marked these as currently unused, but worse than that,
with the introduction of ECN, both of those now "unused" bits are for ECN.
Other parts of the kernel are using those bits for ECN, so bit 1 probably
shouldn't be used in routing anymore as congestion could create unexpected
routing behaviour, i.e. fib_rules


> 
> BTW, I think there are some problems with RT_TOS that need to be fixed
> separately first.
> 
> For example some of the ip6_make_flowinfo() calls can probably
> erroneously mark some packets with ECT(0). Instead of masking the
> problem in this patch, I think it'd be better to have an explicit fix
> that'd mask the ECN bits in ip6_make_flowinfo() and drop the buggy
> RT_TOS() in the callers.
> 
> Another example is inet_rtm_getroute(). It calls
> ip_route_output_key_hash_rcu() without masking the tos field first.

Should rtm->tos be checked for validity in inet_rtm_valid_getroute_req? Seems
like it was missed.  That would make the mask unnecessary, but...  It's like
wack a mole.


> Therefore it can return a different route than what the routing code
> would actually use. Like for the ip6_make_flowinfo() case, it might
> be better to stop relying on the callers to mask ECN bits and do that
> in ip_route_output_key_hash_rcu() instead.

In this context one of the ECN bits is not an ECN bit, as can be seen by

#define RT_FL_TOS(oldflp4) \
        ((oldflp4)->flowi4_tos & (IP_DSCP_MASK | RTO_ONLINK))

It's all a bit messy and spread about.  Reducing the distributed nature of
the masking would be good.



> I'll verify that these two problems can actually happen in practice
> and will send patches if necessary.

Thanks


> 
> > iproute2 already supports setting dscp values through
> > ip route add dsfield <dscp value> lookup ......
> > 
> > Signed-off-by: Russell Strong <russell@strong.id.au>
> > ---
> >  .../ethernet/mellanox/mlx5/core/en/tc_tun.c   |  2 +-
> >  drivers/net/geneve.c                          |  4 ++--
> >  drivers/net/ipvlan/ipvlan_core.c              |  2 +-
> >  drivers/net/ppp/pptp.c                        |  2 +-
> >  drivers/net/vrf.c                             |  2 +-
> >  drivers/net/vxlan.c                           |  4 ++--
> >  include/net/ip.h                              |  2 +-
> >  include/net/route.h                           |  6 ++----
> >  include/uapi/linux/ip.h                       |  2 ++
> >  net/bridge/br_netfilter_hooks.c               |  2 +-
> >  net/core/filter.c                             |  4 ++--
> >  net/core/lwt_bpf.c                            |  2 +-
> >  net/ipv4/fib_frontend.c                       |  2 +-
> >  net/ipv4/fib_rules.c                          |  2 +-
> >  net/ipv4/icmp.c                               |  6 +++---
> >  net/ipv4/ip_gre.c                             |  2 +-
> >  net/ipv4/ip_output.c                          |  2 +-
> >  net/ipv4/ip_tunnel.c                          |  6 +++---
> >  net/ipv4/ipmr.c                               |  6 +++---
> >  net/ipv4/netfilter.c                          |  2 +-
> >  net/ipv4/netfilter/ipt_rpfilter.c             |  2 +-
> >  net/ipv4/netfilter/nf_dup_ipv4.c              |  2 +-
> >  net/ipv4/route.c                              | 20 +++++++++----------
> >  net/ipv6/ip6_output.c                         |  2 +-
> >  net/ipv6/ip6_tunnel.c                         |  4 ++--
> >  net/ipv6/sit.c                                |  4 ++--
> >  net/xfrm/xfrm_policy.c                        |  2 +-
> >  27 files changed, 49 insertions(+), 49 deletions(-)  
>
Guillaume Nault Nov. 24, 2020, 3:22 p.m. UTC | #3
On Tue, Nov 24, 2020 at 12:41:49PM +1000, Russell Strong wrote:
> On Mon, 23 Nov 2020 23:55:05 +0100 Guillaume Nault <gnault@redhat.com> wrote:
> > On Sat, Nov 21, 2020 at 06:24:46PM +1000, Russell Strong wrote:
> 
> I was wondering if one patch would be acceptable, or should it be broken
> up?  If broken up. It would not make sense to apply 1/2 of them.

A patch series would be applied in its entirety or not applied at all.
However, it's not acceptable to temporarily bring regressions in one
patch and fix it later in the series. The tree has to remain
bisectable.

Anyway, I believe there's no need to replace all the TOS macros in the
same patch series. DSCP doesn't have to be enabled everywhere at once.
Small, targeted, patch series are much easier to review.

> > RT_TOS didn't clear the second lowest bit, while the new IP_DSCP does.
> > Therefore, there's no guarantee that such a blanket replacement isn't
> > going to change existing behaviours. Replacements have to be done
> > step by step and accompanied by an explanation of why they're safe.
> 
> Original TOS did not use this bit until it was added in RFC1349 as "lowcost".
> The DSCP change (RFC2474) marked these as currently unused, but worse than that,
> with the introduction of ECN, both of those now "unused" bits are for ECN.
> Other parts of the kernel are using those bits for ECN, so bit 1 probably
> shouldn't be used in routing anymore as congestion could create unexpected
> routing behaviour, i.e. fib_rules

The IETF meaning and history of these bits are well understood. But we
can't write patches based on assumptions like "bit 1 probably shouldn't
be used". The actual code is what matters. That's why, again, changes
have to be done incrementally and in a reviewable manner.

> > For example some of the ip6_make_flowinfo() calls can probably
> > erroneously mark some packets with ECT(0). Instead of masking the
> > problem in this patch, I think it'd be better to have an explicit fix
> > that'd mask the ECN bits in ip6_make_flowinfo() and drop the buggy
> > RT_TOS() in the callers.
> > 
> > Another example is inet_rtm_getroute(). It calls
> > ip_route_output_key_hash_rcu() without masking the tos field first.
> 
> Should rtm->tos be checked for validity in inet_rtm_valid_getroute_req? Seems
> like it was missed.

Well, I don't think so. inet_rtm_valid_getroute_req() is supposed to
return an error if a parameter is wrong. Verifying ->tos should have
been done since day 1, yes. However, in practice, we've been accepting
any value for years. That's the kind of user space behaviour that we
can't really change. The only solution I can see is to mask the ECN
bits silently. That way, users can still pass whatever they like (we
won't break any script), but the result will be right (that is,
consistent with what routing does).

> > Therefore it can return a different route than what the routing code
> > would actually use. Like for the ip6_make_flowinfo() case, it might
> > be better to stop relying on the callers to mask ECN bits and do that
> > in ip_route_output_key_hash_rcu() instead.
> 
> In this context one of the ECN bits is not an ECN bit, as can be seen by
> 
> #define RT_FL_TOS(oldflp4) \
>         ((oldflp4)->flowi4_tos & (IP_DSCP_MASK | RTO_ONLINK))

The RTO_ONLINK flag would have to be passed in a different way. Not a
trivial task (many places to audit), but that looks feasible.

> It's all a bit messy and spread about.  Reducing the distributed nature of
> the masking would be good.

Yes, that's why I'd like to stop sprinkling RT_TOS everywhere and mask
the bits in central places when possible. Once the RT_TOS situation
improves, adding DSCP support will be much easier.

> > I'll verify that these two problems can actually happen in practice
> > and will send patches if necessary.
> 
> Thanks
>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c
b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c index
90930e54b6f2..b0c766216a2c 100644 ---
a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c +++
b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c @@ -364,7 +364,7
@@ int mlx5e_tc_tun_create_header_ipv6(struct mlx5e_priv *priv, 
 	ttl = tun_key->ttl;
 
-	fl6.flowlabel = ip6_make_flowinfo(RT_TOS(tun_key->tos),
tun_key->label);
+	fl6.flowlabel = ip6_make_flowinfo(IP_DSCP(tun_key->tos),
tun_key->label); fl6.daddr = tun_key->u.ipv6.dst;
 	fl6.saddr = tun_key->u.ipv6.src;
 
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index a3c8ce6deb93..1c20acc649ef 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -797,7 +797,7 @@  static struct rtable *geneve_get_v4_rt(struct
sk_buff *skb, tos = ip_tunnel_get_dsfield(ip_hdr(skb), skb);
 		use_cache = false;
 	}
-	fl4->flowi4_tos = RT_TOS(tos);
+	fl4->flowi4_tos = IP_DSCP(tos);
 
 	dst_cache = (struct dst_cache *)&info->dst_cache;
 	if (use_cache) {
@@ -851,7 +851,7 @@  static struct dst_entry *geneve_get_v6_dst(struct
sk_buff *skb, use_cache = false;
 	}
 
-	fl6->flowlabel = ip6_make_flowinfo(RT_TOS(prio),
+	fl6->flowlabel = ip6_make_flowinfo(IP_DSCP(prio),
 					   info->key.label);
 	dst_cache = (struct dst_cache *)&info->dst_cache;
 	if (use_cache) {
diff --git a/drivers/net/ipvlan/ipvlan_core.c
b/drivers/net/ipvlan/ipvlan_core.c index 8801d093135c..d50e4163d0e0
100644 --- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -421,7 +421,7 @@  static int ipvlan_process_v4_outbound(struct
sk_buff *skb) int err, ret = NET_XMIT_DROP;
 	struct flowi4 fl4 = {
 		.flowi4_oif = dev->ifindex,
-		.flowi4_tos = RT_TOS(ip4h->tos),
+		.flowi4_tos = IP_DSCP(ip4h->tos),
 		.flowi4_flags = FLOWI_FLAG_ANYSRC,
 		.flowi4_mark = skb->mark,
 		.daddr = ip4h->daddr,
diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
index ee5058445d06..3f29a1690955 100644
--- a/drivers/net/ppp/pptp.c
+++ b/drivers/net/ppp/pptp.c
@@ -155,7 +155,7 @@  static int pptp_xmit(struct ppp_channel *chan,
struct sk_buff *skb) opt->dst_addr.sin_addr.s_addr,
 				   opt->src_addr.sin_addr.s_addr,
 				   0, 0, IPPROTO_GRE,
-				   RT_TOS(0), sk->sk_bound_dev_if);
+				   IP_DSCP(0), sk->sk_bound_dev_if);
 	if (IS_ERR(rt))
 		goto tx_error;
 
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index f2793ffde191..09f4058a2c52 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -534,7 +534,7 @@  static netdev_tx_t vrf_process_v4_outbound(struct
sk_buff *skb, /* needed to match OIF rule */
 	fl4.flowi4_oif = vrf_dev->ifindex;
 	fl4.flowi4_iif = LOOPBACK_IFINDEX;
-	fl4.flowi4_tos = RT_TOS(ip4h->tos);
+	fl4.flowi4_tos = IP_DSCP(ip4h->tos);
 	fl4.flowi4_flags = FLOWI_FLAG_ANYSRC | FLOWI_FLAG_SKIP_NH_OIF;
 	fl4.flowi4_proto = ip4h->protocol;
 	fl4.daddr = ip4h->daddr;
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 236fcc55a5fd..59c4e7f466ab 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2412,7 +2412,7 @@  static struct rtable *vxlan_get_route(struct
vxlan_dev *vxlan, struct net_device 
 	memset(&fl4, 0, sizeof(fl4));
 	fl4.flowi4_oif = oif;
-	fl4.flowi4_tos = RT_TOS(tos);
+	fl4.flowi4_tos = IP_DSCP(tos);
 	fl4.flowi4_mark = skb->mark;
 	fl4.flowi4_proto = IPPROTO_UDP;
 	fl4.daddr = daddr;
@@ -2469,7 +2469,7 @@  static struct dst_entry *vxlan6_get_route(struct
vxlan_dev *vxlan, fl6.flowi6_oif = oif;
 	fl6.daddr = *daddr;
 	fl6.saddr = *saddr;
-	fl6.flowlabel = ip6_make_flowinfo(RT_TOS(tos), label);
+	fl6.flowlabel = ip6_make_flowinfo(IP_DSCP(tos), label);
 	fl6.flowi6_mark = skb->mark;
 	fl6.flowi6_proto = IPPROTO_UDP;
 	fl6.fl6_dport = dport;
diff --git a/include/net/ip.h b/include/net/ip.h
index e20874059f82..9df0734c7e29 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -241,7 +241,7 @@  static inline struct sk_buff *ip_finish_skb(struct
sock *sk, struct flowi4 *fl4) 
 static inline __u8 get_rttos(struct ipcm_cookie* ipc, struct inet_sock
*inet) {
-	return (ipc->tos != -1) ? RT_TOS(ipc->tos) : RT_TOS(inet->tos);
+	return (ipc->tos != -1) ? IP_DSCP(ipc->tos) :
IP_DSCP(inet->tos); }
 
 static inline __u8 get_rtconn_flags(struct ipcm_cookie* ipc, struct
sock* sk) diff --git a/include/net/route.h b/include/net/route.h
index ff021cab657e..123d151ef47c 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -40,8 +40,8 @@ 
 
 #define RTO_ONLINK	0x01
 
-#define RT_CONN_FLAGS(sk)   (RT_TOS(inet_sk(sk)->tos) | sock_flag(sk,
SOCK_LOCALROUTE)) -#define RT_CONN_FLAGS_TOS(sk,tos)   (RT_TOS(tos) |
sock_flag(sk, SOCK_LOCALROUTE)) +#define RT_CONN_FLAGS(sk)
(IP_DSCP(inet_sk(sk)->tos) | sock_flag(sk, SOCK_LOCALROUTE)) +#define
RT_CONN_FLAGS_TOS(sk,tos)   (IP_DSCP(tos) | sock_flag(sk,
SOCK_LOCALROUTE)) struct fib_nh;
 struct fib_info;
@@ -255,8 +255,6 @@  static inline void ip_rt_put(struct rtable *rt)
 	dst_release(&rt->dst);
 }
 
-#define IPTOS_RT_MASK	(IPTOS_TOS_MASK & ~3)
-
 extern const __u8 ip_tos2prio[16];
 
 static inline char rt_tos2priority(u8 tos)
diff --git a/include/uapi/linux/ip.h b/include/uapi/linux/ip.h
index e42d13b55cf3..2519e779e9ad 100644
--- a/include/uapi/linux/ip.h
+++ b/include/uapi/linux/ip.h
@@ -38,6 +38,8 @@ 
 #define IPTOS_PREC_PRIORITY             0x20
 #define IPTOS_PREC_ROUTINE              0x00
 
+#define IP_DSCP_MASK		0xfc
+#define IP_DSCP(dscp)		((dscp)&IP_DSCP_MASK)
 
 /* IP options */
 #define IPOPT_COPY		0x80
diff --git a/net/bridge/br_netfilter_hooks.c
b/net/bridge/br_netfilter_hooks.c index 04c3f9a82650..fea45a94125e
100644 --- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -379,7 +379,7 @@  static int br_nf_pre_routing_finish(struct net
*net, struct sock *sk, struct sk_ goto free_skb;
 
 			rt = ip_route_output(net, iph->daddr, 0,
-					     RT_TOS(iph->tos), 0);
+					     IP_DSCP(iph->tos), 0);
 			if (!IS_ERR(rt)) {
 				/* - Bridged-and-DNAT'ed traffic
doesn't
 				 *   require ip_forwarding. */
diff --git a/net/core/filter.c b/net/core/filter.c
index 2ca5eecebacf..83c3011326dd 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2345,7 +2345,7 @@  static int __bpf_redirect_neigh_v4(struct sk_buff
*skb, struct net_device *dev, struct flowi4 fl4 = {
 			.flowi4_flags = FLOWI_FLAG_ANYSRC,
 			.flowi4_mark  = skb->mark,
-			.flowi4_tos   = RT_TOS(ip4h->tos),
+			.flowi4_tos   = IP_DSCP(ip4h->tos),
 			.flowi4_oif   = dev->ifindex,
 			.flowi4_proto = ip4h->protocol,
 			.daddr	      = ip4h->daddr,
@@ -5309,7 +5309,7 @@  static int bpf_ipv4_fib_lookup(struct net *net,
struct bpf_fib_lookup *params, fl4.flowi4_iif = params->ifindex;
 		fl4.flowi4_oif = 0;
 	}
-	fl4.flowi4_tos = params->tos & IPTOS_RT_MASK;
+	fl4.flowi4_tos = params->tos & IP_DSCP_MASK;
 	fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
 	fl4.flowi4_flags = 0;
 
diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index 7d3438215f32..0757a36030b3 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -206,7 +206,7 @@  static int bpf_lwt_xmit_reroute(struct sk_buff *skb)
 		fl4.flowi4_oif = oif;
 		fl4.flowi4_mark = skb->mark;
 		fl4.flowi4_uid = sock_net_uid(net, sk);
-		fl4.flowi4_tos = RT_TOS(iph->tos);
+		fl4.flowi4_tos = IP_DSCP(iph->tos);
 		fl4.flowi4_flags = FLOWI_FLAG_ANYSRC;
 		fl4.flowi4_proto = iph->protocol;
 		fl4.daddr = iph->daddr;
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 86a23e4a6a50..0f07f0f0bc17 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -292,7 +292,7 @@  __be32 fib_compute_spec_dst(struct sk_buff *skb)
 			.flowi4_iif = LOOPBACK_IFINDEX,
 			.flowi4_oif = l3mdev_master_ifindex_rcu(dev),
 			.daddr = ip_hdr(skb)->saddr,
-			.flowi4_tos = RT_TOS(ip_hdr(skb)->tos),
+			.flowi4_tos = IP_DSCP(ip_hdr(skb)->tos),
 			.flowi4_scope = scope,
 			.flowi4_mark = vmark ? skb->mark : 0,
 		};
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index ce54a30c2ef1..1e75bb3b2f25 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -229,7 +229,7 @@  static int fib4_rule_configure(struct fib_rule
*rule, struct sk_buff *skb, int err = -EINVAL;
 	struct fib4_rule *rule4 = (struct fib4_rule *) rule;
 
-	if (frh->tos & ~IPTOS_TOS_MASK) {
+	if (frh->tos & ~IP_DSCP_MASK) {
 		NL_SET_ERR_MSG(extack, "Invalid tos");
 		goto errout;
 	}
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 005faea415a4..3f6f7c64902f 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -444,7 +444,7 @@  static void icmp_reply(struct icmp_bxm *icmp_param,
struct sk_buff *skb) fl4.saddr = saddr;
 	fl4.flowi4_mark = mark;
 	fl4.flowi4_uid = sock_net_uid(net, NULL);
-	fl4.flowi4_tos = RT_TOS(ip_hdr(skb)->tos);
+	fl4.flowi4_tos = IP_DSCP(ip_hdr(skb)->tos);
 	fl4.flowi4_proto = IPPROTO_ICMP;
 	fl4.flowi4_oif = l3mdev_master_ifindex(skb->dev);
 	security_skb_classify_flow(skb, flowi4_to_flowi(&fl4));
@@ -496,7 +496,7 @@  static struct rtable *icmp_route_lookup(struct net
*net, fl4->saddr = saddr;
 	fl4->flowi4_mark = mark;
 	fl4->flowi4_uid = sock_net_uid(net, NULL);
-	fl4->flowi4_tos = RT_TOS(tos);
+	fl4->flowi4_tos = IP_DSCP(tos);
 	fl4->flowi4_proto = IPPROTO_ICMP;
 	fl4->fl4_icmp_type = type;
 	fl4->fl4_icmp_code = code;
@@ -544,7 +544,7 @@  static struct rtable *icmp_route_lookup(struct net
*net, orefdst = skb_in->_skb_refdst; /* save old refdst */
 		skb_dst_set(skb_in, NULL);
 		err = ip_route_input(skb_in, fl4_dec.daddr,
fl4_dec.saddr,
-				     RT_TOS(tos), rt2->dst.dev);
+				     IP_DSCP(tos), rt2->dst.dev);
 
 		dst_release(&rt2->dst);
 		rt2 = skb_rtable(skb_in);
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index a68bf4c6fe9b..6bf61a994c19 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -882,7 +882,7 @@  static int ipgre_open(struct net_device *dev)
 					 t->parms.iph.daddr,
 					 t->parms.iph.saddr,
 					 t->parms.o_key,
-					 RT_TOS(t->parms.iph.tos),
+					 IP_DSCP(t->parms.iph.tos),
 					 t->parms.link);
 		if (IS_ERR(rt))
 			return -EADDRNOTAVAIL;
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 879b76ae4435..6a459283ef82 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1694,7 +1694,7 @@  void ip_send_unicast_reply(struct sock *sk,
struct sk_buff *skb, 
 	flowi4_init_output(&fl4, oif,
 			   IP4_REPLY_MARK(net, skb->mark) ?:
sk->sk_mark,
-			   RT_TOS(arg->tos),
+			   IP_DSCP(arg->tos),
 			   RT_SCOPE_UNIVERSE, ip_hdr(skb)->protocol,
 			   ip_reply_arg_flowi_flags(arg),
 			   daddr, saddr,
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index ee65c9225178..2ca0a3f6c29c 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -294,7 +294,7 @@  static int ip_tunnel_bind_dev(struct net_device
*dev) 
 		ip_tunnel_init_flow(&fl4, iph->protocol, iph->daddr,
 				    iph->saddr, tunnel->parms.o_key,
-				    RT_TOS(iph->tos),
tunnel->parms.link,
+				    IP_DSCP(iph->tos),
tunnel->parms.link, tunnel->fwmark, 0);
 		rt = ip_route_output_key(tunnel->net, &fl4);
 
@@ -565,7 +565,7 @@  void ip_md_tunnel_xmit(struct sk_buff *skb, struct
net_device *dev, tos = ipv6_get_dsfield((const struct ipv6hdr
*)inner_iph); }
 	ip_tunnel_init_flow(&fl4, proto, key->u.ipv4.dst,
key->u.ipv4.src,
-			    tunnel_id_to_key32(key->tun_id),
RT_TOS(tos),
+			    tunnel_id_to_key32(key->tun_id),
IP_DSCP(tos), 0, skb->mark, skb_get_hash(skb));
 	if (tunnel->encap.type != TUNNEL_ENCAP_NONE)
 		goto tx_error;
@@ -722,7 +722,7 @@  void ip_tunnel_xmit(struct sk_buff *skb, struct
net_device *dev, }
 
 	ip_tunnel_init_flow(&fl4, protocol, dst, tnl_params->saddr,
-			    tunnel->parms.o_key, RT_TOS(tos),
tunnel->parms.link,
+			    tunnel->parms.o_key, IP_DSCP(tos),
tunnel->parms.link, tunnel->fwmark, skb_get_hash(skb));
 
 	if (ip_tunnel_encap(skb, tunnel, &protocol, &fl4) < 0)
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 939792a38814..7806b5c04970 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1840,7 +1840,7 @@  static void ipmr_queue_xmit(struct net *net,
struct mr_table *mrt, vif->remote, vif->local,
 					   0, 0,
 					   IPPROTO_IPIP,
-					   RT_TOS(iph->tos),
vif->link);
+					   IP_DSCP(iph->tos),
vif->link); if (IS_ERR(rt))
 			goto out_free;
 		encap = sizeof(struct iphdr);
@@ -1848,7 +1848,7 @@  static void ipmr_queue_xmit(struct net *net,
struct mr_table *mrt, rt = ip_route_output_ports(net, &fl4, NULL,
iph->daddr, 0, 0, 0,
 					   IPPROTO_IPIP,
-					   RT_TOS(iph->tos),
vif->link);
+					   IP_DSCP(iph->tos),
vif->link); if (IS_ERR(rt))
 			goto out_free;
 	}
@@ -2048,7 +2048,7 @@  static struct mr_table *ipmr_rt_fib_lookup(struct
net *net, struct sk_buff *skb) struct flowi4 fl4 = {
 		.daddr = iph->daddr,
 		.saddr = iph->saddr,
-		.flowi4_tos = RT_TOS(iph->tos),
+		.flowi4_tos = IP_DSCP(iph->tos),
 		.flowi4_oif = (rt_is_output_route(rt) ?
 			       skb->dev->ifindex : 0),
 		.flowi4_iif = (rt_is_output_route(rt) ?
diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c
index 7c841037c533..aa9f5322a489 100644
--- a/net/ipv4/netfilter.c
+++ b/net/ipv4/netfilter.c
@@ -42,7 +42,7 @@  int ip_route_me_harder(struct net *net, struct sock
*sk, struct sk_buff *skb, un */
 	fl4.daddr = iph->daddr;
 	fl4.saddr = saddr;
-	fl4.flowi4_tos = RT_TOS(iph->tos);
+	fl4.flowi4_tos = IP_DSCP(iph->tos);
 	fl4.flowi4_oif = sk ? sk->sk_bound_dev_if : 0;
 	if (!fl4.flowi4_oif)
 		fl4.flowi4_oif = l3mdev_master_ifindex(dev);
diff --git a/net/ipv4/netfilter/ipt_rpfilter.c
b/net/ipv4/netfilter/ipt_rpfilter.c index cc23f1ce239c..5e952661a5ea
100644 --- a/net/ipv4/netfilter/ipt_rpfilter.c
+++ b/net/ipv4/netfilter/ipt_rpfilter.c
@@ -76,7 +76,7 @@  static bool rpfilter_mt(const struct sk_buff *skb,
struct xt_action_param *par) flow.daddr = iph->saddr;
 	flow.saddr = rpfilter_get_saddr(iph->daddr);
 	flow.flowi4_mark = info->flags & XT_RPFILTER_VALID_MARK ?
skb->mark : 0;
-	flow.flowi4_tos = RT_TOS(iph->tos);
+	flow.flowi4_tos = IP_DSCP(iph->tos);
 	flow.flowi4_scope = RT_SCOPE_UNIVERSE;
 	flow.flowi4_oif = l3mdev_master_ifindex_rcu(xt_in(par));
 
diff --git a/net/ipv4/netfilter/nf_dup_ipv4.c
b/net/ipv4/netfilter/nf_dup_ipv4.c index 6cc5743c553a..d2613828a0ec
100644 --- a/net/ipv4/netfilter/nf_dup_ipv4.c
+++ b/net/ipv4/netfilter/nf_dup_ipv4.c
@@ -32,7 +32,7 @@  static bool nf_dup_ipv4_route(struct net *net, struct
sk_buff *skb, fl4.flowi4_oif = oif;
 
 	fl4.daddr = gw->s_addr;
-	fl4.flowi4_tos = RT_TOS(iph->tos);
+	fl4.flowi4_tos = IP_DSCP(iph->tos);
 	fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
 	fl4.flowi4_flags = FLOWI_FLAG_KNOWN_NH;
 	rt = ip_route_output_key(net, &fl4);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c962f0d96d8d..1ae7f5d668c9 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -113,7 +113,7 @@ 
 #include "fib_lookup.h"
 
 #define RT_FL_TOS(oldflp4) \
-	((oldflp4)->flowi4_tos & (IPTOS_RT_MASK | RTO_ONLINK))
+	((oldflp4)->flowi4_tos & (IP_DSCP_MASK | RTO_ONLINK))
 
 #define RT_GC_TIMEOUT (300*HZ)
 
@@ -549,7 +549,7 @@  static void build_skb_flow_key(struct flowi4 *fl4,
const struct sk_buff *skb, const struct net *net = dev_net(skb->dev);
 	const struct iphdr *iph = ip_hdr(skb);
 	int oif = skb->dev->ifindex;
-	u8 tos = RT_TOS(iph->tos);
+	u8 tos = IP_DSCP(iph->tos);
 	u8 prot = iph->protocol;
 	u32 mark = skb->mark;
 
@@ -825,7 +825,7 @@  static void ip_do_redirect(struct dst_entry *dst,
struct sock *sk, struct sk_buf const struct iphdr *iph = (const struct
iphdr *) skb->data; struct net *net = dev_net(skb->dev);
 	int oif = skb->dev->ifindex;
-	u8 tos = RT_TOS(iph->tos);
+	u8 tos = IP_DSCP(iph->tos);
 	u8 prot = iph->protocol;
 	u32 mark = skb->mark;
 
@@ -1073,7 +1073,7 @@  void ipv4_update_pmtu(struct sk_buff *skb, struct
net *net, u32 mtu, u32 mark = IP4_REPLY_MARK(net, skb->mark);
 
 	__build_flow_key(net, &fl4, NULL, iph, oif,
-			 RT_TOS(iph->tos), protocol, mark, 0);
+			 IP_DSCP(iph->tos), protocol, mark, 0);
 	rt = __ip_route_output_key(net, &fl4);
 	if (!IS_ERR(rt)) {
 		__ip_rt_update_pmtu(rt, &fl4, mtu);
@@ -1162,7 +1162,7 @@  void ipv4_redirect(struct sk_buff *skb, struct
net *net, struct rtable *rt;
 
 	__build_flow_key(net, &fl4, NULL, iph, oif,
-			 RT_TOS(iph->tos), protocol, 0, 0);
+			 IP_DSCP(iph->tos), protocol, 0, 0);
 	rt = __ip_route_output_key(net, &fl4);
 	if (!IS_ERR(rt)) {
 		__ip_do_redirect(rt, skb, &fl4, false);
@@ -1274,7 +1274,7 @@  void ip_rt_get_source(u8 *addr, struct sk_buff
*skb, struct rtable *rt) struct flowi4 fl4 = {
 			.daddr = iph->daddr,
 			.saddr = iph->saddr,
-			.flowi4_tos = RT_TOS(iph->tos),
+			.flowi4_tos = IP_DSCP(iph->tos),
 			.flowi4_oif = rt->dst.dev->ifindex,
 			.flowi4_iif = skb->dev->ifindex,
 			.flowi4_mark = skb->mark,
@@ -2055,7 +2055,7 @@  int ip_route_use_hint(struct sk_buff *skb, __be32
daddr, __be32 saddr, if (rt->rt_type != RTN_LOCAL)
 		goto skip_validate_source;
 
-	tos &= IPTOS_RT_MASK;
+	tos &= IP_DSCP_MASK;
 	err = fib_validate_source(skb, saddr, daddr, tos, 0, dev,
in_dev, &tag); if (err < 0)
 		goto martian_source;
@@ -2298,7 +2298,7 @@  int ip_route_input_noref(struct sk_buff *skb,
__be32 daddr, __be32 saddr, struct fib_result res;
 	int err;
 
-	tos &= IPTOS_RT_MASK;
+	tos &= IP_DSCP_MASK;
 	rcu_read_lock();
 	err = ip_route_input_rcu(skb, daddr, saddr, tos, dev, &res);
 	rcu_read_unlock();
@@ -2499,7 +2499,7 @@  struct rtable *ip_route_output_key_hash(struct
net *net, struct flowi4 *fl4, struct rtable *rth;
 
 	fl4->flowi4_iif = LOOPBACK_IFINDEX;
-	fl4->flowi4_tos = tos & IPTOS_RT_MASK;
+	fl4->flowi4_tos = tos & IP_DSCP_MASK;
 	fl4->flowi4_scope = ((tos & RTO_ONLINK) ?
 			 RT_SCOPE_LINK : RT_SCOPE_UNIVERSE);
 
@@ -2808,7 +2808,7 @@  struct rtable *ip_route_output_tunnel(struct
sk_buff *skb, fl4.daddr = info->key.u.ipv4.dst;
 	fl4.saddr = info->key.u.ipv4.src;
 	tos = info->key.tos;
-	fl4.flowi4_tos = RT_TOS(tos);
+	fl4.flowi4_tos = IP_DSCP(tos);
 
 	rt = ip_route_output_key(net, &fl4);
 	if (IS_ERR(rt)) {
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 749ad72386b2..1cd6f7e7bc13 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1243,7 +1243,7 @@  struct dst_entry *ip6_dst_lookup_tunnel(struct
sk_buff *skb, fl6.daddr = info->key.u.ipv6.dst;
 	fl6.saddr = info->key.u.ipv6.src;
 	prio = info->key.tos;
-	fl6.flowlabel = ip6_make_flowinfo(RT_TOS(prio),
+	fl6.flowlabel = ip6_make_flowinfo(IP_DSCP(prio),
 					  info->key.label);
 
 	dst = ipv6_stub->ipv6_dst_lookup_flow(net, sock->sk, &fl6,
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index a7950baa05e5..ef1c880da186 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -612,7 +612,7 @@  ip4ip6_err(struct sk_buff *skb, struct
inet6_skb_parm *opt, 
 	/* Try to guess incoming interface */
 	rt = ip_route_output_ports(dev_net(skb->dev), &fl4, NULL,
eiph->saddr,
-				   0, 0, 0, IPPROTO_IPIP,
RT_TOS(eiph->tos), 0);
+				   0, 0, 0, IPPROTO_IPIP,
IP_DSCP(eiph->tos), 0); if (IS_ERR(rt))
 		goto out;
 
@@ -623,7 +623,7 @@  ip4ip6_err(struct sk_buff *skb, struct
inet6_skb_parm *opt, if (rt->rt_flags & RTCF_LOCAL) {
 		rt = ip_route_output_ports(dev_net(skb->dev), &fl4,
NULL, eiph->daddr, eiph->saddr, 0, 0,
-					   IPPROTO_IPIP,
RT_TOS(eiph->tos), 0);
+					   IPPROTO_IPIP,
IP_DSCP(eiph->tos), 0); if (IS_ERR(rt) || rt->dst.dev->type !=
ARPHRD_TUNNEL6) { if (!IS_ERR(rt))
 				ip_rt_put(rt);
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 2da0ee703779..5149ed121e6b 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -937,7 +937,7 @@  static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff
*skb, }
 
 	flowi4_init_output(&fl4, tunnel->parms.link, tunnel->fwmark,
-			   RT_TOS(tos), RT_SCOPE_UNIVERSE,
IPPROTO_IPV6,
+			   IP_DSCP(tos), RT_SCOPE_UNIVERSE,
IPPROTO_IPV6, 0, dst, tiph->saddr, 0, 0,
 			   sock_net_uid(tunnel->net, NULL));
 
@@ -1112,7 +1112,7 @@  static void ipip6_tunnel_bind_dev(struct
net_device *dev) iph->daddr, iph->saddr,
 							  0, 0,
 							  IPPROTO_IPV6,
-
RT_TOS(iph->tos),
+
IP_DSCP(iph->tos), tunnel->parms.link);
 
 		if (!IS_ERR(rt)) {
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index d622c2548d22..0425cc597a98 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2450,7 +2450,7 @@  xfrm_tmpl_resolve(struct xfrm_policy **pols, int
npols, const struct flowi *fl, static int xfrm_get_tos(const struct
flowi *fl, int family) {
 	if (family == AF_INET)
-		return IPTOS_RT_MASK & fl->u.ip4.flowi4_tos;
+		return IP_DSCP_MASK & fl->u.ip4.flowi4_tos;
 
 	return 0;
 }