diff mbox series

[net-next,v3,2/2] ipv4: use dst hint for ipv4 list receive

Message ID f242674de1892d14ed602047c3817cc7212a618d.1574165644.git.pabeni@redhat.com
State Changes Requested
Delegated to: David Miller
Headers show
Series net: introduce and use route hint | expand

Commit Message

Paolo Abeni Nov. 19, 2019, 2:38 p.m. UTC
This is alike the previous change, with some additional ipv4 specific
quirk. Even when using the route hint we still have to do perform
additional per packet checks about source address validity: a new
helper is added to wrap them.

To keep the code as simple as possible, use  hints for local destination
only.

UDP flood performances vs recvmmsg() receiver:

vanilla		patched		delta
Kpps		Kpps		%
1683		1871		+11

In the worst case scenario - each packet has a different
destination address - the performance delta is within noise
range.

v2 -> v3:
 - really fix build (sic) and hint usage check
 - use fib4_has_custom_rules() helpers (David A.)
 - add ip_extract_route_hint() helper (Edward C.)
 - use prev skb as hint instead of copying data (Willem)

v1 -> v2:
 - fix build issue with !CONFIG_IP_MULTIPLE_TABLES

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/ip_fib.h    | 10 ++++++++++
 include/net/route.h     |  4 ++++
 net/ipv4/fib_frontend.c | 10 ----------
 net/ipv4/ip_input.c     | 35 +++++++++++++++++++++++++++++++----
 net/ipv4/route.c        | 37 +++++++++++++++++++++++++++++++++++++
 5 files changed, 82 insertions(+), 14 deletions(-)

Comments

David Ahern Nov. 19, 2019, 4 p.m. UTC | #1
On 11/19/19 7:38 AM, Paolo Abeni wrote:
> @@ -535,11 +550,20 @@ static void ip_sublist_rcv_finish(struct list_head *head)
>  	}
>  }
>  
> +static struct sk_buff *ip_extract_route_hint(struct net *net,
> +					     struct sk_buff *skb, int rt_type)
> +{
> +	if (fib4_has_custom_rules(net) || rt_type != RTN_LOCAL)

Why the local only limitation for v4 but not v6? Really, why limit this
to LOCAL at all? same destination with same tos and no custom rules
means even for forwarding the lookup should be the same and you can
re-use the dst.
Paolo Abeni Nov. 19, 2019, 4:20 p.m. UTC | #2
On Tue, 2019-11-19 at 09:00 -0700, David Ahern wrote:
> On 11/19/19 7:38 AM, Paolo Abeni wrote:
> > @@ -535,11 +550,20 @@ static void ip_sublist_rcv_finish(struct list_head *head)
> >  	}
> >  }
> >  
> > +static struct sk_buff *ip_extract_route_hint(struct net *net,
> > +					     struct sk_buff *skb, int rt_type)
> > +{
> > +	if (fib4_has_custom_rules(net) || rt_type != RTN_LOCAL)
> 
> Why the local only limitation for v4 but not v6? Really, why limit this
> to LOCAL at all? 

The goal here was to simplify as much as possible the ipv4
ip_route_use_hint() helper, as its complexity raised some eyebrown.

Yes, hints can be used also for forwarding. I'm unsure how much will
help, given the daddr contraint. If there is agreement I can re-add it.

Thanks,

Paolo
Paolo Abeni Nov. 19, 2019, 5:33 p.m. UTC | #3
On Tue, 2019-11-19 at 17:20 +0100, Paolo Abeni wrote:
> On Tue, 2019-11-19 at 09:00 -0700, David Ahern wrote:
> > On 11/19/19 7:38 AM, Paolo Abeni wrote:
> > > @@ -535,11 +550,20 @@ static void ip_sublist_rcv_finish(struct list_head *head)
> > >  	}
> > >  }
> > >  
> > > +static struct sk_buff *ip_extract_route_hint(struct net *net,
> > > +					     struct sk_buff *skb, int rt_type)
> > > +{
> > > +	if (fib4_has_custom_rules(net) || rt_type != RTN_LOCAL)
> > 
> > Why the local only limitation for v4 but not v6? Really, why limit this
> > to LOCAL at all? 
> 
> The goal here was to simplify as much as possible the ipv4
> ip_route_use_hint() helper, as its complexity raised some eyebrown.
> 
> Yes, hints can be used also for forwarding. I'm unsure how much will
> help, given the daddr contraint. If there is agreement I can re-add it.

Sorry, I forgot to ask: would you be ok enabling the route hint for
!RTN_BROADCAST, as in the previous iteration? Covering RTN_BROADCAST
will add quite a bit of complexity to ip_route_use_hint(), likely with
no relevant use-case.

Thanks!

Paolo
David Ahern Nov. 19, 2019, 5:45 p.m. UTC | #4
On 11/19/19 10:33 AM, Paolo Abeni wrote:
> On Tue, 2019-11-19 at 17:20 +0100, Paolo Abeni wrote:
>> On Tue, 2019-11-19 at 09:00 -0700, David Ahern wrote:
>>> On 11/19/19 7:38 AM, Paolo Abeni wrote:
>>>> @@ -535,11 +550,20 @@ static void ip_sublist_rcv_finish(struct list_head *head)
>>>>  	}
>>>>  }
>>>>  
>>>> +static struct sk_buff *ip_extract_route_hint(struct net *net,
>>>> +					     struct sk_buff *skb, int rt_type)
>>>> +{
>>>> +	if (fib4_has_custom_rules(net) || rt_type != RTN_LOCAL)
>>>
>>> Why the local only limitation for v4 but not v6? Really, why limit this
>>> to LOCAL at all? 
>>
>> The goal here was to simplify as much as possible the ipv4
>> ip_route_use_hint() helper, as its complexity raised some eyebrown.
>>
>> Yes, hints can be used also for forwarding. I'm unsure how much will
>> help, given the daddr contraint. If there is agreement I can re-add it.
> 
> Sorry, I forgot to ask: would you be ok enabling the route hint for
> !RTN_BROADCAST, as in the previous iteration? Covering RTN_BROADCAST
> will add quite a bit of complexity to ip_route_use_hint(), likely with
> no relevant use-case.
> 

It is a trade-off of too many checks which just add overhead to the
packets that can not benefit from re-use. I was trying to understand why
local delivery was given preference.
diff mbox series

Patch

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 52b2406a5dfc..8e65e3e0a948 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -311,6 +311,11 @@  static inline int fib_lookup(struct net *net, const struct flowi4 *flp,
 	return err;
 }
 
+static inline bool fib4_has_custom_rules(struct net *net)
+{
+	return false;
+}
+
 static inline bool fib4_rule_default(const struct fib_rule *rule)
 {
 	return true;
@@ -378,6 +383,11 @@  static inline int fib_lookup(struct net *net, struct flowi4 *flp,
 	return err;
 }
 
+static inline bool fib4_has_custom_rules(struct net *net)
+{
+	return net->ipv4.fib_has_custom_rules;
+}
+
 bool fib4_rule_default(const struct fib_rule *rule);
 int fib4_rules_dump(struct net *net, struct notifier_block *nb,
 		    struct netlink_ext_ack *extack);
diff --git a/include/net/route.h b/include/net/route.h
index 6c516840380d..a9c60fc68e36 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -185,6 +185,10 @@  int ip_route_input_rcu(struct sk_buff *skb, __be32 dst, __be32 src,
 		       u8 tos, struct net_device *devin,
 		       struct fib_result *res);
 
+int ip_route_use_hint(struct sk_buff *skb, __be32 dst, __be32 src,
+		      u8 tos, struct net_device *devin,
+		      const struct sk_buff *hint);
+
 static inline int ip_route_input(struct sk_buff *skb, __be32 dst, __be32 src,
 				 u8 tos, struct net_device *devin)
 {
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 71c78d223dfd..577db1d50a24 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -70,11 +70,6 @@  static int __net_init fib4_rules_init(struct net *net)
 	fib_free_table(main_table);
 	return -ENOMEM;
 }
-
-static bool fib4_has_custom_rules(struct net *net)
-{
-	return false;
-}
 #else
 
 struct fib_table *fib_new_table(struct net *net, u32 id)
@@ -131,11 +126,6 @@  struct fib_table *fib_get_table(struct net *net, u32 id)
 	}
 	return NULL;
 }
-
-static bool fib4_has_custom_rules(struct net *net)
-{
-	return net->ipv4.fib_has_custom_rules;
-}
 #endif /* CONFIG_IP_MULTIPLE_TABLES */
 
 static void fib_replace_table(struct net *net, struct fib_table *old,
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 24a95126e698..e992f90586f3 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -302,16 +302,31 @@  static inline bool ip_rcv_options(struct sk_buff *skb, struct net_device *dev)
 	return true;
 }
 
+static bool ip_can_use_hint(struct sk_buff *skb, const struct iphdr *iph,
+			    const struct sk_buff *hint)
+{
+	return hint && !skb_dst(skb) && ip_hdr(hint)->daddr == iph->daddr &&
+	       ip_hdr(hint)->tos == iph->tos;
+}
+
 INDIRECT_CALLABLE_DECLARE(int udp_v4_early_demux(struct sk_buff *));
 INDIRECT_CALLABLE_DECLARE(int tcp_v4_early_demux(struct sk_buff *));
 static int ip_rcv_finish_core(struct net *net, struct sock *sk,
-			      struct sk_buff *skb, struct net_device *dev)
+			      struct sk_buff *skb, struct net_device *dev,
+			      const struct sk_buff *hint)
 {
 	const struct iphdr *iph = ip_hdr(skb);
 	int (*edemux)(struct sk_buff *skb);
 	struct rtable *rt;
 	int err;
 
+	if (ip_can_use_hint(skb, iph, hint)) {
+		err = ip_route_use_hint(skb, iph->daddr, iph->saddr, iph->tos,
+					dev, hint);
+		if (unlikely(err))
+			goto drop_error;
+	}
+
 	if (net->ipv4.sysctl_ip_early_demux &&
 	    !skb_dst(skb) &&
 	    !skb->sk &&
@@ -408,7 +423,7 @@  static int ip_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
 	if (!skb)
 		return NET_RX_SUCCESS;
 
-	ret = ip_rcv_finish_core(net, sk, skb, dev);
+	ret = ip_rcv_finish_core(net, sk, skb, dev, NULL);
 	if (ret != NET_RX_DROP)
 		ret = dst_input(skb);
 	return ret;
@@ -535,11 +550,20 @@  static void ip_sublist_rcv_finish(struct list_head *head)
 	}
 }
 
+static struct sk_buff *ip_extract_route_hint(struct net *net,
+					     struct sk_buff *skb, int rt_type)
+{
+	if (fib4_has_custom_rules(net) || rt_type != RTN_LOCAL)
+		return NULL;
+
+	return skb;
+}
+
 static void ip_list_rcv_finish(struct net *net, struct sock *sk,
 			       struct list_head *head)
 {
+	struct sk_buff *skb, *next, *hint = NULL;
 	struct dst_entry *curr_dst = NULL;
-	struct sk_buff *skb, *next;
 	struct list_head sublist;
 
 	INIT_LIST_HEAD(&sublist);
@@ -554,11 +578,14 @@  static void ip_list_rcv_finish(struct net *net, struct sock *sk,
 		skb = l3mdev_ip_rcv(skb);
 		if (!skb)
 			continue;
-		if (ip_rcv_finish_core(net, sk, skb, dev) == NET_RX_DROP)
+		if (ip_rcv_finish_core(net, sk, skb, dev, hint) == NET_RX_DROP)
 			continue;
 
 		dst = skb_dst(skb);
 		if (curr_dst != dst) {
+			hint = ip_extract_route_hint(net, skb,
+					       ((struct rtable *)dst)->rt_type);
+
 			/* dispatch old sublist */
 			if (!list_empty(&sublist))
 				ip_sublist_rcv_finish(&sublist);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index dcc4fa10138d..7083cfa9f0a5 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2019,10 +2019,47 @@  static int ip_mkroute_input(struct sk_buff *skb,
 	return __mkroute_input(skb, res, in_dev, daddr, saddr, tos);
 }
 
+/* Implements all the saddr-related checks as ip_route_input_slow(),
+ * assuming daddr is valid and destination is local.
+ * Uses the provided hint instead of performing a route lookup.
+ */
+int ip_route_use_hint(struct sk_buff *skb, __be32 daddr, __be32 saddr,
+		      u8 tos, struct net_device *dev,
+		      const struct sk_buff *hint)
+{
+	struct in_device *in_dev = __in_dev_get_rcu(dev);
+	struct net *net = dev_net(dev);
+	int err = -EINVAL;
+	u32 tag = 0;
+
+	if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr))
+		goto martian_source;
+
+	if (ipv4_is_zeronet(saddr))
+		goto martian_source;
+
+	if (ipv4_is_loopback(saddr) && !IN_DEV_NET_ROUTE_LOCALNET(in_dev, net))
+		goto martian_source;
+
+	tos &= IPTOS_RT_MASK;
+	err = fib_validate_source(skb, saddr, daddr, tos, 0, dev, in_dev, &tag);
+	if (err < 0)
+		goto martian_source;
+
+	skb_dst_copy(skb, hint);
+	return 0;
+
+martian_source:
+	ip_handle_martian_source(dev, in_dev, skb, daddr, saddr);
+	return err;
+}
+
 /*
  *	NOTE. We drop all the packets that has local source
  *	addresses, because every properly looped back packet
  *	must have correct destination already attached by output routine.
+ *	Changes in the enforced policies must be applied also to
+ *	ip_route_use_hint().
  *
  *	Such approach solves two big problems:
  *	1. Not simplex devices are handled properly.