diff mbox

[v2,net-next] ipv6: router reachability probing

Message ID d483b5eedcbfedc2002c458b15586679e7795399.1386766078.git.jbenc@redhat.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Jiri Benc Dec. 11, 2013, 12:48 p.m. UTC
RFC 4191 states in 3.5:

   When a host avoids using any non-reachable router X and instead sends
   a data packet to another router Y, and the host would have used
   router X if router X were reachable, then the host SHOULD probe each
   such router X's reachability by sending a single Neighbor
   Solicitation to that router's address.  A host MUST NOT probe a
   router's reachability in the absence of useful traffic that the host
   would have sent to the router if it were reachable.  In any case,
   these probes MUST be rate-limited to no more than one per minute per
   router.

Currently, when the neighbour corresponding to a router falls into
NUD_FAILED, it's never considered again. Introduce a new rt6_nud_state
value, RT6_NUD_FAIL_PROBE, which suggests the route should not be used but
should be probed with a single NS. The probe is ratelimited by the existing
code. To better distinguish meanings of the failure values, rename
RT6_NUD_FAIL_SOFT to RT6_NUD_FAIL_DO_RR.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
v2: put the neighbor into NUD_PROBE instead of accepting NAs in the failed
state
---
 include/net/neighbour.h |    1 +
 net/core/neighbour.c    |   15 +++++++++++++++
 net/ipv6/route.c        |   16 ++++++++++------
 3 files changed, 26 insertions(+), 6 deletions(-)

Comments

Hannes Frederic Sowa Dec. 11, 2013, 7:10 p.m. UTC | #1
On Wed, Dec 11, 2013 at 01:48:20PM +0100, Jiri Benc wrote:
> RFC 4191 states in 3.5:
> 
>    When a host avoids using any non-reachable router X and instead sends
>    a data packet to another router Y, and the host would have used
>    router X if router X were reachable, then the host SHOULD probe each
>    such router X's reachability by sending a single Neighbor
>    Solicitation to that router's address.  A host MUST NOT probe a
>    router's reachability in the absence of useful traffic that the host
>    would have sent to the router if it were reachable.  In any case,
>    these probes MUST be rate-limited to no more than one per minute per
>    router.
> 
> Currently, when the neighbour corresponding to a router falls into
> NUD_FAILED, it's never considered again. Introduce a new rt6_nud_state
> value, RT6_NUD_FAIL_PROBE, which suggests the route should not be used but
> should be probed with a single NS. The probe is ratelimited by the existing
> code. To better distinguish meanings of the failure values, rename
> RT6_NUD_FAIL_SOFT to RT6_NUD_FAIL_DO_RR.
> 
> Signed-off-by: Jiri Benc <jbenc@redhat.com>

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Looks good, thanks!

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 11, 2013, 9:03 p.m. UTC | #2
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Wed, 11 Dec 2013 20:10:23 +0100

> On Wed, Dec 11, 2013 at 01:48:20PM +0100, Jiri Benc wrote:
>> RFC 4191 states in 3.5:
>> 
>>    When a host avoids using any non-reachable router X and instead sends
>>    a data packet to another router Y, and the host would have used
>>    router X if router X were reachable, then the host SHOULD probe each
>>    such router X's reachability by sending a single Neighbor
>>    Solicitation to that router's address.  A host MUST NOT probe a
>>    router's reachability in the absence of useful traffic that the host
>>    would have sent to the router if it were reachable.  In any case,
>>    these probes MUST be rate-limited to no more than one per minute per
>>    router.
>> 
>> Currently, when the neighbour corresponding to a router falls into
>> NUD_FAILED, it's never considered again. Introduce a new rt6_nud_state
>> value, RT6_NUD_FAIL_PROBE, which suggests the route should not be used but
>> should be probed with a single NS. The probe is ratelimited by the existing
>> code. To better distinguish meanings of the failure values, rename
>> RT6_NUD_FAIL_SOFT to RT6_NUD_FAIL_DO_RR.
>> 
>> Signed-off-by: Jiri Benc <jbenc@redhat.com>
> 
> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> 
> Looks good, thanks!

Applied, thanks guys.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michal Kubecek Dec. 11, 2013, 11:12 p.m. UTC | #3
On Wed, Dec 11, 2013 at 01:48:20PM +0100, Jiri Benc wrote:
> RFC 4191 states in 3.5:
> 
>    When a host avoids using any non-reachable router X and instead sends
>    a data packet to another router Y, and the host would have used
>    router X if router X were reachable, then the host SHOULD probe each
>    such router X's reachability by sending a single Neighbor
>    Solicitation to that router's address.  A host MUST NOT probe a
>    router's reachability in the absence of useful traffic that the host
>    would have sent to the router if it were reachable.  In any case,
>    these probes MUST be rate-limited to no more than one per minute per
>    router.
> 
> Currently, when the neighbour corresponding to a router falls into
> NUD_FAILED, it's never considered again.

Is it really the case in current mainline kernels? In my tests, this
behaviour in 3.0 kernel (SLES 11 SP3) was caused by the reference held
by struct dst_entry which caused that in neigh_periodic_work(),
n->refcnt was always bigger than one so that the neighbour entry was
never cleaned up.  But when I tested with 3.11.6 (OpenSuSE 13.1) where
neighbour is no longer cached in struct dst_entry, the neighbour was
cleaned up eventually and new lookup was performed.

I believe the patch would be useful anyway as it would speed up the
detection that the router is reachable again, I just want to make sure
my analysis wasn't completely wrong.

                                                        Michal Kubecek

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 12, 2013, 12:56 a.m. UTC | #4
From: Michal Kubecek <mkubecek@suse.cz>
Date: Thu, 12 Dec 2013 00:12:36 +0100

> Is it really the case in current mainline kernels? In my tests, this
> behaviour in 3.0 kernel (SLES 11 SP3) was caused by the reference held
> by struct dst_entry which caused that in neigh_periodic_work(),
> n->refcnt was always bigger than one so that the neighbour entry was
> never cleaned up.  But when I tested with 3.11.6 (OpenSuSE 13.1) where
> neighbour is no longer cached in struct dst_entry, the neighbour was
> cleaned up eventually and new lookup was performed.

The detachment of neighbours from route entires has had many consequences,
both desirable and undesirable.  This happens to be one of the former,
fortunately :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Dec. 12, 2013, 1:21 a.m. UTC | #5
On Thu, Dec 12, 2013 at 12:12:36AM +0100, Michal Kubecek wrote:
> On Wed, Dec 11, 2013 at 01:48:20PM +0100, Jiri Benc wrote:
> > RFC 4191 states in 3.5:
> > 
> >    When a host avoids using any non-reachable router X and instead sends
> >    a data packet to another router Y, and the host would have used
> >    router X if router X were reachable, then the host SHOULD probe each
> >    such router X's reachability by sending a single Neighbor
> >    Solicitation to that router's address.  A host MUST NOT probe a
> >    router's reachability in the absence of useful traffic that the host
> >    would have sent to the router if it were reachable.  In any case,
> >    these probes MUST be rate-limited to no more than one per minute per
> >    router.
> > 
> > Currently, when the neighbour corresponding to a router falls into
> > NUD_FAILED, it's never considered again.
> 
> Is it really the case in current mainline kernels? In my tests, this
> behaviour in 3.0 kernel (SLES 11 SP3) was caused by the reference held
> by struct dst_entry which caused that in neigh_periodic_work(),
> n->refcnt was always bigger than one so that the neighbour entry was
> never cleaned up.  But when I tested with 3.11.6 (OpenSuSE 13.1) where
> neighbour is no longer cached in struct dst_entry, the neighbour was
> cleaned up eventually and new lookup was performed.

IMHO the wording is a big too strong. The *particular* neighbour is
never considered again as long as it survives in the NUD_FAILED state
(depending on the state of the reference counter this could be between
base_reachable_time and infinity in case of bugs ;) ).

But probing should happen at least every 60 seconds (or
router_probe_interval), which did not happen before this patch.

> I believe the patch would be useful anyway as it would speed up the
> detection that the router is reachable again, I just want to make sure
> my analysis wasn't completely wrong.

Yeah, I think your analysis is correct. Also you can see that (at least) I did
not considered this worth for -stable but only for net-next. ;)

Greetings,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Dec. 12, 2013, 1:26 a.m. UTC | #6
On Wed, Dec 11, 2013 at 07:56:51PM -0500, David Miller wrote:
> From: Michal Kubecek <mkubecek@suse.cz>
> Date: Thu, 12 Dec 2013 00:12:36 +0100
> 
> > Is it really the case in current mainline kernels? In my tests, this
> > behaviour in 3.0 kernel (SLES 11 SP3) was caused by the reference held
> > by struct dst_entry which caused that in neigh_periodic_work(),
> > n->refcnt was always bigger than one so that the neighbour entry was
> > never cleaned up.  But when I tested with 3.11.6 (OpenSuSE 13.1) where
> > neighbour is no longer cached in struct dst_entry, the neighbour was
> > cleaned up eventually and new lookup was performed.
> 
> The detachment of neighbours from route entires has had many consequences,
> both desirable and undesirable.  This happens to be one of the former,
> fortunately :-)

In the end it was worth it. It fixed some very annoying bugs.

But while I wrote the previouss answer I noticed that we should actually
move the state of the probes from neigh->update to the rt6_info structure
as we cannot guarantee it will survive the 60 seconds in failed state. :/

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michal Kubecek Dec. 12, 2013, 9:10 a.m. UTC | #7
On Thu, Dec 12, 2013 at 02:21:15AM +0100, Hannes Frederic Sowa wrote:
> 
> Yeah, I think your analysis is correct. Also you can see that (at least) I did
> not considered this worth for -stable but only for net-next. ;)

On the other hand, the fix could be worth backporting to 3.4 stable
(more general, pre-3.6 stable branches) as the problem is much more
severe there.

                                                     Michal Kubecek

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Dec. 12, 2013, 7:08 p.m. UTC | #8
On Thu, Dec 12, 2013 at 10:10:05AM +0100, Michal Kubecek wrote:
> On Thu, Dec 12, 2013 at 02:21:15AM +0100, Hannes Frederic Sowa wrote:
> > 
> > Yeah, I think your analysis is correct. Also you can see that (at least) I did
> > not considered this worth for -stable but only for net-next. ;)
> 
> On the other hand, the fix could be worth backporting to 3.4 stable
> (more general, pre-3.6 stable branches) as the problem is much more
> severe there.

Could you give it a try? I am not so sure if it really helps, maybe only in
cases where more than one router is available.

Greetings,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michal Kubecek Dec. 13, 2013, 12:51 p.m. UTC | #9
On Thu, Dec 12, 2013 at 08:08:10PM +0100, Hannes Frederic Sowa wrote:
> On Thu, Dec 12, 2013 at 10:10:05AM +0100, Michal Kubecek wrote:
> > 
> > On the other hand, the fix could be worth backporting to 3.4 stable
> > (more general, pre-3.6 stable branches) as the problem is much more
> > severe there.
> 
> Could you give it a try? I am not so sure if it really helps, maybe only in
> cases where more than one router is available.

I'll send a backport next week so that there is enough time for a review
before the original patch gets into mainline.

                                                          Michal Kubecek

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 41b1ce6c96a8..4c09bd23b832 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -252,6 +252,7 @@  static inline struct neighbour *neigh_create(struct neigh_table *tbl,
 void neigh_destroy(struct neighbour *neigh);
 int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb);
 int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, u32 flags);
+void __neigh_set_probe_once(struct neighbour *neigh);
 void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev);
 int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
 int neigh_resolve_output(struct neighbour *neigh, struct sk_buff *skb);
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 8be82a7f0e0f..bf6f404c04aa 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1238,6 +1238,21 @@  out:
 }
 EXPORT_SYMBOL(neigh_update);
 
+/* Update the neigh to listen temporarily for probe responses, even if it is
+ * in a NUD_FAILED state. The caller has to hold neigh->lock for writing.
+ */
+void __neigh_set_probe_once(struct neighbour *neigh)
+{
+	neigh->updated = jiffies;
+	if (!(neigh->nud_state & NUD_FAILED))
+		return;
+	neigh->nud_state = NUD_PROBE;
+	atomic_set(&neigh->probes, NEIGH_VAR(neigh->parms, UCAST_PROBES));
+	neigh_add_timer(neigh,
+			jiffies + NEIGH_VAR(neigh->parms, RETRANS_TIME));
+}
+EXPORT_SYMBOL(__neigh_set_probe_once);
+
 struct neighbour *neigh_event_ns(struct neigh_table *tbl,
 				 u8 *lladdr, void *saddr,
 				 struct net_device *dev)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index ddb9d41c8eea..a1a57523b158 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -66,8 +66,9 @@ 
 #endif
 
 enum rt6_nud_state {
-	RT6_NUD_FAIL_HARD = -2,
-	RT6_NUD_FAIL_SOFT = -1,
+	RT6_NUD_FAIL_HARD = -3,
+	RT6_NUD_FAIL_PROBE = -2,
+	RT6_NUD_FAIL_DO_RR = -1,
 	RT6_NUD_SUCCEED = 1
 };
 
@@ -521,7 +522,7 @@  static void rt6_probe(struct rt6_info *rt)
 		work = kmalloc(sizeof(*work), GFP_ATOMIC);
 
 		if (neigh && work)
-			neigh->updated = jiffies;
+			__neigh_set_probe_once(neigh);
 
 		if (neigh)
 			write_unlock(&neigh->lock);
@@ -577,11 +578,13 @@  static inline enum rt6_nud_state rt6_check_neigh(struct rt6_info *rt)
 #ifdef CONFIG_IPV6_ROUTER_PREF
 		else if (!(neigh->nud_state & NUD_FAILED))
 			ret = RT6_NUD_SUCCEED;
+		else
+			ret = RT6_NUD_FAIL_PROBE;
 #endif
 		read_unlock(&neigh->lock);
 	} else {
 		ret = IS_ENABLED(CONFIG_IPV6_ROUTER_PREF) ?
-		      RT6_NUD_SUCCEED : RT6_NUD_FAIL_SOFT;
+		      RT6_NUD_SUCCEED : RT6_NUD_FAIL_DO_RR;
 	}
 	rcu_read_unlock_bh();
 
@@ -618,16 +621,17 @@  static struct rt6_info *find_match(struct rt6_info *rt, int oif, int strict,
 		goto out;
 
 	m = rt6_score_route(rt, oif, strict);
-	if (m == RT6_NUD_FAIL_SOFT) {
+	if (m == RT6_NUD_FAIL_DO_RR) {
 		match_do_rr = true;
 		m = 0; /* lowest valid score */
-	} else if (m < 0) {
+	} else if (m == RT6_NUD_FAIL_HARD) {
 		goto out;
 	}
 
 	if (strict & RT6_LOOKUP_F_REACHABLE)
 		rt6_probe(rt);
 
+	/* note that m can be RT6_NUD_FAIL_PROBE at this point */
 	if (m > *mpri) {
 		*do_rr = match_do_rr;
 		*mpri = m;