diff mbox

xfrm: SAD entries do not expire correctly after suspend-resume

Message ID 20091108211249.2ecdfd38@penta.localdomain
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Yury Polyanskiy Nov. 9, 2009, 2:12 a.m. UTC
This fixes the following bug in the current implementation of
net/xfrm: SAD entries timeouts do not count the time spent by the machine 
in the suspended state. This leads to the connectivity problems because 
after resuming local machine thinks that the SAD entry is still valid, while 
it has already been expired on the remote server.

  The cause of this is very simple: the timeouts in the net/xfrm are bound to 
the old mod_timer() timers. This patch reassigns them to the
CLOCK_REALTIME hrtimer.

  I have been using this version of the patch for a few months on my
machines without any problems. Also run a few stress tests w/o any
issues.

  This version of the patch uses tasklet_hrtimer by Peter Zijlstra
(commit 9ba5f0).

  This patch is against 2.6.31.4. Please CC me.

Signed-off-by: Yury Polyanskiy <polyanskiy@gmail.com>
---
 include/net/xfrm.h    |    5 ++++-
 net/xfrm/xfrm_state.c |   34 +++++++++++++++++++---------------
 2 files changed, 23 insertions(+), 16 deletions(-)

Comments

David Miller Nov. 9, 2009, 4:58 a.m. UTC | #1
From: Yury Polyanskiy <ypolyans@princeton.edu>
Date: Sun, 8 Nov 2009 21:12:49 -0500

> 
>   This fixes the following bug in the current implementation of
> net/xfrm: SAD entries timeouts do not count the time spent by the machine 
> in the suspended state. This leads to the connectivity problems because 
> after resuming local machine thinks that the SAD entry is still valid, while 
> it has already been expired on the remote server.
 ...
> Signed-off-by: Yury Polyanskiy <polyanskiy@gmail.com>

Applied to net-next-2.6, thanks for following up on this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Nov. 9, 2009, 3:39 p.m. UTC | #2
Yury Polyanskiy <ypolyans@princeton.edu> wrote:
> 
>  This fixes the following bug in the current implementation of
> net/xfrm: SAD entries timeouts do not count the time spent by the machine 
> in the suspended state. This leads to the connectivity problems because 
> after resuming local machine thinks that the SAD entry is still valid, while 
> it has already been expired on the remote server.
> 
>  The cause of this is very simple: the timeouts in the net/xfrm are bound to 
> the old mod_timer() timers. This patch reassigns them to the
> CLOCK_REALTIME hrtimer.
> 
>  I have been using this version of the patch for a few months on my
> machines without any problems. Also run a few stress tests w/o any
> issues.
> 
>  This version of the patch uses tasklet_hrtimer by Peter Zijlstra
> (commit 9ba5f0).
> 
>  This patch is against 2.6.31.4. Please CC me.
> 
> Signed-off-by: Yury Polyanskiy <polyanskiy@gmail.com>

Thanks for the patch.

However, I have some reservations as to whether this is the ideal
situation.  Unless I'm mistaken, this patch may cause IPsec SAs
to expire if the system clock was out of sync prior to IPsec startup
and is subsequently resynced by ntpdate or similar.

For example, it's quite common for clocks to be out-of-sync by
10 hours in Australia due to time zone issues with BIOS clocks.
So potentially ntpdate could move the clock forward by 10 hours
or more on bootup thus causing IPsec SAs to expire prematurely
with this patch.

This shouldn't really be a problem in itself except that there
are some dodgy IPsec gateways out there that refuse to reestablish
IPsec SAs if the interval between two successive connections is
too small.  This could render the SA inoperable for hours.

So the upshot of all this is that we definitely want the effect
of this patch for suspend/resume, but it would be great if we can
avoid it for settimeofday(2).

Cheers,
Yury Polyanskiy Nov. 9, 2009, 6:31 p.m. UTC | #3
On Mon, 9 Nov 2009 10:39:10 -0500
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> However, I have some reservations as to whether this is the ideal
> situation.  Unless I'm mistaken, this patch may cause IPsec SAs
> to expire if the system clock was out of sync prior to IPsec startup
> and is subsequently resynced by ntpdate or similar.
> 
> For example, it's quite common for clocks to be out-of-sync by
> 10 hours in Australia due to time zone issues with BIOS clocks.
> So potentially ntpdate could move the clock forward by 10 hours
> or more on bootup thus causing IPsec SAs to expire prematurely
> with this patch.
> 
> This shouldn't really be a problem in itself except that there
> are some dodgy IPsec gateways out there that refuse to reestablish
> IPsec SAs if the interval between two successive connections is
> too small.  This could render the SA inoperable for hours.

But why would it be inoperable for hours? 

I think that the following will happen:
* racoon will recreate SAD entry in the larval state, wait 30s and drop
  it (since dodgy-gw filtered out all keyexchange packets)
* The next time there is a connect() with a match in the SPD, racoon
  will again try to recreate the SAD entry. If there dodgy-gw still
  filters out, the larval SAD entry dies after 30s.

So the inoperability will only last as long as dodgy-gw filters
keyexchanges. 

In any case, running ntpdate before racoon fixes the problem.

> 
> So the upshot of all this is that we definitely want the effect
> of this patch for suspend/resume, but it would be great if we can
> avoid it for settimeofday(2).

I think the natural solution is to have CLOCK_BOOTBASED hrtimers. I.e.
something in the spirit of monotonic_to_bootbased() and getboottime().
I understand that doing +=total_sleep_time is against the core idea of
hires timers, but perhaps there is a nicer way.

Best,
Y
Herbert Xu Nov. 9, 2009, 7:23 p.m. UTC | #4
On Mon, Nov 09, 2009 at 01:31:53PM -0500, Yury Polyanskiy wrote:
>
> But why would it be inoperable for hours? 

As I said the other end is buggy and won't allow you to reconnect
when you've just connected successfully.  Eventially it'll timeout
and let you back in but it could take hours.

> In any case, running ntpdate before racoon fixes the problem.

Not if the only trusted time source is over IPsec.

Cheers,
diff mbox

Patch

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 9e3a3f4..ef28810 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -19,6 +19,9 @@ 
 #include <net/route.h>
 #include <net/ipv6.h>
 #include <net/ip6_fib.h>
+
+#include <linux/interrupt.h>
+
 #ifdef CONFIG_XFRM_STATISTICS
 #include <net/snmp.h>
 #endif
@@ -199,7 +202,7 @@  struct xfrm_state
 	struct xfrm_stats	stats;
 
 	struct xfrm_lifetime_cur curlft;
-	struct timer_list	timer;
+	struct tasklet_hrtimer	mtimer;
 
 	/* Last used time */
 	unsigned long		lastused;
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index f2f7c63..45fff9c 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -21,6 +21,9 @@ 
 #include <linux/cache.h>
 #include <linux/audit.h>
 #include <asm/uaccess.h>
+#include <linux/ktime.h>
+#include <linux/interrupt.h>
+#include <linux/kernel.h>
 
 #include "xfrm_hash.h"
 
@@ -352,7 +355,7 @@  static void xfrm_put_mode(struct xfrm_mode *mode)
 
 static void xfrm_state_gc_destroy(struct xfrm_state *x)
 {
-	del_timer_sync(&x->timer);
+	tasklet_hrtimer_cancel(&x->mtimer);
 	del_timer_sync(&x->rtimer);
 	kfree(x->aalg);
 	kfree(x->ealg);
@@ -398,9 +401,10 @@  static inline unsigned long make_jiffies(long secs)
 		return secs*HZ;
 }
 
-static void xfrm_timer_handler(unsigned long data)
-{
-	struct xfrm_state *x = (struct xfrm_state*)data;
+static enum hrtimer_restart xfrm_timer_handler(struct hrtimer * me)
+{	
+	struct tasklet_hrtimer *thr = container_of(me, struct tasklet_hrtimer, timer);
+	struct xfrm_state *x = container_of(thr, struct xfrm_state, mtimer);
 	struct net *net = xs_net(x);
 	unsigned long now = get_seconds();
 	long next = LONG_MAX;
@@ -451,8 +455,9 @@  static void xfrm_timer_handler(unsigned long data)
 	if (warn)
 		km_state_expired(x, 0, 0);
 resched:
-	if (next != LONG_MAX)
-		mod_timer(&x->timer, jiffies + make_jiffies(next));
+	if (next != LONG_MAX){
+		tasklet_hrtimer_start(&x->mtimer, ktime_set(next, 0), HRTIMER_MODE_REL);
+	}
 
 	goto out;
 
@@ -474,6 +479,7 @@  expired:
 
 out:
 	spin_unlock(&x->lock);
+	return HRTIMER_NORESTART;
 }
 
 static void xfrm_replay_timer_handler(unsigned long data);
@@ -492,7 +498,7 @@  struct xfrm_state *xfrm_state_alloc(struct net *net)
 		INIT_HLIST_NODE(&x->bydst);
 		INIT_HLIST_NODE(&x->bysrc);
 		INIT_HLIST_NODE(&x->byspi);
-		setup_timer(&x->timer, xfrm_timer_handler, (unsigned long)x);
+		tasklet_hrtimer_init(&x->mtimer, xfrm_timer_handler, CLOCK_REALTIME, HRTIMER_MODE_ABS);
 		setup_timer(&x->rtimer, xfrm_replay_timer_handler,
 				(unsigned long)x);
 		x->curlft.add_time = get_seconds();
@@ -843,8 +849,7 @@  found:
 				hlist_add_head(&x->byspi, net->xfrm.state_byspi+h);
 			}
 			x->lft.hard_add_expires_seconds = net->xfrm.sysctl_acq_expires;
-			x->timer.expires = jiffies + net->xfrm.sysctl_acq_expires*HZ;
-			add_timer(&x->timer);
+			tasklet_hrtimer_start(&x->mtimer, ktime_set(net->xfrm.sysctl_acq_expires, 0), HRTIMER_MODE_REL);
 			net->xfrm.state_num++;
 			xfrm_hash_grow_check(net, x->bydst.next != NULL);
 		} else {
@@ -921,7 +926,7 @@  static void __xfrm_state_insert(struct xfrm_state *x)
 		hlist_add_head(&x->byspi, net->xfrm.state_byspi+h);
 	}
 
-	mod_timer(&x->timer, jiffies + HZ);
+	tasklet_hrtimer_start(&x->mtimer, ktime_set(1, 0), HRTIMER_MODE_REL);
 	if (x->replay_maxage)
 		mod_timer(&x->rtimer, jiffies + x->replay_maxage);
 
@@ -1019,8 +1024,7 @@  static struct xfrm_state *__find_acq_core(struct net *net, unsigned short family
 		x->props.reqid = reqid;
 		x->lft.hard_add_expires_seconds = net->xfrm.sysctl_acq_expires;
 		xfrm_state_hold(x);
-		x->timer.expires = jiffies + net->xfrm.sysctl_acq_expires*HZ;
-		add_timer(&x->timer);
+		tasklet_hrtimer_start(&x->mtimer, ktime_set(net->xfrm.sysctl_acq_expires, 0), HRTIMER_MODE_REL);
 		list_add(&x->km.all, &net->xfrm.state_all);
 		hlist_add_head(&x->bydst, net->xfrm.state_bydst+h);
 		h = xfrm_src_hash(net, daddr, saddr, family);
@@ -1299,8 +1303,8 @@  out:
 			memcpy(&x1->sel, &x->sel, sizeof(x1->sel));
 		memcpy(&x1->lft, &x->lft, sizeof(x1->lft));
 		x1->km.dying = 0;
-
-		mod_timer(&x1->timer, jiffies + HZ);
+		
+		tasklet_hrtimer_start(&x1->mtimer, ktime_set(1, 0), HRTIMER_MODE_REL);
 		if (x1->curlft.use_time)
 			xfrm_state_check_expire(x1);
 
@@ -1325,7 +1329,7 @@  int xfrm_state_check_expire(struct xfrm_state *x)
 	if (x->curlft.bytes >= x->lft.hard_byte_limit ||
 	    x->curlft.packets >= x->lft.hard_packet_limit) {
 		x->km.state = XFRM_STATE_EXPIRED;
-		mod_timer(&x->timer, jiffies);
+		tasklet_hrtimer_start(&x->mtimer, ktime_set(0,0), HRTIMER_MODE_REL);
 		return -EINVAL;
 	}