Patchwork [RFT,2/4] Add mod_timer_noact

login
register
mail settings
Submitter stephen hemminger
Date Feb. 18, 2009, 5:19 a.m.
Message ID <20090218052747.437271195@vyatta.com>
Download mbox | patch
Permalink /patch/23327/
State RFC
Delegated to: David Miller
Headers show

Comments

stephen hemminger - Feb. 18, 2009, 5:19 a.m.
Introduce mod_timer_noact() which for example is to replace the calls to
del_timer()/add_timer() in __nf_ct_refresh_acct(). It works like mod_timer()
but doesn't activate or modify the timeout of an inactive timer which is the
behaviour we want in order to be able to use timers as a means of
synchronization in nf_conntrack.

A later patch will modify __nf_ct_refresh_acct() to use mod_timer_noact()
which will then save one spin_lock_irqsave() / spin_lock_irqrestore() pair per
conntrack timer update. This will also get rid of the race we currently have
without adding more locking in nf_conntrack.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>

---
 include/linux/timer.h |    8 ++++++--
 kernel/relay.c        |    2 +-
 kernel/timer.c        |   40 +++++++++++++++++++++++++++++++++++-----
 3 files changed, 42 insertions(+), 8 deletions(-)
Ingo Molnar - Feb. 18, 2009, 9:20 a.m.
* Stephen Hemminger <shemminger@vyatta.com> wrote:

> Introduce mod_timer_noact() which for example is to replace the calls to
> del_timer()/add_timer() in __nf_ct_refresh_acct(). It works like mod_timer()
> but doesn't activate or modify the timeout of an inactive timer which is the
> behaviour we want in order to be able to use timers as a means of
> synchronization in nf_conntrack.
> 
> A later patch will modify __nf_ct_refresh_acct() to use mod_timer_noact()
> which will then save one spin_lock_irqsave() / spin_lock_irqrestore() pair per
> conntrack timer update. This will also get rid of the race we currently have
> without adding more locking in nf_conntrack.
> 
> Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
> 
> ---
>  include/linux/timer.h |    8 ++++++--
>  kernel/relay.c        |    2 +-
>  kernel/timer.c        |   40 +++++++++++++++++++++++++++++++++++-----
>  3 files changed, 42 insertions(+), 8 deletions(-)
> 
> --- a/include/linux/timer.h	2009-02-17 10:55:33.427785986 -0800
> +++ b/include/linux/timer.h	2009-02-17 11:04:10.291844534 -0800
> @@ -25,6 +25,9 @@ struct timer_list {
>  
>  extern struct tvec_base boot_tvec_bases;
>  
> +#define TIMER_ACT	1
> +#define TIMER_NOACT	0

Ugly flaggery.

> -extern int __mod_timer(struct timer_list *timer, unsigned long expires);
> +extern int __mod_timer(struct timer_list *timer, unsigned long expires, int activate);

This is not really acceptable, it slows down every single 
add_timer() and mod_timer() call in the kernel with a flag that 
has one specific value in all but your case. There's more than 
2000 such callsites in the kernel.

Why dont you use something like this instead:

	if (del_timer(timer))
		add_timer(timer);

?

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Feb. 18, 2009, 9:30 a.m.
From: Ingo Molnar <mingo@elte.hu>
Date: Wed, 18 Feb 2009 10:20:41 +0100

> Why dont you use something like this instead:
> 
> 	if (del_timer(timer))
> 		add_timer(timer);
> 
> ?

Why don't you read his commit message?

At least show him that much respect if you're going to
be against his patch.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy - Feb. 18, 2009, 10:07 a.m.
Ingo Molnar wrote:
>> -extern int __mod_timer(struct timer_list *timer, unsigned long expires);
>> +extern int __mod_timer(struct timer_list *timer, unsigned long expires, int activate);
> 
> This is not really acceptable, it slows down every single 
> add_timer() and mod_timer() call in the kernel with a flag that 
> has one specific value in all but your case. There's more than 
> 2000 such callsites in the kernel.
> 
> Why dont you use something like this instead:
> 
> 	if (del_timer(timer))
> 		add_timer(timer);

We need to avoid having a timer that was deleted by one CPU
getting re-added by another, but want to avoid taking the
conntrack lock for every timer update. The timer-internal
locking is enough for this as long as we have a mod_timer
variant that forwards a timer, but doesn't activate it in
case it isn't active already.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy - Feb. 18, 2009, 10:29 a.m.
Stephen Hemminger wrote:

> +/***
> + * mod_timer_noact - modify a timer's timeout
> + * @timer: the timer to be modified
> + *
> + * mod_timer_noact works like mod_timer except that it doesn't activate an
> + * inactive timer, instead it returns without updating timer->expires.
> + *
> + * The function returns whether it has modified a pending timer or not.
> + * (ie. mod_timer_noact() of an inactive timer returns 0, mod_timer_noact() of
> + * an active timer returns 1.)
> + */
> +int mod_timer_noact(struct timer_list *timer, unsigned long expires)
> +{
> +	BUG_ON(!timer->function);
> +
> +	/*
> +	 * This is a common optimization triggered by the
> +	 * networking code - if the timer is re-modified
> +	 * to be the same thing then just return:
> +	 */
> +	if (timer->expires == expires && timer_pending(timer))
> +		return 1;

This doesn't seem right, since it uses TIMER_NOACT below, there's
no point in checking for timer_pending() I think.

> +
> +	return __mod_timer(timer, expires, TIMER_NOACT);
> +}
> +
> +EXPORT_SYMBOL(mod_timer_noact);
> +

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar - Feb. 18, 2009, 11:01 a.m.
* David Miller <davem@davemloft.net> wrote:

> From: Ingo Molnar <mingo@elte.hu>
> Date: Wed, 18 Feb 2009 10:20:41 +0100
> 
> > Why dont you use something like this instead:
> > 
> > 	if (del_timer(timer))
> > 		add_timer(timer);
> > 
> > ?
> 
> Why don't you read his commit message?

Uhm, of course i have read this piece of non-info:

| Introduce mod_timer_noact() which for example is to replace 
| the calls to del_timer()/add_timer() in 
| __nf_ct_refresh_acct(). It works like mod_timer() but doesn't 
| activate or modify the timeout of an inactive timer which is 
| the behaviour we want in order to be able to use timers as a 
| means of synchronization in nf_conntrack.

It does not mention the overhead to the regular timer interfaces 
at all, nor does it explain the reasons for this change 
adequately.

And that's why i'm asking, why is the sequence i suggested above 
inadequate? If del_timer(timer) returns 1 it means the timer was 
active - and we call add_timer() only in that case. I.e. we dont 
activate or modify the timeout of an inactive timer.

It can _only_ make a difference in the narrow special case of 
code using the timer list lock as serialization: but that is a 
pretty poor solution in this proposed form as it slows down the 
other 2000 users of timers for no good reason. The changelog was 
completely silent about that overhead aspect (which is small but 
real), and i refuse to believe that this effect was not 
realized.

In other words, the changelog is useless and even borderline 
deceptive. Not a good sign if you are trying to get a patch 
accepted to the kernel.

Furthermore, no performance figures were posted along with this 
modification - it only stated that these are "performance 
improvements". Especially in cases where a change slows down the 
common case the showing of a very substantial performance 
benefit is a must-have, before a patch is considered for 
upstream merging.

You might be aware of that and you might have planned to provide 
such info in the future, but the changelog and the submission 
does not show any realization of this necessity, so i'm asking 
for that here out of caution, to make sure it's done.

In fact, the submission incorrectly stated:

| This patch set is against Patrick's netfilter next tree since
| it is where it should end up.
|      
| git.kernel.org:/pub/scm/linux/kernel/git/kaber/nf-next-2.6.git

This is wrong, the "netfilter next tree" is not where the "Add 
mod_timer_noact" change should end up, and you should ask your 
contributors to submit changes to other subsystems to their 
respective maintainer trees - the timer tree in this case.

> At least show him that much respect if you're going to be 
> against his patch.

Firstly, let me make clear what happened here.

Deep buried inside a networking patchset, Cc:-ed to the netdev 
and netfilter lists only, a core kernel change is embedded that 
in essence modifies 2000 callsites of the generic kernel. The 
patch was not Cc:-ed to lkml.

Secondly, all i'm doing here is reviewing patches of subsystems 
i maintain, so please stop attacking me for doing my job.

I noticed it because i read a lot of lists, but still, this was 
not done transparently at all. Please show minimal respect to 
Linux and post core kernel patches to lkml, and ask your 
sub-maintainers to do likewise. If there's someone here who has 
a moral basis for flaming here it's me, not you.

So, please, at minimum, follow the following well-established 
protocol of contribution:

 - Post timer patches to lkml (the mailing list mentioned in the 
   MAINTAINERS file), just like you expect networking patches to 
   be posted to netdev. It's basic courtesy and not doing so is 
   at minimum a double standard.

 - Declare negative performance impact to the common case very 
   prominently in the changelog, and include analysis about why 
   it's worth paying the price.

 - Include measurements that show clear positive performance
   impact at the new usage site - which offsets the negative 
   micro-costs that every other usage site pays.

 - Require your sub-contributors to write meaningful changelogs,
   that mention every substantial effect of a change, especially 
   when they change core kernel facilities. For example:

      Impact: add new API, slow down old APIs a tiny bit

   Would have alerted people straight away. I had to read the 
   actual patch to figure out this key information.

I'm also utterly puzzled by your apparent desire to flame me. 
This patch is wrong on so many levels that it's not even funny - 
and you as a long-time kernel contributor should have realized 
that straight away. Instead you forced me into wasting time on 
this rather long email (and you also forced the very unnecessary 
public embarrasment of a contributor), for what should have been 
an 'oops, right, will fix' routine matter.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski - Feb. 18, 2009, 11:39 a.m.
On 18-02-2009 12:01, Ingo Molnar wrote:
...
> that straight away. Instead you forced me into wasting time on 
> this rather long email (and you also forced the very unnecessary 
> public embarrasment of a contributor), for what should have been 
> an 'oops, right, will fix' routine matter.

No problem! But next time use this shorter routine, please...

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy - Feb. 18, 2009, 12:33 p.m.
Ingo Molnar wrote:
> In other words, the changelog is useless and even borderline 
> deceptive. Not a good sign if you are trying to get a patch 
> accepted to the kernel.
> 
> Furthermore, no performance figures were posted along with this 
> modification - it only stated that these are "performance 
> improvements". Especially in cases where a change slows down the 
> common case the showing of a very substantial performance 
> benefit is a must-have, before a patch is considered for 
> upstream merging.

I think this is mainly a misunderstanding, Stephen posted these
patches as RFT so Rick and Eric could do benchmarks, they were
not intended for merging at this time.

> In fact, the submission incorrectly stated:
> 
> | This patch set is against Patrick's netfilter next tree since
> | it is where it should end up.
> |      
> | git.kernel.org:/pub/scm/linux/kernel/git/kaber/nf-next-2.6.git
> 
> This is wrong, the "netfilter next tree" is not where the "Add 
> mod_timer_noact" change should end up, and you should ask your 
> contributors to submit changes to other subsystems to their 
> respective maintainer trees - the timer tree in this case.

Absolutely, I wouldn't have taken it, and Dave wouldn't have taken
it from me, so no cause for alarm :)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar - Feb. 18, 2009, 12:37 p.m.
* Jarek Poplawski <jarkao2@gmail.com> wrote:

> On 18-02-2009 12:01, Ingo Molnar wrote:
> ...
> > that straight away. Instead you forced me into wasting time on 
> > this rather long email (and you also forced the very unnecessary 
> > public embarrasment of a contributor), for what should have been 
> > an 'oops, right, will fix' routine matter.
> 
> No problem! But next time use this shorter routine, please...

Correct, the "oops, right, will fix" should have come as a reply 
to my mail, obviously - i did not submit the patch after all. 
Instead i got this accusatory mail from davem which certainly 
did not help bring the issue forward ...

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Feb. 18, 2009, 9:39 p.m.
From: Ingo Molnar <mingo@elte.hu>
Date: Wed, 18 Feb 2009 12:01:44 +0100

> * David Miller <davem@davemloft.net> wrote:
> 
> | Introduce mod_timer_noact() which for example is to replace 
> | the calls to del_timer()/add_timer() in 
> | __nf_ct_refresh_acct(). It works like mod_timer() but doesn't 
> | activate or modify the timeout of an inactive timer which is 
> | the behaviour we want in order to be able to use timers as a 
> | means of synchronization in nf_conntrack.
> 
> It does not mention the overhead to the regular timer interfaces 
> at all, nor does it explain the reasons for this change 
> adequately.

You (conveniently) skipped this part of his commit message, so
I guess this is the part you didn't read very carefully:

	A later patch will modify __nf_ct_refresh_acct() to use
	mod_timer_noact() which will then save one spin_lock_irqsave()
	/ spin_lock_irqrestore() pair per conntrack timer update. This
	will also get rid of the race we currently have without adding
	more locking in nf_conntrack.

The whole point is to avoid two spin_lock_irqsave() sequences, thus
taking the timer locks twice.

So Ingo, when you say in response:

	Why don't you use?

		if (del_timer())
			add_timer();

you really look foolish and, in fact, disrespectful to Stephen.

This was my objection to your email, it proved that you didn't
really read his changelog message.  He explained perfectly well
what the final goal was of his changes.

And you have this knee-jerk reaction quite often.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar - Feb. 18, 2009, 9:51 p.m.
* David Miller <davem@davemloft.net> wrote:

> From: Ingo Molnar <mingo@elte.hu>
> Date: Wed, 18 Feb 2009 12:01:44 +0100
> 
> > * David Miller <davem@davemloft.net> wrote:
> > 
> > | Introduce mod_timer_noact() which for example is to replace 
> > | the calls to del_timer()/add_timer() in 
> > | __nf_ct_refresh_acct(). It works like mod_timer() but doesn't 
> > | activate or modify the timeout of an inactive timer which is 
> > | the behaviour we want in order to be able to use timers as a 
> > | means of synchronization in nf_conntrack.
> > 
> > It does not mention the overhead to the regular timer interfaces 
> > at all, nor does it explain the reasons for this change 
> > adequately.
> 
> You (conveniently) skipped this part of his commit message, so
> I guess this is the part you didn't read very carefully:
> 
> 	A later patch will modify __nf_ct_refresh_acct() to use
> 	mod_timer_noact() which will then save one spin_lock_irqsave()
> 	/ spin_lock_irqrestore() pair per conntrack timer update. This
> 	will also get rid of the race we currently have without adding
> 	more locking in nf_conntrack.
> 
> The whole point is to avoid two spin_lock_irqsave() sequences, thus
> taking the timer locks twice.
> 
> So Ingo, when you say in response:
> 
> 	Why don't you use?
> 
> 		if (del_timer())
> 			add_timer();
> 
> you really look foolish and, in fact, disrespectful to Stephen.
> 
> This was my objection to your email, it proved that you didn't
> really read his changelog message.  He explained perfectly well
> what the final goal was of his changes.
> 
> And you have this knee-jerk reaction quite often.

You accusing me of knee-jerk reaction is the joke of the century 
;-)

Anyway, it's all handled, you just need to read the rest of the 
thread.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Feb. 18, 2009, 10:04 p.m.
From: Ingo Molnar <mingo@elte.hu>
Date: Wed, 18 Feb 2009 22:51:40 +0100

> Anyway, it's all handled, you just need to read the rest of the 
> thread.

I did read the entire thread before replying, my objection
to your original posting still standed.

And as others have pointed out you also failed to recognize
the context of the patch posting.  It was part of a sequence
of patches for people to test some experimental netfilter
performance optimizations.  "RFT" was prefixed to every patch
subject line, if any more indication was necessary.

Yet you object that the patches are against the networking
and netfilter trees.

Again, your reactions were knee-jerk, by every definition of the
term.

I know how you work Ingo, you want to be fast and efficient.
But often, your "fast and efficient" is "careless", and this
wastes everyone elses time and in the final analysis makes
you "slow".
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra - Feb. 18, 2009, 10:42 p.m.
On Wed, 2009-02-18 at 14:04 -0800, David Miller wrote:

> And as others have pointed out you also failed to recognize
> the context of the patch posting.  It was part of a sequence
> of patches for people to test some experimental netfilter
> performance optimizations.  "RFT" was prefixed to every patch
> subject line, if any more indication was necessary.

Be that as it may, its a maintainer seeing a patch against his
subsystem, reviewing it (albeit early -- we should all want to get
around to reviewing that early) and asking for some clarification.

The fact is, Steve's changelog was very unclear to people not intimately
familiar with the problem space. Asking some clarification just isn't
weird in any way.

> Yet you object that the patches are against the networking
> and netfilter trees.
> 
> Again, your reactions were knee-jerk, by every definition of the
> term.
> 
> I know how you work Ingo, you want to be fast and efficient.
> But often, your "fast and efficient" is "careless", and this
> wastes everyone elses time and in the final analysis makes
> you "slow".

Can we please leave it at this, the technical issue seems to be delt
with. You and Ingo seems to have a gift to rub each other the wrong way,
it would be grand if you could both try to be a little forgiving and
just focus on the code/technical issues which makes Linux to what it is,
technically excellent ;-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Feb. 18, 2009, 10:47 p.m.
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed, 18 Feb 2009 23:42:27 +0100

> Can we please leave it at this, the technical issue seems to be delt
> with. You and Ingo seems to have a gift to rub each other the wrong way,
> it would be grand if you could both try to be a little forgiving and
> just focus on the code/technical issues which makes Linux to what it is,
> technically excellent ;-)

Like it or not, open source development is as much about people
and their personalitites as it is about technical issues.

So every timeone someone says to concentrate on the technical
issues and get past the personalities, they really are missing
the point, and at best are being naive.

The Linux kernel has been shaped by overtly emotional discourse and
personal interaction as it has been by any technical achievement.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger - Feb. 18, 2009, 10:56 p.m.
On Wed, 18 Feb 2009 14:47:41 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> From: Peter Zijlstra <peterz@infradead.org>
> Date: Wed, 18 Feb 2009 23:42:27 +0100
> 
> > Can we please leave it at this, the technical issue seems to be delt
> > with. You and Ingo seems to have a gift to rub each other the wrong way,
> > it would be grand if you could both try to be a little forgiving and
> > just focus on the code/technical issues which makes Linux to what it is,
> > technically excellent ;-)
> 
> Like it or not, open source development is as much about people
> and their personalitites as it is about technical issues.
> 
> So every timeone someone says to concentrate on the technical
> issues and get past the personalities, they really are missing
> the point, and at best are being naive.
> 
> The Linux kernel has been shaped by overtly emotional discourse and
> personal interaction as it has been by any technical achievement.

Everyone, please read and internalize what Matt had to say.
He is right, the community needs to learn how to review:
   
http://mdzlog.wordpress.com/2008/12/24/constructive-criticism/
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

--- a/include/linux/timer.h	2009-02-17 10:55:33.427785986 -0800
+++ b/include/linux/timer.h	2009-02-17 11:04:10.291844534 -0800
@@ -25,6 +25,9 @@  struct timer_list {
 
 extern struct tvec_base boot_tvec_bases;
 
+#define TIMER_ACT	1
+#define TIMER_NOACT	0
+
 #define TIMER_INITIALIZER(_function, _expires, _data) {		\
 		.entry = { .prev = TIMER_ENTRY_STATIC },	\
 		.function = (_function),			\
@@ -86,8 +89,9 @@  static inline int timer_pending(const st
 
 extern void add_timer_on(struct timer_list *timer, int cpu);
 extern int del_timer(struct timer_list * timer);
-extern int __mod_timer(struct timer_list *timer, unsigned long expires);
+extern int __mod_timer(struct timer_list *timer, unsigned long expires, int activate);
 extern int mod_timer(struct timer_list *timer, unsigned long expires);
+extern int mod_timer_noact(struct timer_list *timer, unsigned long expires);
 
 /*
  * The jiffies value which is added to now, when there is no timer
@@ -163,7 +167,7 @@  static inline void timer_stats_timer_cle
 static inline void add_timer(struct timer_list *timer)
 {
 	BUG_ON(timer_pending(timer));
-	__mod_timer(timer, timer->expires);
+	__mod_timer(timer, timer->expires, TIMER_ACT);
 }
 
 #ifdef CONFIG_SMP
--- a/kernel/timer.c	2009-02-17 10:55:33.403580297 -0800
+++ b/kernel/timer.c	2009-02-17 11:04:10.291844534 -0800
@@ -589,7 +589,7 @@  static struct tvec_base *lock_timer_base
 	}
 }
 
-int __mod_timer(struct timer_list *timer, unsigned long expires)
+int __mod_timer(struct timer_list *timer, unsigned long expires, int activate)
 {
 	struct tvec_base *base, *new_base;
 	unsigned long flags;
@@ -603,7 +603,8 @@  int __mod_timer(struct timer_list *timer
 	if (timer_pending(timer)) {
 		detach_timer(timer, 0);
 		ret = 1;
-	}
+	} else if (activate == TIMER_NOACT)
+		goto out_unlock;
 
 	debug_timer_activate(timer);
 
@@ -629,8 +630,9 @@  int __mod_timer(struct timer_list *timer
 
 	timer->expires = expires;
 	internal_add_timer(base, timer);
-	spin_unlock_irqrestore(&base->lock, flags);
 
+out_unlock:
+	spin_unlock_irqrestore(&base->lock, flags);
 	return ret;
 }
 
@@ -699,11 +701,39 @@  int mod_timer(struct timer_list *timer, 
 	if (timer->expires == expires && timer_pending(timer))
 		return 1;
 
-	return __mod_timer(timer, expires);
+	return __mod_timer(timer, expires, TIMER_ACT);
 }
 
 EXPORT_SYMBOL(mod_timer);
 
+/***
+ * mod_timer_noact - modify a timer's timeout
+ * @timer: the timer to be modified
+ *
+ * mod_timer_noact works like mod_timer except that it doesn't activate an
+ * inactive timer, instead it returns without updating timer->expires.
+ *
+ * The function returns whether it has modified a pending timer or not.
+ * (ie. mod_timer_noact() of an inactive timer returns 0, mod_timer_noact() of
+ * an active timer returns 1.)
+ */
+int mod_timer_noact(struct timer_list *timer, unsigned long expires)
+{
+	BUG_ON(!timer->function);
+
+	/*
+	 * This is a common optimization triggered by the
+	 * networking code - if the timer is re-modified
+	 * to be the same thing then just return:
+	 */
+	if (timer->expires == expires && timer_pending(timer))
+		return 1;
+
+	return __mod_timer(timer, expires, TIMER_NOACT);
+}
+
+EXPORT_SYMBOL(mod_timer_noact);
+
 /**
  * del_timer - deactive a timer.
  * @timer: the timer to be deactivated
@@ -1268,7 +1298,7 @@  signed long __sched schedule_timeout(sig
 	expire = timeout + jiffies;
 
 	setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
-	__mod_timer(&timer, expire);
+	__mod_timer(&timer, expire, TIMER_ACT);
 	schedule();
 	del_singleshot_timer_sync(&timer);
 
--- a/kernel/relay.c	2009-02-17 10:55:33.416279439 -0800
+++ b/kernel/relay.c	2009-02-17 11:04:10.291844534 -0800
@@ -750,7 +750,7 @@  size_t relay_switch_subbuf(struct rchan_
 			 * from the scheduler (trying to re-grab
 			 * rq->lock), so defer it.
 			 */
-			__mod_timer(&buf->timer, jiffies + 1);
+			__mod_timer(&buf->timer, jiffies + 1, TIMER_NOACT);
 	}
 
 	old = buf->data;