Message ID | 20090218052747.437271195@vyatta.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
* Stephen Hemminger <shemminger@vyatta.com> wrote: > Introduce mod_timer_noact() which for example is to replace the calls to > del_timer()/add_timer() in __nf_ct_refresh_acct(). It works like mod_timer() > but doesn't activate or modify the timeout of an inactive timer which is the > behaviour we want in order to be able to use timers as a means of > synchronization in nf_conntrack. > > A later patch will modify __nf_ct_refresh_acct() to use mod_timer_noact() > which will then save one spin_lock_irqsave() / spin_lock_irqrestore() pair per > conntrack timer update. This will also get rid of the race we currently have > without adding more locking in nf_conntrack. > > Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se> > > --- > include/linux/timer.h | 8 ++++++-- > kernel/relay.c | 2 +- > kernel/timer.c | 40 +++++++++++++++++++++++++++++++++++----- > 3 files changed, 42 insertions(+), 8 deletions(-) > > --- a/include/linux/timer.h 2009-02-17 10:55:33.427785986 -0800 > +++ b/include/linux/timer.h 2009-02-17 11:04:10.291844534 -0800 > @@ -25,6 +25,9 @@ struct timer_list { > > extern struct tvec_base boot_tvec_bases; > > +#define TIMER_ACT 1 > +#define TIMER_NOACT 0 Ugly flaggery. > -extern int __mod_timer(struct timer_list *timer, unsigned long expires); > +extern int __mod_timer(struct timer_list *timer, unsigned long expires, int activate); This is not really acceptable, it slows down every single add_timer() and mod_timer() call in the kernel with a flag that has one specific value in all but your case. There's more than 2000 such callsites in the kernel. Why dont you use something like this instead: if (del_timer(timer)) add_timer(timer); ? Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Ingo Molnar <mingo@elte.hu> Date: Wed, 18 Feb 2009 10:20:41 +0100 > Why dont you use something like this instead: > > if (del_timer(timer)) > add_timer(timer); > > ? Why don't you read his commit message? At least show him that much respect if you're going to be against his patch. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ingo Molnar wrote: >> -extern int __mod_timer(struct timer_list *timer, unsigned long expires); >> +extern int __mod_timer(struct timer_list *timer, unsigned long expires, int activate); > > This is not really acceptable, it slows down every single > add_timer() and mod_timer() call in the kernel with a flag that > has one specific value in all but your case. There's more than > 2000 such callsites in the kernel. > > Why dont you use something like this instead: > > if (del_timer(timer)) > add_timer(timer); We need to avoid having a timer that was deleted by one CPU getting re-added by another, but want to avoid taking the conntrack lock for every timer update. The timer-internal locking is enough for this as long as we have a mod_timer variant that forwards a timer, but doesn't activate it in case it isn't active already. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Stephen Hemminger wrote: > +/*** > + * mod_timer_noact - modify a timer's timeout > + * @timer: the timer to be modified > + * > + * mod_timer_noact works like mod_timer except that it doesn't activate an > + * inactive timer, instead it returns without updating timer->expires. > + * > + * The function returns whether it has modified a pending timer or not. > + * (ie. mod_timer_noact() of an inactive timer returns 0, mod_timer_noact() of > + * an active timer returns 1.) > + */ > +int mod_timer_noact(struct timer_list *timer, unsigned long expires) > +{ > + BUG_ON(!timer->function); > + > + /* > + * This is a common optimization triggered by the > + * networking code - if the timer is re-modified > + * to be the same thing then just return: > + */ > + if (timer->expires == expires && timer_pending(timer)) > + return 1; This doesn't seem right, since it uses TIMER_NOACT below, there's no point in checking for timer_pending() I think. > + > + return __mod_timer(timer, expires, TIMER_NOACT); > +} > + > +EXPORT_SYMBOL(mod_timer_noact); > + -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* David Miller <davem@davemloft.net> wrote: > From: Ingo Molnar <mingo@elte.hu> > Date: Wed, 18 Feb 2009 10:20:41 +0100 > > > Why dont you use something like this instead: > > > > if (del_timer(timer)) > > add_timer(timer); > > > > ? > > Why don't you read his commit message? Uhm, of course i have read this piece of non-info: | Introduce mod_timer_noact() which for example is to replace | the calls to del_timer()/add_timer() in | __nf_ct_refresh_acct(). It works like mod_timer() but doesn't | activate or modify the timeout of an inactive timer which is | the behaviour we want in order to be able to use timers as a | means of synchronization in nf_conntrack. It does not mention the overhead to the regular timer interfaces at all, nor does it explain the reasons for this change adequately. And that's why i'm asking, why is the sequence i suggested above inadequate? If del_timer(timer) returns 1 it means the timer was active - and we call add_timer() only in that case. I.e. we dont activate or modify the timeout of an inactive timer. It can _only_ make a difference in the narrow special case of code using the timer list lock as serialization: but that is a pretty poor solution in this proposed form as it slows down the other 2000 users of timers for no good reason. The changelog was completely silent about that overhead aspect (which is small but real), and i refuse to believe that this effect was not realized. In other words, the changelog is useless and even borderline deceptive. Not a good sign if you are trying to get a patch accepted to the kernel. Furthermore, no performance figures were posted along with this modification - it only stated that these are "performance improvements". Especially in cases where a change slows down the common case the showing of a very substantial performance benefit is a must-have, before a patch is considered for upstream merging. You might be aware of that and you might have planned to provide such info in the future, but the changelog and the submission does not show any realization of this necessity, so i'm asking for that here out of caution, to make sure it's done. In fact, the submission incorrectly stated: | This patch set is against Patrick's netfilter next tree since | it is where it should end up. | | git.kernel.org:/pub/scm/linux/kernel/git/kaber/nf-next-2.6.git This is wrong, the "netfilter next tree" is not where the "Add mod_timer_noact" change should end up, and you should ask your contributors to submit changes to other subsystems to their respective maintainer trees - the timer tree in this case. > At least show him that much respect if you're going to be > against his patch. Firstly, let me make clear what happened here. Deep buried inside a networking patchset, Cc:-ed to the netdev and netfilter lists only, a core kernel change is embedded that in essence modifies 2000 callsites of the generic kernel. The patch was not Cc:-ed to lkml. Secondly, all i'm doing here is reviewing patches of subsystems i maintain, so please stop attacking me for doing my job. I noticed it because i read a lot of lists, but still, this was not done transparently at all. Please show minimal respect to Linux and post core kernel patches to lkml, and ask your sub-maintainers to do likewise. If there's someone here who has a moral basis for flaming here it's me, not you. So, please, at minimum, follow the following well-established protocol of contribution: - Post timer patches to lkml (the mailing list mentioned in the MAINTAINERS file), just like you expect networking patches to be posted to netdev. It's basic courtesy and not doing so is at minimum a double standard. - Declare negative performance impact to the common case very prominently in the changelog, and include analysis about why it's worth paying the price. - Include measurements that show clear positive performance impact at the new usage site - which offsets the negative micro-costs that every other usage site pays. - Require your sub-contributors to write meaningful changelogs, that mention every substantial effect of a change, especially when they change core kernel facilities. For example: Impact: add new API, slow down old APIs a tiny bit Would have alerted people straight away. I had to read the actual patch to figure out this key information. I'm also utterly puzzled by your apparent desire to flame me. This patch is wrong on so many levels that it's not even funny - and you as a long-time kernel contributor should have realized that straight away. Instead you forced me into wasting time on this rather long email (and you also forced the very unnecessary public embarrasment of a contributor), for what should have been an 'oops, right, will fix' routine matter. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 18-02-2009 12:01, Ingo Molnar wrote: ... > that straight away. Instead you forced me into wasting time on > this rather long email (and you also forced the very unnecessary > public embarrasment of a contributor), for what should have been > an 'oops, right, will fix' routine matter. No problem! But next time use this shorter routine, please... Thanks, Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ingo Molnar wrote: > In other words, the changelog is useless and even borderline > deceptive. Not a good sign if you are trying to get a patch > accepted to the kernel. > > Furthermore, no performance figures were posted along with this > modification - it only stated that these are "performance > improvements". Especially in cases where a change slows down the > common case the showing of a very substantial performance > benefit is a must-have, before a patch is considered for > upstream merging. I think this is mainly a misunderstanding, Stephen posted these patches as RFT so Rick and Eric could do benchmarks, they were not intended for merging at this time. > In fact, the submission incorrectly stated: > > | This patch set is against Patrick's netfilter next tree since > | it is where it should end up. > | > | git.kernel.org:/pub/scm/linux/kernel/git/kaber/nf-next-2.6.git > > This is wrong, the "netfilter next tree" is not where the "Add > mod_timer_noact" change should end up, and you should ask your > contributors to submit changes to other subsystems to their > respective maintainer trees - the timer tree in this case. Absolutely, I wouldn't have taken it, and Dave wouldn't have taken it from me, so no cause for alarm :) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Jarek Poplawski <jarkao2@gmail.com> wrote: > On 18-02-2009 12:01, Ingo Molnar wrote: > ... > > that straight away. Instead you forced me into wasting time on > > this rather long email (and you also forced the very unnecessary > > public embarrasment of a contributor), for what should have been > > an 'oops, right, will fix' routine matter. > > No problem! But next time use this shorter routine, please... Correct, the "oops, right, will fix" should have come as a reply to my mail, obviously - i did not submit the patch after all. Instead i got this accusatory mail from davem which certainly did not help bring the issue forward ... Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Ingo Molnar <mingo@elte.hu> Date: Wed, 18 Feb 2009 12:01:44 +0100 > * David Miller <davem@davemloft.net> wrote: > > | Introduce mod_timer_noact() which for example is to replace > | the calls to del_timer()/add_timer() in > | __nf_ct_refresh_acct(). It works like mod_timer() but doesn't > | activate or modify the timeout of an inactive timer which is > | the behaviour we want in order to be able to use timers as a > | means of synchronization in nf_conntrack. > > It does not mention the overhead to the regular timer interfaces > at all, nor does it explain the reasons for this change > adequately. You (conveniently) skipped this part of his commit message, so I guess this is the part you didn't read very carefully: A later patch will modify __nf_ct_refresh_acct() to use mod_timer_noact() which will then save one spin_lock_irqsave() / spin_lock_irqrestore() pair per conntrack timer update. This will also get rid of the race we currently have without adding more locking in nf_conntrack. The whole point is to avoid two spin_lock_irqsave() sequences, thus taking the timer locks twice. So Ingo, when you say in response: Why don't you use? if (del_timer()) add_timer(); you really look foolish and, in fact, disrespectful to Stephen. This was my objection to your email, it proved that you didn't really read his changelog message. He explained perfectly well what the final goal was of his changes. And you have this knee-jerk reaction quite often. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* David Miller <davem@davemloft.net> wrote: > From: Ingo Molnar <mingo@elte.hu> > Date: Wed, 18 Feb 2009 12:01:44 +0100 > > > * David Miller <davem@davemloft.net> wrote: > > > > | Introduce mod_timer_noact() which for example is to replace > > | the calls to del_timer()/add_timer() in > > | __nf_ct_refresh_acct(). It works like mod_timer() but doesn't > > | activate or modify the timeout of an inactive timer which is > > | the behaviour we want in order to be able to use timers as a > > | means of synchronization in nf_conntrack. > > > > It does not mention the overhead to the regular timer interfaces > > at all, nor does it explain the reasons for this change > > adequately. > > You (conveniently) skipped this part of his commit message, so > I guess this is the part you didn't read very carefully: > > A later patch will modify __nf_ct_refresh_acct() to use > mod_timer_noact() which will then save one spin_lock_irqsave() > / spin_lock_irqrestore() pair per conntrack timer update. This > will also get rid of the race we currently have without adding > more locking in nf_conntrack. > > The whole point is to avoid two spin_lock_irqsave() sequences, thus > taking the timer locks twice. > > So Ingo, when you say in response: > > Why don't you use? > > if (del_timer()) > add_timer(); > > you really look foolish and, in fact, disrespectful to Stephen. > > This was my objection to your email, it proved that you didn't > really read his changelog message. He explained perfectly well > what the final goal was of his changes. > > And you have this knee-jerk reaction quite often. You accusing me of knee-jerk reaction is the joke of the century ;-) Anyway, it's all handled, you just need to read the rest of the thread. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Ingo Molnar <mingo@elte.hu> Date: Wed, 18 Feb 2009 22:51:40 +0100 > Anyway, it's all handled, you just need to read the rest of the > thread. I did read the entire thread before replying, my objection to your original posting still standed. And as others have pointed out you also failed to recognize the context of the patch posting. It was part of a sequence of patches for people to test some experimental netfilter performance optimizations. "RFT" was prefixed to every patch subject line, if any more indication was necessary. Yet you object that the patches are against the networking and netfilter trees. Again, your reactions were knee-jerk, by every definition of the term. I know how you work Ingo, you want to be fast and efficient. But often, your "fast and efficient" is "careless", and this wastes everyone elses time and in the final analysis makes you "slow". -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2009-02-18 at 14:04 -0800, David Miller wrote: > And as others have pointed out you also failed to recognize > the context of the patch posting. It was part of a sequence > of patches for people to test some experimental netfilter > performance optimizations. "RFT" was prefixed to every patch > subject line, if any more indication was necessary. Be that as it may, its a maintainer seeing a patch against his subsystem, reviewing it (albeit early -- we should all want to get around to reviewing that early) and asking for some clarification. The fact is, Steve's changelog was very unclear to people not intimately familiar with the problem space. Asking some clarification just isn't weird in any way. > Yet you object that the patches are against the networking > and netfilter trees. > > Again, your reactions were knee-jerk, by every definition of the > term. > > I know how you work Ingo, you want to be fast and efficient. > But often, your "fast and efficient" is "careless", and this > wastes everyone elses time and in the final analysis makes > you "slow". Can we please leave it at this, the technical issue seems to be delt with. You and Ingo seems to have a gift to rub each other the wrong way, it would be grand if you could both try to be a little forgiving and just focus on the code/technical issues which makes Linux to what it is, technically excellent ;-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Peter Zijlstra <peterz@infradead.org> Date: Wed, 18 Feb 2009 23:42:27 +0100 > Can we please leave it at this, the technical issue seems to be delt > with. You and Ingo seems to have a gift to rub each other the wrong way, > it would be grand if you could both try to be a little forgiving and > just focus on the code/technical issues which makes Linux to what it is, > technically excellent ;-) Like it or not, open source development is as much about people and their personalitites as it is about technical issues. So every timeone someone says to concentrate on the technical issues and get past the personalities, they really are missing the point, and at best are being naive. The Linux kernel has been shaped by overtly emotional discourse and personal interaction as it has been by any technical achievement. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 18 Feb 2009 14:47:41 -0800 (PST) David Miller <davem@davemloft.net> wrote: > From: Peter Zijlstra <peterz@infradead.org> > Date: Wed, 18 Feb 2009 23:42:27 +0100 > > > Can we please leave it at this, the technical issue seems to be delt > > with. You and Ingo seems to have a gift to rub each other the wrong way, > > it would be grand if you could both try to be a little forgiving and > > just focus on the code/technical issues which makes Linux to what it is, > > technically excellent ;-) > > Like it or not, open source development is as much about people > and their personalitites as it is about technical issues. > > So every timeone someone says to concentrate on the technical > issues and get past the personalities, they really are missing > the point, and at best are being naive. > > The Linux kernel has been shaped by overtly emotional discourse and > personal interaction as it has been by any technical achievement. Everyone, please read and internalize what Matt had to say. He is right, the community needs to learn how to review: http://mdzlog.wordpress.com/2008/12/24/constructive-criticism/ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- a/include/linux/timer.h 2009-02-17 10:55:33.427785986 -0800 +++ b/include/linux/timer.h 2009-02-17 11:04:10.291844534 -0800 @@ -25,6 +25,9 @@ struct timer_list { extern struct tvec_base boot_tvec_bases; +#define TIMER_ACT 1 +#define TIMER_NOACT 0 + #define TIMER_INITIALIZER(_function, _expires, _data) { \ .entry = { .prev = TIMER_ENTRY_STATIC }, \ .function = (_function), \ @@ -86,8 +89,9 @@ static inline int timer_pending(const st extern void add_timer_on(struct timer_list *timer, int cpu); extern int del_timer(struct timer_list * timer); -extern int __mod_timer(struct timer_list *timer, unsigned long expires); +extern int __mod_timer(struct timer_list *timer, unsigned long expires, int activate); extern int mod_timer(struct timer_list *timer, unsigned long expires); +extern int mod_timer_noact(struct timer_list *timer, unsigned long expires); /* * The jiffies value which is added to now, when there is no timer @@ -163,7 +167,7 @@ static inline void timer_stats_timer_cle static inline void add_timer(struct timer_list *timer) { BUG_ON(timer_pending(timer)); - __mod_timer(timer, timer->expires); + __mod_timer(timer, timer->expires, TIMER_ACT); } #ifdef CONFIG_SMP --- a/kernel/timer.c 2009-02-17 10:55:33.403580297 -0800 +++ b/kernel/timer.c 2009-02-17 11:04:10.291844534 -0800 @@ -589,7 +589,7 @@ static struct tvec_base *lock_timer_base } } -int __mod_timer(struct timer_list *timer, unsigned long expires) +int __mod_timer(struct timer_list *timer, unsigned long expires, int activate) { struct tvec_base *base, *new_base; unsigned long flags; @@ -603,7 +603,8 @@ int __mod_timer(struct timer_list *timer if (timer_pending(timer)) { detach_timer(timer, 0); ret = 1; - } + } else if (activate == TIMER_NOACT) + goto out_unlock; debug_timer_activate(timer); @@ -629,8 +630,9 @@ int __mod_timer(struct timer_list *timer timer->expires = expires; internal_add_timer(base, timer); - spin_unlock_irqrestore(&base->lock, flags); +out_unlock: + spin_unlock_irqrestore(&base->lock, flags); return ret; } @@ -699,11 +701,39 @@ int mod_timer(struct timer_list *timer, if (timer->expires == expires && timer_pending(timer)) return 1; - return __mod_timer(timer, expires); + return __mod_timer(timer, expires, TIMER_ACT); } EXPORT_SYMBOL(mod_timer); +/*** + * mod_timer_noact - modify a timer's timeout + * @timer: the timer to be modified + * + * mod_timer_noact works like mod_timer except that it doesn't activate an + * inactive timer, instead it returns without updating timer->expires. + * + * The function returns whether it has modified a pending timer or not. + * (ie. mod_timer_noact() of an inactive timer returns 0, mod_timer_noact() of + * an active timer returns 1.) + */ +int mod_timer_noact(struct timer_list *timer, unsigned long expires) +{ + BUG_ON(!timer->function); + + /* + * This is a common optimization triggered by the + * networking code - if the timer is re-modified + * to be the same thing then just return: + */ + if (timer->expires == expires && timer_pending(timer)) + return 1; + + return __mod_timer(timer, expires, TIMER_NOACT); +} + +EXPORT_SYMBOL(mod_timer_noact); + /** * del_timer - deactive a timer. * @timer: the timer to be deactivated @@ -1268,7 +1298,7 @@ signed long __sched schedule_timeout(sig expire = timeout + jiffies; setup_timer_on_stack(&timer, process_timeout, (unsigned long)current); - __mod_timer(&timer, expire); + __mod_timer(&timer, expire, TIMER_ACT); schedule(); del_singleshot_timer_sync(&timer); --- a/kernel/relay.c 2009-02-17 10:55:33.416279439 -0800 +++ b/kernel/relay.c 2009-02-17 11:04:10.291844534 -0800 @@ -750,7 +750,7 @@ size_t relay_switch_subbuf(struct rchan_ * from the scheduler (trying to re-grab * rq->lock), so defer it. */ - __mod_timer(&buf->timer, jiffies + 1); + __mod_timer(&buf->timer, jiffies + 1, TIMER_NOACT); } old = buf->data;
Introduce mod_timer_noact() which for example is to replace the calls to del_timer()/add_timer() in __nf_ct_refresh_acct(). It works like mod_timer() but doesn't activate or modify the timeout of an inactive timer which is the behaviour we want in order to be able to use timers as a means of synchronization in nf_conntrack. A later patch will modify __nf_ct_refresh_acct() to use mod_timer_noact() which will then save one spin_lock_irqsave() / spin_lock_irqrestore() pair per conntrack timer update. This will also get rid of the race we currently have without adding more locking in nf_conntrack. Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se> --- include/linux/timer.h | 8 ++++++-- kernel/relay.c | 2 +- kernel/timer.c | 40 +++++++++++++++++++++++++++++++++++----- 3 files changed, 42 insertions(+), 8 deletions(-)