Message ID | 20091021211906.GA11401@ami.dom.local |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, 2009-10-21 at 23:19 +0200, Jarek Poplawski wrote: > On Wed, Oct 21, 2009 at 08:46:40PM +0200, Tilman Schmidt wrote: > ... > > I have tested your patch and I can confirm that it fixes the messages. > > I have not noticed any ill effects. > > OK. So, in any case, here is this next variant/proposal. > > Thanks, > Jarek P. > ------------------------> > net: Adjust softirq raising in __napi_schedule > > This patch changes __raise_softirq_irqoff() to raise_softirq_irqoff() > in __napi_schedule() to enable proper softirq scheduling from process > context. The main intent is to let use netif_rx() universally, and > make netif_rx_ni() redundant. > > Currently using netif_rx() instead of netif_rx_ni() triggers: > "NOHZ: local_softirq_pending 08" warnings, but additional cost of one > "if" on the fast path doesn't seem to justify maintaining it > separately. > > This patch is based on the analysis, suggestions and the original > patch for mac80211 by: Michael Buesch <mb@bu3sch.de> > > Another patch calling netif_rx variants conditionally was done by: > Oliver Hartkopp <oliver@hartkopp.net> > > Reported-by: Michael Buesch <mb@bu3sch.de> > Reported-by: Oliver Hartkopp <oliver@hartkopp.net> > Reported-by: Tilman Schmidt <tilman@imap.cc> > Diagnosed-by: Michael Buesch <mb@bu3sch.de> > Tested-by: Tilman Schmidt <tilman@imap.cc> > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> > --- > > net/core/dev.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 28b0b9e..7fc4009 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -2728,7 +2728,7 @@ void __napi_schedule(struct napi_struct *n) > > local_irq_save(flags); > list_add_tail(&n->poll_list, &__get_cpu_var(softnet_data).poll_list); > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > + raise_softirq_irqoff(NET_RX_SOFTIRQ); This still doesn't make any sense. There may or may not be a lot of code that assumes that everything else is run with other tasklets disabled, and that it cannot be interrupted by a tasklet and thus create a race. Can you prove that is not the case, across the entire networking layer? johannes
Johannes Berg schrieb: > This still doesn't make any sense. > > There may or may not be a lot of code that assumes that everything else > is run with other tasklets disabled, and that it cannot be interrupted > by a tasklet and thus create a race. > > Can you prove that is not the case, across the entire networking layer? So what's the solution you propose?
On Thu, Oct 22, 2009 at 06:25:30AM +0900, Johannes Berg wrote: > On Wed, 2009-10-21 at 23:19 +0200, Jarek Poplawski wrote: ... > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -2728,7 +2728,7 @@ void __napi_schedule(struct napi_struct *n) > > > > local_irq_save(flags); > > list_add_tail(&n->poll_list, &__get_cpu_var(softnet_data).poll_list); > > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > + raise_softirq_irqoff(NET_RX_SOFTIRQ); > > This still doesn't make any sense. > > There may or may not be a lot of code that assumes that everything else > is run with other tasklets disabled, and that it cannot be interrupted > by a tasklet and thus create a race. > > Can you prove that is not the case, across the entire networking layer? I'm not sure I can understand your question. This patch is mainly to avoid using netif_rx()/netif_rx_ni() pair as a test of proper process context handling; IMHO there're better tools for this (lockdep, WARN_ON's). Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2009-10-21 at 23:39 +0200, Jarek Poplawski wrote: > > > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > > + raise_softirq_irqoff(NET_RX_SOFTIRQ); > > > > This still doesn't make any sense. > > > > There may or may not be a lot of code that assumes that everything else > > is run with other tasklets disabled, and that it cannot be interrupted > > by a tasklet and thus create a race. > > > > Can you prove that is not the case, across the entire networking layer? > > I'm not sure I can understand your question. This patch is mainly to > avoid using netif_rx()/netif_rx_ni() pair as a test of proper process > context handling; IMHO there're better tools for this (lockdep, > WARN_ON's). And how exactly does that matter to the patch at hand?! I'm saying that it seems to me, as indicated by the API (and without proof otherwise that's how it is) the networking layer needs to have packets handed to it with softirqs disabled. Therefore, this patch is not needed. While it may not be _wrong_, it'll definitely introduce a performance regression. This really should be obvious. You're fixing the warning at the source of the warning, rather than the source of the problem. johannes
From: Jarek Poplawski <jarkao2@gmail.com> Date: Wed, 21 Oct 2009 23:39:47 +0200 > I'm not sure I can understand your question. This patch is mainly to > avoid using netif_rx()/netif_rx_ni() pair as a test of proper process > context handling; IMHO there're better tools for this (lockdep, > WARN_ON's). Semantically I think your patch is correct, but I wonder about cost. Something that is a simply per-cpu inline "or" operation is now a function call and potentially mispredicted branch inside of raise_softirq_irqoff(). And netif_rx() is indeed a fast path for tunnels and other users so this does matter. I like having people call things in the correct context the function was built for, and thus we can avoiryd completely useless operations and tests as we can now in netif_rx(). Makaing things general purpose costs something, and it costs too much here for this critical routine, sorry. I was just having a talk with Nick Piggin about these kinds of issues today, too few people care about these ever encrouching tiny pieces of bloat that slow the kernel down gradually over time, and I simply won't stand for it when I notice it :-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 22, 2009 at 04:29:39AM -0700, David Miller wrote: > From: Jarek Poplawski <jarkao2@gmail.com> > Date: Wed, 21 Oct 2009 23:39:47 +0200 > > > I'm not sure I can understand your question. This patch is mainly to > > avoid using netif_rx()/netif_rx_ni() pair as a test of proper process > > context handling; IMHO there're better tools for this (lockdep, > > WARN_ON's). > > Semantically I think your patch is correct, but I wonder about cost. > > Something that is a simply per-cpu inline "or" operation is now a > function call and potentially mispredicted branch inside of > raise_softirq_irqoff(). > > And netif_rx() is indeed a fast path for tunnels and other users so > this does matter. > > I like having people call things in the correct context the function > was built for, and thus we can avoiryd completely useless operations and > tests as we can now in netif_rx(). I like it too, but in this particular case I'm not sure netif_rx() functionality requires this kind of separation; it looks to me quite similarly to e.g. tasklet_schedule(), the same for process or softirq contexts. > > Makaing things general purpose costs something, and it costs too much > here for this critical routine, sorry. > > I was just having a talk with Nick Piggin about these kinds of issues > today, too few people care about these ever encrouching tiny pieces > of bloat that slow the kernel down gradually over time, and I simply > won't stand for it when I notice it :-) I'm not sure we're saving in the right place. As a matter of fact, whenever I look into kernel/ code I can't see this kind of optimization. There is quite a lot of WARN_ON's and if's. These NOHZ warnings simply show somebody's else debugging triggers far from places where it all started and is quite accidental, while this particular "bug" should've been printed immediately long time ago, if we really cared. Since I understand it's a question of taste, and it's not anything critical, I'm quite OK with staying with the old way (except old bugs, I hope ;-). Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Johannes Berg schrieb: > On Wed, 2009-10-21 at 23:39 +0200, Jarek Poplawski wrote: > >> I'm not sure I can understand your question. This patch is mainly to >> avoid using netif_rx()/netif_rx_ni() pair as a test of proper process >> context handling; IMHO there're better tools for this (lockdep, >> WARN_ON's). > > I'm saying that it seems to me, as indicated by the API (and without > proof otherwise that's how it is) the networking layer needs to have > packets handed to it with softirqs disabled. Strange. Then what are the two separate functions netif_rx() and netif_rx_ni() for? > This really should be obvious. You're fixing the warning at the source > of the warning, rather than the source of the problem. Good idea. So please do tell us where the source of the problem is. Thanks, Tilman - -- Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Ungeöffnet mindestens haltbar bis: (siehe Rückseite) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.4 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFK4cALQ3+did9BuFsRAnW8AKCP4ey+gT2RZBYpzx91PaXd11A/PwCgh35g fhEbJs++1BRIQ3encV8fIm4= =SSaA -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2009-10-23 at 16:39 +0200, Tilman Schmidt wrote: > Strange. Then what are the two separate functions netif_rx() and > netif_rx_ni() for? netif_rx_ni() disables preemption. > > This really should be obvious. You're fixing the warning at the source > > of the warning, rather than the source of the problem. > > Good idea. So please do tell us where the source of the problem is. You use netif_rx_ni() instead of netif_rx() at whatever place that causes this problem. johannes
On Fri, Oct 23, 2009 at 04:46:31PM +0200, Johannes Berg wrote: > On Fri, 2009-10-23 at 16:39 +0200, Tilman Schmidt wrote: > > > Strange. Then what are the two separate functions netif_rx() and > > netif_rx_ni() for? > > netif_rx_ni() disables preemption. You wrote earlier: > [...] the networking layer needs to have > packets handed to it with softirqs disabled. How disabling preemption can fix something which needs softirqs disabled? Could you be more precise? > > > This really should be obvious. You're fixing the warning at the source > > > of the warning, rather than the source of the problem. > > > > Good idea. So please do tell us where the source of the problem is. > > You use netif_rx_ni() instead of netif_rx() at whatever place that > causes this problem. This isn't a very precise description either. Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2009-10-26 at 07:41 +0000, Jarek Poplawski wrote: > > netif_rx_ni() disables preemption. > > You wrote earlier: > > > [...] the networking layer needs to have > > packets handed to it with softirqs disabled. > > How disabling preemption can fix something which needs softirqs > disabled? Could you be more precise? No, I wrote that I didn't know. I suppose now that I looked at it I do know, and only disabling preemption is required. johannes
On Mon, Oct 26, 2009 at 08:44:14AM +0100, Johannes Berg wrote: > On Mon, 2009-10-26 at 07:41 +0000, Jarek Poplawski wrote: > > > > netif_rx_ni() disables preemption. > > > > You wrote earlier: > > > > > [...] the networking layer needs to have > > > packets handed to it with softirqs disabled. > > > > How disabling preemption can fix something which needs softirqs > > disabled? Could you be more precise? > > No, I wrote that I didn't know. I suppose now that I looked at it I do > know, and only disabling preemption is required. But netif_rx has preemption disabled most of the time (by hardirqs disabling). So maybe disabling preemption isn't the main reason here either? Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2009-10-26 at 07:54 +0000, Jarek Poplawski wrote: > > No, I wrote that I didn't know. I suppose now that I looked at it I do > > know, and only disabling preemption is required. > > But netif_rx has preemption disabled most of the time (by hardirqs > disabling). So maybe disabling preemption isn't the main reason here > either? Not for netpoll though, which may or may not be relevant (if I were to venture a guess I'd say it isn't and it disables preemption to be able to do the softirq thing) However, I lost track now of why we're discussing this. Basically it boils down to using netif_rx() when in (soft)irq, and netif_rx_ni() when in process context. That could just be an optimisation, but it's a very valid one. johannes
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Am 26.10.2009 09:58 schrieb Johannes Berg: > On Mon, 2009-10-26 at 07:54 +0000, Jarek Poplawski wrote: > >>> No, I wrote that I didn't know. I suppose now that I looked at it I do >>> know, and only disabling preemption is required. >> But netif_rx has preemption disabled most of the time (by hardirqs >> disabling). So maybe disabling preemption isn't the main reason here >> either? > > Not for netpoll though, which may or may not be relevant (if I were to > venture a guess I'd say it isn't and it disables preemption to be able > to do the softirq thing) > > However, I lost track now of why we're discussing this. The starting point were several reports of the kernel message: NOHZ: local_softirq_pending 08 Originally most if not all of them came from wireless networking, but I muddied the waters by adding to the mix a case involving ISDN. You stated that all the solutions proposed so far were wrong, so we're naturally turning to you for guidance on what the right solution might be. > Basically it boils down to using netif_rx() when in (soft)irq, and > netif_rx_ni() when in process context. That could just be an > optimisation, but it's a very valid one. Hmmm. That seems to contradict your earlier statement to me that simply replacing a call to netif_rx() by one to netif_rx_ni() when not in interrupt context isn't a valid solution either. What am I missing? Thanks, Tilman - -- Tilman Schmidt E-Mail: tilman@imap.cc Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Ungeöffnet mindestens haltbar bis: (siehe Rückseite) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.4 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFK5WIXQ3+did9BuFsRAsj1AJ0e4VJ7Nsp69ROXCiT4oM/Q6lIpSwCfZvXS 4nV4tWTIzgk4IRlCt0CCF3Y= =r15I -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2009-10-26 at 10:47 +0200, Tilman Schmidt wrote: > > However, I lost track now of why we're discussing this. > > The starting point were several reports of the kernel message: > > NOHZ: local_softirq_pending 08 > > Originally most if not all of them came from wireless networking, > but I muddied the waters by adding to the mix a case involving ISDN. > You stated that all the solutions proposed so far were wrong, so > we're naturally turning to you for guidance on what the right > solution might be. Right. Sorry about getting lost here. > > Basically it boils down to using netif_rx() when in (soft)irq, and > > netif_rx_ni() when in process context. That could just be an > > optimisation, but it's a very valid one. > > Hmmm. That seems to contradict your earlier statement to me that > simply replacing a call to netif_rx() by one to netif_rx_ni() > when not in interrupt context isn't a valid solution either. > What am I missing? Well, I think you misunderstood me. It would be correct to do this, if and only if the code that calls it doesn't need the extra guarantee. Any code (say ISDN code) that calls netif_rx() is clearly assuming to always be running in (soft)irq context, otherwise it couldn't call netif_rx() unconditionally. Agree so far? So now if you change the ISDN code to call netif_rx_ni(), you've changed the assumption that the ISDN code makes -- that it is running in (soft)irq context. Therefore, you need to verify that this is actually a correct change, which is what I tried to say. johannes
Am 26.10.2009 09:56 schrieb Johannes Berg: > On Mon, 2009-10-26 at 10:47 +0200, Tilman Schmidt wrote: > >>> Basically it boils down to using netif_rx() when in (soft)irq, and >>> netif_rx_ni() when in process context. That could just be an >>> optimisation, but it's a very valid one. >> Hmmm. That seems to contradict your earlier statement to me that >> simply replacing a call to netif_rx() by one to netif_rx_ni() >> when not in interrupt context isn't a valid solution either. >> What am I missing? > > Well, I think you misunderstood me. It would be correct to do this, if > and only if the code that calls it doesn't need the extra guarantee. I see. Thanks for the clarification. > Any code (say ISDN code) that calls netif_rx() is clearly assuming to > always be running in (soft)irq context, otherwise it couldn't call > netif_rx() unconditionally. Agree so far? Well, in fact I'm not sure. :-) All I know is that in the ISDN case, no such assumption is explicitly stated anywhere. (The code in question is called from the rcvcallb_skb() callback method which the hardware driver calls when data has been received, and the description of that method in Documentation/isdn/INTERFACE does not say anything about the context in which it may be called.) The relevant code in drivers/isdn/i4l/isdn_ppp.c is rather old, perhaps even older than softirqs and the netif_rx() / netif_rx_ni() split. (Bear in mind that we are talking about the old ISDN4Linux subsystem which initially didn't even make it into the 2.6 series because it was considered obsolete.) It seems quite possible to me that just no one ever thought about that question. > So now if you change the ISDN code to call netif_rx_ni(), you've changed > the assumption that the ISDN code makes -- that it is running in > (soft)irq context. Therefore, you need to verify that this is actually a > correct change, which is what I tried to say. Understood. However, the fact that the local_softirq_pending message is appearing would seem to indicate that this assumption was wrong to begin with, wouldn't it? Thanks, Tilman
On Tue, 2009-10-27 at 01:52 +0100, Tilman Schmidt wrote: > > Any code (say ISDN code) that calls netif_rx() is clearly assuming to > > always be running in (soft)irq context, otherwise it couldn't call > > netif_rx() unconditionally. Agree so far? > > Well, in fact I'm not sure. :-) All I know is that in the ISDN case, no > such assumption is explicitly stated anywhere. (The code in question is > called from the rcvcallb_skb() callback method which the hardware driver > calls when data has been received, and the description of that method in > Documentation/isdn/INTERFACE does not say anything about the context in > which it may be called.) The relevant code in drivers/isdn/i4l/isdn_ppp.c > is rather old, perhaps even older than softirqs and the netif_rx() / > netif_rx_ni() split. (Bear in mind that we are talking about the old > ISDN4Linux subsystem which initially didn't even make it into the 2.6 > series because it was considered obsolete.) It seems quite possible to me > that just no one ever thought about that question. Heh :) > > So now if you change the ISDN code to call netif_rx_ni(), you've changed > > the assumption that the ISDN code makes -- that it is running in > > (soft)irq context. Therefore, you need to verify that this is actually a > > correct change, which is what I tried to say. > > Understood. However, the fact that the local_softirq_pending message is > appearing would seem to indicate that this assumption was wrong to > begin with, wouldn't it? I thought it only recently started appearing with a new driver or something, but I may have misunderstood you. Anyway, I think that sums up the issue from my POV. johannes
diff --git a/net/core/dev.c b/net/core/dev.c index 28b0b9e..7fc4009 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2728,7 +2728,7 @@ void __napi_schedule(struct napi_struct *n) local_irq_save(flags); list_add_tail(&n->poll_list, &__get_cpu_var(softnet_data).poll_list); - __raise_softirq_irqoff(NET_RX_SOFTIRQ); + raise_softirq_irqoff(NET_RX_SOFTIRQ); local_irq_restore(flags); } EXPORT_SYMBOL(__napi_schedule);