Message ID | 20081101053220.GA13348@xi.wantstofly.org |
---|---|
State | Accepted, archived |
Delegated to: | Jeff Garzik |
Headers | show |
> If wait_event_timeout() would return zero, mv643xx_eth would conclude > that the SMI access timed out, but this is not necessarily true -- > wait_event_timeout() can also return zero in the case where the SMI > completion interrupt did happen in time but where it took longer than > the requested timeout for the process performing the SMI access to be > scheduled again. This would lead to occasional SMI access timeouts > when the system would be under heavy load. Would it make more sense to fix this in the wait_event_timeout() code itself a la bb10ed09 ("sched: fix wait_for_completion_timeout() spurious failure under heavy load")? - R. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 31, 2008 at 10:47:47PM -0700, Roland Dreier wrote: > > If wait_event_timeout() would return zero, mv643xx_eth would conclude > > that the SMI access timed out, but this is not necessarily true -- > > wait_event_timeout() can also return zero in the case where the SMI > > completion interrupt did happen in time but where it took longer than > > the requested timeout for the process performing the SMI access to be > > scheduled again. This would lead to occasional SMI access timeouts > > when the system would be under heavy load. > > Would it make more sense to fix this in the wait_event_timeout() code > itself a la bb10ed09 ("sched: fix wait_for_completion_timeout() spurious > failure under heavy load")? Well, wait_event_timeout() does (or did, before that commit) exactly what its docbook comment says it does: * The function returns 0 if the @timeout elapsed, and the remaining * jiffies if the condition evaluated to true before the timeout elapsed. Making it return 1 jiffy seems a bit hacky. Why not go all the way and just make it return 0 or 1 in all cases and audit all the callers (and update the docbook)? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Lennert Buytenhek wrote: > The mv643xx_eth mii bus implementation uses wait_event_timeout() to > wait for SMI completion interrupts. > > If wait_event_timeout() would return zero, mv643xx_eth would conclude > that the SMI access timed out, but this is not necessarily true -- > wait_event_timeout() can also return zero in the case where the SMI > completion interrupt did happen in time but where it took longer than > the requested timeout for the process performing the SMI access to be > scheduled again. This would lead to occasional SMI access timeouts > when the system would be under heavy load. > > The fix is to ignore the return value of wait_event_timeout(), and > to re-check the SMI done bit after wait_event_timeout() returns to > determine whether or not the SMI access timed out. > > Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> > --- > The commit that introduced this was added in the .28 dev cycle, so > this fix is for .28 only. Thanks! > > diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c > index d25a302..0e94ed3 100644 > --- a/drivers/net/mv643xx_eth.c > +++ b/drivers/net/mv643xx_eth.c > @@ -1065,9 +1065,12 @@ static int smi_wait_ready(struct mv643xx_eth_shared_private *msp) > return 0; > } > > - if (!wait_event_timeout(msp->smi_busy_wait, smi_is_done(msp), > - msecs_to_jiffies(100))) > - return -ETIMEDOUT; > + if (!smi_is_done(msp)) { > + wait_event_timeout(msp->smi_busy_wait, smi_is_done(msp), > + msecs_to_jiffies(100)); > + if (!smi_is_done(msp)) > + return -ETIMEDOUT; > + } applied -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c index d25a302..0e94ed3 100644 --- a/drivers/net/mv643xx_eth.c +++ b/drivers/net/mv643xx_eth.c @@ -1065,9 +1065,12 @@ static int smi_wait_ready(struct mv643xx_eth_shared_private *msp) return 0; } - if (!wait_event_timeout(msp->smi_busy_wait, smi_is_done(msp), - msecs_to_jiffies(100))) - return -ETIMEDOUT; + if (!smi_is_done(msp)) { + wait_event_timeout(msp->smi_busy_wait, smi_is_done(msp), + msecs_to_jiffies(100)); + if (!smi_is_done(msp)) + return -ETIMEDOUT; + } return 0; }
The mv643xx_eth mii bus implementation uses wait_event_timeout() to wait for SMI completion interrupts. If wait_event_timeout() would return zero, mv643xx_eth would conclude that the SMI access timed out, but this is not necessarily true -- wait_event_timeout() can also return zero in the case where the SMI completion interrupt did happen in time but where it took longer than the requested timeout for the process performing the SMI access to be scheduled again. This would lead to occasional SMI access timeouts when the system would be under heavy load. The fix is to ignore the return value of wait_event_timeout(), and to re-check the SMI done bit after wait_event_timeout() returns to determine whether or not the SMI access timed out. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> --- The commit that introduced this was added in the .28 dev cycle, so this fix is for .28 only. Thanks! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html