Message ID | 1470102616-18346-1-git-send-email-stewart@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 5b8b58063029f02da573120ef4dc9079822e3cda |
Headers | show |
Stewart Smith <stewart@linux.vnet.ibm.com> writes: > According to the OPAL docs: > https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt > https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt > OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this > indicates either a transient or permanent error. > > Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a > permanent error particularly well, in that you could end up in a busy > loop. > > This was not too hard to trigger on an AMI BMC based OpenPOWER machine > doing a continuous "ipmitool mc reset cold" to the BMC, the result of > that being that we'd get stuck in an infinite loop in opal_get_rtc_time. > > We now retry a few times before returning the error higher up the stack. Looks like this has always been broken, so: Fixes: 16b1d26e77b1 ("rtc/tpo: Driver to support rtc and wakeup on PowerNV platform") > Cc: stable@vger.kernel.org And therefore that should be: Cc: stable@vger.kernel.org # v3.19+ > Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> > --- > drivers/rtc/rtc-opal.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/drivers/rtc/rtc-opal.c b/drivers/rtc/rtc-opal.c > index 9c18d6fd8107..fab19e3e2fba 100644 > --- a/drivers/rtc/rtc-opal.c > +++ b/drivers/rtc/rtc-opal.c > @@ -58,6 +58,7 @@ static void tm_to_opal(struct rtc_time *tm, u32 *y_m_d, u64 *h_m_s_ms) > static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm) > { > long rc = OPAL_BUSY; > + int retries = 10; > u32 y_m_d; > u64 h_m_s_ms; > __be32 __y_m_d; > @@ -67,8 +68,11 @@ static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm) > rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms); > if (rc == OPAL_BUSY_EVENT) > opal_poll_events(NULL); > - else > + else if (retries-- && (rc == OPAL_HARDWARE > + || rc == OPAL_INTERNAL_ERROR)) > msleep(10); > + else if (rc != OPAL_BUSY && rc != OPAL_BUSY_EVENT) > + break; > } This is a pretty gross API at this point. That's basically a score of 2 on Rusty's API usability index ("Read the implementation and you'll get it right" - http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html). The docs don't mention OPAL_INTERNAL_ERROR being transient, nor do they mention OPAL_BUSY. Can we at least do a wrapper function in opal.h for drivers to use that handles some or all of these cases? cheers
On Tue, 2016-08-02 at 01:50:16 UTC, Stewart Smith wrote: > According to the OPAL docs: > https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt > https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt > OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this > indicates either a transient or permanent error. > > Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a > permanent error particularly well, in that you could end up in a busy > loop. > > This was not too hard to trigger on an AMI BMC based OpenPOWER machine > doing a continuous "ipmitool mc reset cold" to the BMC, the result of > that being that we'd get stuck in an infinite loop in opal_get_rtc_time. > > We now retry a few times before returning the error higher up the stack. > > Cc: stable@vger.kernel.org > Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/5b8b58063029f02da573120ef4dc90 cheers
Hi, On 02/08/2016 at 11:50:16 +1000, Stewart Smith wrote: > According to the OPAL docs: > https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt > https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt > OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this > indicates either a transient or permanent error. > > Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a > permanent error particularly well, in that you could end up in a busy > loop. > > This was not too hard to trigger on an AMI BMC based OpenPOWER machine > doing a continuous "ipmitool mc reset cold" to the BMC, the result of > that being that we'd get stuck in an infinite loop in opal_get_rtc_time. > > We now retry a few times before returning the error higher up the stack. > > Cc: stable@vger.kernel.org > Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> > --- > drivers/rtc/rtc-opal.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > Just a note to let you know that this patch should have gone through my tree but it was not sent to linux-rtc or me. I guess what happened is that Michael cleaned up the Linux PPC patchwork queue.
Alexandre Belloni <alexandre.belloni@bootlin.com> writes: > Hi, > > On 02/08/2016 at 11:50:16 +1000, Stewart Smith wrote: >> According to the OPAL docs: >> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt >> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt >> OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this >> indicates either a transient or permanent error. >> >> Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a >> permanent error particularly well, in that you could end up in a busy >> loop. >> >> This was not too hard to trigger on an AMI BMC based OpenPOWER machine >> doing a continuous "ipmitool mc reset cold" to the BMC, the result of >> that being that we'd get stuck in an infinite loop in opal_get_rtc_time. >> >> We now retry a few times before returning the error higher up the stack. >> >> Cc: stable@vger.kernel.org >> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> >> --- >> drivers/rtc/rtc-opal.c | 12 ++++++++++-- >> 1 file changed, 10 insertions(+), 2 deletions(-) >> > > Just a note to let you know that this patch should have gone through my > tree but it was not sent to linux-rtc or me. Sorry, I saw it had been languishing for a long time and assumed you'd missed it. Happy to revert/rework it if you're not happy with it. > I guess what happened is that Michael cleaned up the Linux PPC patchwork > queue. Yeah I did. In future I'll ping you if there's something that seems to have fallen through the cracks. cheers
Alexandre Belloni <alexandre.belloni@bootlin.com> writes: > On 02/08/2016 at 11:50:16 +1000, Stewart Smith wrote: >> According to the OPAL docs: >> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt >> https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt >> OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this >> indicates either a transient or permanent error. >> >> Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a >> permanent error particularly well, in that you could end up in a busy >> loop. >> >> This was not too hard to trigger on an AMI BMC based OpenPOWER machine >> doing a continuous "ipmitool mc reset cold" to the BMC, the result of >> that being that we'd get stuck in an infinite loop in opal_get_rtc_time. >> >> We now retry a few times before returning the error higher up the stack. >> >> Cc: stable@vger.kernel.org >> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> >> --- >> drivers/rtc/rtc-opal.c | 12 ++++++++++-- >> 1 file changed, 10 insertions(+), 2 deletions(-) >> > > Just a note to let you know that this patch should have gone through my > tree but it was not sent to linux-rtc or me. > > I guess what happened is that Michael cleaned up the Linux PPC patchwork > queue. Apologies for not sending there. My (18 month ago self) bad.
On 06/02/2018 at 16:22:47 +1100, Michael Ellerman wrote: > > Just a note to let you know that this patch should have gone through my > > tree but it was not sent to linux-rtc or me. > > Sorry, I saw it had been languishing for a long time and assumed you'd > missed it. > > Happy to revert/rework it if you're not happy with it. > No, that's fine. It's just that the commit title stands out when using git log --oneline and that triggered my OCD ;) > > I guess what happened is that Michael cleaned up the Linux PPC patchwork > > queue. > > Yeah I did. > > In future I'll ping you if there's something that seems to have fallen > through the cracks. > Great thanks!
Alexandre Belloni <alexandre.belloni@bootlin.com> writes: > On 06/02/2018 at 16:22:47 +1100, Michael Ellerman wrote: >> > Just a note to let you know that this patch should have gone through my >> > tree but it was not sent to linux-rtc or me. >> >> Sorry, I saw it had been languishing for a long time and assumed you'd >> missed it. >> >> Happy to revert/rework it if you're not happy with it. >> > > No, that's fine. It's just that the commit title stands out when using > git log --oneline and that triggered my OCD ;) Oh no! Now I'm *really* sorry, that's the worst! :) cheers
diff --git a/drivers/rtc/rtc-opal.c b/drivers/rtc/rtc-opal.c index 9c18d6fd8107..fab19e3e2fba 100644 --- a/drivers/rtc/rtc-opal.c +++ b/drivers/rtc/rtc-opal.c @@ -58,6 +58,7 @@ static void tm_to_opal(struct rtc_time *tm, u32 *y_m_d, u64 *h_m_s_ms) static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm) { long rc = OPAL_BUSY; + int retries = 10; u32 y_m_d; u64 h_m_s_ms; __be32 __y_m_d; @@ -67,8 +68,11 @@ static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm) rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms); if (rc == OPAL_BUSY_EVENT) opal_poll_events(NULL); - else + else if (retries-- && (rc == OPAL_HARDWARE + || rc == OPAL_INTERNAL_ERROR)) msleep(10); + else if (rc != OPAL_BUSY && rc != OPAL_BUSY_EVENT) + break; } if (rc != OPAL_SUCCESS) @@ -84,6 +88,7 @@ static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm) static int opal_set_rtc_time(struct device *dev, struct rtc_time *tm) { long rc = OPAL_BUSY; + int retries = 10; u32 y_m_d = 0; u64 h_m_s_ms = 0; @@ -92,8 +97,11 @@ static int opal_set_rtc_time(struct device *dev, struct rtc_time *tm) rc = opal_rtc_write(y_m_d, h_m_s_ms); if (rc == OPAL_BUSY_EVENT) opal_poll_events(NULL); - else + else if (retries-- && (rc == OPAL_HARDWARE + || rc == OPAL_INTERNAL_ERROR)) msleep(10); + else if (rc != OPAL_BUSY && rc != OPAL_BUSY_EVENT) + break; } return rc == OPAL_SUCCESS ? 0 : -EIO;
According to the OPAL docs: https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-read-3.txt https://github.com/open-power/skiboot/blob/skiboot-5.2.5/doc/opal-api/opal-rtc-write-4.txt OPAL_HARDWARE may be returned from OPAL_RTC_READ or OPAL_RTC_WRITE and this indicates either a transient or permanent error. Prior to this patch, Linux was not dealing with OPAL_HARDWARE being a permanent error particularly well, in that you could end up in a busy loop. This was not too hard to trigger on an AMI BMC based OpenPOWER machine doing a continuous "ipmitool mc reset cold" to the BMC, the result of that being that we'd get stuck in an infinite loop in opal_get_rtc_time. We now retry a few times before returning the error higher up the stack. Cc: stable@vger.kernel.org Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> --- drivers/rtc/rtc-opal.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)