diff mbox

slw: improve error message for SLW timer stuck

Message ID 1473299410-17557-1-git-send-email-stewart@linux.vnet.ibm.com
State Superseded
Headers show

Commit Message

Stewart Smith Sept. 8, 2016, 1:50 a.m. UTC
We still register dump, but only to in memory console buffer by default.

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
---
 hw/slw.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

Comments

Shilpasri G Bhat Sept. 8, 2016, 5:59 a.m. UTC | #1
Hi Stewart,

On 09/08/2016 07:20 AM, Stewart Smith wrote:
> +				 * operations (for example). It may also
> +				 * increase jitter, as more time is spent
> +				 * inside OPAL every time a timer is requested
> +				 * and we time out after 1ms.

This error sets 'slw_has_timer' to false, which disables queuing of new timers
and thus we wont be in the timeout loop after this error. So this error will not
bring in any new jitter.

Thanks and Regards.
Shilpa
Stewart Smith Sept. 9, 2016, 4:09 a.m. UTC | #2
Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> writes:
> Hi Stewart,
>
> On 09/08/2016 07:20 AM, Stewart Smith wrote:
>> +				 * operations (for example). It may also
>> +				 * increase jitter, as more time is spent
>> +				 * inside OPAL every time a timer is requested
>> +				 * and we time out after 1ms.
>
> This error sets 'slw_has_timer' to false, which disables queuing of new timers
> and thus we wont be in the timeout loop after this error. So this error will not
> bring in any new jitter.

Good point! Maybe I should have payed attention to the code a bit more
when I wrote the text. I blame early morning meetings and insufficient
coffee.

Fixed in V2.
diff mbox

Patch

diff --git a/hw/slw.c b/hw/slw.c
index 74b9cd5..640b287 100644
--- a/hw/slw.c
+++ b/hw/slw.c
@@ -1197,13 +1197,13 @@  static void slw_dump_timer_ffdc(void)
 	 * root-cause. OPAL/skiboot may be stuck on some operation that
 	 * requires SLW timer state machine (e.g. core powersaving)
 	 */
-	prlog(PR_ERR, "SLW: Register state:\n");
+	prlog(PR_DEBUG, "SLW: Register state:\n");
 
 	for (i = 0; i < ARRAY_SIZE(dump_regs); i++) {
 		uint32_t reg = dump_regs[i];
 		rc = xscom_read(slw_timer_chip, reg, &val);
 		if (rc) {
-			prlog(PR_ERR, "SLW: XSCOM error %lld reading"
+			prlog(PR_DEBUG, "SLW: XSCOM error %lld reading"
 			      " reg 0x%x\n", rc, reg);
 			break;
 		}
@@ -1250,7 +1250,26 @@  void slw_update_timer_expiry(uint64_t new_target)
 			if (!(gen & 1))
 				break;
 			if (tb_compare(now + msecs_to_tb(1), mftb()) == TB_ABEFOREB) {
-				prerror("SLW: Stuck with odd generation !\n");
+				/**
+				 * @fwts-label SLWTimerStuck
+				 * @fwts-advice The SLeep/Winkle Engine (SLW)
+				 * failed to increment the generation number
+				 * within our timeout period (it *should* have
+				 * done so within ~10us, not >1ms. OPAL uses
+				 * the SLW timer to schedule some operations,
+				 * but can fall back to the (much less frequent
+				 * OPAL poller, which although does not affect
+				 * functionality, runs *much* less frequently.
+				 * This could have the effect of slow I2C
+				 * operations (for example). It may also
+				 * increase jitter, as more time is spent
+				 * inside OPAL every time a timer is requested
+				 * and we time out after 1ms. This error may
+				 * also occur if the machine is connected
+				 * to via soft FSI.
+				 */
+				prerror("SLW: timer stuck, falling back to OPAL pollers. Possible increased jitter and slower I2C.\n");
+				prlog(PR_DEBUG, "SLW: Stuck with odd generation !\n");
 				slw_has_timer = false;
 				slw_dump_timer_ffdc();
 				return;