From patchwork Tue Apr 30 04:39:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stewart Smith X-Patchwork-Id: 1092961 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 44tTPG5x9Gz9s70 for ; Tue, 30 Apr 2019 14:41:06 +1000 (AEST) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 44tTPG4000zDqM6 for ; Tue, 30 Apr 2019 14:41:06 +1000 (AEST) X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=linux.ibm.com (client-ip=148.163.158.5; helo=mx0a-001b2d01.pphosted.com; envelope-from=stewart@linux.ibm.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 44tTP73CFGzDqLL for ; Tue, 30 Apr 2019 14:40:58 +1000 (AEST) Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x3U4eD3X105426 for ; Tue, 30 Apr 2019 00:40:55 -0400 Received: from e11.ny.us.ibm.com (e11.ny.us.ibm.com [129.33.205.201]) by mx0a-001b2d01.pphosted.com with ESMTP id 2s6bu4ex8j-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 30 Apr 2019 00:40:54 -0400 Received: from localhost by e11.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 30 Apr 2019 05:40:54 +0100 Received: from b01cxnp22034.gho.pok.ibm.com (9.57.198.24) by e11.ny.us.ibm.com (146.89.104.198) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 30 Apr 2019 05:40:51 +0100 Received: from b01ledav005.gho.pok.ibm.com (b01ledav005.gho.pok.ibm.com [9.57.199.110]) by b01cxnp22034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x3U4dZA536044846 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Tue, 30 Apr 2019 04:39:35 GMT Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9A8F0AE05C for ; Tue, 30 Apr 2019 04:39:35 +0000 (GMT) Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 24500AE062 for ; Tue, 30 Apr 2019 04:39:35 +0000 (GMT) Received: from birb.localdomain (unknown [9.185.142.80]) by b01ledav005.gho.pok.ibm.com (Postfix) with ESMTP for ; Tue, 30 Apr 2019 04:39:35 +0000 (GMT) Received: by birb.localdomain (Postfix, from userid 1000) id 891A54EC63D; Tue, 30 Apr 2019 14:39:33 +1000 (AEST) From: Stewart Smith To: skiboot@lists.ozlabs.org Date: Tue, 30 Apr 2019 14:39:32 +1000 X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 19043004-2213-0000-0000-00000384350D X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00011020; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000285; SDB=6.01196387; UDB=6.00627413; IPR=6.00977236; MB=3.00026660; MTD=3.00000008; XFM=3.00000015; UTC=2019-04-30 04:40:52 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19043004-2214-0000-0000-00005E3B0579 Message-Id: <20190430043932.29777-1-stewart@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-04-30_02:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904300030 Subject: [Skiboot] [PATCH] doc: Add (most) nvram debugging options X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" Signed-off-by: Stewart Smith --- doc/console-log.rst | 17 ++++ doc/device-tree/ibm,opal/power-mgt.rst | 2 + doc/index.rst | 1 + doc/opal-api/opal-cec-reboot-6-116.rst | 11 +++ doc/pci.rst | 119 ++++++++++++++++++++++++- doc/power-management.rst | 17 ++++ 6 files changed, 166 insertions(+), 1 deletion(-) create mode 100644 doc/power-management.rst diff --git a/doc/console-log.rst b/doc/console-log.rst index ca9ec3ff04ad..c758e9a57482 100644 --- a/doc/console-log.rst +++ b/doc/console-log.rst @@ -74,3 +74,20 @@ still only PR_NOTICE through drivers. People who write something like 0x1f will get a very quiet boot indeed. +Debugging +--------- + +You can change the log level of what goes to the in memory buffer and whta +goes to the driver (i.e. serial port / IPMI Serial over LAN) at boot time +by setting NVRAM variables: :: + + nvram -p ibm,skiboot --update-config log-level-driver=7 + nvram -p ibm,skiboot --update-config log-level-memory=7 + +You can also use the named versions of emerg, alert, crit, err, +warning, notice, printf, info, debug, trace or insane. ie. :: + + nvram -p ibm,skiboot --update-config log-level-driver=insane + + +You an also write to the debug_descriptor to change it at runtime. diff --git a/doc/device-tree/ibm,opal/power-mgt.rst b/doc/device-tree/ibm,opal/power-mgt.rst index b326a24b8700..8d9439d7db16 100644 --- a/doc/device-tree/ibm,opal/power-mgt.rst +++ b/doc/device-tree/ibm,opal/power-mgt.rst @@ -1,3 +1,5 @@ +.. _power-mgt-devtree: + ibm,opal/power-mgt device tree entries ====================================== diff --git a/doc/index.rst b/doc/index.rst index b7a868c96e85..79a5accf2434 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -46,6 +46,7 @@ Developer Guide and Internals xscom-node-bindings xive imc + power-management OPAL ABI diff --git a/doc/opal-api/opal-cec-reboot-6-116.rst b/doc/opal-api/opal-cec-reboot-6-116.rst index 516d4fc01f9e..e9e53ce24a95 100644 --- a/doc/opal-api/opal-cec-reboot-6-116.rst +++ b/doc/opal-api/opal-cec-reboot-6-116.rst @@ -66,3 +66,14 @@ OPAL_REBOOT_FULL_IPL = 2 Unsupported Reboot type For unsupported reboot type, this function will return with OPAL_UNSUPPORTED and no reboot will be triggered. + +Debugging +^^^^^^^^^ + +This is **not** ABI and may change or be removed at any time. + +You can change if the software checkstop trigger is used or not by an NVRAM +variable: :: + + nvram -p ibm,skiboot --update-config opal-sw-xstop=enable + nvram -p ibm,skiboot --update-config opal-sw-xstop=disable diff --git a/doc/pci.rst b/doc/pci.rst index f72fc1480b53..d18d35d8f301 100644 --- a/doc/pci.rst +++ b/doc/pci.rst @@ -1,7 +1,124 @@ PCI === -**WARNING**: This documentation **urgently needs updating** and is *woefully* incomplete. +Debugging +--------- + +There exist a couple of NVRAM options for enabling extra debug functionality +to help debug PCI issues. These are not ABI and may be changed or removed at +**any** time. + +Verbose EEH +^^^^^^^^^^^ + +:: + + nvram -p ibm,skiboot --update-config pci-eeh-verbose=true + +Disable EEH MMIO +^^^^^^^^^^^^^^^^ +:: + nvram -p ibm,skiboot --update-config pci-eeh-mmio=disabled + + +Check for RX errors after link training +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Some PHB4 PHYs can get stuck in a bad state where they are constantly +retraining the link. This happens transparently to skiboot and Linux +but will causes PCIe to be slow. Resetting the PHB4 clears the +problem. + +We can detect this case by looking at the RX errors count where we +check for link stability. This patch does this by modifying the link +optimal code to check for RX errors. If errors are occurring we +retrain the link irrespective of the chip rev or card. + +Normally when this problem occurs, the RX error count is maxed out at +255. When there is no problem, the count is 0. We chose 8 as the max +rx errors value to give us some margin for a few errors. There is also +a knob that can be used to set the error threshold for when we should +retrain the link. i.e. :: + + nvram -p ibm,skiboot --update-config phb-rx-err-max=8 + +Retrain link if degraded +^^^^^^^^^^^^^^^^^^^^^^^^ + +On P9 Scale Out (Nimbus) DD2.0 and Scale in (Cumulus) DD1.0 (and +below) the PCIe PHY can lockup causing training issues. This can cause +a degradation in speed or width in ~5% of training cases (depending on +the card). This is fixed in later chip revisions. This issue can also +cause PCIe links to not train at all, but this case is already +handled. + +There is code in skiboot that checks if the PCIe link has trained optimally +and if not, does a full PHB reset (to fix the PHY lockup) and retrain. + +One complication is some devices are known to train degraded unless +device specific configuration is performed. Because of this, we only +retrain when the device is in a whitelist. All devices in the current +whitelist have been testing on a P9DSU/Boston, ZZ and Witherspoon. + +We always gather information on the link and print it in the logs even +if the card is not in the whitelist. + +For testing purposes, there's an nvram to retry all PCIe cards and all +P9 chips when a degraded link is detected. The new option is +'pci-retry-all=true' which can be set using: :: + + nvram -p ibm,skiboot --update-config pci-retry-all=true + +This option may increase the boot time if used on a badly behaving +card. + +Maximum link speed +^^^^^^^^^^^^^^^^^^ + +Was useful during bringup on P9 DD1. + +:: + nvram -p ibm,skiboot --update-config pcie-max-link-speed=4 + + +Ric Mata Mode +^^^^^^^^^^^^^ + +This mode (for PHB4) will trace the training process closely. This activates +as soon as PERST is deasserted and produces human readable output of +the process. + +It will also add the PCIe Link Training and Status State Machine (LTSSM) tracing +and details on speed and link width. + +Output looks a bit like this :: + + [ 1.096995141,3] PHB#0000[0:0]: TRACE:0x0000001101000000 0ms GEN1:x16:detect + [ 1.102849137,3] PHB#0000[0:0]: TRACE:0x0000102101000000 11ms presence GEN1:x16:polling + [ 1.104341838,3] PHB#0000[0:0]: TRACE:0x0000182101000000 14ms training GEN1:x16:polling + [ 1.104357444,3] PHB#0000[0:0]: TRACE:0x00001c5101000000 14ms training GEN1:x16:recovery + [ 1.104580394,3] PHB#0000[0:0]: TRACE:0x00001c5103000000 14ms training GEN3:x16:recovery + [ 1.123259359,3] PHB#0000[0:0]: TRACE:0x00001c5104000000 51ms training GEN4:x16:recovery + [ 1.141737656,3] PHB#0000[0:0]: TRACE:0x0000144104000000 87ms presence GEN4:x16:L0 + [ 1.141752318,3] PHB#0000[0:0]: TRACE:0x0000154904000000 87ms trained GEN4:x16:L0 + [ 1.141757964,3] PHB#0000[0:0]: TRACE: Link trained. + [ 1.096834019,3] PHB#0001[0:1]: TRACE:0x0000001101000000 0ms GEN1:x16:detect + [ 1.105578525,3] PHB#0001[0:1]: TRACE:0x0000102101000000 17ms presence GEN1:x16:polling + [ 1.112763075,3] PHB#0001[0:1]: TRACE:0x0000183101000000 31ms training GEN1:x16:config + [ 1.112778956,3] PHB#0001[0:1]: TRACE:0x00001c5081000000 31ms training GEN1:x08:recovery + [ 1.113002083,3] PHB#0001[0:1]: TRACE:0x00001c5083000000 31ms training GEN3:x08:recovery + [ 1.114833873,3] PHB#0001[0:1]: TRACE:0x0000144083000000 35ms presence GEN3:x08:L0 + [ 1.114848832,3] PHB#0001[0:1]: TRACE:0x0000154883000000 35ms trained GEN3:x08:L0 + [ 1.114854650,3] PHB#0001[0:1]: TRACE: Link trained. + +Enabled via NVRAM: :: + + nvram -p ibm,skiboot --update-config pci-tracing=true + +Named after the person the output of this mode is typically sent to. + + +**WARNING**: The documentation below **urgently needs updating** and is *woefully* incomplete. IODA PE Setup Sequences ----------------------- diff --git a/doc/power-management.rst b/doc/power-management.rst new file mode 100644 index 000000000000..76491a71464d --- /dev/null +++ b/doc/power-management.rst @@ -0,0 +1,17 @@ +Power Management +================ + +See :ref:`power-mgt-devtree` for device tree structure describing power management facilities. + +Debugging +--------- + +There exist a few debug knobs that can be set via nvram settings. These are +**not** ABI and may be changed or removed at *any* time. + +Disabling specific stop states +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +On boot, specific stop states can be disabled via setting a mask. For example, +to disable all but stop 0,1,2, use ~0xE0000000. :: + + nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0x1FFFFFFF