{"id":2233125,"url":"http://patchwork.ozlabs.org/api/1.2/patches/2233125/?format=json","web_url":"http://patchwork.ozlabs.org/project/linux-pci/patch/20260505173029.2718246-12-terry.bowman@amd.com/","project":{"id":28,"url":"http://patchwork.ozlabs.org/api/1.2/projects/28/?format=json","name":"Linux PCI development","link_name":"linux-pci","list_id":"linux-pci.vger.kernel.org","list_email":"linux-pci@vger.kernel.org","web_url":null,"scm_url":null,"webscm_url":null,"list_archive_url":"","list_archive_url_format":"","commit_url_format":""},"msgid":"<20260505173029.2718246-12-terry.bowman@amd.com>","list_archive_url":null,"date":"2026-05-05T17:30:29","name":"[v17,11/11] Documentation: cxl: Document CXL protocol error handling","commit_ref":null,"pull_url":null,"state":"new","archived":false,"hash":"70b5168643eaf26fc8578f2251192827f5873749","submitter":{"id":82124,"url":"http://patchwork.ozlabs.org/api/1.2/people/82124/?format=json","name":"Bowman, Terry","email":"Terry.Bowman@amd.com"},"delegate":null,"mbox":"http://patchwork.ozlabs.org/project/linux-pci/patch/20260505173029.2718246-12-terry.bowman@amd.com/mbox/","series":[{"id":502875,"url":"http://patchwork.ozlabs.org/api/1.2/series/502875/?format=json","web_url":"http://patchwork.ozlabs.org/project/linux-pci/list/?series=502875","date":"2026-05-05T17:30:19","name":"Enable CXL PCIe Port Protocol Error handling and logging","version":17,"mbox":"http://patchwork.ozlabs.org/series/502875/mbox/"}],"comments":"http://patchwork.ozlabs.org/api/patches/2233125/comments/","check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/2233125/checks/","tags":{},"related":[],"headers":{"Return-Path":"\n <linux-pci+bounces-53775-incoming=patchwork.ozlabs.org@vger.kernel.org>","X-Original-To":["incoming@patchwork.ozlabs.org","linux-pci@vger.kernel.org"],"Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=amd.com header.i=@amd.com header.a=rsa-sha256\n header.s=selector1 header.b=NhoD/pRt;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=2600:3c09:e001:a7::12fc:5321; helo=sto.lore.kernel.org;\n envelope-from=linux-pci+bounces-53775-incoming=patchwork.ozlabs.org@vger.kernel.org;\n receiver=patchwork.ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com\n header.b=\"NhoD/pRt\"","smtp.subspace.kernel.org;\n arc=fail smtp.client-ip=40.93.195.23","smtp.subspace.kernel.org;\n dmarc=pass (p=quarantine dis=none) header.from=amd.com","smtp.subspace.kernel.org;\n spf=fail smtp.mailfrom=amd.com"],"Received":["from sto.lore.kernel.org (sto.lore.kernel.org\n [IPv6:2600:3c09:e001:a7::12fc:5321])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4g95Fr1HPQz1yJ0\n\tfor <incoming@patchwork.ozlabs.org>; Wed, 06 May 2026 03:33:20 +1000 (AEST)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby sto.lore.kernel.org (Postfix) with ESMTP id AD84F301FF33\n\tfor <incoming@patchwork.ozlabs.org>; Tue,  5 May 2026 17:33:11 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id 381E14A2E26;\n\tTue,  5 May 2026 17:33:08 +0000 (UTC)","from SN4PR2101CU001.outbound.protection.outlook.com\n (mail-southcentralusazon11012023.outbound.protection.outlook.com\n [40.93.195.23])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id 185C838E5F9;\n\tTue,  5 May 2026 17:33:05 +0000 (UTC)","from DS7PR03CA0099.namprd03.prod.outlook.com (2603:10b6:5:3b7::14)\n by CH2PR12MB4120.namprd12.prod.outlook.com (2603:10b6:610:7b::13) with\n Microsoft SMTP Server (version=TLS1_2,\n cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9870.25; Tue, 5 May\n 2026 17:32:56 +0000","from DM2PEPF00003FC4.namprd04.prod.outlook.com\n (2603:10b6:5:3b7:cafe::db) by DS7PR03CA0099.outlook.office365.com\n (2603:10b6:5:3b7::14) with Microsoft SMTP Server (version=TLS1_3,\n cipher=TLS_AES_256_GCM_SHA384) id 15.20.9870.25 via Frontend Transport; Tue,\n 5 May 2026 17:32:56 +0000","from satlexmb07.amd.com (165.204.84.17) by\n DM2PEPF00003FC4.mail.protection.outlook.com (10.167.23.22) with Microsoft\n SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id\n 15.20.9891.9 via Frontend Transport; Tue, 5 May 2026 17:32:55 +0000","from ethanolx7ea3host.amd.com (10.180.168.240) by satlexmb07.amd.com\n (10.181.42.216) with Microsoft SMTP Server (version=TLS1_2,\n cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.17; Tue, 5 May\n 2026 12:32:54 -0500"],"ARC-Seal":["i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1778002388; cv=fail;\n b=TYwind9COIQqia4uojFaQT4MJ7t7Ks32NnGdttZPNtZspkJfjqdRjUE5T9BlmFYt3L0g5OMViC5uh6gtS8l6cH8zCRFpsKcfqE5hEyYaHNKpVysGSmLnrH9tnkHV+jJsw9WqxnEXztGVjGEiB6J8Nv8E8dicJCRB8e3Cmx7QiCQ=","i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;\n b=RkGm0h7fXhDN1R+EcXbhB1DFSYWwCFleGLf3NCR/r49yrdgvxRgiyDJ8M9on9ZrlBPp3uuDS5a258jXuzRJbzxeBjWD85jD9TRSA/W1m1iteXk0PBRMlRiaM63RVY8POPwJo/f0GhtSR4mBOpg+lBlnYl3huNsOPtKDfGQ5gFo54ClOcKsKqOxvSnfK3EU3f3pf9hsTPnJtojzljQ1tvQUJ+drgxoNi7bExaeIYzNMvRKtMBFa6lWpLVihQwr70DLFnYzWlp+FNTjJs1qbUCMocyA84PhgdUwcK1LMzOFdCR7J+SUkk11ieQ0K/zOvhG4dfec9UmOciJMKOvrEO/fA=="],"ARC-Message-Signature":["i=2; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1778002388; c=relaxed/simple;\n\tbh=kKUGFKbhKHjNOknXaHeutQh+mJSrXYf2VvDkSAcS2MA=;\n\th=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References:\n\t MIME-Version:Content-Type;\n b=SXhPoscKZonnPgJGaQXsZFd3mvlByUK8UAffEMzfDNc9qgdtH0H9nZJ8U0fZBDExfNfN4qjEUj5pueRftoZKkOYDMAImidB+Aqx0WqcFKh8HLloAk3y6XQvLcDz1gL9QyiuSsQXOFcThWkp1Kpv8bhYmovfdTv82PuERqCz1K/E=","i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;\n s=arcselector10001;\n h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;\n bh=87h5VvFDLPcUf0akGajedE6q2tJmRieN7ll/0XADI78=;\n b=DHFPvxW20Ct+wZts3zA/R1TO5SVKJy/cjrOpEH7yZYuF9RPtoSXHruRbBJbTl12+VOZsk4zaCgvGKHK4UPinIujHmvC50Jp630XbEhZjsaJREKS4BNi78Q1Xh7ED5oKtNwgIpKcwo65fMLtzo4oU75Nyt7ru9racmWFrvDWhl5ItO+BdJAk1N7eXhTdKSE3FknXQaOKXVYozTH6U7mhd2ihtRtcZSDRwNSJzGOsJWyzEetPQtPyNxpLvvH4tUj9/EJC/02JTr2Hf8R2a1L2kv1UxE4rdrcEYixKdQdmw+FydBoscwQWJLgEGSOGVbfrElP51GphxfdZyhO14AR5BWA=="],"ARC-Authentication-Results":["i=2; smtp.subspace.kernel.org;\n dmarc=pass (p=quarantine dis=none) header.from=amd.com;\n spf=fail smtp.mailfrom=amd.com;\n dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com\n header.b=NhoD/pRt; arc=fail smtp.client-ip=40.93.195.23","i=1; mx.microsoft.com 1; spf=pass (sender ip is\n 165.204.84.17) smtp.rcpttodomain=stgolabs.net smtp.mailfrom=amd.com;\n dmarc=pass (p=quarantine sp=quarantine pct=100) action=none\n header.from=amd.com; dkim=none (message not signed); arc=none (0)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1;\n h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;\n bh=87h5VvFDLPcUf0akGajedE6q2tJmRieN7ll/0XADI78=;\n b=NhoD/pRtoswUEySheIqU3xics7mdY//YNzCdnyBZkGTqV7kOldeVdGIvtW4fZJj9D3aYfj89sxkiMdbDm7P+pI2uoapA1Q/fx50JAAlfInI2iXp1gkAci27stdqw4QhSKWhMc6OIjwdv85hLEXXFoOgh3reE3W8uVaC7bQqiF8o=","X-MS-Exchange-Authentication-Results":"spf=pass (sender IP is 165.204.84.17)\n smtp.mailfrom=amd.com; dkim=none (message not signed)\n header.d=none;dmarc=pass action=none header.from=amd.com;","Received-SPF":"Pass (protection.outlook.com: domain of amd.com designates\n 165.204.84.17 as permitted sender) receiver=protection.outlook.com;\n client-ip=165.204.84.17; helo=satlexmb07.amd.com; pr=C","From":"Terry Bowman <terry.bowman@amd.com>","To":"<dave@stgolabs.net>, <jic23@kernel.org>, <dave.jiang@intel.com>,\n\t<alison.schofield@intel.com>, <djbw@kernel.org>, <bhelgaas@google.com>,\n\t<shiju.jose@huawei.com>, <ming.li@zohomail.com>,\n\t<Smita.KoralahalliChannabasappa@amd.com>, <rrichter@amd.com>,\n\t<dan.carpenter@linaro.org>, <PradeepVineshReddy.Kodamati@amd.com>,\n\t<lukas@wunner.de>, <Benjamin.Cheatham@amd.com>,\n\t<sathyanarayanan.kuppuswamy@linux.intel.com>, <vishal.l.verma@intel.com>,\n\t<alucerop@amd.com>, <ira.weiny@intel.com>, <corbet@lwn.net>,\n\t<rafael@kernel.org>, <xueshuai@linux.alibaba.com>,\n\t<linux-cxl@vger.kernel.org>","CC":"<linux-kernel@vger.kernel.org>, <linux-pci@vger.kernel.org>,\n\t<linux-acpi@vger.kernel.org>, <linux-doc@vger.kernel.org>,\n\t<terry.bowman@amd.com>","Subject":"[PATCH v17 11/11] Documentation: cxl: Document CXL protocol error\n handling","Date":"Tue, 5 May 2026 12:30:29 -0500","Message-ID":"<20260505173029.2718246-12-terry.bowman@amd.com>","X-Mailer":"git-send-email 2.34.1","In-Reply-To":"<20260505173029.2718246-1-terry.bowman@amd.com>","References":"<20260505173029.2718246-1-terry.bowman@amd.com>","Precedence":"bulk","X-Mailing-List":"linux-pci@vger.kernel.org","List-Id":"<linux-pci.vger.kernel.org>","List-Subscribe":"<mailto:linux-pci+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:linux-pci+unsubscribe@vger.kernel.org>","MIME-Version":"1.0","Content-Transfer-Encoding":"7bit","Content-Type":"text/plain","X-ClientProxiedBy":"satlexmb08.amd.com (10.181.42.217) To satlexmb07.amd.com\n (10.181.42.216)","X-EOPAttributedMessage":"0","X-MS-PublicTrafficType":"Email","X-MS-TrafficTypeDiagnostic":"DM2PEPF00003FC4:EE_|CH2PR12MB4120:EE_","X-MS-Office365-Filtering-Correlation-Id":"d3d82984-1d3b-414f-8ecf-08deaacc5803","X-MS-Exchange-SenderADCheck":"1","X-MS-Exchange-AntiSpam-Relay":"0","X-Microsoft-Antispam":"\n\tBCL:0;ARA:13230040|7416014|82310400026|376014|36860700016|1800799024|921020|22082099003|56012099003|18002099003;","X-Microsoft-Antispam-Message-Info":"\n\tOvWS6REqSZC9SwIJlNEaxlRHQosTOd/USdpr+XT7PeeMyT64InyRJi+ulxU7XXO2W6+wVF5lGhVB5vEvhKd+yJoWIrB+OCqDwq/mtihDIBudFTttYIcbDs5GpdlhXFmz3YBmw1a8d3wQRWS46tHKMhqVijbFL+yJjuP+fp+syj1xEla5uEwKHOppFXNvfe/ccP1zqP++qtAwAh26oJY0yHlsaVNLzaClSWY/D3SwNC8en2UhhZeuSvEKWBE0kYkvKKAiSxZg7SjVp7O72YgkGZYbuSH5VEfcA0JhEknxBuFU+dbCkiUECpXewaKLnB8IAgW1w7qesNoU//soB4d5pOoCnaMgVcK1QrI9SHhmuzLtlw/scluFRKB1hJk6w1H3IBBsKr2hgzigy6ElUuHXSb8ivapy1I8Pa72bsFBHEoAAh8VOAmq0GWwyGQLXMxUVxg/dQcy6xiGlfUAnDu6OQ7YIqfXs9oLEXaAnwjD8jGrFH6qTQwBR/0+2J2xjAr/oyEoakiIXc3wNGlrtZtbZsv//p2rEClQ7u+MSVyfZIk6naMc227CW+0fwBrUC+Y/RUMcUJlTyGttpbT629ApImQjrZ+vBosYvFM2jmR3ZRK4wucmlvJdk4slaMsT2Ri+ostNsm1BszspqIsHC0sP0fuRHkq3iIKFKFCOoTf8pBiWiNTsuBm7i5G7hyF6Rgg4ECcSMZ74V49xBQO5qFCcugKy/j5yKbRBhvyUWuzG4fGwVlbS6YMbXdaOD3+uiR3vIs2LGNSYGpHc3DAIuOmMyig==","X-Forefront-Antispam-Report":"\n\tCIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:satlexmb07.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(7416014)(82310400026)(376014)(36860700016)(1800799024)(921020)(22082099003)(56012099003)(18002099003);DIR:OUT;SFP:1101;","X-MS-Exchange-AntiSpam-MessageData-ChunkCount":"1","X-MS-Exchange-AntiSpam-MessageData-0":"\n\tjC1kONZPDnXQrh4e8gSJbIrC/OP4Jp9yby901hqj79y7ATlfCb68iSUf1cRS6pzVI0NXfyYGyL9ngkqAPg+y426x4nPB/UtwEl24sstVf/xR7nMwx+/Zn/EjniW6NstKMVGBBqcCMIY5InE9bJ+f2RWxMe5+X9vQdMjgxo+TaVcUQM2L0Hez0/vTX7aHiU1SQLmWVKZyiyTd193NOLVsHMNArrwFxYHbkpZJ4RpSEyLeiYwzyzmgsm2EcZwhD6+AKMDgmZwQldseN2Uvode2gsAihsrUUa47lNwVt0fMMtq9Gv3DB+3v5VGeCNfYt44Sapkz1LZySuuVU8xazHzGYw3yoy/lZ4jMV0PrL4cmFOhrNVH4GZt4eUzsc9MgbtkVy8M8+f2an14Guv9P2FJSxCtns97RhvmLMliatbn8LX378CBug5bMFtWc0ZA+utyy","X-OriginatorOrg":"amd.com","X-MS-Exchange-CrossTenant-OriginalArrivalTime":"05 May 2026 17:32:55.9082\n (UTC)","X-MS-Exchange-CrossTenant-Network-Message-Id":"\n d3d82984-1d3b-414f-8ecf-08deaacc5803","X-MS-Exchange-CrossTenant-Id":"3dd8961f-e488-4e60-8e11-a82d994e183d","X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp":"\n TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[satlexmb07.amd.com]","X-MS-Exchange-CrossTenant-AuthSource":"\n\tDM2PEPF00003FC4.namprd04.prod.outlook.com","X-MS-Exchange-CrossTenant-AuthAs":"Anonymous","X-MS-Exchange-CrossTenant-FromEntityHeader":"HybridOnPrem","X-MS-Exchange-Transport-CrossTenantHeadersStamped":"CH2PR12MB4120"},"content":"Add Documentation/driver-api/cxl/linux/protocol-error-handling.rst\ndescribing the end-to-end CXL protocol error path: AER ingress, the\nAER-CXL kfifo handoff, the cxl_core consumer worker, RCD/RCH special\ncases, severity policy, trace events, and a source code map.\n\nThis documents the architecture introduced by the preceding patches in\nthis series.\n\nThis was generated by claude-opus-4.7.\n\nAssisted-by: Claude:claude-opus-4.7\nSigned-off-by: Terry Bowman <terry.bowman@amd.com>\n---\n Documentation/driver-api/cxl/index.rst        |   1 +\n .../cxl/linux/protocol-error-handling.rst     | 440 ++++++++++++++++++\n 2 files changed, 441 insertions(+)\n create mode 100644 Documentation/driver-api/cxl/linux/protocol-error-handling.rst","diff":"diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-api/cxl/index.rst\nindex 3dfae1d310ca..6861b2e5726a 100644\n--- a/Documentation/driver-api/cxl/index.rst\n+++ b/Documentation/driver-api/cxl/index.rst\n@@ -42,6 +42,7 @@ that have impacts on each other.  The docs here break up configurations steps.\n    linux/dax-driver\n    linux/memory-hotplug\n    linux/access-coordinates\n+   linux/protocol-error-handling\n \n .. toctree::\n    :maxdepth: 2\ndiff --git a/Documentation/driver-api/cxl/linux/protocol-error-handling.rst b/Documentation/driver-api/cxl/linux/protocol-error-handling.rst\nnew file mode 100644\nindex 000000000000..4d6f33f0ed31\n--- /dev/null\n+++ b/Documentation/driver-api/cxl/linux/protocol-error-handling.rst\n@@ -0,0 +1,440 @@\n+.. SPDX-License-Identifier: GPL-2.0\n+\n+==============================\n+CXL Protocol Error Handling\n+==============================\n+\n+This document describes how the kernel detects, classifies, dispatches,\n+logs, and recovers from CXL protocol errors signaled through the PCIe\n+Advanced Error Reporting (AER) interface. It covers both Virtual\n+Hierarchy (VH) topologies (Root Ports, Upstream/Downstream Switch\n+Ports, and Endpoints) and Restricted CXL Host (RCH) topologies\n+(Root Complex Event Collectors driving Restricted CXL Devices).\n+\n+It is intended for kernel developers maintaining or extending\n+``drivers/pci/pcie/aer*.c``, ``drivers/cxl/core/ras.c``, and the\n+related plumbing in ``include/linux/aer.h``.\n+\n+\n+Background\n+==========\n+\n+A CXL device reports protocol-layer failures (CXL.cachemem RAS) as\n+PCIe AER **Internal Errors**: ``PCI_ERR_COR_INTERNAL`` for correctable\n+events and ``PCI_ERR_UNC_INTN`` for uncorrectable events. From the AER\n+core's point of view these look like ordinary PCIe AER messages, but\n+their semantics are CXL-specific: the actual fault information lives\n+in CXL RAS capability registers, not in the PCIe AER status registers.\n+\n+Historically, native CXL.cachemem RAS handling was implemented only\n+for CXL Endpoints and for RCH Downstream Ports. CXL Root Ports,\n+Upstream Switch Ports, and Downstream Switch Ports were not covered.\n+This left the kernel unable to log or react to protocol errors\n+signaled by switch components.\n+\n+The unified CXL protocol error path closes that gap by routing every\n+CXL Internal Error through a single producer/consumer pipeline shared\n+by all CXL device types.\n+\n+\n+Architecture overview\n+=====================\n+\n+CXL protocol error handling is implemented as a distinct error plane\n+layered on top of the existing PCIe AER infrastructure. The two planes\n+are kept separate:\n+\n+* The **PCIe AER plane** continues to handle native PCIe errors\n+  (Receiver overflows, malformed TLPs, completion timeouts, and so\n+  on). This is unchanged.\n+\n+* The **CXL protocol error plane** owns CXL Internal Errors. The AER\n+  core forwards them to ``cxl_core`` via a dedicated kfifo; ``cxl_core``\n+  then dispatches to CE/UE handlers and drives the recovery and\n+  panic policy.\n+\n+The boundary between the two planes is ``is_cxl_error()`` in\n+``drivers/pci/pcie/aer_cxl_vh.c``, which inspects ``info->is_cxl``\n+(set from ``pcie_is_cxl()``) together with the PCIe device type and\n+the AER status word. When ``is_cxl_error()`` returns true the event\n+is enqueued into the AER-CXL kfifo; otherwise the event flows through\n+``pci_aer_handle_error()`` as before.\n+\n+The pipeline has three layers:\n+\n+1. **Producer** (``aer_cxl_vh.c``, ``aer_cxl_rch.c``) - runs in AER\n+   IRQ/threaded context, classifies, clears the AER CE status, and\n+   enqueues ``struct cxl_proto_err_work_data``.\n+2. **Queue** - the AER-CXL kfifo plus a backing ``struct work_struct``.\n+3. **Consumer** (``cxl_core/ras.c``) - workqueue-context worker that\n+   resolves the CXL Port topology and dispatches to CE/UE handlers.\n+\n+\n+Topologies\n+==========\n+\n+Two topologies are supported, and both feed the same kfifo.\n+\n+Virtual Hierarchy (VH)\n+----------------------\n+\n+A standard CXL VH consists of a CXL Root Port (RP), an optional CXL\n+Upstream Switch Port (USP), one or more CXL Downstream Switch Ports\n+(DSPs), and CXL Endpoints (EPs) attached to the DSPs. Each component\n+is a regular PCIe device with a CXL DVSEC and a CXL RAS capability,\n+and it raises Internal Errors directly to the AER subsystem via the\n+RP's MSI/MSI-X interrupt.\n+\n+The VH producer is ``cxl_forward_error()`` in\n+``drivers/pci/pcie/aer_cxl_vh.c``.\n+\n+Restricted CXL Host (RCH)\n+-------------------------\n+\n+In the RCH topology, a Root Complex Event Collector (RCEC) aggregates\n+errors from one or more Restricted CXL Devices (RCDs) attached as\n+Root Complex Integrated Endpoints. The RCEC delivers the AER\n+interrupt; the AER driver iterates the RCDs beneath it.\n+\n+The RCH producer is ``cxl_rch_handle_error_iter()`` in\n+``drivers/pci/pcie/aer_cxl_rch.c``. For each RCD it finds, it calls\n+``cxl_forward_error()`` (the same producer helper used by the VH\n+path), so RCH events end up in the same AER-CXL kfifo as VH events.\n+\n+\n+End-to-end flow\n+===============\n+\n+The diagram below shows the full path from an AER interrupt through\n+producer classification, kfifo handoff, and consumer dispatch.\n+\n+.. code-block:: text\n+\n+   +-------------------------------------------------------------------------+\n+   |                  CXL Internal Error Packet Flow                         |\n+   |    From PCIe AER Interrupt to CXL Protocol Error Handling and Logging   |\n+   +-------------------------------------------------------------------------+\n+\n+      CXL device (RP / USP / DSP / EP / RCD) raises AER Internal Error\n+      (correctable PCI_ERR_COR_INTERNAL or uncorrectable PCI_ERR_UNC_INTN)\n+                      |\n+                      v\n+      +-------------------------------------------------------------+\n+      |    PCIe Root Port AER MSI/MSI-X interrupt fires             |\n+      +-------------------------------------------------------------+\n+                      |\n+      ============= drivers/pci/pcie/aer.c (AER core) =============\n+                      |\n+                      v\n+           +---------------------------------+\n+           |  aer_irq()  /  aer_isr()        |  (top + threaded handler)\n+           +---------------------------------+\n+                      |\n+                      v\n+           +---------------------------------+\n+           |  aer_isr_one_error()            |\n+           |  aer_isr_one_error_type()       |\n+           +---------------------------------+\n+                      |\n+                      v\n+          +------------------------------------------+\n+          |  aer_get_device_error_info()             |\n+          |  - reads PCI_ERR_COR_STATUS              |\n+          |  - reads PCI_ERR_UNCOR_STATUS  (*if RP/  |\n+          |    RCEC/DSP, or non-fatal severity)      |\n+          |  - sets info->is_cxl = pcie_is_cxl(dev)  |\n+          +------------------------------------------+\n+                      |\n+                      v\n+           +---------------------------------+\n+           |  handle_error_source(dev, info) |\n+           +---------------------------------+\n+              |                          |\n+              |  is_cxl_error()          +--->  pci_aer_handle_error()\n+              |  (CXL device + Internal)        (native PCIe AER path,\n+              v                                  not covered here)\n+      +-------------------------------------------------------------+\n+      | Topology dispatch within AER core:                          |\n+      |                                                             |\n+      |   - VH topology  (RP / USP / DSP / EP)                      |\n+      |     -> drivers/pci/pcie/aer_cxl_vh.c                        |\n+      |                                                             |\n+      |   - RCH topology (RCEC iterates RCDs under it)              |\n+      |     -> drivers/pci/pcie/aer_cxl_rch.c                       |\n+      +-------------------------------------------------------------+\n+           |                                            |\n+           | VH path                            RCH path (RCEC AER)\n+           v                                            v\n+      ============= aer_cxl_vh.c (VH      ============= aer_cxl_rch.c (RCH\n+                    producer) =============              producer) ==========\n+           |                                            |\n+           v                                            v\n+      +-----------------------------+         +-------------------------------+\n+      | cxl_forward_error(pdev,info)|         | cxl_rch_handle_error_iter()   |\n+      |  - if AER_CORRECTABLE:      |         |  - iterate each RCD pdev      |\n+      |     clear PCI_ERR_COR_STATUS|         |    beneath the RCEC           |\n+      |  - pci_dev_get(pdev)        |         |  - call cxl_forward_error()   |\n+      |  - build cxl_proto_err_     |         |    for each RCD               |\n+      |    work_data                |         |    (same producer helper as   |\n+      |    { pdev, severity }       |         |     the VH path uses)         |\n+      |  - kfifo_in_spinlocked(...) |         +-------------------------------+\n+      |  - schedule_work(...)       |                       |\n+      +-----------------------------+                       |\n+              |                                             |\n+              +-----------------+---------------------------+\n+                                |\n+                                v\n+                    +--------------------------+\n+                    |     AER-CXL kfifo        |\n+                    |     (work_struct)        |\n+                    +--------------------------+\n+                                |\n+                                v\n+      ============= drivers/cxl/core/ras.c (consumer worker) =======\n+                                |\n+                                v\n+      +-------------------------------------------------------------+\n+      | cxl_proto_err_work_fn() (workqueue handler)                 |\n+      |   for_each_cxl_proto_err(&wd, __cxl_proto_err_work_fn)      |\n+      +-------------------------------------------------------------+\n+                      |\n+                      v\n+      +-------------------------------------------------------------+\n+      | __cxl_proto_err_work_fn(wd)                                 |\n+      |   port = find_cxl_port_by_dev(&pdev->dev, &dport)           |\n+      |   cxl_handle_proto_error(pdev, port, dport, severity)       |\n+      |   pci_dev_put(pdev)                                         |\n+      +-------------------------------------------------------------+\n+                      |\n+                      v\n+      +-------------------------------------------------------------+\n+      | cxl_handle_proto_error()                                    |\n+      +-------------------------------------------------------------+\n+           |                                            |\n+      pci_pcie_type ==                          pci_pcie_type !=\n+      PCI_EXP_TYPE_RC_END                       PCI_EXP_TYPE_RC_END\n+      (RCD Endpoint)                            (VH: RP/USP/DSP/EP)\n+           |                                            |\n+           v                                            |\n+      +-------------------------------------+           |\n+      | cxl_handle_rdport_errors(pdev)      |           |\n+      |   - process RCH Downstream Port's   |           |\n+      |     RAS register block first        |           |\n+      |   - cxl_handle_cor_ras() for CE     |           |\n+      |   - cxl_handle_ras() for UE         |           |\n+      |     (log only; does NOT panic)      |           |\n+      +-------------------------------------+           |\n+           |                                            |\n+           +--------------------+-----------------------+\n+                                |\n+                                v\n+                   +-----------------------------+\n+                   | severity == AER_CORRECTABLE |\n+                   +-----------------------------+\n+                         |                  |\n+                         yes                no\n+                         v                  v\n+            +----------------------+   +-------------------------+\n+            | cxl_handle_cor_ras() |   | cxl_do_recovery()       |\n+            |  - emit cxl_aer_     |   | (described below)       |\n+            |    correctable_      |   +-------------------------+\n+            |    error trace       |\n+            | pcie_clear_device_   |\n+            |   status()           |\n+            +----------------------+\n+\n+                    +-------------------------------+\n+                    | cxl_do_recovery()             |\n+                    |  if pci_dev_is_disconnected:  |\n+                    |    panic(\"CXL cachemem err.\") |\n+                    |                               |\n+                    |  ue = cxl_handle_ras()        |\n+                    |    -> emit                    |\n+                    |       cxl_aer_uncorrectable_  |\n+                    |       error trace event       |\n+                    |                               |\n+                    |  if (ue):                     |\n+                    |    panic(\"CXL cachemem err.\") |\n+                    |                               |\n+                    |  pcie_clear_device_status()   |\n+                    |  pci_aer_clear_nonfatal_status|\n+                    |  pci_aer_clear_fatal_status   |\n+                    +-------------------------------+\n+\n+\n+Severity policy\n+===============\n+\n+The kernel's response to a CXL protocol error depends on the AER\n+severity reported by the device and on the result of inspecting the\n+CXL RAS registers.\n+\n+Correctable Error (CE)\n+----------------------\n+\n+* The AER driver clears ``PCI_ERR_COR_STATUS`` in the producer\n+  (``cxl_forward_error()``) before enqueue, so the device is\n+  acknowledged even if the consumer drops the event.\n+* The consumer's ``cxl_handle_cor_ras()`` reads and clears the CXL\n+  RAS correctable status and emits a ``cxl_aer_correctable_error``\n+  trace event.\n+* No recovery action is taken.\n+\n+Uncorrectable Error (UE), non-fatal\n+-----------------------------------\n+\n+* The producer enqueues the event without clearing the AER UCE\n+  status.\n+* The consumer enters ``cxl_do_recovery()``.\n+* ``cxl_handle_ras()`` reads the CXL RAS uncorrectable status and\n+  emits a ``cxl_aer_uncorrectable_error`` trace event.\n+* If ``cxl_handle_ras()`` returns true (a CXL RAS UE bit was set),\n+  the kernel panics with ``\"CXL cachemem error.\"``. CXL.cachemem\n+  traffic cannot be safely recovered in software once corruption is\n+  observed; continuing risks silent data loss across all devices in\n+  an interleaved HDM region.\n+* If ``cxl_handle_ras()`` returns false (no CXL RAS bit set, i.e.\n+  the AER UCE was a PCIe-side issue rather than a CXL.cachemem\n+  issue), the AER UCE status is cleared and execution continues.\n+\n+Uncorrectable Error (UE), fatal\n+-------------------------------\n+\n+Fatal severity follows the same recovery path as non-fatal in\n+``cxl_do_recovery()``, with one important caveat: the AER core only\n+reads ``PCI_ERR_UNCOR_STATUS`` for Root Ports, RCECs, Downstream\n+Ports, or non-fatal severities (see ``aer_get_device_error_info()``\n+in ``drivers/pci/pcie/aer.c``). For a fatal UE signaled by an\n+upstream component, PCI config reads to the source device are\n+expected to fail, so ``UNCOR_STATUS`` is never retrieved and\n+``info->status`` stays zero.\n+\n+The practical consequence: a fatal UE on an Upstream Switch Port or\n+Endpoint is **not** classified as a CXL error by ``is_cxl_error()``.\n+It falls through to ``pci_aer_handle_error()`` and is processed by\n+the standard AER recovery flow. Only the CXL trace events emitted by\n+the AER core (``aer_event``) appear; the CXL-specific\n+``cxl_aer_uncorrectable_error`` event is not emitted on this path.\n+\n+Disconnect during recovery\n+--------------------------\n+\n+``cxl_do_recovery()`` checks ``pci_dev_is_disconnected(pdev)`` before\n+touching the RAS registers. A device disconnecting during an\n+uncorrectable error event is itself unrecoverable, particularly when\n+the device backs an interleaved HDM region; in that case the kernel\n+panics directly rather than returning ``~0u`` from the readl() and\n+masking the cause.\n+\n+\n+RCD/RCH special cases\n+=====================\n+\n+RCD Endpoint flow\n+-----------------\n+\n+When ``cxl_handle_proto_error()`` sees ``pci_pcie_type(pdev) ==\n+PCI_EXP_TYPE_RC_END`` (i.e. an RCD Endpoint), it calls\n+``cxl_handle_rdport_errors()`` first. This processes the RAS state\n+of the RCH Downstream Port that hosts the RCD before falling through\n+to the common CE/UE dispatch on the RCD Endpoint itself.\n+\n+The RCH Downstream Port's RAS UE is **logged only**: it emits the\n+trace event but does not panic. The panic decision is taken on the\n+RCD Endpoint's own RAS in ``cxl_do_recovery()``.\n+\n+This split mirrors the structure of an RCH topology: the RCH dport\n+is functionally a CXL infrastructure component (similar to a switch\n+port), while the RCD itself is the actual CXL.cachemem source whose\n+corruption drives the recovery decision.\n+\n+RCH ingress aggregation\n+-----------------------\n+\n+RCH errors do not arrive on a per-RCD interrupt. The RCEC is the AER\n+source, and the AER driver drives ``cxl_rch_handle_error_iter()`` to\n+walk each RCD beneath it and forward an event per RCD through the\n+shared kfifo. From the consumer's point of view, RCH-originated\n+events are indistinguishable from VH events.\n+\n+\n+Trace events\n+============\n+\n+Two unified trace events are emitted from ``cxl_handle_cor_ras()``\n+and ``cxl_handle_ras()`` and are used by every CXL device type and\n+both topologies:\n+\n+* ``cxl_aer_correctable_error`` - emitted when a CXL RAS CE bit is\n+  set; carries the human-readable status string.\n+* ``cxl_aer_uncorrectable_error`` - emitted when a CXL RAS UE bit is\n+  set; carries both the current status and the first-error pointer.\n+\n+Common fields:\n+\n+* ``device=<PCI BDF>`` - the source device (always a PCI BDF, even\n+  for RCH paths where the trace was historically a memdev name).\n+* ``host=<bridge>`` - the parent host bridge or PCI host BDF.\n+* ``serial=<u64>`` - the device serial from ``pci_get_dsn()``.\n+\n+The ``device`` field replaces the older ``memdev`` field that earlier\n+revisions emitted on Endpoint events. Userspace consumers\n+(rasdaemon's ``ras-cxl-handler.c``) need a corresponding update to\n+read the new field name.\n+\n+\n+Source code map\n+===============\n+\n+============================================  ==============================\n+File                                          Role\n+============================================  ==============================\n+``drivers/pci/pcie/aer.c``                    AER core; receives the IRQ,\n+                                              builds ``aer_err_info``,\n+                                              dispatches to either the CXL\n+                                              path (``is_cxl_error()``) or\n+                                              ``pci_aer_handle_error()``.\n+``drivers/pci/pcie/aer_cxl_vh.c``             VH producer; provides\n+                                              ``is_cxl_error()``,\n+                                              ``cxl_forward_error()``, the\n+                                              AER-CXL kfifo, and the\n+                                              consumer registration\n+                                              helpers.\n+``drivers/pci/pcie/aer_cxl_rch.c``            RCH producer; iterates RCDs\n+                                              under an RCEC and forwards\n+                                              each via\n+                                              ``cxl_forward_error()``.\n+``drivers/cxl/core/ras.c``                    Consumer; defines\n+                                              ``cxl_proto_err_work_fn()``,\n+                                              ``cxl_handle_proto_error()``,\n+                                              ``cxl_handle_rdport_errors()``,\n+                                              ``cxl_do_recovery()``,\n+                                              ``cxl_handle_cor_ras()`` and\n+                                              ``cxl_handle_ras()``.\n+``include/linux/aer.h``                       Public declarations:\n+                                              ``struct cxl_proto_err_work_data``,\n+                                              ``cxl_proto_err_fn_t``,\n+                                              ``cxl_register_proto_err_work()``\n+                                              and ``for_each_cxl_proto_err()``.\n+============================================  ==============================\n+\n+\n+Limitations and future work\n+===========================\n+\n+* **USP/EP fatal UCE is not classified as CXL.** As described under\n+  `Severity policy`_, the AER core never retrieves\n+  ``PCI_ERR_UNCOR_STATUS`` in this scenario, so ``is_cxl_error()``\n+  cannot tag the event as CXL. The event is handled by the AER path\n+  only. Resolving this requires either an AER-core change to attempt\n+  a config read with link-validity gating, or a separate CXL-side\n+  notification mechanism for upstream-signaled fatal events.\n+* **User-defined status masks** are not yet supported. All CE and UE\n+  status bits are reported as they appear in the RAS register.\n+* **Port traversing in cxl_do_recovery()** is not yet implemented; a\n+  CXL UE today is reported and acted on at the source device only,\n+  not propagated to ancestor ports.\n+* The RCH producer (``aer_cxl_rch.c``) currently lives under\n+  ``drivers/pci/pcie/`` for historical reasons. Moving it to\n+  ``drivers/cxl/core/ras_rch.c`` is on the roadmap.\n+\n","prefixes":["v17","11/11"]}