[{"id":3686701,"web_url":"http://patchwork.ozlabs.org/comment/3686701/","msgid":"<20260505210753.B9B8FC2BCB4@smtp.kernel.org>","list_archive_url":null,"date":"2026-05-05T21:07:53","subject":"Re: [PATCH v17 02/11] cxl/ras: Unify Endpoint and Port AER trace\n events","submitter":{"id":93311,"url":"http://patchwork.ozlabs.org/api/people/93311/","name":null,"email":"sashiko-bot@kernel.org"},"content":"Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:\n- [Critical] ABI breakage for user-space RAS monitoring tools.\n- [Critical] Massive kernel stack out-of-bounds read (Information Leak) in trace_cxl_aer_uncorrectable_error.\n- [High] Unconditional PCI configuration space reads via pci_get_dsn() on potentially frozen links.\n- [High] Out-of-bounds MMIO read in header_log_copy() causing potential system crash.\n--\n\ncommit 7ac043ff49acf52a4133043e3ef651a4e11ba73f\nAuthor: Dan Williams <djbw@kernel.org>\n\ncxl/ras: Unify Endpoint and Port AER trace events\n\nThis commit unifies the trace events for CXL Endpoint and Port devices on\nboth the CPER and native AER paths. It generalizes the correctable and\nuncorrectable error events to use a common device and serial number format.\nIt also renames the memdev field to device to support all CXL device types.\n\nLink: https://lore.kernel.org/linux-cxl/69cb2d5ba3111_178904100b7@dwillia2-mobl4.notmuch/\n\n>     **WARNING: ABI BREAK**\n>     Rename the trace event field \"memdev\" to \"device\" so all CXL device types\n>     (Ports and Endpoints) can be reported under a common field name. Note this\n>     is an ABI break for userspace tools that key off the old \"memdev\" field.\n>     Specifically, rasdaemon's ras-cxl-handler.c looks up \"memdev\" and bails on\n>     NULL, so an unmodified rasdaemon will drop every CXL CE/UCE event once this\n>     kernel ships. A rasdaemon update is needed in a separate series.\n\nDoes renaming this trace event field violate the stable ABI policy for\ntrace events?\n\nSince unmodified system monitoring utilities like rasdaemon will drop\nthese memory errors after this change, should a backward-compatible approach\nbe used, such as retaining the memdev field or adding a new event without\ndeleting the old one?\n\n> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c\n> index 006c6ffc2f567..d7081caaf5d3f 100644\n> --- a/drivers/cxl/core/ras.c\n> +++ b/drivers/cxl/core/ras.c\n> @@ -28,20 +28,24 @@ static void cxl_cper_trace_uncorr_port_prot_err(struct pci_dev *pdev,\n>  \telse\n>  \t\tfe = status;\n>  \n> -\ttrace_cxl_port_aer_uncorrectable_error(&pdev->dev, status, fe,\n> -\t\t\t\t\t       ras_cap.header_log);\n> +\ttrace_cxl_aer_uncorrectable_error(&pdev->dev, status, fe,\n> +\t\t\t\t\t  ras_cap.header_log,\n> +\t\t\t\t\t  pci_get_dsn(pdev));\n>  }\n\nCan passing ras_cap.header_log here cause an out-of-bounds stack read?\n\nThe ras_cap.header_log field is an array of 16 u32 elements (64 bytes).\nHowever, inside the trace_cxl_aer_uncorrectable_error() fast assign block,\nit copies CXL_HEADERLOG_SIZE (512 bytes):\n\n\tTP_fast_assign(\n\t\t...\n\t\tmemcpy(__entry->header_log, hl, CXL_HEADERLOG_SIZE);\n\t)\n\nWill this read 448 bytes past the end of the header_log array and leak\nadjacent uninitialized kernel stack data into the trace ring buffer, making\nit accessible to user-space?\n\nAlso, can calling pci_get_dsn() here cause issues if the PCIe link is in a\nfrozen state?\n\nSince pci_get_dsn() performs live PCI config space reads, executing it on a\ndisconnected or frozen device could result in system hangs, extended timeouts,\nor Unsupported Request completions.\n\nFurthermore, since pci_get_dsn() is passed as an argument to the trace\nwrapper, won't the config space read execute unconditionally on every\nerror even if tracing is disabled?\n\n[ ... ]\n\n> @@ -247,7 +252,7 @@ bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)\n>  \t}\n>  \n>  \theader_log_copy(ras_base, hl);\n> -\ttrace_cxl_aer_uncorrectable_error(to_cxl_memdev(dev), status, fe, hl);\n> +\ttrace_cxl_aer_uncorrectable_error(dev, status, fe, hl, serial);\n>  \twritel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);\n\nCan the call to header_log_copy() just above this trace event cause an\nout-of-bounds MMIO read?\n\nThe header_log_copy() function iterates 128 times to read 512 bytes from\nthe MMIO address:\n\nstatic void header_log_copy(void __iomem *ras_base, u32 *log)\n{\n\t...\n\tint i, log_u32_size = CXL_HEADERLOG_SIZE / sizeof(u32);\n\n\taddr = ras_base + CXL_RAS_HEADER_LOG_OFFSET;\n\tlog_addr = log;\n\n\tfor (i = 0; i < log_u32_size; i++) {\n\t\t*log_addr = readl(addr);\n\t\t...\n}\n\nHowever, during component probe, the CXL RAS capability block is mapped\nwith a length of CXL_RAS_CAPABILITY_LENGTH, which is only 88 bytes.\n\nCould attempting to read 512 bytes from an 88-byte MMIO mapping send PCIe\nmemory reads to unintended device offsets, potentially triggering\nMaster Aborts, Unsupported Requests, or unhandled page faults?\n\n[ ... ]\n\n> @@ -299,7 +305,8 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,\n>  \t\t * chance the situation is recoverable dump the status of the RAS\n>  \t\t * capability registers and bounce the active state of the memdev.\n>  \t\t */\n> -\t\tue = cxl_handle_ras(&cxlds->cxlmd->dev, cxlmd->endpoint->regs.ras);\n> +\t\tue = cxl_handle_ras(&cxlds->cxlmd->dev, pci_get_dsn(pdev),\n> +\t\t\t\t    cxlmd->endpoint->regs.ras);\n>  \t}\n\nIn cxl_error_detected(), if the PCIe link is in a frozen state, will the\ncall to pci_get_dsn() fail?\n\nBecause pci_get_dsn() performs live PCI config space reads, calling it\nwhen the channel state is pci_channel_io_frozen could cause system hangs\nor Unsupported Request completions.\n\nWill this vulnerability persist even if subsequent patches in the series\n(like \"cxl: Update Endpoint AER uncorrectable handler\") refactor the error\nflow into a workqueue, since the pci_get_dsn() call remains on the\nfatal error path?","headers":{"Return-Path":"\n <linux-pci+bounces-53791-incoming=patchwork.ozlabs.org@vger.kernel.org>","X-Original-To":["incoming@patchwork.ozlabs.org","linux-pci@vger.kernel.org"],"Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=kernel.org header.i=@kernel.org header.a=rsa-sha256\n header.s=k20201202 header.b=q1k08Uy6;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=2600:3c0a:e001:db::12fc:5321; helo=sea.lore.kernel.org;\n envelope-from=linux-pci+bounces-53791-incoming=patchwork.ozlabs.org@vger.kernel.org;\n receiver=patchwork.ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org\n header.b=\"q1k08Uy6\"","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=10.30.226.201"],"Received":["from sea.lore.kernel.org (sea.lore.kernel.org\n [IPv6:2600:3c0a:e001:db::12fc:5321])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4g9B1W4QB7z1yJV\n\tfor <incoming@patchwork.ozlabs.org>; Wed, 06 May 2026 07:07:59 +1000 (AEST)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby sea.lore.kernel.org (Postfix) with ESMTP id A2DED30262F6\n\tfor <incoming@patchwork.ozlabs.org>; Tue,  5 May 2026 21:07:55 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id 729D739EF0B;\n\tTue,  5 May 2026 21:07:54 +0000 (UTC)","from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org\n [10.30.226.201])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F935291864\n\tfor <linux-pci@vger.kernel.org>; Tue,  5 May 2026 21:07:54 +0000 (UTC)","by smtp.kernel.org (Postfix) with ESMTPSA id B9B8FC2BCB4;\n\tTue,  5 May 2026 21:07:53 +0000 (UTC)"],"ARC-Seal":"i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1778015274; cv=none;\n b=h9S1qWVzbQ5u0mI95g4erYRJ3c0dL9GEnzFLRXihVmJi9bAGNQ1o2/AjXNOuHCEIItBkB0k9HdlbbzALy0TQCRfc0V3UAtRZXenpxGL1fc9RIW80D7R9AUVK2WcOWJfHUUW1hBDzWmIltJnhqpSPMXWyKLCvpu5kEXbao4zKDvE=","ARC-Message-Signature":"i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1778015274; c=relaxed/simple;\n\tbh=Mu5QqxyqVGswjgV1q0Kxf4rbENVMtQfqfy3dr4PFfUc=;\n\th=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date:\n\t Message-Id;\n b=fgh4ehp5lMX17VgUbl1gnmcNyjJAzHurkr05SckEpSmIq5gob6a3BoCVhCrnUxAVRzp/vzMdz6EAIzLrQ1uK4TBkjvbuj4s8XxfkNyYy6Wc4GPDh37xxkSzu8ZwZk94o2BpVsEXHfhzYU9R4XQy2UPp68/ksxr7sG3P76rXZGgg=","ARC-Authentication-Results":"i=1; smtp.subspace.kernel.org;\n dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org\n header.b=q1k08Uy6; arc=none smtp.client-ip=10.30.226.201","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;\n\ts=k20201202; t=1778015273;\n\tbh=Mu5QqxyqVGswjgV1q0Kxf4rbENVMtQfqfy3dr4PFfUc=;\n\th=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date:From;\n\tb=q1k08Uy6PnaulxxSgI5Jjsrq8tiK52wYGSuO51FV7x8VJ6q7+ZRet4n6n8pDDf8lI\n\t nMUiqalG5eKsB68qj9ZI8Fgg4GZjJ595WihDA9PCUobobPDL6bv/h4PEdE6bxm+PoE\n\t wczCUvClfvWJ7mvGBJhRSvUhLZ3srpFgpYlhvUN99kwoDIvImKXVjHgt8cojaBx5q+\n\t lw31pjS3/Tao34So19feGKblT7F+PhKrzCosicSVYzilwrHzs6Deu2SpMZtZg+wjyc\n\t YDkjjw8h5RCzSuCiokS5M3LpW2jl7FC6444rWp0k7TuQYn3wmd+QvnKWyIk2jOqbFI\n\t heEnMxjj+Cffw==","From":"sashiko-bot@kernel.org","Subject":"Re: [PATCH v17 02/11] cxl/ras: Unify Endpoint and Port AER trace\n events","Reply-To":"sashiko@lists.linux.dev","To":"\"Terry Bowman\" <terry.bowman@amd.com>","Cc":"linux-pci@vger.kernel.org","In-Reply-To":"<20260505173029.2718246-3-terry.bowman@amd.com>","References":"<20260505173029.2718246-3-terry.bowman@amd.com>","Content-Type":"text/plain; charset=utf-8","Content-Transfer-Encoding":"quoted-printable","Date":"Tue, 05 May 2026 21:07:53 +0000","Message-Id":"<20260505210753.B9B8FC2BCB4@smtp.kernel.org>","Precedence":"bulk","X-Mailing-List":"linux-pci@vger.kernel.org","List-Id":"<linux-pci.vger.kernel.org>","List-Subscribe":"<mailto:linux-pci+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:linux-pci+unsubscribe@vger.kernel.org>"}},{"id":3686729,"web_url":"http://patchwork.ozlabs.org/comment/3686729/","msgid":"<618748f8-ad14-42ea-8694-0c1bc5b95c54@intel.com>","list_archive_url":null,"date":"2026-05-05T21:46:55","subject":"Re: [PATCH v17 02/11] cxl/ras: Unify Endpoint and Port AER trace\n events","submitter":{"id":13225,"url":"http://patchwork.ozlabs.org/api/people/13225/","name":"Dave Jiang","email":"dave.jiang@intel.com"},"content":"On 5/5/26 10:30 AM, Terry Bowman wrote:\n> From: Dan Williams <djbw@kernel.org>\n> \n> CXL protocol error logging uses two parallel sets of trace events. The\n> cxl_port_aer_correctable_error() and cxl_port_aer_uncorrectable_error()\n> events are used by CPER for CXL Port devices. The cxl_aer_correctable_error()\n> and cxl_aer_uncorrectable_error() events are used for CXL Endpoints. Update\n> the trace routines to use the latter for all CXL devices on both the CPER\n> and native AER paths.\n> \n> Generalize cxl_aer_correctable_error()/cxl_aer_uncorrectable_error to\n> take a struct device * and a u64 serial argument supplied by the caller.\n> cxl_handle_ras() and cxl_handle_cor_ras() gain the new u64 serial parameter,\n> sourced from pci_get_dsn().\n> \n> The CPER path keeps its existing Port-vs-Endpoint dispatch and passes the\n> new arguments to the unified trace events. The CPER path will be folded\n> together in a following patch.\n> \n> Remove the now-unused cxl_port_aer_correctable_error() and\n> cxl_port_aer_uncorrectable_error().\n> \n> **WARNING: ABI BREAK**\n> Rename the trace event field \"memdev\" to \"device\" so all CXL device types\n> (Ports and Endpoints) can be reported under a common field name. Note this\n> is an ABI break for userspace tools that key off the old \"memdev\" field.\n> Specifically, rasdaemon's ras-cxl-handler.c looks up \"memdev\" and bails on\n> NULL, so an unmodified rasdaemon will drop every CXL CE/UCE event once this\n> kernel ships. A rasdaemon update is needed in a separate series.\n> \n> The need for the field rename was discussed in v16 review [1].\n> \n> Also, for CXL Upstream Switch Port (USP) and Endpoint (EP) fatal UCE,\n> the cxl_aer_uncorrectable_error trace event is not emitted. The AER core\n> only retrieves PCI_ERR_UNCOR_STATUS for Root Ports, RCECs, and Downstream\n> Ports, or for non-fatal severities. PCI config reads to the source device\n> are expected to fail otherwise, so the AER core never reads the status\n> word, is_cxl_error() does not classify the event as CXL, and the AER path\n> handles it instead. In this case the AER handler consumes the event and\n> logs it as an AER error without calling the CXL RAS handlers or trace\n> logging.\n> \n> Before this patch, Endpoint and Port devices emitted different events:\n> \n>   # Endpoint (cxl_aer_*):\n>   cxl_aer_correctable_error: memdev=mem0 host=0000:0c:00.0 serial=0: status: 'CRC Threshold Hit'\n>   cxl_aer_uncorrectable_error: memdev=mem0 host=0000:0c:00.0 serial=0: status: 'Cache Data ECC Error | Memory Data ECC Error' first_error: 'Cache Data ECC Error'\n> \n>   # Port (cxl_port_aer_*, no serial field):\n>   cxl_port_aer_correctable_error: device=0000:0c:00.0 host=pci0000:0c status='CRC Threshold Hit'\n>   cxl_port_aer_uncorrectable_error: device=0000:0c:00.0 host=pci0000:0c status: 'Cache Data ECC Error | Memory Data ECC Error' first_error: 'Cache Data ECC Error'\n> \n> After this patch, all CXL devices emit the unified cxl_aer_* events\n> with the same field layout:\n> \n>   cxl_aer_correctable_error: device=0000:0c:00.0 host=pci0000:0c serial=0 status: 'CRC Threshold Hit'\n>   cxl_aer_uncorrectable_error: device=0000:0c:00.0 host=pci0000:0c serial=0 status: 'Cache Data ECC Error | Memory Data ECC Error' first_error: 'Cache Data ECC Error'\n> \n> [1] https://lore.kernel.org/linux-cxl/69cb2d5ba3111_178904100b7@dwillia2-mobl4.notmuch/\n> \n> Co-developed-by: Terry Bowman <terry.bowman@amd.com>\n> Signed-off-by: Terry Bowman <terry.bowman@amd.com>\n> Signed-off-by: Dan Williams <djbw@kernel.org>\n\nReviewed-by: Dave Jiang <dave.jiang@intel.com>\n\n\n> \n> ---\n> \n> Changes in v16->v17:\n> - Replace cxlds->serial with pci_get_dsn()\n> - Change 'memdev' to 'device' (Dan)\n> - Updated Commit message\n> \n> Changes in v15->v16:\n> - Add Dan's review-by\n> - Incorporate Dan's comment into commit message:\n> \"Add the serial number at the end to preserve compatibility with\n> libtraceevent parsing of the parameters.\"\n> \n> Changes in v14->v15:\n> - Update commit message.\n> - Moved cxl_handle_ras/cxl_handle_cor_ras() changes to future patch (terry)\n> \n> Changes in v13->v14:\n> - Update commit headline (Bjorn)\n> \n> Changes in v12->v13:\n> - Added Dave Jiang's review-by\n> \n> Changes in v11 -> v12:\n> - Correct parameters to call trace_cxl_aer_correctable_error()\n> - Add reviewed-by for Jonathan and Shiju\n> \n> Changes in v10->v11:\n> - Updated CE and UCE trace routines to maintain consistent TP_Struct ABI\n> and unchanged TP_printk() logging.\n> ---\n>  drivers/cxl/core/core.h    | 11 ++++--\n>  drivers/cxl/core/ras.c     | 39 +++++++++++--------\n>  drivers/cxl/core/ras_rch.c |  6 ++-\n>  drivers/cxl/core/trace.h   | 76 ++++++++------------------------------\n>  4 files changed, 49 insertions(+), 83 deletions(-)\n> \n> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h\n> index 82ca3a476708..132ac9c1ebf4 100644\n> --- a/drivers/cxl/core/core.h\n> +++ b/drivers/cxl/core/core.h\n> @@ -183,8 +183,9 @@ static inline struct device *dport_to_host(struct cxl_dport *dport)\n>  #ifdef CONFIG_CXL_RAS\n>  int cxl_ras_init(void);\n>  void cxl_ras_exit(void);\n> -bool cxl_handle_ras(struct device *dev, void __iomem *ras_base);\n> -void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base);\n> +bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base);\n> +void cxl_handle_cor_ras(struct device *dev, u64 serial,\n> +\t\t\tvoid __iomem *ras_base);\n>  void cxl_dport_map_rch_aer(struct cxl_dport *dport);\n>  void cxl_disable_rch_root_ints(struct cxl_dport *dport);\n>  void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds);\n> @@ -195,11 +196,13 @@ static inline int cxl_ras_init(void)\n>  \treturn 0;\n>  }\n>  static inline void cxl_ras_exit(void) { }\n> -static inline bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)\n> +static inline bool cxl_handle_ras(struct device *dev, u64 serial,\n> +\t\t\t\t  void __iomem *ras_base)\n>  {\n>  \treturn false;\n>  }\n> -static inline void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base) { }\n> +static inline void cxl_handle_cor_ras(struct device *dev, u64 serial,\n> +\t\t\t\t      void __iomem *ras_base) { }\n>  static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { }\n>  static inline void cxl_disable_rch_root_ints(struct cxl_dport *dport) { }\n>  static inline void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }\n> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c\n> index 006c6ffc2f56..d7081caaf5d3 100644\n> --- a/drivers/cxl/core/ras.c\n> +++ b/drivers/cxl/core/ras.c\n> @@ -13,7 +13,7 @@ static void cxl_cper_trace_corr_port_prot_err(struct pci_dev *pdev,\n>  {\n>  \tu32 status = ras_cap.cor_status & ~ras_cap.cor_mask;\n>  \n> -\ttrace_cxl_port_aer_correctable_error(&pdev->dev, status);\n> +\ttrace_cxl_aer_correctable_error(&pdev->dev, status, pci_get_dsn(pdev));\n>  }\n>  \n>  static void cxl_cper_trace_uncorr_port_prot_err(struct pci_dev *pdev,\n> @@ -28,20 +28,24 @@ static void cxl_cper_trace_uncorr_port_prot_err(struct pci_dev *pdev,\n>  \telse\n>  \t\tfe = status;\n>  \n> -\ttrace_cxl_port_aer_uncorrectable_error(&pdev->dev, status, fe,\n> -\t\t\t\t\t       ras_cap.header_log);\n> +\ttrace_cxl_aer_uncorrectable_error(&pdev->dev, status, fe,\n> +\t\t\t\t\t  ras_cap.header_log,\n> +\t\t\t\t\t  pci_get_dsn(pdev));\n>  }\n>  \n> -static void cxl_cper_trace_corr_prot_err(struct cxl_memdev *cxlmd,\n> +static void cxl_cper_trace_corr_prot_err(struct pci_dev *pdev,\n> +\t\t\t\t\t struct cxl_memdev *cxlmd,\n>  \t\t\t\t\t struct cxl_ras_capability_regs ras_cap)\n>  {\n>  \tu32 status = ras_cap.cor_status & ~ras_cap.cor_mask;\n>  \n> -\ttrace_cxl_aer_correctable_error(cxlmd, status);\n> +\ttrace_cxl_aer_correctable_error(&cxlmd->dev, status,\n> +\t\t\t\t\tpci_get_dsn(pdev));\n>  }\n>  \n>  static void\n> -cxl_cper_trace_uncorr_prot_err(struct cxl_memdev *cxlmd,\n> +cxl_cper_trace_uncorr_prot_err(struct pci_dev *pdev,\n> +\t\t\t       struct cxl_memdev *cxlmd,\n>  \t\t\t       struct cxl_ras_capability_regs ras_cap)\n>  {\n>  \tu32 status = ras_cap.uncor_status & ~ras_cap.uncor_mask;\n> @@ -53,8 +57,9 @@ cxl_cper_trace_uncorr_prot_err(struct cxl_memdev *cxlmd,\n>  \telse\n>  \t\tfe = status;\n>  \n> -\ttrace_cxl_aer_uncorrectable_error(cxlmd, status, fe,\n> -\t\t\t\t\t  ras_cap.header_log);\n> +\ttrace_cxl_aer_uncorrectable_error(&cxlmd->dev, status, fe,\n> +\t\t\t\t\t  ras_cap.header_log,\n> +\t\t\t\t\t  pci_get_dsn(pdev));\n>  }\n>  \n>  static int match_memdev_by_parent(struct device *dev, const void *uport)\n> @@ -101,9 +106,9 @@ void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data)\n>  \n>  \tcxlmd = to_cxl_memdev(mem_dev);\n>  \tif (data->severity == AER_CORRECTABLE)\n> -\t\tcxl_cper_trace_corr_prot_err(cxlmd, data->ras_cap);\n> +\t\tcxl_cper_trace_corr_prot_err(pdev, cxlmd, data->ras_cap);\n>  \telse\n> -\t\tcxl_cper_trace_uncorr_prot_err(cxlmd, data->ras_cap);\n> +\t\tcxl_cper_trace_uncorr_prot_err(pdev, cxlmd, data->ras_cap);\n>  }\n>  EXPORT_SYMBOL_GPL(cxl_cper_handle_prot_err);\n>  \n> @@ -183,7 +188,7 @@ void devm_cxl_port_ras_setup(struct cxl_port *port)\n>  }\n>  EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, \"CXL\");\n>  \n> -void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)\n> +void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)\n>  {\n>  \tvoid __iomem *addr;\n>  \tu32 status;\n> @@ -195,7 +200,7 @@ void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)\n>  \tstatus = readl(addr);\n>  \tif (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {\n>  \t\twritel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);\n> -\t\ttrace_cxl_aer_correctable_error(to_cxl_memdev(dev), status);\n> +\t\ttrace_cxl_aer_correctable_error(dev, status, serial);\n>  \t}\n>  }\n>  \n> @@ -220,7 +225,7 @@ static void header_log_copy(void __iomem *ras_base, u32 *log)\n>   * Log the state of the RAS status registers and prepare them to log the\n>   * next error status. Return 1 if reset needed.\n>   */\n> -bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)\n> +bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base)\n>  {\n>  \tu32 hl[CXL_HEADERLOG_SIZE_U32];\n>  \tvoid __iomem *addr;\n> @@ -247,7 +252,7 @@ bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)\n>  \t}\n>  \n>  \theader_log_copy(ras_base, hl);\n> -\ttrace_cxl_aer_uncorrectable_error(to_cxl_memdev(dev), status, fe, hl);\n> +\ttrace_cxl_aer_uncorrectable_error(dev, status, fe, hl, serial);\n>  \twritel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);\n>  \n>  \treturn true;\n> @@ -270,7 +275,8 @@ void cxl_cor_error_detected(struct pci_dev *pdev)\n>  \t\tif (cxlds->rcd)\n>  \t\t\tcxl_handle_rdport_errors(cxlds);\n>  \n> -\t\tcxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlmd->endpoint->regs.ras);\n> +\t\tcxl_handle_cor_ras(&cxlds->cxlmd->dev, pci_get_dsn(pdev),\n> +\t\t\t\t   cxlmd->endpoint->regs.ras);\n>  \t}\n>  }\n>  EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, \"CXL\");\n> @@ -299,7 +305,8 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,\n>  \t\t * chance the situation is recoverable dump the status of the RAS\n>  \t\t * capability registers and bounce the active state of the memdev.\n>  \t\t */\n> -\t\tue = cxl_handle_ras(&cxlds->cxlmd->dev, cxlmd->endpoint->regs.ras);\n> +\t\tue = cxl_handle_ras(&cxlds->cxlmd->dev, pci_get_dsn(pdev),\n> +\t\t\t\t    cxlmd->endpoint->regs.ras);\n>  \t}\n>  \n>  \tswitch (state) {\n> diff --git a/drivers/cxl/core/ras_rch.c b/drivers/cxl/core/ras_rch.c\n> index 0a8b3b9b6388..61835fbafc0f 100644\n> --- a/drivers/cxl/core/ras_rch.c\n> +++ b/drivers/cxl/core/ras_rch.c\n> @@ -115,7 +115,9 @@ void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)\n>  \n>  \tpci_print_aer(pdev, severity, &aer_regs);\n>  \tif (severity == AER_CORRECTABLE)\n> -\t\tcxl_handle_cor_ras(&cxlds->cxlmd->dev, dport->regs.ras);\n> +\t\tcxl_handle_cor_ras(&cxlds->cxlmd->dev, pci_get_dsn(pdev),\n> +\t\t\t\t   dport->regs.ras);\n>  \telse\n> -\t\tcxl_handle_ras(&cxlds->cxlmd->dev, dport->regs.ras);\n> +\t\tcxl_handle_ras(&cxlds->cxlmd->dev, pci_get_dsn(pdev),\n> +\t\t\t       dport->regs.ras);\n>  }\n> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h\n> index a972e4ef1936..6f3957b3c3af 100644\n> --- a/drivers/cxl/core/trace.h\n> +++ b/drivers/cxl/core/trace.h\n> @@ -48,49 +48,22 @@\n>  \t{ CXL_RAS_UC_IDE_RX_ERR, \"IDE Rx Error\" }\t\t\t  \\\n>  )\n>  \n> -TRACE_EVENT(cxl_port_aer_uncorrectable_error,\n> -\tTP_PROTO(struct device *dev, u32 status, u32 fe, u32 *hl),\n> -\tTP_ARGS(dev, status, fe, hl),\n> +TRACE_EVENT(cxl_aer_uncorrectable_error,\n> +\tTP_PROTO(const struct device *dev, u32 status, u32 fe, u32 *hl,\n> +\t\t u64 serial),\n> +\tTP_ARGS(dev, status, fe, hl, serial),\n>  \tTP_STRUCT__entry(\n>  \t\t__string(device, dev_name(dev))\n>  \t\t__string(host, dev_name(dev->parent))\n> -\t\t__field(u32, status)\n> -\t\t__field(u32, first_error)\n> -\t\t__array(u32, header_log, CXL_HEADERLOG_SIZE_U32)\n> -\t),\n> -\tTP_fast_assign(\n> -\t\t__assign_str(device);\n> -\t\t__assign_str(host);\n> -\t\t__entry->status = status;\n> -\t\t__entry->first_error = fe;\n> -\t\t/*\n> -\t\t * Embed the 512B headerlog data for user app retrieval and\n> -\t\t * parsing, but no need to print this in the trace buffer.\n> -\t\t */\n> -\t\tmemcpy(__entry->header_log, hl, CXL_HEADERLOG_SIZE);\n> -\t),\n> -\tTP_printk(\"device=%s host=%s status: '%s' first_error: '%s'\",\n> -\t\t  __get_str(device), __get_str(host),\n> -\t\t  show_uc_errs(__entry->status),\n> -\t\t  show_uc_errs(__entry->first_error)\n> -\t)\n> -);\n> -\n> -TRACE_EVENT(cxl_aer_uncorrectable_error,\n> -\tTP_PROTO(const struct cxl_memdev *cxlmd, u32 status, u32 fe, u32 *hl),\n> -\tTP_ARGS(cxlmd, status, fe, hl),\n> -\tTP_STRUCT__entry(\n> -\t\t__string(memdev, dev_name(&cxlmd->dev))\n> -\t\t__string(host, dev_name(cxlmd->dev.parent))\n>  \t\t__field(u64, serial)\n>  \t\t__field(u32, status)\n>  \t\t__field(u32, first_error)\n>  \t\t__array(u32, header_log, CXL_HEADERLOG_SIZE_U32)\n>  \t),\n>  \tTP_fast_assign(\n> -\t\t__assign_str(memdev);\n> +\t\t__assign_str(device);\n>  \t\t__assign_str(host);\n> -\t\t__entry->serial = cxlmd->cxlds->serial;\n> +\t\t__entry->serial = serial;\n>  \t\t__entry->status = status;\n>  \t\t__entry->first_error = fe;\n>  \t\t/*\n> @@ -99,8 +72,8 @@ TRACE_EVENT(cxl_aer_uncorrectable_error,\n>  \t\t */\n>  \t\tmemcpy(__entry->header_log, hl, CXL_HEADERLOG_SIZE);\n>  \t),\n> -\tTP_printk(\"memdev=%s host=%s serial=%lld: status: '%s' first_error: '%s'\",\n> -\t\t  __get_str(memdev), __get_str(host), __entry->serial,\n> +\tTP_printk(\"device=%s host=%s serial=%lld status: '%s' first_error: '%s'\",\n> +\t\t  __get_str(device), __get_str(host), __entry->serial,\n>  \t\t  show_uc_errs(__entry->status),\n>  \t\t  show_uc_errs(__entry->first_error)\n>  \t)\n> @@ -124,42 +97,23 @@ TRACE_EVENT(cxl_aer_uncorrectable_error,\n>  \t{ CXL_RAS_CE_PHYS_LAYER_ERR, \"Received Error From Physical Layer\" }\t\\\n>  )\n>  \n> -TRACE_EVENT(cxl_port_aer_correctable_error,\n> -\tTP_PROTO(struct device *dev, u32 status),\n> -\tTP_ARGS(dev, status),\n> +TRACE_EVENT(cxl_aer_correctable_error,\n> +\tTP_PROTO(const struct device *dev, u32 status, u64 serial),\n> +\tTP_ARGS(dev, status, serial),\n>  \tTP_STRUCT__entry(\n>  \t\t__string(device, dev_name(dev))\n>  \t\t__string(host, dev_name(dev->parent))\n> -\t\t__field(u32, status)\n> -\t),\n> -\tTP_fast_assign(\n> -\t\t__assign_str(device);\n> -\t\t__assign_str(host);\n> -\t\t__entry->status = status;\n> -\t),\n> -\tTP_printk(\"device=%s host=%s status='%s'\",\n> -\t\t  __get_str(device), __get_str(host),\n> -\t\t  show_ce_errs(__entry->status)\n> -\t)\n> -);\n> -\n> -TRACE_EVENT(cxl_aer_correctable_error,\n> -\tTP_PROTO(const struct cxl_memdev *cxlmd, u32 status),\n> -\tTP_ARGS(cxlmd, status),\n> -\tTP_STRUCT__entry(\n> -\t\t__string(memdev, dev_name(&cxlmd->dev))\n> -\t\t__string(host, dev_name(cxlmd->dev.parent))\n>  \t\t__field(u64, serial)\n>  \t\t__field(u32, status)\n>  \t),\n>  \tTP_fast_assign(\n> -\t\t__assign_str(memdev);\n> +\t\t__assign_str(device);\n>  \t\t__assign_str(host);\n> -\t\t__entry->serial = cxlmd->cxlds->serial;\n> +\t\t__entry->serial = serial;\n>  \t\t__entry->status = status;\n>  \t),\n> -\tTP_printk(\"memdev=%s host=%s serial=%lld: status: '%s'\",\n> -\t\t  __get_str(memdev), __get_str(host), __entry->serial,\n> +\tTP_printk(\"device=%s host=%s serial=%lld status: '%s'\",\n> +\t\t  __get_str(device), __get_str(host), __entry->serial,\n>  \t\t  show_ce_errs(__entry->status)\n>  \t)\n>  );","headers":{"Return-Path":"\n <linux-pci+bounces-53800-incoming=patchwork.ozlabs.org@vger.kernel.org>","X-Original-To":["incoming@patchwork.ozlabs.org","linux-pci@vger.kernel.org"],"Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256\n header.s=Intel header.b=dq0M4+7N;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=2600:3c15:e001:75::12fc:5321; helo=sin.lore.kernel.org;\n envelope-from=linux-pci+bounces-53800-incoming=patchwork.ozlabs.org@vger.kernel.org;\n receiver=patchwork.ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com\n header.b=\"dq0M4+7N\"","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=192.198.163.9","smtp.subspace.kernel.org;\n dmarc=pass (p=none dis=none) header.from=intel.com","smtp.subspace.kernel.org;\n spf=pass smtp.mailfrom=intel.com"],"Received":["from sin.lore.kernel.org (sin.lore.kernel.org\n [IPv6:2600:3c15:e001:75::12fc:5321])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4g9Bx90KX6z1yJq\n\tfor <incoming@patchwork.ozlabs.org>; Wed, 06 May 2026 07:49:17 +1000 (AEST)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby sin.lore.kernel.org (Postfix) with ESMTP id C0CF4303C98E\n\tfor <incoming@patchwork.ozlabs.org>; Tue,  5 May 2026 21:47:13 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id 0D3684968FC;\n\tTue,  5 May 2026 21:47:02 +0000 (UTC)","from mgamail.intel.com (mgamail.intel.com [192.198.163.9])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id 8066F481ABB;\n\tTue,  5 May 2026 21:46:59 +0000 (UTC)","from fmviesa001.fm.intel.com ([10.60.135.141])\n  by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;\n 05 May 2026 14:46:58 -0700","from spandruv-mobl5.amr.corp.intel.com (HELO [10.125.110.85])\n ([10.125.110.85])\n  by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;\n 05 May 2026 14:46:57 -0700"],"ARC-Seal":"i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1778017621; cv=none;\n b=OVWjl/vQzhOr8aQOzcC+f9CQl+DfL3TRxgBHw/XRhwgy06QuxG5kAGTm+UzY6YmJxue0t7DrMwvg/5ZcC94fcTIAsz9mFglHk4l0jOBu2R8hj1pGzo9f2I4rXVLmaOkf8tTqLN0CzDTvyyR3PpFVyS2Cddtjpv2ZD5AThI3gT4M=","ARC-Message-Signature":"i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1778017621; c=relaxed/simple;\n\tbh=BHSv3jAwjSZo+2AWxxFqINgE4/fx+LvuaSNvCy3HofQ=;\n\th=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From:\n\t In-Reply-To:Content-Type;\n b=fS/h94xy3tsiQO3ua4Bl/gmvzaqIfGj6ws6CrnjB9oCcCC6oXGe3juJqwZl/amI/1iYlBLn4/Tzo++c4chpS8A82QZY2NW6sE3KXwOHqdwiTSzhlsbdaUoFQJxWUoiu+Zj7SfK9iVd755WcfnYiUMvB0CHQNyRGAG377FdlU2zw=","ARC-Authentication-Results":"i=1; smtp.subspace.kernel.org;\n dmarc=pass (p=none dis=none) header.from=intel.com;\n spf=pass smtp.mailfrom=intel.com;\n dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com\n header.b=dq0M4+7N; arc=none smtp.client-ip=192.198.163.9","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple;\n  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;\n  t=1778017619; x=1809553619;\n  h=message-id:date:mime-version:subject:to:cc:references:\n   from:in-reply-to:content-transfer-encoding;\n  bh=BHSv3jAwjSZo+2AWxxFqINgE4/fx+LvuaSNvCy3HofQ=;\n  b=dq0M4+7Nk7kWu3egbaPd2BXkGzlhaZBL0wBwpI1Tn+WDVbvG777OdzMb\n   Rfqo7kp9j34WCb+CI9pZ9GbzOGnY2EM3NSkmBicEY/4qldyIaHi/spaCz\n   dhu3Rfexc18XS6Zri8aLVFJqhnxHS/at+vbEVdBT//9k10EabmEidrINF\n   99WL5DcDHe3wMnHFQXslR0sU4QYDFI0ierWu6uQyulKHSTCATmW+fSjzA\n   Wk9YELqTO2SkhPft52bit6zclEt4YaVhtuKf/fuyYe3Yu+uWjh+OxPsc1\n   O9xNqM+wcTsJVX9sOUpb4umnw5YMz1/7nzXndwvdg7w84bOFWg2LGc+pc\n   Q==;","X-CSE-ConnectionGUID":["s/ERNurzRvOCeRY1Ki9ZWQ==","uaxHixZCSjGCywU16kTVmg=="],"X-CSE-MsgGUID":["oIPhTgVZQnauwgFwYzqlyQ==","cAR89AN2QQ+jA+jP/g45PA=="],"X-IronPort-AV":["E=McAfee;i=\"6800,10657,11777\"; a=\"89602157\"","E=Sophos;i=\"6.23,218,1770624000\";\n   d=\"scan'208\";a=\"89602157\"","E=Sophos;i=\"6.23,218,1770624000\";\n   d=\"scan'208\";a=\"259599581\""],"X-ExtLoop1":"1","Message-ID":"<618748f8-ad14-42ea-8694-0c1bc5b95c54@intel.com>","Date":"Tue, 5 May 2026 14:46:55 -0700","Precedence":"bulk","X-Mailing-List":"linux-pci@vger.kernel.org","List-Id":"<linux-pci.vger.kernel.org>","List-Subscribe":"<mailto:linux-pci+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:linux-pci+unsubscribe@vger.kernel.org>","MIME-Version":"1.0","User-Agent":"Mozilla Thunderbird","Subject":"Re: [PATCH v17 02/11] cxl/ras: Unify Endpoint and Port AER trace\n events","To":"Terry Bowman <terry.bowman@amd.com>, dave@stgolabs.net, jic23@kernel.org,\n alison.schofield@intel.com, djbw@kernel.org, bhelgaas@google.com,\n ming.li@zohomail.com, Smita.KoralahalliChannabasappa@amd.com,\n rrichter@amd.com, PradeepVineshReddy.Kodamati@amd.com, lukas@wunner.de,\n Benjamin.Cheatham@amd.com, sathyanarayanan.kuppuswamy@linux.intel.com,\n vishal.l.verma@intel.com, alucerop@amd.com, ira.weiny@intel.com,\n corbet@lwn.net, rafael@kernel.org, xueshuai@linux.alibaba.com,\n linux-cxl@vger.kernel.org","Cc":"linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,\n linux-acpi@vger.kernel.org, linux-doc@vger.kernel.org","References":"<20260505173029.2718246-1-terry.bowman@amd.com>\n <20260505173029.2718246-3-terry.bowman@amd.com>","Content-Language":"en-US","From":"Dave Jiang <dave.jiang@intel.com>","In-Reply-To":"<20260505173029.2718246-3-terry.bowman@amd.com>","Content-Type":"text/plain; charset=UTF-8","Content-Transfer-Encoding":"7bit"}},{"id":3687977,"web_url":"http://patchwork.ozlabs.org/comment/3687977/","msgid":"<20260507190836.70197e24@jic23-huawei>","list_archive_url":null,"date":"2026-05-07T18:08:36","subject":"Re: [PATCH v17 02/11] cxl/ras: Unify Endpoint and Port AER trace\n events","submitter":{"id":10151,"url":"http://patchwork.ozlabs.org/api/people/10151/","name":"Jonathan Cameron","email":"jic23@kernel.org"},"content":"On Tue, 5 May 2026 12:30:20 -0500\nTerry Bowman <terry.bowman@amd.com> wrote:\n\n> From: Dan Williams <djbw@kernel.org>\n\n+CC Mauro - rasdaemon related - see below.\n\n> \n> CXL protocol error logging uses two parallel sets of trace events. The\n> cxl_port_aer_correctable_error() and cxl_port_aer_uncorrectable_error()\n> events are used by CPER for CXL Port devices. The cxl_aer_correctable_error()\n> and cxl_aer_uncorrectable_error() events are used for CXL Endpoints. Update\n> the trace routines to use the latter for all CXL devices on both the CPER\n> and native AER paths.\n> \n> Generalize cxl_aer_correctable_error()/cxl_aer_uncorrectable_error to\n> take a struct device * and a u64 serial argument supplied by the caller.\n> cxl_handle_ras() and cxl_handle_cor_ras() gain the new u64 serial parameter,\n> sourced from pci_get_dsn().\n> \n> The CPER path keeps its existing Port-vs-Endpoint dispatch and passes the\n> new arguments to the unified trace events. The CPER path will be folded\n> together in a following patch.\n> \n> Remove the now-unused cxl_port_aer_correctable_error() and\n> cxl_port_aer_uncorrectable_error().\n> \n> **WARNING: ABI BREAK**\n> Rename the trace event field \"memdev\" to \"device\" so all CXL device types\n> (Ports and Endpoints) can be reported under a common field name. Note this\n> is an ABI break for userspace tools that key off the old \"memdev\" field.\n> Specifically, rasdaemon's ras-cxl-handler.c looks up \"memdev\" and bails on\n> NULL, so an unmodified rasdaemon will drop every CXL CE/UCE event once this\n> kernel ships. A rasdaemon update is needed in a separate series.\n> \n> The need for the field rename was discussed in v16 review [1].\n\nThis concerns me (sorry I wasn't paying attention to the v16 thread).\nIt is a userspace regression against code that is out in the wild and typically\nnot updated in sync with the kernel.\n\nIf you are suggesting breaking ras-daemon at the very least +CC the maintainer.\n\nTo get to a unified tracepoint add a new one that does what you want, but\nmaintain the existing ones as well.  Userspace can then migrate and maybe\nin 5+ years time we can delete the non unified ones.\n\nNo actually comments on the code, just left it all here for Mauro,\n\nThanks,\n\nJonathan\n\n> \n> Also, for CXL Upstream Switch Port (USP) and Endpoint (EP) fatal UCE,\n> the cxl_aer_uncorrectable_error trace event is not emitted. The AER core\n> only retrieves PCI_ERR_UNCOR_STATUS for Root Ports, RCECs, and Downstream\n> Ports, or for non-fatal severities. PCI config reads to the source device\n> are expected to fail otherwise, so the AER core never reads the status\n> word, is_cxl_error() does not classify the event as CXL, and the AER path\n> handles it instead. In this case the AER handler consumes the event and\n> logs it as an AER error without calling the CXL RAS handlers or trace\n> logging.\n> \n> Before this patch, Endpoint and Port devices emitted different events:\n> \n>   # Endpoint (cxl_aer_*):\n>   cxl_aer_correctable_error: memdev=mem0 host=0000:0c:00.0 serial=0: status: 'CRC Threshold Hit'\n>   cxl_aer_uncorrectable_error: memdev=mem0 host=0000:0c:00.0 serial=0: status: 'Cache Data ECC Error | Memory Data ECC Error' first_error: 'Cache Data ECC Error'\n> \n>   # Port (cxl_port_aer_*, no serial field):\n>   cxl_port_aer_correctable_error: device=0000:0c:00.0 host=pci0000:0c status='CRC Threshold Hit'\n>   cxl_port_aer_uncorrectable_error: device=0000:0c:00.0 host=pci0000:0c status: 'Cache Data ECC Error | Memory Data ECC Error' first_error: 'Cache Data ECC Error'\n> \n> After this patch, all CXL devices emit the unified cxl_aer_* events\n> with the same field layout:\n> \n>   cxl_aer_correctable_error: device=0000:0c:00.0 host=pci0000:0c serial=0 status: 'CRC Threshold Hit'\n>   cxl_aer_uncorrectable_error: device=0000:0c:00.0 host=pci0000:0c serial=0 status: 'Cache Data ECC Error | Memory Data ECC Error' first_error: 'Cache Data ECC Error'\n> \n> [1] https://lore.kernel.org/linux-cxl/69cb2d5ba3111_178904100b7@dwillia2-mobl4.notmuch/\n> \n> Co-developed-by: Terry Bowman <terry.bowman@amd.com>\n> Signed-off-by: Terry Bowman <terry.bowman@amd.com>\n> Signed-off-by: Dan Williams <djbw@kernel.org>\n> \n> ---\n> \n> Changes in v16->v17:\n> - Replace cxlds->serial with pci_get_dsn()\n> - Change 'memdev' to 'device' (Dan)\n> - Updated Commit message\n> \n> Changes in v15->v16:\n> - Add Dan's review-by\n> - Incorporate Dan's comment into commit message:\n> \"Add the serial number at the end to preserve compatibility with\n> libtraceevent parsing of the parameters.\"\n> \n> Changes in v14->v15:\n> - Update commit message.\n> - Moved cxl_handle_ras/cxl_handle_cor_ras() changes to future patch (terry)\n> \n> Changes in v13->v14:\n> - Update commit headline (Bjorn)\n> \n> Changes in v12->v13:\n> - Added Dave Jiang's review-by\n> \n> Changes in v11 -> v12:\n> - Correct parameters to call trace_cxl_aer_correctable_error()\n> - Add reviewed-by for Jonathan and Shiju\n> \n> Changes in v10->v11:\n> - Updated CE and UCE trace routines to maintain consistent TP_Struct ABI\n> and unchanged TP_printk() logging.\n> ---\n>  drivers/cxl/core/core.h    | 11 ++++--\n>  drivers/cxl/core/ras.c     | 39 +++++++++++--------\n>  drivers/cxl/core/ras_rch.c |  6 ++-\n>  drivers/cxl/core/trace.h   | 76 ++++++++------------------------------\n>  4 files changed, 49 insertions(+), 83 deletions(-)\n> \n> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h\n> index 82ca3a476708..132ac9c1ebf4 100644\n> --- a/drivers/cxl/core/core.h\n> +++ b/drivers/cxl/core/core.h\n> @@ -183,8 +183,9 @@ static inline struct device *dport_to_host(struct cxl_dport *dport)\n>  #ifdef CONFIG_CXL_RAS\n>  int cxl_ras_init(void);\n>  void cxl_ras_exit(void);\n> -bool cxl_handle_ras(struct device *dev, void __iomem *ras_base);\n> -void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base);\n> +bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base);\n> +void cxl_handle_cor_ras(struct device *dev, u64 serial,\n> +\t\t\tvoid __iomem *ras_base);\n>  void cxl_dport_map_rch_aer(struct cxl_dport *dport);\n>  void cxl_disable_rch_root_ints(struct cxl_dport *dport);\n>  void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds);\n> @@ -195,11 +196,13 @@ static inline int cxl_ras_init(void)\n>  \treturn 0;\n>  }\n>  static inline void cxl_ras_exit(void) { }\n> -static inline bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)\n> +static inline bool cxl_handle_ras(struct device *dev, u64 serial,\n> +\t\t\t\t  void __iomem *ras_base)\n>  {\n>  \treturn false;\n>  }\n> -static inline void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base) { }\n> +static inline void cxl_handle_cor_ras(struct device *dev, u64 serial,\n> +\t\t\t\t      void __iomem *ras_base) { }\n>  static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { }\n>  static inline void cxl_disable_rch_root_ints(struct cxl_dport *dport) { }\n>  static inline void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }\n> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c\n> index 006c6ffc2f56..d7081caaf5d3 100644\n> --- a/drivers/cxl/core/ras.c\n> +++ b/drivers/cxl/core/ras.c\n> @@ -13,7 +13,7 @@ static void cxl_cper_trace_corr_port_prot_err(struct pci_dev *pdev,\n>  {\n>  \tu32 status = ras_cap.cor_status & ~ras_cap.cor_mask;\n>  \n> -\ttrace_cxl_port_aer_correctable_error(&pdev->dev, status);\n> +\ttrace_cxl_aer_correctable_error(&pdev->dev, status, pci_get_dsn(pdev));\n>  }\n>  \n>  static void cxl_cper_trace_uncorr_port_prot_err(struct pci_dev *pdev,\n> @@ -28,20 +28,24 @@ static void cxl_cper_trace_uncorr_port_prot_err(struct pci_dev *pdev,\n>  \telse\n>  \t\tfe = status;\n>  \n> -\ttrace_cxl_port_aer_uncorrectable_error(&pdev->dev, status, fe,\n> -\t\t\t\t\t       ras_cap.header_log);\n> +\ttrace_cxl_aer_uncorrectable_error(&pdev->dev, status, fe,\n> +\t\t\t\t\t  ras_cap.header_log,\n> +\t\t\t\t\t  pci_get_dsn(pdev));\n>  }\n>  \n> -static void cxl_cper_trace_corr_prot_err(struct cxl_memdev *cxlmd,\n> +static void cxl_cper_trace_corr_prot_err(struct pci_dev *pdev,\n> +\t\t\t\t\t struct cxl_memdev *cxlmd,\n>  \t\t\t\t\t struct cxl_ras_capability_regs ras_cap)\n>  {\n>  \tu32 status = ras_cap.cor_status & ~ras_cap.cor_mask;\n>  \n> -\ttrace_cxl_aer_correctable_error(cxlmd, status);\n> +\ttrace_cxl_aer_correctable_error(&cxlmd->dev, status,\n> +\t\t\t\t\tpci_get_dsn(pdev));\n>  }\n>  \n>  static void\n> -cxl_cper_trace_uncorr_prot_err(struct cxl_memdev *cxlmd,\n> +cxl_cper_trace_uncorr_prot_err(struct pci_dev *pdev,\n> +\t\t\t       struct cxl_memdev *cxlmd,\n>  \t\t\t       struct cxl_ras_capability_regs ras_cap)\n>  {\n>  \tu32 status = ras_cap.uncor_status & ~ras_cap.uncor_mask;\n> @@ -53,8 +57,9 @@ cxl_cper_trace_uncorr_prot_err(struct cxl_memdev *cxlmd,\n>  \telse\n>  \t\tfe = status;\n>  \n> -\ttrace_cxl_aer_uncorrectable_error(cxlmd, status, fe,\n> -\t\t\t\t\t  ras_cap.header_log);\n> +\ttrace_cxl_aer_uncorrectable_error(&cxlmd->dev, status, fe,\n> +\t\t\t\t\t  ras_cap.header_log,\n> +\t\t\t\t\t  pci_get_dsn(pdev));\n>  }\n>  \n>  static int match_memdev_by_parent(struct device *dev, const void *uport)\n> @@ -101,9 +106,9 @@ void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data)\n>  \n>  \tcxlmd = to_cxl_memdev(mem_dev);\n>  \tif (data->severity == AER_CORRECTABLE)\n> -\t\tcxl_cper_trace_corr_prot_err(cxlmd, data->ras_cap);\n> +\t\tcxl_cper_trace_corr_prot_err(pdev, cxlmd, data->ras_cap);\n>  \telse\n> -\t\tcxl_cper_trace_uncorr_prot_err(cxlmd, data->ras_cap);\n> +\t\tcxl_cper_trace_uncorr_prot_err(pdev, cxlmd, data->ras_cap);\n>  }\n>  EXPORT_SYMBOL_GPL(cxl_cper_handle_prot_err);\n>  \n> @@ -183,7 +188,7 @@ void devm_cxl_port_ras_setup(struct cxl_port *port)\n>  }\n>  EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, \"CXL\");\n>  \n> -void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)\n> +void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)\n>  {\n>  \tvoid __iomem *addr;\n>  \tu32 status;\n> @@ -195,7 +200,7 @@ void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)\n>  \tstatus = readl(addr);\n>  \tif (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {\n>  \t\twritel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);\n> -\t\ttrace_cxl_aer_correctable_error(to_cxl_memdev(dev), status);\n> +\t\ttrace_cxl_aer_correctable_error(dev, status, serial);\n>  \t}\n>  }\n>  \n> @@ -220,7 +225,7 @@ static void header_log_copy(void __iomem *ras_base, u32 *log)\n>   * Log the state of the RAS status registers and prepare them to log the\n>   * next error status. Return 1 if reset needed.\n>   */\n> -bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)\n> +bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base)\n>  {\n>  \tu32 hl[CXL_HEADERLOG_SIZE_U32];\n>  \tvoid __iomem *addr;\n> @@ -247,7 +252,7 @@ bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)\n>  \t}\n>  \n>  \theader_log_copy(ras_base, hl);\n> -\ttrace_cxl_aer_uncorrectable_error(to_cxl_memdev(dev), status, fe, hl);\n> +\ttrace_cxl_aer_uncorrectable_error(dev, status, fe, hl, serial);\n>  \twritel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);\n>  \n>  \treturn true;\n> @@ -270,7 +275,8 @@ void cxl_cor_error_detected(struct pci_dev *pdev)\n>  \t\tif (cxlds->rcd)\n>  \t\t\tcxl_handle_rdport_errors(cxlds);\n>  \n> -\t\tcxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlmd->endpoint->regs.ras);\n> +\t\tcxl_handle_cor_ras(&cxlds->cxlmd->dev, pci_get_dsn(pdev),\n> +\t\t\t\t   cxlmd->endpoint->regs.ras);\n>  \t}\n>  }\n>  EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, \"CXL\");\n> @@ -299,7 +305,8 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,\n>  \t\t * chance the situation is recoverable dump the status of the RAS\n>  \t\t * capability registers and bounce the active state of the memdev.\n>  \t\t */\n> -\t\tue = cxl_handle_ras(&cxlds->cxlmd->dev, cxlmd->endpoint->regs.ras);\n> +\t\tue = cxl_handle_ras(&cxlds->cxlmd->dev, pci_get_dsn(pdev),\n> +\t\t\t\t    cxlmd->endpoint->regs.ras);\n>  \t}\n>  \n>  \tswitch (state) {\n> diff --git a/drivers/cxl/core/ras_rch.c b/drivers/cxl/core/ras_rch.c\n> index 0a8b3b9b6388..61835fbafc0f 100644\n> --- a/drivers/cxl/core/ras_rch.c\n> +++ b/drivers/cxl/core/ras_rch.c\n> @@ -115,7 +115,9 @@ void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)\n>  \n>  \tpci_print_aer(pdev, severity, &aer_regs);\n>  \tif (severity == AER_CORRECTABLE)\n> -\t\tcxl_handle_cor_ras(&cxlds->cxlmd->dev, dport->regs.ras);\n> +\t\tcxl_handle_cor_ras(&cxlds->cxlmd->dev, pci_get_dsn(pdev),\n> +\t\t\t\t   dport->regs.ras);\n>  \telse\n> -\t\tcxl_handle_ras(&cxlds->cxlmd->dev, dport->regs.ras);\n> +\t\tcxl_handle_ras(&cxlds->cxlmd->dev, pci_get_dsn(pdev),\n> +\t\t\t       dport->regs.ras);\n>  }\n> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h\n> index a972e4ef1936..6f3957b3c3af 100644\n> --- a/drivers/cxl/core/trace.h\n> +++ b/drivers/cxl/core/trace.h\n> @@ -48,49 +48,22 @@\n>  \t{ CXL_RAS_UC_IDE_RX_ERR, \"IDE Rx Error\" }\t\t\t  \\\n>  )\n>  \n> -TRACE_EVENT(cxl_port_aer_uncorrectable_error,\n> -\tTP_PROTO(struct device *dev, u32 status, u32 fe, u32 *hl),\n> -\tTP_ARGS(dev, status, fe, hl),\n> +TRACE_EVENT(cxl_aer_uncorrectable_error,\n> +\tTP_PROTO(const struct device *dev, u32 status, u32 fe, u32 *hl,\n> +\t\t u64 serial),\n> +\tTP_ARGS(dev, status, fe, hl, serial),\n>  \tTP_STRUCT__entry(\n>  \t\t__string(device, dev_name(dev))\n>  \t\t__string(host, dev_name(dev->parent))\n> -\t\t__field(u32, status)\n> -\t\t__field(u32, first_error)\n> -\t\t__array(u32, header_log, CXL_HEADERLOG_SIZE_U32)\n> -\t),\n> -\tTP_fast_assign(\n> -\t\t__assign_str(device);\n> -\t\t__assign_str(host);\n> -\t\t__entry->status = status;\n> -\t\t__entry->first_error = fe;\n> -\t\t/*\n> -\t\t * Embed the 512B headerlog data for user app retrieval and\n> -\t\t * parsing, but no need to print this in the trace buffer.\n> -\t\t */\n> -\t\tmemcpy(__entry->header_log, hl, CXL_HEADERLOG_SIZE);\n> -\t),\n> -\tTP_printk(\"device=%s host=%s status: '%s' first_error: '%s'\",\n> -\t\t  __get_str(device), __get_str(host),\n> -\t\t  show_uc_errs(__entry->status),\n> -\t\t  show_uc_errs(__entry->first_error)\n> -\t)\n> -);\n> -\n> -TRACE_EVENT(cxl_aer_uncorrectable_error,\n> -\tTP_PROTO(const struct cxl_memdev *cxlmd, u32 status, u32 fe, u32 *hl),\n> -\tTP_ARGS(cxlmd, status, fe, hl),\n> -\tTP_STRUCT__entry(\n> -\t\t__string(memdev, dev_name(&cxlmd->dev))\n> -\t\t__string(host, dev_name(cxlmd->dev.parent))\n>  \t\t__field(u64, serial)\n>  \t\t__field(u32, status)\n>  \t\t__field(u32, first_error)\n>  \t\t__array(u32, header_log, CXL_HEADERLOG_SIZE_U32)\n>  \t),\n>  \tTP_fast_assign(\n> -\t\t__assign_str(memdev);\n> +\t\t__assign_str(device);\n>  \t\t__assign_str(host);\n> -\t\t__entry->serial = cxlmd->cxlds->serial;\n> +\t\t__entry->serial = serial;\n>  \t\t__entry->status = status;\n>  \t\t__entry->first_error = fe;\n>  \t\t/*\n> @@ -99,8 +72,8 @@ TRACE_EVENT(cxl_aer_uncorrectable_error,\n>  \t\t */\n>  \t\tmemcpy(__entry->header_log, hl, CXL_HEADERLOG_SIZE);\n>  \t),\n> -\tTP_printk(\"memdev=%s host=%s serial=%lld: status: '%s' first_error: '%s'\",\n> -\t\t  __get_str(memdev), __get_str(host), __entry->serial,\n> +\tTP_printk(\"device=%s host=%s serial=%lld status: '%s' first_error: '%s'\",\n> +\t\t  __get_str(device), __get_str(host), __entry->serial,\n>  \t\t  show_uc_errs(__entry->status),\n>  \t\t  show_uc_errs(__entry->first_error)\n>  \t)\n> @@ -124,42 +97,23 @@ TRACE_EVENT(cxl_aer_uncorrectable_error,\n>  \t{ CXL_RAS_CE_PHYS_LAYER_ERR, \"Received Error From Physical Layer\" }\t\\\n>  )\n>  \n> -TRACE_EVENT(cxl_port_aer_correctable_error,\n> -\tTP_PROTO(struct device *dev, u32 status),\n> -\tTP_ARGS(dev, status),\n> +TRACE_EVENT(cxl_aer_correctable_error,\n> +\tTP_PROTO(const struct device *dev, u32 status, u64 serial),\n> +\tTP_ARGS(dev, status, serial),\n>  \tTP_STRUCT__entry(\n>  \t\t__string(device, dev_name(dev))\n>  \t\t__string(host, dev_name(dev->parent))\n> -\t\t__field(u32, status)\n> -\t),\n> -\tTP_fast_assign(\n> -\t\t__assign_str(device);\n> -\t\t__assign_str(host);\n> -\t\t__entry->status = status;\n> -\t),\n> -\tTP_printk(\"device=%s host=%s status='%s'\",\n> -\t\t  __get_str(device), __get_str(host),\n> -\t\t  show_ce_errs(__entry->status)\n> -\t)\n> -);\n> -\n> -TRACE_EVENT(cxl_aer_correctable_error,\n> -\tTP_PROTO(const struct cxl_memdev *cxlmd, u32 status),\n> -\tTP_ARGS(cxlmd, status),\n> -\tTP_STRUCT__entry(\n> -\t\t__string(memdev, dev_name(&cxlmd->dev))\n> -\t\t__string(host, dev_name(cxlmd->dev.parent))\n>  \t\t__field(u64, serial)\n>  \t\t__field(u32, status)\n>  \t),\n>  \tTP_fast_assign(\n> -\t\t__assign_str(memdev);\n> +\t\t__assign_str(device);\n>  \t\t__assign_str(host);\n> -\t\t__entry->serial = cxlmd->cxlds->serial;\n> +\t\t__entry->serial = serial;\n>  \t\t__entry->status = status;\n>  \t),\n> -\tTP_printk(\"memdev=%s host=%s serial=%lld: status: '%s'\",\n> -\t\t  __get_str(memdev), __get_str(host), __entry->serial,\n> +\tTP_printk(\"device=%s host=%s serial=%lld status: '%s'\",\n> +\t\t  __get_str(device), __get_str(host), __entry->serial,\n>  \t\t  show_ce_errs(__entry->status)\n>  \t)\n>  );","headers":{"Return-Path":"\n <linux-pci+bounces-54111-incoming=patchwork.ozlabs.org@vger.kernel.org>","X-Original-To":["incoming@patchwork.ozlabs.org","linux-pci@vger.kernel.org"],"Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=kernel.org header.i=@kernel.org header.a=rsa-sha256\n header.s=k20201202 header.b=Yx1oTv3g;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=2600:3c15:e001:75::12fc:5321; helo=sin.lore.kernel.org;\n envelope-from=linux-pci+bounces-54111-incoming=patchwork.ozlabs.org@vger.kernel.org;\n receiver=patchwork.ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org\n header.b=\"Yx1oTv3g\"","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=10.30.226.201"],"Received":["from sin.lore.kernel.org (sin.lore.kernel.org\n [IPv6:2600:3c15:e001:75::12fc:5321])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4gBKyF3Yqjz1y04\n\tfor <incoming@patchwork.ozlabs.org>; Fri, 08 May 2026 04:09:09 +1000 (AEST)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby sin.lore.kernel.org (Postfix) with ESMTP id D73A0300EC4C\n\tfor <incoming@patchwork.ozlabs.org>; Thu,  7 May 2026 18:08:52 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id 6B39045BD7F;\n\tThu,  7 May 2026 18:08:51 +0000 (UTC)","from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org\n [10.30.226.201])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id 295B64534AC;\n\tThu,  7 May 2026 18:08:50 +0000 (UTC)","by smtp.kernel.org (Postfix) with ESMTPSA id 2FFF6C2BCB2;\n\tThu,  7 May 2026 18:08:40 +0000 (UTC)"],"ARC-Seal":"i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1778177331; cv=none;\n b=MSBbPgFIJzJRXnwSFzlQrhdUXQGGsvtHPbvkyJzMGhEnlHlhrmaOiVO2SiRwvwvPLoRz+uQIqjqSV4mukK4cBwGxpeMVSBEPW+zSyYRoE2ptMlHu8Nd3jc/mTBW7aM6mr0YjjlugpSi2ItFGcnPAsnxQCJCdR4J1aXA3reve1XE=","ARC-Message-Signature":"i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1778177331; c=relaxed/simple;\n\tbh=UsDR5qi5bqk2X9YkW3TzvwcnSyAyyGEZbjSHDw1G98U=;\n\th=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References:\n\t MIME-Version:Content-Type;\n b=UVCu8DiUo1769vKsg/6bBG+UOQGVSiVUAElFpBNwAcEMT+8+iEj2B73Ju+dU3AcsxEqGrMnlG0nI8yLs/e3gVLiwZ/1Nz2sMadnmda0SlpGf8sbLcGyhq9i5RI8B/X67RjzfgIVeAnBJ9DbydXK++rN57npqoeW9Z3JeFzFAhYo=","ARC-Authentication-Results":"i=1; smtp.subspace.kernel.org;\n dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org\n header.b=Yx1oTv3g; arc=none smtp.client-ip=10.30.226.201","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;\n\ts=k20201202; t=1778177330;\n\tbh=UsDR5qi5bqk2X9YkW3TzvwcnSyAyyGEZbjSHDw1G98U=;\n\th=Date:From:To:Cc:Subject:In-Reply-To:References:From;\n\tb=Yx1oTv3gPz/fefXxL08tGjK2871zFbV1PkN1h51ha3YC9p3GTdPOQNEu8iYICHruI\n\t l6cQmc/bzSt4Rz0m6+296qQRg202CazP9R+Zhj+PXs/xoPIKhZC5pzpMUAEMQDrk6A\n\t 9o1al2JdgJCi1MycbBqzNtsp0JBMzXvShy9ml1oYk8P9juBIj1Yeox9EqgUT+NVYwF\n\t /MNdNw1kj5t35YxPavh3kToP7BHmR/lb/SyAOMdba8EII4BrXeYEyGmjuco7uu8/9S\n\t wFsWxsFeMMu44TuKxv822e6gQY2HKzcE9VaiiJHIJuYSw2M4D25m5tsiKzPJG41Qas\n\t vT2L5IBU/cenw==","Date":"Thu, 7 May 2026 19:08:36 +0100","From":"Jonathan Cameron <jic23@kernel.org>","To":"Terry Bowman <terry.bowman@amd.com>","Cc":"<dave@stgolabs.net>, <dave.jiang@intel.com>,\n <alison.schofield@intel.com>, <djbw@kernel.org>, <bhelgaas@google.com>,\n <shiju.jose@huawei.com>, <ming.li@zohomail.com>,\n <Smita.KoralahalliChannabasappa@amd.com>, <rrichter@amd.com>,\n <dan.carpenter@linaro.org>, <PradeepVineshReddy.Kodamati@amd.com>,\n <lukas@wunner.de>, <Benjamin.Cheatham@amd.com>,\n <sathyanarayanan.kuppuswamy@linux.intel.com>, <vishal.l.verma@intel.com>,\n <alucerop@amd.com>, <ira.weiny@intel.com>, <corbet@lwn.net>,\n <rafael@kernel.org>, <xueshuai@linux.alibaba.com>,\n <linux-cxl@vger.kernel.org>, <linux-kernel@vger.kernel.org>,\n <linux-pci@vger.kernel.org>, <linux-acpi@vger.kernel.org>,\n <linux-doc@vger.kernel.org>, Mauro Carvalho Chehab <mchehab@kernel.org>","Subject":"Re: [PATCH v17 02/11] cxl/ras: Unify Endpoint and Port AER trace\n events","Message-ID":"<20260507190836.70197e24@jic23-huawei>","In-Reply-To":"<20260505173029.2718246-3-terry.bowman@amd.com>","References":"<20260505173029.2718246-1-terry.bowman@amd.com>\n\t<20260505173029.2718246-3-terry.bowman@amd.com>","X-Mailer":"Claws Mail 4.4.0 (GTK 3.24.52; x86_64-pc-linux-gnu)","Precedence":"bulk","X-Mailing-List":"linux-pci@vger.kernel.org","List-Id":"<linux-pci.vger.kernel.org>","List-Subscribe":"<mailto:linux-pci+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:linux-pci+unsubscribe@vger.kernel.org>","MIME-Version":"1.0","Content-Type":"text/plain; charset=US-ASCII","Content-Transfer-Encoding":"7bit"}},{"id":3687990,"web_url":"http://patchwork.ozlabs.org/comment/3687990/","msgid":"<8913c666-a343-4717-8ab2-0b8546d1bdfb@amd.com>","list_archive_url":null,"date":"2026-05-07T18:33:45","subject":"Re: [PATCH v17 02/11] cxl/ras: Unify Endpoint and Port AER trace\n events","submitter":{"id":82124,"url":"http://patchwork.ozlabs.org/api/people/82124/","name":"Bowman, Terry","email":"Terry.Bowman@amd.com"},"content":"On 5/7/2026 1:08 PM, Jonathan Cameron wrote:\n> [Some people who received this message don't often get email from jic23@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]\n> \n> On Tue, 5 May 2026 12:30:20 -0500\n> Terry Bowman <terry.bowman@amd.com> wrote:\n> \n>> From: Dan Williams <djbw@kernel.org>\n> \n> +CC Mauro - rasdaemon related - see below.\n> \n>>\n>> CXL protocol error logging uses two parallel sets of trace events. The\n>> cxl_port_aer_correctable_error() and cxl_port_aer_uncorrectable_error()\n>> events are used by CPER for CXL Port devices. The cxl_aer_correctable_error()\n>> and cxl_aer_uncorrectable_error() events are used for CXL Endpoints. Update\n>> the trace routines to use the latter for all CXL devices on both the CPER\n>> and native AER paths.\n>>\n>> Generalize cxl_aer_correctable_error()/cxl_aer_uncorrectable_error to\n>> take a struct device * and a u64 serial argument supplied by the caller.\n>> cxl_handle_ras() and cxl_handle_cor_ras() gain the new u64 serial parameter,\n>> sourced from pci_get_dsn().\n>>\n>> The CPER path keeps its existing Port-vs-Endpoint dispatch and passes the\n>> new arguments to the unified trace events. The CPER path will be folded\n>> together in a following patch.\n>>\n>> Remove the now-unused cxl_port_aer_correctable_error() and\n>> cxl_port_aer_uncorrectable_error().\n>>\n>> **WARNING: ABI BREAK**\n>> Rename the trace event field \"memdev\" to \"device\" so all CXL device types\n>> (Ports and Endpoints) can be reported under a common field name. Note this\n>> is an ABI break for userspace tools that key off the old \"memdev\" field.\n>> Specifically, rasdaemon's ras-cxl-handler.c looks up \"memdev\" and bails on\n>> NULL, so an unmodified rasdaemon will drop every CXL CE/UCE event once this\n>> kernel ships. A rasdaemon update is needed in a separate series.\n>>\n>> The need for the field rename was discussed in v16 review [1].\n> \n> This concerns me (sorry I wasn't paying attention to the v16 thread).\n> It is a userspace regression against code that is out in the wild and typically\n> not updated in sync with the kernel.\n> \n> If you are suggesting breaking ras-daemon at the very least +CC the maintainer.\n> \n> To get to a unified tracepoint add a new one that does what you want, but\n> maintain the existing ones as well.  Userspace can then migrate and maybe\n> in 5+ years time we can delete the non unified ones.\n> \n> No actually comments on the code, just left it all here for Mauro,\n> \n> Thanks,\n> \n> Jonathan\n> \n\nDan was clear about using a single set of CE and UE handlers for all CXL RAS \nprotocol errors. While I understand there may be concerns, please direct any \nobjections to Dan and clarify what changes are required to avoid this \nrepeatedly going back and forth.\n\n[1] https://lore.kernel.org/linux-cxl/69cb2d5ba3111_178904100b7@dwillia2-mobl4.notmuch/\n\n- Terry\n\n\n>>\n>> Also, for CXL Upstream Switch Port (USP) and Endpoint (EP) fatal UCE,\n>> the cxl_aer_uncorrectable_error trace event is not emitted. The AER core\n>> only retrieves PCI_ERR_UNCOR_STATUS for Root Ports, RCECs, and Downstream\n>> Ports, or for non-fatal severities. PCI config reads to the source device\n>> are expected to fail otherwise, so the AER core never reads the status\n>> word, is_cxl_error() does not classify the event as CXL, and the AER path\n>> handles it instead. In this case the AER handler consumes the event and\n>> logs it as an AER error without calling the CXL RAS handlers or trace\n>> logging.\n>>\n>> Before this patch, Endpoint and Port devices emitted different events:\n>>\n>>   # Endpoint (cxl_aer_*):\n>>   cxl_aer_correctable_error: memdev=mem0 host=0000:0c:00.0 serial=0: status: 'CRC Threshold Hit'\n>>   cxl_aer_uncorrectable_error: memdev=mem0 host=0000:0c:00.0 serial=0: status: 'Cache Data ECC Error | Memory Data ECC Error' first_error: 'Cache Data ECC Error'\n>>\n>>   # Port (cxl_port_aer_*, no serial field):\n>>   cxl_port_aer_correctable_error: device=0000:0c:00.0 host=pci0000:0c status='CRC Threshold Hit'\n>>   cxl_port_aer_uncorrectable_error: device=0000:0c:00.0 host=pci0000:0c status: 'Cache Data ECC Error | Memory Data ECC Error' first_error: 'Cache Data ECC Error'\n>>\n>> After this patch, all CXL devices emit the unified cxl_aer_* events\n>> with the same field layout:\n>>\n>>   cxl_aer_correctable_error: device=0000:0c:00.0 host=pci0000:0c serial=0 status: 'CRC Threshold Hit'\n>>   cxl_aer_uncorrectable_error: device=0000:0c:00.0 host=pci0000:0c serial=0 status: 'Cache Data ECC Error | Memory Data ECC Error' first_error: 'Cache Data ECC Error'\n>>\n>> [1] https://lore.kernel.org/linux-cxl/69cb2d5ba3111_178904100b7@dwillia2-mobl4.notmuch/\n>>\n>> Co-developed-by: Terry Bowman <terry.bowman@amd.com>\n>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>\n>> Signed-off-by: Dan Williams <djbw@kernel.org>\n>>\n>> ---\n>>\n>> Changes in v16->v17:\n>> - Replace cxlds->serial with pci_get_dsn()\n>> - Change 'memdev' to 'device' (Dan)\n>> - Updated Commit message\n>>\n>> Changes in v15->v16:\n>> - Add Dan's review-by\n>> - Incorporate Dan's comment into commit message:\n>> \"Add the serial number at the end to preserve compatibility with\n>> libtraceevent parsing of the parameters.\"\n>>\n>> Changes in v14->v15:\n>> - Update commit message.\n>> - Moved cxl_handle_ras/cxl_handle_cor_ras() changes to future patch (terry)\n>>\n>> Changes in v13->v14:\n>> - Update commit headline (Bjorn)\n>>\n>> Changes in v12->v13:\n>> - Added Dave Jiang's review-by\n>>\n>> Changes in v11 -> v12:\n>> - Correct parameters to call trace_cxl_aer_correctable_error()\n>> - Add reviewed-by for Jonathan and Shiju\n>>\n>> Changes in v10->v11:\n>> - Updated CE and UCE trace routines to maintain consistent TP_Struct ABI\n>> and unchanged TP_printk() logging.\n>> ---\n>>  drivers/cxl/core/core.h    | 11 ++++--\n>>  drivers/cxl/core/ras.c     | 39 +++++++++++--------\n>>  drivers/cxl/core/ras_rch.c |  6 ++-\n>>  drivers/cxl/core/trace.h   | 76 ++++++++------------------------------\n>>  4 files changed, 49 insertions(+), 83 deletions(-)\n>>\n>> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h\n>> index 82ca3a476708..132ac9c1ebf4 100644\n>> --- a/drivers/cxl/core/core.h\n>> +++ b/drivers/cxl/core/core.h\n>> @@ -183,8 +183,9 @@ static inline struct device *dport_to_host(struct cxl_dport *dport)\n>>  #ifdef CONFIG_CXL_RAS\n>>  int cxl_ras_init(void);\n>>  void cxl_ras_exit(void);\n>> -bool cxl_handle_ras(struct device *dev, void __iomem *ras_base);\n>> -void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base);\n>> +bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base);\n>> +void cxl_handle_cor_ras(struct device *dev, u64 serial,\n>> +                     void __iomem *ras_base);\n>>  void cxl_dport_map_rch_aer(struct cxl_dport *dport);\n>>  void cxl_disable_rch_root_ints(struct cxl_dport *dport);\n>>  void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds);\n>> @@ -195,11 +196,13 @@ static inline int cxl_ras_init(void)\n>>       return 0;\n>>  }\n>>  static inline void cxl_ras_exit(void) { }\n>> -static inline bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)\n>> +static inline bool cxl_handle_ras(struct device *dev, u64 serial,\n>> +                               void __iomem *ras_base)\n>>  {\n>>       return false;\n>>  }\n>> -static inline void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base) { }\n>> +static inline void cxl_handle_cor_ras(struct device *dev, u64 serial,\n>> +                                   void __iomem *ras_base) { }\n>>  static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { }\n>>  static inline void cxl_disable_rch_root_ints(struct cxl_dport *dport) { }\n>>  static inline void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }\n>> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c\n>> index 006c6ffc2f56..d7081caaf5d3 100644\n>> --- a/drivers/cxl/core/ras.c\n>> +++ b/drivers/cxl/core/ras.c\n>> @@ -13,7 +13,7 @@ static void cxl_cper_trace_corr_port_prot_err(struct pci_dev *pdev,\n>>  {\n>>       u32 status = ras_cap.cor_status & ~ras_cap.cor_mask;\n>>\n>> -     trace_cxl_port_aer_correctable_error(&pdev->dev, status);\n>> +     trace_cxl_aer_correctable_error(&pdev->dev, status, pci_get_dsn(pdev));\n>>  }\n>>\n>>  static void cxl_cper_trace_uncorr_port_prot_err(struct pci_dev *pdev,\n>> @@ -28,20 +28,24 @@ static void cxl_cper_trace_uncorr_port_prot_err(struct pci_dev *pdev,\n>>       else\n>>               fe = status;\n>>\n>> -     trace_cxl_port_aer_uncorrectable_error(&pdev->dev, status, fe,\n>> -                                            ras_cap.header_log);\n>> +     trace_cxl_aer_uncorrectable_error(&pdev->dev, status, fe,\n>> +                                       ras_cap.header_log,\n>> +                                       pci_get_dsn(pdev));\n>>  }\n>>\n>> -static void cxl_cper_trace_corr_prot_err(struct cxl_memdev *cxlmd,\n>> +static void cxl_cper_trace_corr_prot_err(struct pci_dev *pdev,\n>> +                                      struct cxl_memdev *cxlmd,\n>>                                        struct cxl_ras_capability_regs ras_cap)\n>>  {\n>>       u32 status = ras_cap.cor_status & ~ras_cap.cor_mask;\n>>\n>> -     trace_cxl_aer_correctable_error(cxlmd, status);\n>> +     trace_cxl_aer_correctable_error(&cxlmd->dev, status,\n>> +                                     pci_get_dsn(pdev));\n>>  }\n>>\n>>  static void\n>> -cxl_cper_trace_uncorr_prot_err(struct cxl_memdev *cxlmd,\n>> +cxl_cper_trace_uncorr_prot_err(struct pci_dev *pdev,\n>> +                            struct cxl_memdev *cxlmd,\n>>                              struct cxl_ras_capability_regs ras_cap)\n>>  {\n>>       u32 status = ras_cap.uncor_status & ~ras_cap.uncor_mask;\n>> @@ -53,8 +57,9 @@ cxl_cper_trace_uncorr_prot_err(struct cxl_memdev *cxlmd,\n>>       else\n>>               fe = status;\n>>\n>> -     trace_cxl_aer_uncorrectable_error(cxlmd, status, fe,\n>> -                                       ras_cap.header_log);\n>> +     trace_cxl_aer_uncorrectable_error(&cxlmd->dev, status, fe,\n>> +                                       ras_cap.header_log,\n>> +                                       pci_get_dsn(pdev));\n>>  }\n>>\n>>  static int match_memdev_by_parent(struct device *dev, const void *uport)\n>> @@ -101,9 +106,9 @@ void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data)\n>>\n>>       cxlmd = to_cxl_memdev(mem_dev);\n>>       if (data->severity == AER_CORRECTABLE)\n>> -             cxl_cper_trace_corr_prot_err(cxlmd, data->ras_cap);\n>> +             cxl_cper_trace_corr_prot_err(pdev, cxlmd, data->ras_cap);\n>>       else\n>> -             cxl_cper_trace_uncorr_prot_err(cxlmd, data->ras_cap);\n>> +             cxl_cper_trace_uncorr_prot_err(pdev, cxlmd, data->ras_cap);\n>>  }\n>>  EXPORT_SYMBOL_GPL(cxl_cper_handle_prot_err);\n>>\n>> @@ -183,7 +188,7 @@ void devm_cxl_port_ras_setup(struct cxl_port *port)\n>>  }\n>>  EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, \"CXL\");\n>>\n>> -void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)\n>> +void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)\n>>  {\n>>       void __iomem *addr;\n>>       u32 status;\n>> @@ -195,7 +200,7 @@ void cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base)\n>>       status = readl(addr);\n>>       if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {\n>>               writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);\n>> -             trace_cxl_aer_correctable_error(to_cxl_memdev(dev), status);\n>> +             trace_cxl_aer_correctable_error(dev, status, serial);\n>>       }\n>>  }\n>>\n>> @@ -220,7 +225,7 @@ static void header_log_copy(void __iomem *ras_base, u32 *log)\n>>   * Log the state of the RAS status registers and prepare them to log the\n>>   * next error status. Return 1 if reset needed.\n>>   */\n>> -bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)\n>> +bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base)\n>>  {\n>>       u32 hl[CXL_HEADERLOG_SIZE_U32];\n>>       void __iomem *addr;\n>> @@ -247,7 +252,7 @@ bool cxl_handle_ras(struct device *dev, void __iomem *ras_base)\n>>       }\n>>\n>>       header_log_copy(ras_base, hl);\n>> -     trace_cxl_aer_uncorrectable_error(to_cxl_memdev(dev), status, fe, hl);\n>> +     trace_cxl_aer_uncorrectable_error(dev, status, fe, hl, serial);\n>>       writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);\n>>\n>>       return true;\n>> @@ -270,7 +275,8 @@ void cxl_cor_error_detected(struct pci_dev *pdev)\n>>               if (cxlds->rcd)\n>>                       cxl_handle_rdport_errors(cxlds);\n>>\n>> -             cxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlmd->endpoint->regs.ras);\n>> +             cxl_handle_cor_ras(&cxlds->cxlmd->dev, pci_get_dsn(pdev),\n>> +                                cxlmd->endpoint->regs.ras);\n>>       }\n>>  }\n>>  EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, \"CXL\");\n>> @@ -299,7 +305,8 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,\n>>                * chance the situation is recoverable dump the status of the RAS\n>>                * capability registers and bounce the active state of the memdev.\n>>                */\n>> -             ue = cxl_handle_ras(&cxlds->cxlmd->dev, cxlmd->endpoint->regs.ras);\n>> +             ue = cxl_handle_ras(&cxlds->cxlmd->dev, pci_get_dsn(pdev),\n>> +                                 cxlmd->endpoint->regs.ras);\n>>       }\n>>\n>>       switch (state) {\n>> diff --git a/drivers/cxl/core/ras_rch.c b/drivers/cxl/core/ras_rch.c\n>> index 0a8b3b9b6388..61835fbafc0f 100644\n>> --- a/drivers/cxl/core/ras_rch.c\n>> +++ b/drivers/cxl/core/ras_rch.c\n>> @@ -115,7 +115,9 @@ void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)\n>>\n>>       pci_print_aer(pdev, severity, &aer_regs);\n>>       if (severity == AER_CORRECTABLE)\n>> -             cxl_handle_cor_ras(&cxlds->cxlmd->dev, dport->regs.ras);\n>> +             cxl_handle_cor_ras(&cxlds->cxlmd->dev, pci_get_dsn(pdev),\n>> +                                dport->regs.ras);\n>>       else\n>> -             cxl_handle_ras(&cxlds->cxlmd->dev, dport->regs.ras);\n>> +             cxl_handle_ras(&cxlds->cxlmd->dev, pci_get_dsn(pdev),\n>> +                            dport->regs.ras);\n>>  }\n>> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h\n>> index a972e4ef1936..6f3957b3c3af 100644\n>> --- a/drivers/cxl/core/trace.h\n>> +++ b/drivers/cxl/core/trace.h\n>> @@ -48,49 +48,22 @@\n>>       { CXL_RAS_UC_IDE_RX_ERR, \"IDE Rx Error\" }                         \\\n>>  )\n>>\n>> -TRACE_EVENT(cxl_port_aer_uncorrectable_error,\n>> -     TP_PROTO(struct device *dev, u32 status, u32 fe, u32 *hl),\n>> -     TP_ARGS(dev, status, fe, hl),\n>> +TRACE_EVENT(cxl_aer_uncorrectable_error,\n>> +     TP_PROTO(const struct device *dev, u32 status, u32 fe, u32 *hl,\n>> +              u64 serial),\n>> +     TP_ARGS(dev, status, fe, hl, serial),\n>>       TP_STRUCT__entry(\n>>               __string(device, dev_name(dev))\n>>               __string(host, dev_name(dev->parent))\n>> -             __field(u32, status)\n>> -             __field(u32, first_error)\n>> -             __array(u32, header_log, CXL_HEADERLOG_SIZE_U32)\n>> -     ),\n>> -     TP_fast_assign(\n>> -             __assign_str(device);\n>> -             __assign_str(host);\n>> -             __entry->status = status;\n>> -             __entry->first_error = fe;\n>> -             /*\n>> -              * Embed the 512B headerlog data for user app retrieval and\n>> -              * parsing, but no need to print this in the trace buffer.\n>> -              */\n>> -             memcpy(__entry->header_log, hl, CXL_HEADERLOG_SIZE);\n>> -     ),\n>> -     TP_printk(\"device=%s host=%s status: '%s' first_error: '%s'\",\n>> -               __get_str(device), __get_str(host),\n>> -               show_uc_errs(__entry->status),\n>> -               show_uc_errs(__entry->first_error)\n>> -     )\n>> -);\n>> -\n>> -TRACE_EVENT(cxl_aer_uncorrectable_error,\n>> -     TP_PROTO(const struct cxl_memdev *cxlmd, u32 status, u32 fe, u32 *hl),\n>> -     TP_ARGS(cxlmd, status, fe, hl),\n>> -     TP_STRUCT__entry(\n>> -             __string(memdev, dev_name(&cxlmd->dev))\n>> -             __string(host, dev_name(cxlmd->dev.parent))\n>>               __field(u64, serial)\n>>               __field(u32, status)\n>>               __field(u32, first_error)\n>>               __array(u32, header_log, CXL_HEADERLOG_SIZE_U32)\n>>       ),\n>>       TP_fast_assign(\n>> -             __assign_str(memdev);\n>> +             __assign_str(device);\n>>               __assign_str(host);\n>> -             __entry->serial = cxlmd->cxlds->serial;\n>> +             __entry->serial = serial;\n>>               __entry->status = status;\n>>               __entry->first_error = fe;\n>>               /*\n>> @@ -99,8 +72,8 @@ TRACE_EVENT(cxl_aer_uncorrectable_error,\n>>                */\n>>               memcpy(__entry->header_log, hl, CXL_HEADERLOG_SIZE);\n>>       ),\n>> -     TP_printk(\"memdev=%s host=%s serial=%lld: status: '%s' first_error: '%s'\",\n>> -               __get_str(memdev), __get_str(host), __entry->serial,\n>> +     TP_printk(\"device=%s host=%s serial=%lld status: '%s' first_error: '%s'\",\n>> +               __get_str(device), __get_str(host), __entry->serial,\n>>                 show_uc_errs(__entry->status),\n>>                 show_uc_errs(__entry->first_error)\n>>       )\n>> @@ -124,42 +97,23 @@ TRACE_EVENT(cxl_aer_uncorrectable_error,\n>>       { CXL_RAS_CE_PHYS_LAYER_ERR, \"Received Error From Physical Layer\" }     \\\n>>  )\n>>\n>> -TRACE_EVENT(cxl_port_aer_correctable_error,\n>> -     TP_PROTO(struct device *dev, u32 status),\n>> -     TP_ARGS(dev, status),\n>> +TRACE_EVENT(cxl_aer_correctable_error,\n>> +     TP_PROTO(const struct device *dev, u32 status, u64 serial),\n>> +     TP_ARGS(dev, status, serial),\n>>       TP_STRUCT__entry(\n>>               __string(device, dev_name(dev))\n>>               __string(host, dev_name(dev->parent))\n>> -             __field(u32, status)\n>> -     ),\n>> -     TP_fast_assign(\n>> -             __assign_str(device);\n>> -             __assign_str(host);\n>> -             __entry->status = status;\n>> -     ),\n>> -     TP_printk(\"device=%s host=%s status='%s'\",\n>> -               __get_str(device), __get_str(host),\n>> -               show_ce_errs(__entry->status)\n>> -     )\n>> -);\n>> -\n>> -TRACE_EVENT(cxl_aer_correctable_error,\n>> -     TP_PROTO(const struct cxl_memdev *cxlmd, u32 status),\n>> -     TP_ARGS(cxlmd, status),\n>> -     TP_STRUCT__entry(\n>> -             __string(memdev, dev_name(&cxlmd->dev))\n>> -             __string(host, dev_name(cxlmd->dev.parent))\n>>               __field(u64, serial)\n>>               __field(u32, status)\n>>       ),\n>>       TP_fast_assign(\n>> -             __assign_str(memdev);\n>> +             __assign_str(device);\n>>               __assign_str(host);\n>> -             __entry->serial = cxlmd->cxlds->serial;\n>> +             __entry->serial = serial;\n>>               __entry->status = status;\n>>       ),\n>> -     TP_printk(\"memdev=%s host=%s serial=%lld: status: '%s'\",\n>> -               __get_str(memdev), __get_str(host), __entry->serial,\n>> +     TP_printk(\"device=%s host=%s serial=%lld status: '%s'\",\n>> +               __get_str(device), __get_str(host), __entry->serial,\n>>                 show_ce_errs(__entry->status)\n>>       )\n>>  );\n>","headers":{"Return-Path":"\n <linux-pci+bounces-54118-incoming=patchwork.ozlabs.org@vger.kernel.org>","X-Original-To":["incoming@patchwork.ozlabs.org","linux-pci@vger.kernel.org"],"Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=amd.com header.i=@amd.com header.a=rsa-sha256\n header.s=selector1 header.b=Q6MN/bij;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=2600:3c0a:e001:db::12fc:5321; helo=sea.lore.kernel.org;\n envelope-from=linux-pci+bounces-54118-incoming=patchwork.ozlabs.org@vger.kernel.org;\n receiver=patchwork.ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com\n header.b=\"Q6MN/bij\"","smtp.subspace.kernel.org;\n arc=fail smtp.client-ip=52.101.201.5","smtp.subspace.kernel.org;\n dmarc=pass (p=quarantine dis=none) header.from=amd.com","smtp.subspace.kernel.org;\n spf=fail smtp.mailfrom=amd.com","dkim=none (message not signed)\n header.d=none;dmarc=none action=none header.from=amd.com;"],"Received":["from sea.lore.kernel.org (sea.lore.kernel.org\n [IPv6:2600:3c0a:e001:db::12fc:5321])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4gBLW756BHz1yKd\n\tfor <incoming@patchwork.ozlabs.org>; Fri, 08 May 2026 04:34:11 +1000 (AEST)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby sea.lore.kernel.org (Postfix) with ESMTP id B7D94302AE20\n\tfor <incoming@patchwork.ozlabs.org>; Thu,  7 May 2026 18:33:56 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id 82F243B27D0;\n\tThu,  7 May 2026 18:33:55 +0000 (UTC)","from PH7PR06CU001.outbound.protection.outlook.com\n (mail-westus3azon11010005.outbound.protection.outlook.com [52.101.201.5])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id 879912BCF4C;\n\tThu,  7 May 2026 18:33:53 +0000 (UTC)","from CH8PR12MB9766.namprd12.prod.outlook.com (2603:10b6:610:2b6::10)\n by BY5PR12MB4033.namprd12.prod.outlook.com (2603:10b6:a03:213::22) with\n Microsoft SMTP Server (version=TLS1_2,\n cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9891.16; Thu, 7 May\n 2026 18:33:49 +0000","from CH8PR12MB9766.namprd12.prod.outlook.com\n ([fe80::be0f:431f:5f27:96d9]) by CH8PR12MB9766.namprd12.prod.outlook.com\n ([fe80::be0f:431f:5f27:96d9%3]) with mapi id 15.20.9891.008; Thu, 7 May 2026\n 18:33:49 +0000"],"ARC-Seal":["i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1778178835; cv=fail;\n b=XhUhzTkoxBuGEhvZ0Fs+Fhk0DE1wFnYC5Q1WFIPONGoSOANygw4wfOxAdLEgeMMM0Ze8F/14zLPVuhKKcmVJSh0vns813wkcLYLQcsA+LJrhOW8ZFOe1gucsapACs6pCCvoN179U0mqvRz9a3HN+tbVa1VvjsaAtbSp/+YWXoM8=","i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;\n b=SWGjKV8xqqQJMv+E7l3TzHr1gYGdu9aBMjWfYnsmItMVDUpqdOv1nCxcmFgCSBCMR7blbPmRjvs9ALhJjJ/t+wd+JKkwozJlccMu60/P9tPyUVSZjkgay0GVFhzLzb64/4VikwjfSuwUkolrOWhR4lz77HjBin9/1eBZ+3H+2eiuyEWpyfSXafdGVgSRtr81MeTOnkPJVQ9aYWrPJ+hFLDfIpy+3Uf4jlVVknJlsp/98E9A50FD7d3pBzStgJ33ip+5vm8uDhwhzH1B039DFRjo2OBoDqNTqD/NvdOJezlGJsIciwXpBSHKo6jbS3yXnjKtUsSglNE9IaZfVii+5iw=="],"ARC-Message-Signature":["i=2; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1778178835; c=relaxed/simple;\n\tbh=BOw+KOdP+/IooSnAxxNUL/2tkQN8oJLCr2Wcp5JGMeY=;\n\th=Message-ID:Date:Subject:To:Cc:References:From:In-Reply-To:\n\t Content-Type:MIME-Version;\n b=s3Escx0PJefZxs4l1vE2Yh4DyX3U38YgRPRaH+CGe42wMBufmDdP0PPK5juHt27IlA0e72n5uC3d0I8GhgngqmhO/ZQ9zv9hmdbbE/QDcmoswQSNrw3Mga8Hu//FiY6rxvGbyWcGm4n2bBceNSN70lwx9zKhqt1zXyQBdt2pRVM=","i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;\n s=arcselector10001;\n h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;\n bh=b6zEJ3lJbeZs5gj1g8CVwzED4BGUBnr9Tspd8AEHJOY=;\n b=n0kCWKoKaY4jDQTPPcxlw1LqBZ8Vqchet+Wr7EPLNJHSTDVfwtg4RVou3wI4cySLjQFgKyQ5hWroGMPUxAvEK2LxG65iaz1XO/HoYHlfw1JBhebZpWn+5Ha7vc1s9XXgFrejvI513xpg89a/6FPKQBFrtsfXd0l+TKfQaYH1wNc/TZLazmMceyxsdn0n+MoMg39TAkwLkR8aNwt9EyxmJEgArvZS2SmdIntx1SANgAn5+6ChwXApqCWPPqMj7mlN2YBRKDYMk1fFobwgp4un5YjAcLeybJMDC+D5QmPh8rWt2y6nfHTuriwAco2vsp4e88sFA0XEbxHIGz5Dv7QFGQ=="],"ARC-Authentication-Results":["i=2; smtp.subspace.kernel.org;\n dmarc=pass (p=quarantine dis=none) header.from=amd.com;\n spf=fail smtp.mailfrom=amd.com;\n dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com\n header.b=Q6MN/bij; arc=fail smtp.client-ip=52.101.201.5","i=1; mx.microsoft.com 1; spf=pass\n smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass\n header.d=amd.com; arc=none"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1;\n h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;\n bh=b6zEJ3lJbeZs5gj1g8CVwzED4BGUBnr9Tspd8AEHJOY=;\n b=Q6MN/bijgex95l30Ov+omzuj8Mt3bUOLDCrEovzXYPPj2PljPqEe2SJRlkF4vGYpoIR/k3yQI5R27/zso70+23knJtHF2qxeOQs/v4HRWAK5tc8Mzuk40IoRKMYQgrSkxzyKNerPUtLPWrvfjzR87q3zYoWZA1yx07yuSNHNbmc=","Message-ID":"<8913c666-a343-4717-8ab2-0b8546d1bdfb@amd.com>","Date":"Thu, 7 May 2026 13:33:45 -0500","User-Agent":"Mozilla Thunderbird","Subject":"Re: [PATCH v17 02/11] cxl/ras: Unify Endpoint and Port AER trace\n events","To":"Jonathan Cameron <jic23@kernel.org>","Cc":"dave@stgolabs.net, dave.jiang@intel.com, alison.schofield@intel.com,\n djbw@kernel.org, bhelgaas@google.com, shiju.jose@huawei.com,\n ming.li@zohomail.com, Smita.KoralahalliChannabasappa@amd.com,\n rrichter@amd.com, dan.carpenter@linaro.org,\n PradeepVineshReddy.Kodamati@amd.com, lukas@wunner.de,\n Benjamin.Cheatham@amd.com, sathyanarayanan.kuppuswamy@linux.intel.com,\n vishal.l.verma@intel.com, alucerop@amd.com, ira.weiny@intel.com,\n corbet@lwn.net, rafael@kernel.org, xueshuai@linux.alibaba.com,\n linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org,\n linux-pci@vger.kernel.org, linux-acpi@vger.kernel.org,\n linux-doc@vger.kernel.org, Mauro Carvalho Chehab <mchehab@kernel.org>","References":"<20260505173029.2718246-1-terry.bowman@amd.com>\n <20260505173029.2718246-3-terry.bowman@amd.com>\n <20260507190836.70197e24@jic23-huawei>","Content-Language":"en-US","From":"\"Bowman, Terry\" <terry.bowman@amd.com>","In-Reply-To":"<20260507190836.70197e24@jic23-huawei>","Content-Type":"text/plain; charset=UTF-8","Content-Transfer-Encoding":"7bit","X-ClientProxiedBy":"CH0PR08CA0014.namprd08.prod.outlook.com\n (2603:10b6:610:33::19) To CH8PR12MB9766.namprd12.prod.outlook.com\n (2603:10b6:610:2b6::10)","Precedence":"bulk","X-Mailing-List":"linux-pci@vger.kernel.org","List-Id":"<linux-pci.vger.kernel.org>","List-Subscribe":"<mailto:linux-pci+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:linux-pci+unsubscribe@vger.kernel.org>","MIME-Version":"1.0","X-MS-PublicTrafficType":"Email","X-MS-TrafficTypeDiagnostic":"CH8PR12MB9766:EE_|BY5PR12MB4033:EE_","X-MS-Office365-Filtering-Correlation-Id":"02863f24-379f-4244-97a7-08deac672e2a","X-MS-Exchange-SenderADCheck":"1","X-MS-Exchange-AntiSpam-Relay":"0","X-Microsoft-Antispam":"\n\tBCL:0;ARA:13230040|1800799024|376014|7416014|366016|13003099007|3023799003|18002099003|22082099003|56012099003;","X-Microsoft-Antispam-Message-Info":"\n\ticc3+kVOxrvUN87eatPXNN6MXRtyYonkUD3QMjmhBmys41AXs5N7kK39pKqXIFZPJjyiH/BCxHBYisilYfiT+sZzx9h5IrHXGTxjXzuxhgjzLfnr33S+qeKPryohrvKJlPJaglIGMl8/LwpgIOyHxdAS7reiHfbgVR5rg6QJwtsRZY5lsiKr3LOgFpK2esid1AZa/9hvjJVyy6Ks9JaR+0uUUyIo6MYv7pW3sTdy43e9pbPfdzkr04Ot7dPkOj2fL6sw9CVHln5Ne39XBuj2WiRyRXi26WeDp9vVCKJw/f7EDqrrnfTdXViZV3VqLp9+ZtSe7/J9AsTa4HRDu1oFUhFeMwAO0CJKmmJzZ164zcrRJHnNnGVQa+WurVQV+QvTgz9EBYXcy2eosB3N86H+sU7ZOE7is5u8c+yY7LV+Klg6ocATFs5kQPKBP5KJ1c+oS4un4d21Bu3hFpsQM1xi4iVdQIUrjbPBRetevroWpukBV/I/g8ed6zpJJ0QoYSCuixWjN45ouwn8emr7G8BCmlI3K3TyLGGf8FCZPV+4M4D0Eols/o3aHv4OpsfrHxnVWMkEbaLZziKwO1egGGCzPKCtruYIe1Cw1RWjddLP+ozaQJMOdeVsjZ5Bq8y0p/rFtp52QHyMpCpgYKd8ddxYIcLnh10YvNeMdsKJuIPcGMdsb42TrlXRP77WVrAepgdK5kkMLn/ncCYRYMIz8y4O0Q==","X-Forefront-Antispam-Report":"\n\tCIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CH8PR12MB9766.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(376014)(7416014)(366016)(13003099007)(3023799003)(18002099003)(22082099003)(56012099003);DIR:OUT;SFP:1101;","X-MS-Exchange-AntiSpam-MessageData-ChunkCount":"1","X-MS-Exchange-AntiSpam-MessageData-0":"=?utf-8?q?7HBTgpHkt0flIL6ndxMa6fAmOspV?=\n\t=?utf-8?q?Zq0tYobOQdbcYXVuykNc4DXz+tE3xtTrieCuwN8JAVUxy4VlNkXDwUdNczGKWGzA1?=\n\t=?utf-8?q?i/zTI4ql8r1RdRfjCZES79hCG2UQ3KjbkVDHyJff3QmoYZiSyN8EKKdgzj+BcaL44?=\n\t=?utf-8?q?o0hFQ6Q6LhQxg/5Wv0A4w7qBkk40IPcgGs0r4Lzzk+ctnbbfplmlifFf0HPfA+3hJ?=\n\t=?utf-8?q?aG47gKzIt8LVjKy7y4E/xlXRu8bC5C0bB3gLrsGNMb01FBbTE1WoUs8BcRS5BoUAa?=\n\t=?utf-8?q?QGQY3ieEGVmyj2AHTpNz/W/YYnDFG3eCk2nzdm46jeoEGtaQKHBkfV6yu3eU6QuR6?=\n\t=?utf-8?q?qDKL9lOwQV97eT7P/YH/ebsmSNLggXrd4ch9/1UVxdwEFR6eBcj+r4zNgd8Yc1XLT?=\n\t=?utf-8?q?QmVp6al+RvZtaVpYe7NKZQ13a5fQC7fHSOD5daiiCnbnDznoLDdxga979xaxNOUvY?=\n\t=?utf-8?q?omo27gw8xhyvf0Wm/RSHp4cdR6vtcimwbyXl38upidqwVydL6/BQ8HKf5Op+KFusu?=\n\t=?utf-8?q?B0dw0a8ObvsldU5KgXXG1808ACnB+6isXFG+4WSziGHg8UvnXPvTm1LDhBiVyoxz+?=\n\t=?utf-8?q?uUUnLdYkXV6muHDKw48Xj4vwOg4CGVn7CnDuG1nnWli/OBmYqYwuhg37XLDppJC6V?=\n\t=?utf-8?q?xHs55xnZXBA2O/heJM7vwmfSAABV7MPlzBSFfVbjPYIjRqzN+staPzSsDm+WBuwmi?=\n\t=?utf-8?q?KThtlsyTvVVvIzYtQyaOtVZrB5YjiWOuLW4ev57jwu8p6MQjoOJEMKWnpphvB6VZJ?=\n\t=?utf-8?q?yQWhjXG1MlO1uV8km9txiawauOf7UdGO0nZvF2sY+dO0Ip6ur9KkBUJExcbiU4O9y?=\n\t=?utf-8?q?Omws6NeOQAZMWlfH5MYma8aVPgE28dXKnJv7kHeMTfX7xqqkU+wiPpn33ZW7AqOfK?=\n\t=?utf-8?q?83D3zCfBtDmN1qwwX4+cUN/2YxA7MVZyJb7pc2srl+sO6WDlKSZ/sbpNoz01tsjTW?=\n\t=?utf-8?q?Z+Qs1KYi2H1FjuAopzy2t/GLqFvHr/XRZcJiBsIm6saVb9yO0NlMSa87sn/GIeCRs?=\n\t=?utf-8?q?PRta1bxuLtuLgdYRtdwGL5VmI0XWKeLTgFetMpatua/s0ecjTvIWN90KWJpepaXC2?=\n\t=?utf-8?q?03O4tlSo+IfS3ZI6Q/QcJxi+Q6QIYiQfdxW+drshXw8NKGlsPNIR8w3Zehxzucc8D?=\n\t=?utf-8?q?oOs1FNo/eoIUoCgxPtl0qgX6jdqUSDXs3Ou6lQFn0tEcrKQYGW9IxL9Q7fzp9I4dZ?=\n\t=?utf-8?q?IDIBC+LS6BIb19g1p4ScOCA+8+AH5QeIUMlLfre+TK+OzWuOTuxr3EWkLMF/yls6E?=\n\t=?utf-8?q?i+nMjCpPMJK4eBJYPELnCyKRNosXH+x6mH5/8sYR6he5FkzgE9ObmVQHyiQ5P0rv1?=\n\t=?utf-8?q?3xud2I84BIqcoIJ2eS1yXOpe74VN2ocG9iF227H9d9PxWYchw+foz1bYyZWCUUdaQ?=\n\t=?utf-8?q?Qk5DYebqWq10zT3LHu/LjTMJn2WglX4JCGwIUWbuwXb9m1xjOYP14Wz2nb8Xy23hz?=\n\t=?utf-8?q?HEJBRZwb5/rj5VWfDfPezZpViQwO8VihbE0IXIRoerh9Td5INl70cRfdL6nYQc4Q5?=\n\t=?utf-8?q?Z29tJqwz3nDkSk+0+U8/Oi+F6biXGH0aetCZVeIeZxj1oWtDhF45Jib361t4LHZbd?=\n\t=?utf-8?q?cikX3neG//A0Grt3lq/rzYyCoU6hjx2EGDJ4Ik42O4q9JZMpZ2nOJHg1NAmByK2nM?=\n\t=?utf-8?q?ThLEmfk4ri?=","X-OriginatorOrg":"amd.com","X-MS-Exchange-CrossTenant-Network-Message-Id":"\n 02863f24-379f-4244-97a7-08deac672e2a","X-MS-Exchange-CrossTenant-AuthSource":"CH8PR12MB9766.namprd12.prod.outlook.com","X-MS-Exchange-CrossTenant-AuthAs":"Internal","X-MS-Exchange-CrossTenant-OriginalArrivalTime":"07 May 2026 18:33:49.1121\n (UTC)","X-MS-Exchange-CrossTenant-FromEntityHeader":"Hosted","X-MS-Exchange-CrossTenant-Id":"3dd8961f-e488-4e60-8e11-a82d994e183d","X-MS-Exchange-CrossTenant-MailboxType":"HOSTED","X-MS-Exchange-CrossTenant-UserPrincipalName":"\n bJi0M/+1UZ3nhdx5TEcZazEMoiQJgZFemh3JOk3kYB7f5b659gYFCF8zPjrvWcC3wL2Whc+Ot2PiNbt0TFIZgQ==","X-MS-Exchange-Transport-CrossTenantHeadersStamped":"BY5PR12MB4033"}},{"id":3688451,"web_url":"http://patchwork.ozlabs.org/comment/3688451/","msgid":"<20260508150533.04e19cf9@jic23-huawei>","list_archive_url":null,"date":"2026-05-08T14:05:33","subject":"Re: [PATCH v17 02/11] cxl/ras: Unify Endpoint and Port AER trace\n events","submitter":{"id":10151,"url":"http://patchwork.ozlabs.org/api/people/10151/","name":"Jonathan Cameron","email":"jic23@kernel.org"},"content":"On Thu, 7 May 2026 13:33:45 -0500\n\"Bowman, Terry\" <terry.bowman@amd.com> wrote:\n\n> On 5/7/2026 1:08 PM, Jonathan Cameron wrote:\n> > [Some people who received this message don't often get email from jic23@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]\n> > \n> > On Tue, 5 May 2026 12:30:20 -0500\n> > Terry Bowman <terry.bowman@amd.com> wrote:\n> >   \n> >> From: Dan Williams <djbw@kernel.org>  \n> > \n> > +CC Mauro - rasdaemon related - see below.\n> >   \n> >>\n> >> CXL protocol error logging uses two parallel sets of trace events. The\n> >> cxl_port_aer_correctable_error() and cxl_port_aer_uncorrectable_error()\n> >> events are used by CPER for CXL Port devices. The cxl_aer_correctable_error()\n> >> and cxl_aer_uncorrectable_error() events are used for CXL Endpoints. Update\n> >> the trace routines to use the latter for all CXL devices on both the CPER\n> >> and native AER paths.\n> >>\n> >> Generalize cxl_aer_correctable_error()/cxl_aer_uncorrectable_error to\n> >> take a struct device * and a u64 serial argument supplied by the caller.\n> >> cxl_handle_ras() and cxl_handle_cor_ras() gain the new u64 serial parameter,\n> >> sourced from pci_get_dsn().\n> >>\n> >> The CPER path keeps its existing Port-vs-Endpoint dispatch and passes the\n> >> new arguments to the unified trace events. The CPER path will be folded\n> >> together in a following patch.\n> >>\n> >> Remove the now-unused cxl_port_aer_correctable_error() and\n> >> cxl_port_aer_uncorrectable_error().\n> >>\n> >> **WARNING: ABI BREAK**\n> >> Rename the trace event field \"memdev\" to \"device\" so all CXL device types\n> >> (Ports and Endpoints) can be reported under a common field name. Note this\n> >> is an ABI break for userspace tools that key off the old \"memdev\" field.\n> >> Specifically, rasdaemon's ras-cxl-handler.c looks up \"memdev\" and bails on\n> >> NULL, so an unmodified rasdaemon will drop every CXL CE/UCE event once this\n> >> kernel ships. A rasdaemon update is needed in a separate series.\n> >>\n> >> The need for the field rename was discussed in v16 review [1].  \n> > \n> > This concerns me (sorry I wasn't paying attention to the v16 thread).\n> > It is a userspace regression against code that is out in the wild and typically\n> > not updated in sync with the kernel.\n> > \n> > If you are suggesting breaking ras-daemon at the very least +CC the maintainer.\n> > \n> > To get to a unified tracepoint add a new one that does what you want, but\n> > maintain the existing ones as well.  Userspace can then migrate and maybe\n> > in 5+ years time we can delete the non unified ones.\n> > \n> > No actually comments on the code, just left it all here for Mauro,\n> > \n> > Thanks,\n> > \n> > Jonathan\n> >   \n> \n> Dan was clear about using a single set of CE and UE handlers for all CXL RAS \n> protocol errors. While I understand there may be concerns, please direct any \n> objections to Dan and clarify what changes are required to avoid this \n> repeatedly going back and forth.\n> \n> [1] https://lore.kernel.org/linux-cxl/69cb2d5ba3111_178904100b7@dwillia2-mobl4.notmuch/\n\nSure - Dan's on this thread so I'm sure he'll see it sooner or later.\n\nPerhaps I'm missing something that makes this less critical than it appears.\n\nYou can have a single set of handlers, but at the point of spitting the actual\ntracepoints out we need to keep spitting the old ones (+ possibly a new unified\none if you want to one day get rid of the separation.)  Bit fiddly but seems\nunlikely to be that bad.  e.g. put a wrapper where you currently have\ntrace_cxl_aer_uncorrectable_error() and have that omit the new and old (based\non device type) tracepoints.  Then when we eventually drop this after enough\nyears that we can be sure the new one is in use, the code cleanup is all in\none place.\n\nNote this issue isn't a \"maybe\" thing - we are talking userspace ABI breakage\nin an interface known to be in use in commonly used software that is not\ntypically updated in time with the kernel cadence.  There have been arguments\nthat some trace points are not 'stable' but that definitely isn't the case for\nthe RAS ones which are the main interface from kernel to userspace tooling.\n\nIn theory you could make such a change and maybe get away with it (on basis\na regression only exists if anyone notices) but you'd need ALL the distros\non board that ship rasdaemon + if you really don't want to end up reverting\nyou'd have to work closely with the hyperscalers who might decide to throw a\n'regression + revert' request at the list which either means a scramble to\nput in place what I describe above, or this series being reverted.  Note there\nare downstream forks of rasdaemon to content with as well.\n\nThat pain just isn't worth it.\n\nMauro, any idea if any distros scan for RAS tracepoints for compatibility breakage?\nThey probably should like they do of ioctls and similar but no idea if anyone\nactually does yet.  If they do we'd get the revert request pretty quickly...\nIf not we get to wait for some one to hit it in a functional test \n- thankfully RAS paths are definitely in those test sets but they tend to\nrun later and hence when a revert / fix is more painful.\n\nJonathan","headers":{"Return-Path":"\n <linux-pci+bounces-54252-incoming=patchwork.ozlabs.org@vger.kernel.org>","X-Original-To":["incoming@patchwork.ozlabs.org","linux-pci@vger.kernel.org"],"Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=kernel.org header.i=@kernel.org header.a=rsa-sha256\n header.s=k20201202 header.b=azCZp9bY;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=2600:3c0a:e001:db::12fc:5321; helo=sea.lore.kernel.org;\n envelope-from=linux-pci+bounces-54252-incoming=patchwork.ozlabs.org@vger.kernel.org;\n receiver=patchwork.ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org\n header.b=\"azCZp9bY\"","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=10.30.226.201"],"Received":["from sea.lore.kernel.org (sea.lore.kernel.org\n [IPv6:2600:3c0a:e001:db::12fc:5321])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4gBrXG5wgbz1yCg\n\tfor <incoming@patchwork.ozlabs.org>; Sat, 09 May 2026 00:06:54 +1000 (AEST)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby sea.lore.kernel.org (Postfix) with ESMTP id 379E8300DDCA\n\tfor <incoming@patchwork.ozlabs.org>; Fri,  8 May 2026 14:05:51 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id 47C223DC4DA;\n\tFri,  8 May 2026 14:05:49 +0000 (UTC)","from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org\n [10.30.226.201])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id 238DC3603EF;\n\tFri,  8 May 2026 14:05:48 +0000 (UTC)","by smtp.kernel.org (Postfix) with ESMTPSA id 1E422C2BCB0;\n\tFri,  8 May 2026 14:05:38 +0000 (UTC)"],"ARC-Seal":"i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1778249149; cv=none;\n b=NDwRVMskwTBNUavCBLp2cc5qvioLTCUzvQl5PbeEC8fHhyFavyED3GNGcvKupGZ0Qy5ngsYySvC6BC5ahwGu4uZOv+vZOW1lvUOrmj7faDd6l2jWIImQbcE2+BiqpGmJeQ+3lt1OXeqFLdMsmkxUpiKtxY+v+42Vt16vJSzSeUk=","ARC-Message-Signature":"i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1778249149; c=relaxed/simple;\n\tbh=fLSWRhFbte/FoGtd3CLl6kuuzR05q0bVJXMtb8/dZoQ=;\n\th=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References:\n\t MIME-Version:Content-Type;\n b=u4mwC2jKK5s9xksZqW/Rm/sHBdecUUOo7scYaJR/X92Sra8rJyxbl1RsA7HeSPqm3/iZ6uHjlkQUx/3R3fftGJ1s8isYjC0GWkWH0VhuepAcmt2LtRsJQ0LunbyMdEKLm2+5SUi+WY5oDTKtwvCiCiWyV7n+bn+7zeURUnYFIrs=","ARC-Authentication-Results":"i=1; smtp.subspace.kernel.org;\n dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org\n header.b=azCZp9bY; arc=none smtp.client-ip=10.30.226.201","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;\n\ts=k20201202; t=1778249148;\n\tbh=fLSWRhFbte/FoGtd3CLl6kuuzR05q0bVJXMtb8/dZoQ=;\n\th=Date:From:To:Cc:Subject:In-Reply-To:References:From;\n\tb=azCZp9bY7pIL0ZVlrNrdlWPTXcJv4fzF9Uk04rSH3jltcVSM+ccR6e/ow/l5kyVo3\n\t y468mDk1Zo30ojuwGUsq38OEQiyh+PP92U6wEgTcMRkAdObaeSb/c9NOYHTCID/5xv\n\t uAZL1s7VndRqTfTfIdVIxI3IEUD0O12zPgaQIuzxM6QDfrOoyYv22skmC4Cdy2/Ht2\n\t V6IhZOMrWMkeAMxfeu9RHeyWqkrHKkRI4dnFjh69j8LK3CEYPUfX8Q5GOaCBFMhkX2\n\t f2Ojgg6uXgvvBXMqB6ckowC8LrYsS9GqML1aTWS/KBd+sMEu68RegCFAfO+t7FwxHC\n\t 1cI4uq/BeT/wQ==","Date":"Fri, 8 May 2026 15:05:33 +0100","From":"Jonathan Cameron <jic23@kernel.org>","To":"\"Bowman, Terry\" <terry.bowman@amd.com>","Cc":"dave@stgolabs.net, dave.jiang@intel.com, alison.schofield@intel.com,\n djbw@kernel.org, bhelgaas@google.com, shiju.jose@huawei.com,\n ming.li@zohomail.com, Smita.KoralahalliChannabasappa@amd.com,\n rrichter@amd.com, dan.carpenter@linaro.org,\n PradeepVineshReddy.Kodamati@amd.com, lukas@wunner.de,\n Benjamin.Cheatham@amd.com, sathyanarayanan.kuppuswamy@linux.intel.com,\n vishal.l.verma@intel.com, alucerop@amd.com, ira.weiny@intel.com,\n corbet@lwn.net, rafael@kernel.org, xueshuai@linux.alibaba.com,\n linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org,\n linux-pci@vger.kernel.org, linux-acpi@vger.kernel.org,\n linux-doc@vger.kernel.org, Mauro Carvalho Chehab <mchehab@kernel.org>","Subject":"Re: [PATCH v17 02/11] cxl/ras: Unify Endpoint and Port AER trace\n events","Message-ID":"<20260508150533.04e19cf9@jic23-huawei>","In-Reply-To":"<8913c666-a343-4717-8ab2-0b8546d1bdfb@amd.com>","References":"<20260505173029.2718246-1-terry.bowman@amd.com>\n\t<20260505173029.2718246-3-terry.bowman@amd.com>\n\t<20260507190836.70197e24@jic23-huawei>\n\t<8913c666-a343-4717-8ab2-0b8546d1bdfb@amd.com>","X-Mailer":"Claws Mail 4.4.0 (GTK 3.24.52; x86_64-pc-linux-gnu)","Precedence":"bulk","X-Mailing-List":"linux-pci@vger.kernel.org","List-Id":"<linux-pci.vger.kernel.org>","List-Subscribe":"<mailto:linux-pci+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:linux-pci+unsubscribe@vger.kernel.org>","MIME-Version":"1.0","Content-Type":"text/plain; charset=US-ASCII","Content-Transfer-Encoding":"7bit"}},{"id":3688805,"web_url":"http://patchwork.ozlabs.org/comment/3688805/","msgid":"<69feaebd471c3_1b86a100b@djbw-dev.notmuch>","list_archive_url":null,"date":"2026-05-09T03:49:17","subject":"Re: [PATCH v17 02/11] cxl/ras: Unify Endpoint and Port AER trace\n events","submitter":{"id":93110,"url":"http://patchwork.ozlabs.org/api/people/93110/","name":"Dan Williams (nvidia)","email":"djbw@kernel.org"},"content":"Jonathan Cameron wrote:\n> On Thu, 7 May 2026 13:33:45 -0500\n> \"Bowman, Terry\" <terry.bowman@amd.com> wrote:\n[..]\n> > > This concerns me (sorry I wasn't paying attention to the v16 thread).\n> > > It is a userspace regression against code that is out in the wild and typically\n> > > not updated in sync with the kernel.\n> > > \n> > > If you are suggesting breaking ras-daemon at the very least +CC the maintainer.\n\nSorry, that was not the intent, see below.\n\n> > > \n> > > To get to a unified tracepoint add a new one that does what you want, but\n> > > maintain the existing ones as well.  Userspace can then migrate and maybe\n> > > in 5+ years time we can delete the non unified ones.\n> > > \n> > > No actually comments on the code, just left it all here for Mauro,\n> > > \n> > > Thanks,\n> > > \n> > > Jonathan\n> > >   \n> > \n> > Dan was clear about using a single set of CE and UE handlers for all CXL RAS \n> > protocol errors. While I understand there may be concerns, please direct any \n> > objections to Dan and clarify what changes are required to avoid this \n> > repeatedly going back and forth.\n> > \n> > [1] https://lore.kernel.org/linux-cxl/69cb2d5ba3111_178904100b7@dwillia2-mobl4.notmuch/\n> \n> Sure - Dan's on this thread so I'm sure he'll see it sooner or later.\n> \n> Perhaps I'm missing something that makes this less critical than it appears.\n\nNo, it is breakage and a thinko on my part on the advice to Terry on the\nbackwards compatibility rules for tracepoints. At the time I was only\ntracking data type and order of the payload. I.e. string at same\nposition. However, the name of the argument is ABI.\n\nSomething like this incremental fixup I think gets this back on track.\nIt keeps legacy ABI support for \"memdev\" field in the payload. It\nincrementally lets updated userspace understand \"port\" and \"dport\"\nevents. It stops us from growing a new set of events just to update the\narguments. It enhances the CPER events to now handle switch ports in\naddition to endpoint ports.\n\nThe bulk of the change is passing @port and @dport to the CXL trace\nevents instead of a plain @dev.\n\n-- >8 --\ndiff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h\nindex ff39985d363f..ed3a56966369 100644\n--- a/drivers/cxl/core/core.h\n+++ b/drivers/cxl/core/core.h\n@@ -183,9 +183,10 @@ static inline struct device *dport_to_host(struct cxl_dport *dport)\n #ifdef CONFIG_CXL_RAS\n int cxl_ras_init(void);\n void cxl_ras_exit(void);\n-bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base);\n-void cxl_handle_cor_ras(struct device *dev, u64 serial,\n-\t\t\tvoid __iomem *ras_base);\n+bool cxl_handle_ras(struct cxl_port *port, struct cxl_dport *dport, u64 serial,\n+\t\t    void __iomem *ras_base);\n+void cxl_handle_cor_ras(struct cxl_port *port, struct cxl_dport *dport,\n+\t\t\tu64 serial, void __iomem *ras_base);\n void cxl_dport_map_rch_aer(struct cxl_dport *dport);\n void cxl_disable_rch_root_ints(struct cxl_dport *dport);\n void cxl_handle_rdport_errors(struct pci_dev *pdev);\ndiff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h\nindex 6f3957b3c3af..3857d2fc279d 100644\n--- a/drivers/cxl/core/trace.h\n+++ b/drivers/cxl/core/trace.h\n@@ -49,20 +49,24 @@\n )\n \n TRACE_EVENT(cxl_aer_uncorrectable_error,\n-\tTP_PROTO(const struct device *dev, u32 status, u32 fe, u32 *hl,\n-\t\t u64 serial),\n-\tTP_ARGS(dev, status, fe, hl, serial),\n+\tTP_PROTO(struct cxl_port *port, struct cxl_dport *dport, u32 status,\n+\t\t u32 fe, u32 *hl, u64 serial),\n+\tTP_ARGS(port, dport, status, fe, hl, serial),\n \tTP_STRUCT__entry(\n-\t\t__string(device, dev_name(dev))\n-\t\t__string(host, dev_name(dev->parent))\n+\t\t__string(memdev, cxl_trace_memdev_name(port))\n+\t\t__string(host, cxl_trace_host_name(port))\n \t\t__field(u64, serial)\n \t\t__field(u32, status)\n \t\t__field(u32, first_error)\n \t\t__array(u32, header_log, CXL_HEADERLOG_SIZE_U32)\n+\t\t__string(port, cxl_trace_port_name(port))\n+\t\t__string(dport, cxl_trace_dport_name(dport))\n \t),\n \tTP_fast_assign(\n-\t\t__assign_str(device);\n+\t\t__assign_str(memdev);\n \t\t__assign_str(host);\n+\t\t__assign_str(port);\n+\t\t__assign_str(dport);\n \t\t__entry->serial = serial;\n \t\t__entry->status = status;\n \t\t__entry->first_error = fe;\n@@ -72,8 +76,9 @@ TRACE_EVENT(cxl_aer_uncorrectable_error,\n \t\t */\n \t\tmemcpy(__entry->header_log, hl, CXL_HEADERLOG_SIZE);\n \t),\n-\tTP_printk(\"device=%s host=%s serial=%lld status: '%s' first_error: '%s'\",\n-\t\t  __get_str(device), __get_str(host), __entry->serial,\n+\tTP_printk(\"memdev=%s port=%s dport=%s host=%s serial=%lld status: '%s' first_error: '%s'\",\n+\t\t  __get_str(memdev), __get_str(port), __get_str(dport),\n+\t\t  __get_str(host), __entry->serial,\n \t\t  show_uc_errs(__entry->status),\n \t\t  show_uc_errs(__entry->first_error)\n \t)\n@@ -98,22 +103,27 @@ TRACE_EVENT(cxl_aer_uncorrectable_error,\n )\n \n TRACE_EVENT(cxl_aer_correctable_error,\n-\tTP_PROTO(const struct device *dev, u32 status, u64 serial),\n-\tTP_ARGS(dev, status, serial),\n+\tTP_PROTO(struct cxl_port *port, struct cxl_dport *dport, u32 status, u64 serial),\n+\tTP_ARGS(port, dport, status, serial),\n \tTP_STRUCT__entry(\n-\t\t__string(device, dev_name(dev))\n-\t\t__string(host, dev_name(dev->parent))\n+\t\t__string(memdev, cxl_trace_memdev_name(port))\n+\t\t__string(host, cxl_trace_host_name(port))\n \t\t__field(u64, serial)\n \t\t__field(u32, status)\n+\t\t__string(port, cxl_trace_port_name(port))\n+\t\t__string(dport, cxl_trace_dport_name(dport))\n \t),\n \tTP_fast_assign(\n-\t\t__assign_str(device);\n+\t\t__assign_str(memdev);\n+\t\t__assign_str(port);\n+\t\t__assign_str(dport);\n \t\t__assign_str(host);\n \t\t__entry->serial = serial;\n \t\t__entry->status = status;\n \t),\n-\tTP_printk(\"device=%s host=%s serial=%lld status: '%s'\",\n-\t\t  __get_str(device), __get_str(host), __entry->serial,\n+\tTP_printk(\"memdev=%s port=%s dport=%s host=%s serial=%lld status: '%s'\",\n+\t\t  __get_str(memdev), __get_str(port), __get_str(dport),\n+\t\t  __get_str(host), __entry->serial,\n \t\t  show_ce_errs(__entry->status)\n \t)\n );\ndiff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h\nindex 776c50d1db51..83e161d48405 100644\n--- a/drivers/cxl/cxlmem.h\n+++ b/drivers/cxl/cxlmem.h\n@@ -101,6 +101,12 @@ static inline bool is_cxl_endpoint(struct cxl_port *port)\n \treturn is_cxl_memdev(port->uport_dev);\n }\n \n+/* trace-event helpers */\n+const char *cxl_trace_memdev_name(struct cxl_port *port);\n+const char *cxl_trace_host_name(struct cxl_port *port);\n+const char *cxl_trace_port_name(struct cxl_port *port);\n+const char *cxl_trace_dport_name(struct cxl_dport *dport);\n+\n struct cxl_memdev *__devm_cxl_add_memdev(struct cxl_dev_state *cxlds,\n \t\t\t\t\t const struct cxl_memdev_attach *attach);\n struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds,\ndiff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c\nindex b45e2b539b5f..33e78f155916 100644\n--- a/drivers/cxl/core/ras.c\n+++ b/drivers/cxl/core/ras.c\n@@ -8,16 +8,20 @@\n #include <cxlpci.h>\n #include \"trace.h\"\n \n-static void cxl_cper_trace_corr_prot_err(struct pci_dev *pdev, u64 serial,\n-\t\t\t\t\t struct cxl_ras_capability_regs *ras_cap)\n+static void\n+cxl_cper_trace_corr_prot_err(struct cxl_port *port, struct cxl_dport *dport,\n+\t\t\t     u64 serial,\n+\t\t\t     struct cxl_ras_capability_regs *ras_cap)\n {\n \tu32 status = ras_cap->cor_status & ~ras_cap->cor_mask;\n \n-\ttrace_cxl_aer_correctable_error(&pdev->dev, status, serial);\n+\ttrace_cxl_aer_correctable_error(port, dport, status, serial);\n }\n \n-static void cxl_cper_trace_uncorr_prot_err(struct pci_dev *pdev, u64 serial,\n-\t\t\t\t\t   struct cxl_ras_capability_regs *ras_cap)\n+static void\n+cxl_cper_trace_uncorr_prot_err(struct cxl_port *port, struct cxl_dport *dport,\n+\t\t\t       u64 serial,\n+\t\t\t       struct cxl_ras_capability_regs *ras_cap)\n {\n \tu32 status = ras_cap->uncor_status & ~ras_cap->uncor_mask;\n \tu32 fe;\n@@ -28,10 +32,12 @@ static void cxl_cper_trace_uncorr_prot_err(struct pci_dev *pdev, u64 serial,\n \telse\n \t\tfe = status;\n \n-\ttrace_cxl_aer_uncorrectable_error(&pdev->dev, status, fe,\n+\ttrace_cxl_aer_uncorrectable_error(port, dport, status, fe,\n \t\t\t\t\t  ras_cap->header_log, serial);\n }\n \n+static struct cxl_port *find_cxl_port_by_dev(struct device *dev, struct cxl_dport **dport);\n+\n void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data)\n {\n \tunsigned int devfn = PCI_DEVFN(data->prot_err.agent_addr.device,\n@@ -40,19 +46,26 @@ void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data)\n \t\tpci_get_domain_bus_and_slot(data->prot_err.agent_addr.segment,\n \t\t\t\t\t    data->prot_err.agent_addr.bus,\n \t\t\t\t\t    devfn);\n+\tstruct cxl_dport *dport;\n \n \tif (!pdev)\n \t\treturn;\n \n-\tguard(device)(&pdev->dev);\n-\tif (!pdev->dev.driver)\n+\tstruct cxl_port *port __free(put_cxl_port) =\n+\t\tfind_cxl_port_by_dev(&pdev->dev, &dport);\n+\n+\tif (!port)\n+\t\treturn;\n+\n+\tguard(device)(&port->dev);\n+\tif (!port->dev.driver)\n \t\treturn;\n \n \tif (data->severity == AER_CORRECTABLE)\n-\t\tcxl_cper_trace_corr_prot_err(pdev, pci_get_dsn(pdev),\n+\t\tcxl_cper_trace_corr_prot_err(port, dport, pci_get_dsn(pdev),\n \t\t\t\t\t     &data->ras_cap);\n \telse\n-\t\tcxl_cper_trace_uncorr_prot_err(pdev, pci_get_dsn(pdev),\n+\t\tcxl_cper_trace_uncorr_prot_err(port, dport, pci_get_dsn(pdev),\n \t\t\t\t\t       &data->ras_cap);\n }\n EXPORT_SYMBOL_GPL(cxl_cper_handle_prot_err);\n@@ -222,13 +235,12 @@ static void __iomem *to_ras_base(struct cxl_port *port, struct cxl_dport *dport)\n \n static void cxl_do_recovery(struct pci_dev *pdev, struct cxl_port *port, struct cxl_dport *dport)\n {\n-\tstruct device *dev = &pdev->dev;\n \tbool ue;\n \n \tif (pci_dev_is_disconnected(pdev))\n \t\tpanic(\"CXL cachemem error: device disconnected during UE recovery\");\n \n-\tue = cxl_handle_ras(dev, pci_get_dsn(pdev),\n+\tue = cxl_handle_ras(port, dport, pci_get_dsn(pdev),\n \t\t\t    to_ras_base(port, dport));\n \tif (ue)\n \t\tpanic(\"CXL cachemem error.\");\n@@ -238,7 +250,8 @@ static void cxl_do_recovery(struct pci_dev *pdev, struct cxl_port *port, struct\n \tpci_aer_clear_fatal_status(pdev);\n }\n \n-void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)\n+void cxl_handle_cor_ras(struct cxl_port *port, struct cxl_dport *dport,\n+\t\t\tu64 serial, void __iomem *ras_base)\n {\n \tvoid __iomem *addr;\n \tu32 status;\n@@ -250,7 +263,7 @@ void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)\n \tstatus = readl(addr);\n \tif (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {\n \t\twritel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);\n-\t\ttrace_cxl_aer_correctable_error(dev, status, serial);\n+\t\ttrace_cxl_aer_correctable_error(port, dport, status, serial);\n \t}\n }\n \n@@ -275,7 +288,8 @@ static void header_log_copy(void __iomem *ras_base, u32 *log)\n  * Log the state of the RAS status registers and prepare them to log the\n  * next error status. Return 1 if reset needed.\n  */\n-bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base)\n+bool cxl_handle_ras(struct cxl_port *port, struct cxl_dport *dport, u64 serial,\n+\t\t    void __iomem *ras_base)\n {\n \tu32 hl[CXL_HEADERLOG_SIZE_U32];\n \tvoid __iomem *addr;\n@@ -302,7 +316,7 @@ bool cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ras_base)\n \t}\n \n \theader_log_copy(ras_base, hl);\n-\ttrace_cxl_aer_uncorrectable_error(dev, status, fe, hl, serial);\n+\ttrace_cxl_aer_uncorrectable_error(port, dport, status, fe, hl, serial);\n \twritel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);\n \n \treturn true;\n@@ -358,7 +372,7 @@ static void cxl_handle_proto_error(struct pci_dev *pdev, struct cxl_port *port,\n \t\tcxl_handle_rdport_errors(pdev);\n \n \tif (severity == AER_CORRECTABLE) {\n-\t\tcxl_handle_cor_ras(&pdev->dev, pci_get_dsn(pdev),\n+\t\tcxl_handle_cor_ras(port, dport, pci_get_dsn(pdev),\n \t\t\t\t   to_ras_base(port, dport));\n \t\tpcie_clear_device_status(pdev);\n \t} else {\ndiff --git a/drivers/cxl/core/ras_rch.c b/drivers/cxl/core/ras_rch.c\nindex cbd02cabefbc..1bcd3c491aaa 100644\n--- a/drivers/cxl/core/ras_rch.c\n+++ b/drivers/cxl/core/ras_rch.c\n@@ -113,9 +113,8 @@ void cxl_handle_rdport_errors(struct pci_dev *pdev)\n \n \tpci_print_aer(pdev, severity, &aer_regs);\n \tif (severity == AER_CORRECTABLE)\n-\t\tcxl_handle_cor_ras(&pdev->dev, pci_get_dsn(pdev),\n+\t\tcxl_handle_cor_ras(port, dport, pci_get_dsn(pdev),\n \t\t\t\t   dport->regs.ras);\n \telse\n-\t\tcxl_handle_ras(&pdev->dev, pci_get_dsn(pdev),\n-\t\t\t       dport->regs.ras);\n+\t\tcxl_handle_ras(port, dport, pci_get_dsn(pdev), dport->regs.ras);\n }\ndiff --git a/drivers/cxl/core/trace.c b/drivers/cxl/core/trace.c\nindex 7f2a9dd0d0e3..df42d119c53d 100644\n--- a/drivers/cxl/core/trace.c\n+++ b/drivers/cxl/core/trace.c\n@@ -2,7 +2,42 @@\n /* Copyright(c) 2022 Intel Corporation. All rights reserved. */\n \n #include <cxl.h>\n+#include <cxlmem.h>\n #include \"core.h\"\n \n+const char *cxl_trace_memdev_name(struct cxl_port *port)\n+{\n+\tif (is_cxl_endpoint(port)) {\n+\t\tstruct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);\n+\n+\t\treturn dev_name(&cxlmd->dev);\n+\t}\n+\n+\treturn \"\";\n+}\n+\n+const char *cxl_trace_host_name(struct cxl_port *port)\n+{\n+\tif (is_cxl_endpoint(port)) {\n+\t\tstruct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);\n+\n+\t\treturn dev_name(cxlmd->dev.parent);\n+\t}\n+\n+\treturn dev_name(port->uport_dev);\n+}\n+\n+const char *cxl_trace_port_name(struct cxl_port *port)\n+{\n+\treturn dev_name(&port->dev);\n+}\n+\n+const char *cxl_trace_dport_name(struct cxl_dport *dport)\n+{\n+\tif (dport)\n+\t\treturn dev_name(dport->dport_dev);\n+\treturn \"\";\n+}\n+\n #define CREATE_TRACE_POINTS\n #include \"trace.h\"","headers":{"Return-Path":"\n <linux-pci+bounces-54309-incoming=patchwork.ozlabs.org@vger.kernel.org>","X-Original-To":["incoming@patchwork.ozlabs.org","linux-pci@vger.kernel.org"],"Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=kernel.org header.i=@kernel.org header.a=rsa-sha256\n header.s=k20201202 header.b=uAGMGVdJ;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=172.234.253.10; helo=sea.lore.kernel.org;\n envelope-from=linux-pci+bounces-54309-incoming=patchwork.ozlabs.org@vger.kernel.org;\n receiver=patchwork.ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org\n header.b=\"uAGMGVdJ\"","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=10.30.226.201"],"Received":["from sea.lore.kernel.org (sea.lore.kernel.org [172.234.253.10])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4gCBnQ35M8z1yK7\n\tfor <incoming@patchwork.ozlabs.org>; Sat, 09 May 2026 13:49:30 +1000 (AEST)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby sea.lore.kernel.org (Postfix) with ESMTP id E28F2301A912\n\tfor <incoming@patchwork.ozlabs.org>; Sat,  9 May 2026 03:49:22 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id 6ECC43815C3;\n\tSat,  9 May 2026 03:49:21 +0000 (UTC)","from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org\n [10.30.226.201])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id 47FD337AA63;\n\tSat,  9 May 2026 03:49:21 +0000 (UTC)","by smtp.kernel.org (Postfix) with ESMTPSA id EEB59C2BCC4;\n\tSat,  9 May 2026 03:49:19 +0000 (UTC)","from phl-compute-05.internal (phl-compute-05.internal [10.202.2.45])\n\tby mailfauth.phl.internal (Postfix) with ESMTP id 0F57FF40081;\n\tFri,  8 May 2026 23:49:19 -0400 (EDT)","from phl-frontend-03 ([10.202.2.162])\n  by phl-compute-05.internal (MEProxy); Fri, 08 May 2026 23:49:19 -0400","by mail.messagingengine.com (Postfix) with ESMTPA; Fri,\n 8 May 2026 23:49:18 -0400 (EDT)"],"ARC-Seal":"i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1778298561; cv=none;\n b=F5/4fvO9piaXTjywoFBJ3GGTijr33JIanIV5t9A+4k1m5ksTIzx+Y+KSZmh5clC+Ob24ZB2Y0eLqFoo/lieKFt8EpbyAUfkoS0YaazztjsqLEnjdsK3fovuN9XzKJWP54yiq5PO1+On5ELIb25JmL6uS7aCE5lz82WtGH0uuoE4=","ARC-Message-Signature":"i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1778298561; c=relaxed/simple;\n\tbh=AjbX2/PuJjVkTJXhV0OHUmvecqEZOKJQGxfINAzjECs=;\n\th=Date:From:To:Cc:Message-ID:In-Reply-To:References:Subject:\n\t Mime-Version:Content-Type;\n b=CeYK1chWRCRtGkbpDwa1V0Gs7Nn1VH5QdtNTE7A9FP3vGGeZqG3H03geQyWP3GhJQcW7dhsFEaIQkC43fLTrMWrSE4xaZo916dvHtc5HN4vQPXxJz50sUHtI8XcqksDfPiHJYOxIc+pUEjUu09EidcVsJCAxDW8NL0pH2fdmaoE=","ARC-Authentication-Results":"i=1; smtp.subspace.kernel.org;\n dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org\n header.b=uAGMGVdJ; arc=none smtp.client-ip=10.30.226.201","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;\n\ts=k20201202; t=1778298561;\n\tbh=AjbX2/PuJjVkTJXhV0OHUmvecqEZOKJQGxfINAzjECs=;\n\th=Date:From:To:Cc:In-Reply-To:References:Subject:From;\n\tb=uAGMGVdJpeo2DVc9ZdIfqD3HewoTBgjuZxyEOpftEU4lcm+MmDWyzGc7q+rAUxQxW\n\t A9xQYpG/PibsHOvB9wubPgWl+eR6qT4dLilOAcXFkwOlyrOFGNTUei5RCQovscmPP+\n\t 1vn4gl4Iq9otD6R/NgQm5+t9vMeUKWDbZSYgtN2si4FlYpZBiGBy+Aw0fjngMbV5yz\n\t tm58EFatJKGO4+S7QyjLrQwOdDh9Q3GxRzLiHVdwczUbjBOVERSM3+4AcYBnxut/4W\n\t rkA21Wrdf/uZuWquDD/e35aSw8IgQNFVFpOsqjI+oaMIZB4NccfHEWjf2wgy9Vr1C1\n\t edI3kT5q0CYlA==","X-ME-Sender":"<xms:vq7-aQ4_xvATOb1WMTDcHkp7RP6K3e-kOEUgp1toRM6LtBrzZai2Gg>\n    <xme:vq7-abjwQmIuY2mbo0Jc94K0i2Ra90ZVLPPvs7qOWZw0g3TuJH2BEAK25qo5-P2d2\n    IdqfCYv4STBDil3IdVZhBQq4kSJHFBZpcV8dlvSN3n07GPKtQCD7Ho>","X-ME-Received":"\n <xmr:vq7-aS42y6DI2R2GZFTfkfj6s-0UXl0MXOigJ8uciBXfQBK4lHHCvbsLBoNtreP2WUP1Z1rvgAfMYFU-UlkBvW8FSF-ee7aeOxU>","X-ME-Proxy-Cause":"\n gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgdduuddvudelucetufdoteggodetrf\n    dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu\n    rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf\n    gurhepfffhvfevkfgjfhfugggtgfesthejredttddtjeenucfhrhhomhepfdffrghnucgh\n    ihhllhhirghmshculdhnvhhiughirgdmfdcuoegujhgsfieskhgvrhhnvghlrdhorhhgqe\n    enucggtffrrghtthgvrhhnpedtkeelkefgtdegudeiiedttdfgueegieekjeffueffueet\n    jeffkeevtefflefhheenucffohhmrghinhepkhgvrhhnvghlrdhorhhgnecuvehluhhsth\n    gvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepughjsgifodhmvghsmhht\n    phgruhhthhhpvghrshhonhgrlhhithihqddujeejvdeftdegheehqdeffeefleegtdegje\n    dqughjsgifpeepkhgvrhhnvghlrdhorhhgsehfrghsthhmrghilhdrtghomhdpnhgspghr\n    tghpthhtohepvdekpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehjihgtvdefse\n    hkvghrnhgvlhdrohhrghdprhgtphhtthhopehtvghrrhihrdgsohifmhgrnhesrghmugdr\n    tghomhdprhgtphhtthhopegurghvvgesshhtghholhgrsghsrdhnvghtpdhrtghpthhtoh\n    epuggrvhgvrdhjihgrnhhgsehinhhtvghlrdgtohhmpdhrtghpthhtoheprghlihhsohhn\n    rdhstghhohhfihgvlhgusehinhhtvghlrdgtohhmpdhrtghpthhtohepughjsgifsehkvg\n    hrnhgvlhdrohhrghdprhgtphhtthhopegshhgvlhhgrggrshesghhoohhglhgvrdgtohhm\n    pdhrtghpthhtohepshhhihhjuhdrjhhoshgvsehhuhgrfigvihdrtghomhdprhgtphhtth\n    hopehmihhnghdrlhhiseiiohhhohhmrghilhdrtghomh","X-ME-Proxy":"<xmx:vq7-acBFo02_fLfFu834Ew_CWBormYOgwYwrdcrWu7zVsOSSYXNNng>\n    <xmx:v67-afSQBYfb4eYJxFgVJS1zCBYVCi7Be_lJtvvgw4Hm7mOCqUbR6Q>\n    <xmx:v67-aTYvf0pi_CHt8KFsja7M4ArCFxvp1kIqgg_8mq13hGagvP122w>\n    <xmx:v67-aUVMc7ALQiL2NnpSL1vkV1MrGzLGB52_nj1B8586V-uOqCuGzw>\n    <xmx:v67-aQVrv4zT3DqE58fljWuO-CN9wOH7oDngriJRNxa1aUhkLWkdxz9J>","Feedback-ID":"i67ae4b3e:Fastmail","Date":"Fri, 08 May 2026 20:49:17 -0700","From":"\"Dan Williams (nvidia)\" <djbw@kernel.org>","To":"Jonathan Cameron <jic23@kernel.org>,\n \"Bowman, Terry\" <terry.bowman@amd.com>","Cc":"dave@stgolabs.net,\n dave.jiang@intel.com,\n alison.schofield@intel.com,\n djbw@kernel.org,\n bhelgaas@google.com,\n shiju.jose@huawei.com,\n ming.li@zohomail.com,\n Smita.KoralahalliChannabasappa@amd.com,\n rrichter@amd.com,\n dan.carpenter@linaro.org,\n PradeepVineshReddy.Kodamati@amd.com,\n lukas@wunner.de,\n Benjamin.Cheatham@amd.com,\n sathyanarayanan.kuppuswamy@linux.intel.com,\n vishal.l.verma@intel.com,\n alucerop@amd.com,\n ira.weiny@intel.com,\n corbet@lwn.net,\n rafael@kernel.org,\n xueshuai@linux.alibaba.com,\n linux-cxl@vger.kernel.org,\n linux-kernel@vger.kernel.org,\n linux-pci@vger.kernel.org,\n linux-acpi@vger.kernel.org,\n linux-doc@vger.kernel.org,\n Mauro Carvalho Chehab <mchehab@kernel.org>","Message-ID":"<69feaebd471c3_1b86a100b@djbw-dev.notmuch>","In-Reply-To":"<20260508150533.04e19cf9@jic23-huawei>","References":"<20260505173029.2718246-1-terry.bowman@amd.com>\n <20260505173029.2718246-3-terry.bowman@amd.com>\n <20260507190836.70197e24@jic23-huawei>\n <8913c666-a343-4717-8ab2-0b8546d1bdfb@amd.com>\n <20260508150533.04e19cf9@jic23-huawei>","Subject":"Re: [PATCH v17 02/11] cxl/ras: Unify Endpoint and Port AER trace\n events","Precedence":"bulk","X-Mailing-List":"linux-pci@vger.kernel.org","List-Id":"<linux-pci.vger.kernel.org>","List-Subscribe":"<mailto:linux-pci+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:linux-pci+unsubscribe@vger.kernel.org>","Mime-Version":"1.0","Content-Type":"text/plain;\n charset=utf-8","Content-Transfer-Encoding":"7bit"}}]