From patchwork Tue May 7 03:32:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Chiu X-Patchwork-Id: 1932207 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VYP406lfSz20fX for ; Tue, 7 May 2024 13:32:48 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1s4BZF-0006ad-H6; Tue, 07 May 2024 03:32:37 +0000 Received: from smtp-relay-internal-0.internal ([10.131.114.225] helo=smtp-relay-internal-0.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1s4BZD-0006Zy-Hz for kernel-team@lists.ubuntu.com; Tue, 07 May 2024 03:32:35 +0000 Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id BFA2D424B4 for ; Tue, 7 May 2024 03:32:34 +0000 (UTC) Received: by mail-pf1-f198.google.com with SMTP id d2e1a72fcca58-6f454878580so2309570b3a.3 for ; Mon, 06 May 2024 20:32:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715052753; x=1715657553; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4biUn0kuyzulLIcTUfBNNs0f2BiMqPxYHLmcbM+C7gA=; b=ciwT2UVgbPDR6PmgUAveGH4g4WVKunueuGpeVSRUX+3kA7qjdaMT/JPv3mrmVp44C7 fKXXv0AXkQXvu5idohId5cuCcMBq3rLzOWZ3ZEFnVwDqvzS9BZjjXR81AaRInewlxNme vBkuqY49Thc9FGE/3YEtDlgLhyra0m8LNQ2NP7A8Nk+miGV1cSe1BV4ei+VuM/6c0bI9 DtIRs45kcH5H96St1VJHEvwGgkZrPmBiDS8PLbP5VG92JeQzsgUu4jYfxk8zt91Xg40V mCdcEmEuk/Ver7sQmWpwXGxkZhD61b7EJrSujWOO/TwxpzlWtQ82Rs/RwgwQdDMOHl8F avTg== X-Gm-Message-State: AOJu0YyTjqZcM/Zl1kFn8ThnQBYaL68mdmfJium25XBDxNxAXvrmdGfJ xUt63MWBWgIZYUBk7jEIb3xlOGOThC6f2BgwC/afVSeh5OO5OjgI3js294EX4oWHagctOec3Bg8 GXzD1WwuYvifQ3TRuM29SqLoyXMW6ypT9NQxTa5orYMB2EgXe5ebfGzKTfm8bVhIqt7HOt+DlUs xPTXF2uYIY2yiQ X-Received: by 2002:a05:6a00:1827:b0:6f0:c79f:cd7e with SMTP id y39-20020a056a00182700b006f0c79fcd7emr16566127pfa.0.1715052753030; Mon, 06 May 2024 20:32:33 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHe+kcbeId3fJ31qaYawkzKA9KCDCbK2Qi00uq3MAVsGsgDGeqI3SqnYI+HcuEsNR61jIrMUw== X-Received: by 2002:a05:6a00:1827:b0:6f0:c79f:cd7e with SMTP id y39-20020a056a00182700b006f0c79fcd7emr16566109pfa.0.1715052752450; Mon, 06 May 2024 20:32:32 -0700 (PDT) Received: from u-XPS-9320.. (118-163-61-247.hinet-ip.hinet.net. [118.163.61.247]) by smtp.gmail.com with ESMTPSA id it21-20020a056a00459500b006f4401df6c9sm8117287pfb.113.2024.05.06.20.32.30 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 May 2024 20:32:31 -0700 (PDT) From: Chris Chiu To: kernel-team@lists.ubuntu.com Subject: [SRU][U/N/OEM-6.8][PATCH 1/1] drm/i915/huc: Allow for very slow HuC loading Date: Tue, 7 May 2024 11:32:27 +0800 Message-Id: <20240507033227.1879383-2-chris.chiu@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240507033227.1879383-1-chris.chiu@canonical.com> References: <20240507033227.1879383-1-chris.chiu@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: John Harrison BugLink: https://bugs.launchpad.net/bugs/2061049 A failure to load the HuC is occasionally observed where the cause is believed to be a low GT frequency leading to very long load times. So a) increase the timeout so that the user still gets a working system even in the case of slow load. And b) report the frequency during the load to see if that is the cause of the slow down. Also update the similar code on the GuC load to not use uncore->gt when there is a local gt available. The two should match, but no need for unnecessary de-referencing. Signed-off-by: John Harrison Reviewed-by: Daniele Ceraolo Spurio Link: https://patchwork.freedesktop.org/patch/msgid/20240102222202.310495-1-John.C.Harrison@Intel.com (cherry picked from commit a797099562267ebb281acd59750f1a8dbba36eef) Signed-off-by: Chris Chiu --- drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 10 ++-- drivers/gpu/drm/i915/gt/uc/intel_huc.c | 64 ++++++++++++++++++++--- 2 files changed, 63 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c index 0f79cb658518..52332bb14339 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c @@ -184,7 +184,7 @@ static int guc_wait_ucode(struct intel_guc *guc) * in the seconds range. However, there is a limit on how long an * individual wait_for() can wait. So wrap it in a loop. */ - before_freq = intel_rps_read_actual_frequency(&uncore->gt->rps); + before_freq = intel_rps_read_actual_frequency(>->rps); before = ktime_get(); for (count = 0; count < GUC_LOAD_RETRY_LIMIT; count++) { ret = wait_for(guc_load_done(uncore, &status, &success), 1000); @@ -192,7 +192,7 @@ static int guc_wait_ucode(struct intel_guc *guc) break; guc_dbg(guc, "load still in progress, count = %d, freq = %dMHz, status = 0x%08X [0x%02X/%02X]\n", - count, intel_rps_read_actual_frequency(&uncore->gt->rps), status, + count, intel_rps_read_actual_frequency(>->rps), status, REG_FIELD_GET(GS_BOOTROM_MASK, status), REG_FIELD_GET(GS_UKERNEL_MASK, status)); } @@ -204,7 +204,7 @@ static int guc_wait_ucode(struct intel_guc *guc) u32 bootrom = REG_FIELD_GET(GS_BOOTROM_MASK, status); guc_info(guc, "load failed: status = 0x%08X, time = %lldms, freq = %dMHz, ret = %d\n", - status, delta_ms, intel_rps_read_actual_frequency(&uncore->gt->rps), ret); + status, delta_ms, intel_rps_read_actual_frequency(>->rps), ret); guc_info(guc, "load failed: status: Reset = %d, BootROM = 0x%02X, UKernel = 0x%02X, MIA = 0x%02X, Auth = 0x%02X\n", REG_FIELD_GET(GS_MIA_IN_RESET, status), bootrom, ukernel, @@ -254,11 +254,11 @@ static int guc_wait_ucode(struct intel_guc *guc) guc_warn(guc, "excessive init time: %lldms! [status = 0x%08X, count = %d, ret = %d]\n", delta_ms, status, count, ret); guc_warn(guc, "excessive init time: [freq = %dMHz, before = %dMHz, perf_limit_reasons = 0x%08X]\n", - intel_rps_read_actual_frequency(&uncore->gt->rps), before_freq, + intel_rps_read_actual_frequency(>->rps), before_freq, intel_uncore_read(uncore, intel_gt_perf_limit_reasons_reg(gt))); } else { guc_dbg(guc, "init took %lldms, freq = %dMHz, before = %dMHz, status = 0x%08X, count = %d, ret = %d\n", - delta_ms, intel_rps_read_actual_frequency(&uncore->gt->rps), + delta_ms, intel_rps_read_actual_frequency(>->rps), before_freq, status, count, ret); } diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c index ba9e07fc2b57..0945b177d5f9 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c @@ -6,6 +6,7 @@ #include #include "gt/intel_gt.h" +#include "gt/intel_rps.h" #include "intel_guc_reg.h" #include "intel_huc.h" #include "intel_huc_print.h" @@ -447,17 +448,68 @@ static const char *auth_mode_string(struct intel_huc *huc, return partial ? "clear media" : "all workloads"; } +/* + * Use a longer timeout for debug builds so that problems can be detected + * and analysed. But a shorter timeout for releases so that user's don't + * wait forever to find out there is a problem. Note that the only reason + * an end user should hit the timeout is in case of extreme thermal throttling. + * And a system that is that hot during boot is probably dead anyway! + */ +#if defined(CONFIG_DRM_I915_DEBUG_GEM) +#define HUC_LOAD_RETRY_LIMIT 20 +#else +#define HUC_LOAD_RETRY_LIMIT 3 +#endif + int intel_huc_wait_for_auth_complete(struct intel_huc *huc, enum intel_huc_authentication_type type) { struct intel_gt *gt = huc_to_gt(huc); - int ret; + struct intel_uncore *uncore = gt->uncore; + ktime_t before, after, delta; + int ret, count; + u64 delta_ms; + u32 before_freq; - ret = __intel_wait_for_register(gt->uncore, - huc->status[type].reg, - huc->status[type].mask, - huc->status[type].value, - 2, 50, NULL); + /* + * The KMD requests maximum frequency during driver load, however thermal + * throttling can force the frequency down to minimum (although the board + * really should never get that hot in real life!). IFWI issues have been + * seen to cause sporadic failures to grant the higher frequency. And at + * minimum frequency, the authentication time can be in the seconds range. + * Note that there is a limit on how long an individual wait_for() can wait. + * So wrap it in a loop. + */ + before_freq = intel_rps_read_actual_frequency(>->rps); + before = ktime_get(); + for (count = 0; count < HUC_LOAD_RETRY_LIMIT; count++) { + ret = __intel_wait_for_register(gt->uncore, + huc->status[type].reg, + huc->status[type].mask, + huc->status[type].value, + 2, 1000, NULL); + if (!ret) + break; + + huc_dbg(huc, "auth still in progress, count = %d, freq = %dMHz, status = 0x%08X\n", + count, intel_rps_read_actual_frequency(>->rps), + huc->status[type].reg.reg); + } + after = ktime_get(); + delta = ktime_sub(after, before); + delta_ms = ktime_to_ms(delta); + + if (delta_ms > 50) { + huc_warn(huc, "excessive auth time: %lldms! [status = 0x%08X, count = %d, ret = %d]\n", + delta_ms, huc->status[type].reg.reg, count, ret); + huc_warn(huc, "excessive auth time: [freq = %dMHz, before = %dMHz, perf_limit_reasons = 0x%08X]\n", + intel_rps_read_actual_frequency(>->rps), before_freq, + intel_uncore_read(uncore, intel_gt_perf_limit_reasons_reg(gt))); + } else { + huc_dbg(huc, "auth took %lldms, freq = %dMHz, before = %dMHz, status = 0x%08X, count = %d, ret = %d\n", + delta_ms, intel_rps_read_actual_frequency(>->rps), + before_freq, huc->status[type].reg.reg, count, ret); + } /* mark the load process as complete even if the wait failed */ delayed_huc_load_complete(huc);