From patchwork Tue Jul 30 15:16:38 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kamal Mostafa X-Patchwork-Id: 263412 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id A46362C00C4 for ; Wed, 31 Jul 2013 01:19:36 +1000 (EST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1V4Bhx-0001Xz-1V; Tue, 30 Jul 2013 15:19:29 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1V4Bgc-0001D3-Ql for kernel-team@lists.ubuntu.com; Tue, 30 Jul 2013 15:18:06 +0000 Received: from c-67-160-231-162.hsd1.ca.comcast.net ([67.160.231.162] helo=fourier) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1V4Bgb-0007mc-U6; Tue, 30 Jul 2013 15:18:06 +0000 Received: from kamal by fourier with local (Exim 4.80) (envelope-from ) id 1V4BgZ-0007JL-JD; Tue, 30 Jul 2013 08:18:03 -0700 From: Kamal Mostafa To: linux-kernel@vger.kernel.org, stable@vger.kernel.org, kernel-team@lists.ubuntu.com Subject: [PATCH 09/75] drm/i915: Fix incoherence with fence updates on Sandybridge+ Date: Tue, 30 Jul 2013 08:16:38 -0700 Message-Id: <1375197464-27962-10-git-send-email-kamal@canonical.com> X-Mailer: git-send-email 1.8.1.2 In-Reply-To: <1375197464-27962-1-git-send-email-kamal@canonical.com> References: <1375197464-27962-1-git-send-email-kamal@canonical.com> X-Extended-Stable: 3.8 Cc: Daniel Vetter , Kamal Mostafa , Jon Bloomfield , Carsten Emde , Chris Wilson X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.14 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: kernel-team-bounces@lists.ubuntu.com 3.8.13.6 -stable review patch. If anyone has any objections, please let me know. ------------------ From: Chris Wilson commit d18b9619034230b6f945e215276425636ca401fe upstream. This hopefully fixes the root cause behind the workaround added in commit 25ff1195f8a0b3724541ae7bbe331b4296de9c06 Author: Chris Wilson Date: Thu Apr 4 21:31:03 2013 +0100 drm/i915: Workaround incoherence between fences and LLC across multiple CPUs Thanks to further investigation by Jon Bloomfield, he realised that the 64-bit register might be broken up by the hardware into two 32-bit writes (a problem we have encountered elsewhere). This non-atomicity would then cause an issue where a second thread would see an intermediate register state (new high dword, old low dword), and this register would randomly be used in preference to its own thread register. This would cause the second thread to read from and write into a fairly random tiled location. Breaking the operation into 3 explicit 32-bit updates (first disable the fence, poke the upper bits, then poke the lower bits and enable) ensures that, given proper serialisation between the 32-bit register write and the memory transfer, that the fence value is always consistent. Armed with this knowledge, we can explain how the previous workaround work. The key to the corruption is that a second thread sees an erroneous fence register that conflicts and overrides its own. By serialising the fence update across all CPUs, we have a small window where no GTT access is occurring and so hide the potential corruption. This also leads to the conclusion that the earlier workaround was incomplete. v2: Be overly paranoid about the order in which fence updates become visible to the GPU to make really sure that we turn the fence off before doing the update, and then only switch the fence on afterwards. Signed-off-by: Jon Bloomfield Signed-off-by: Chris Wilson Cc: Daniel Vetter Cc: Carsten Emde Signed-off-by: Daniel Vetter Signed-off-by: Kamal Mostafa --- drivers/gpu/drm/i915/i915_gem.c | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 2aef3d2..979d892 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2529,7 +2529,6 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg, drm_i915_private_t *dev_priv = dev->dev_private; int fence_reg; int fence_pitch_shift; - uint64_t val; if (INTEL_INFO(dev)->gen >= 6) { fence_reg = FENCE_REG_SANDYBRIDGE_0; @@ -2539,8 +2538,23 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg, fence_pitch_shift = I965_FENCE_PITCH_SHIFT; } + fence_reg += reg * 8; + + /* To w/a incoherency with non-atomic 64-bit register updates, + * we split the 64-bit update into two 32-bit writes. In order + * for a partial fence not to be evaluated between writes, we + * precede the update with write to turn off the fence register, + * and only enable the fence as the last step. + * + * For extra levels of paranoia, we make sure each step lands + * before applying the next step. + */ + I915_WRITE(fence_reg, 0); + POSTING_READ(fence_reg); + if (obj) { u32 size = obj->gtt_space->size; + uint64_t val; val = (uint64_t)((obj->gtt_offset + size - 4096) & 0xfffff000) << 32; @@ -2549,12 +2563,16 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg, if (obj->tiling_mode == I915_TILING_Y) val |= 1 << I965_FENCE_TILING_Y_SHIFT; val |= I965_FENCE_REG_VALID; - } else - val = 0; - fence_reg += reg * 8; - I915_WRITE64(fence_reg, val); - POSTING_READ(fence_reg); + I915_WRITE(fence_reg + 4, val >> 32); + POSTING_READ(fence_reg + 4); + + I915_WRITE(fence_reg + 0, val); + POSTING_READ(fence_reg); + } else { + I915_WRITE(fence_reg + 4, 0); + POSTING_READ(fence_reg + 4); + } } static void i915_write_fence_reg(struct drm_device *dev, int reg,