diff mbox series

[SRU,Groovy] drm/i915: Mark ininitial fb obj as WT on eLLC machines to avoid rcu lockup during fbdev init

Message ID 20201120172153.2112-1-kamal@canonical.com
State New
Headers show
Series [SRU,Groovy] drm/i915: Mark ininitial fb obj as WT on eLLC machines to avoid rcu lockup during fbdev init | expand

Commit Message

Kamal Mostafa Nov. 20, 2020, 5:21 p.m. UTC
From: Ville Syrjälä <ville.syrjala@linux.intel.com>

BugLink: https://bugs.launchpad.net/bugs/1903397

Currently we leave the cache_level of the initial fb obj
set to NONE. This means on eLLC machines the first pin_to_display()
will try to switch it to WT which requires a vma unbind+bind.
If that happens during the fbdev initialization rcu does not
seem operational which causes the unbind to get stuck. To
most appearances this looks like a dead machine on boot.

Avoid the unbind by already marking the object cache_level
as WT when creating it. We still do an excplicit ggtt pin
which will rewrite the PTEs anyway, so they will match whatever
cache level we set.

Cc: <stable@vger.kernel.org> # v5.7+
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2381
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201007120329.17076-1-ville.syrjala@linux.intel.com
Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-1-chris@chris-wilson.co.uk
(cherry picked from commit d46b60a2e8d246f1f0faa38e52f4f5a73858c338)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 1664ffee760a5d98952318fdd9b198fae396d660)
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
---
 drivers/gpu/drm/i915/display/intel_display.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Kelsey Skunberg Nov. 20, 2020, 5:49 p.m. UTC | #1
On 2020-11-20 09:21:53 , Kamal Mostafa wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1903397
> 
> Currently we leave the cache_level of the initial fb obj
> set to NONE. This means on eLLC machines the first pin_to_display()
> will try to switch it to WT which requires a vma unbind+bind.
> If that happens during the fbdev initialization rcu does not
> seem operational which causes the unbind to get stuck. To
> most appearances this looks like a dead machine on boot.
> 
> Avoid the unbind by already marking the object cache_level
> as WT when creating it. We still do an excplicit ggtt pin
> which will rewrite the PTEs anyway, so they will match whatever
> cache level we set.
> 
> Cc: <stable@vger.kernel.org> # v5.7+
> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2381
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Link: https://patchwork.freedesktop.org/patch/msgid/20201007120329.17076-1-ville.syrjala@linux.intel.com
> Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-1-chris@chris-wilson.co.uk
> (cherry picked from commit d46b60a2e8d246f1f0faa38e52f4f5a73858c338)
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> (cherry picked from commit 1664ffee760a5d98952318fdd9b198fae396d660)
> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
> ---

Clean cherry pick. Applies and builds ok. LGTM. Thank you, Kamal!

Acked-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>

>  drivers/gpu/drm/i915/display/intel_display.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
> index 26996e1839e2..4bd383ff0880 100644
> --- a/drivers/gpu/drm/i915/display/intel_display.c
> +++ b/drivers/gpu/drm/i915/display/intel_display.c
> @@ -3431,6 +3431,14 @@ initial_plane_vma(struct drm_i915_private *i915,
>  	if (IS_ERR(obj))
>  		return NULL;
>  
> +	/*
> +	 * Mark it WT ahead of time to avoid changing the
> +	 * cache_level during fbdev initialization. The
> +	 * unbind there would get stuck waiting for rcu.
> +	 */
> +	i915_gem_object_set_cache_coherency(obj, HAS_WT(i915) ?
> +					    I915_CACHE_WT : I915_CACHE_NONE);
> +
>  	switch (plane_config->tiling) {
>  	case I915_TILING_NONE:
>  		break;
> -- 
> 2.17.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
Stefan Bader Nov. 23, 2020, 7:52 a.m. UTC | #2
On 20.11.20 18:21, Kamal Mostafa wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1903397
> 
> Currently we leave the cache_level of the initial fb obj
> set to NONE. This means on eLLC machines the first pin_to_display()
> will try to switch it to WT which requires a vma unbind+bind.
> If that happens during the fbdev initialization rcu does not
> seem operational which causes the unbind to get stuck. To
> most appearances this looks like a dead machine on boot.
> 
> Avoid the unbind by already marking the object cache_level
> as WT when creating it. We still do an excplicit ggtt pin
> which will rewrite the PTEs anyway, so they will match whatever
> cache level we set.
> 
> Cc: <stable@vger.kernel.org> # v5.7+
> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2381
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Link: https://patchwork.freedesktop.org/patch/msgid/20201007120329.17076-1-ville.syrjala@linux.intel.com
> Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-1-chris@chris-wilson.co.uk
> (cherry picked from commit d46b60a2e8d246f1f0faa38e52f4f5a73858c338)
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> (cherry picked from commit 1664ffee760a5d98952318fdd9b198fae396d660)
> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
> ---

Maybe the bug report (at least the justification) could be clarified to the fact
that this is something appearing to be a hang but in reality it is no video output.

-Stefan

>  drivers/gpu/drm/i915/display/intel_display.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
> index 26996e1839e2..4bd383ff0880 100644
> --- a/drivers/gpu/drm/i915/display/intel_display.c
> +++ b/drivers/gpu/drm/i915/display/intel_display.c
> @@ -3431,6 +3431,14 @@ initial_plane_vma(struct drm_i915_private *i915,
>  	if (IS_ERR(obj))
>  		return NULL;
>  
> +	/*
> +	 * Mark it WT ahead of time to avoid changing the
> +	 * cache_level during fbdev initialization. The
> +	 * unbind there would get stuck waiting for rcu.
> +	 */
> +	i915_gem_object_set_cache_coherency(obj, HAS_WT(i915) ?
> +					    I915_CACHE_WT : I915_CACHE_NONE);
> +
>  	switch (plane_config->tiling) {
>  	case I915_TILING_NONE:
>  		break;
>
Kleber Souza Nov. 23, 2020, 10:11 a.m. UTC | #3
On 20.11.20 18:21, Kamal Mostafa wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1903397
> 
> Currently we leave the cache_level of the initial fb obj
> set to NONE. This means on eLLC machines the first pin_to_display()
> will try to switch it to WT which requires a vma unbind+bind.
> If that happens during the fbdev initialization rcu does not
> seem operational which causes the unbind to get stuck. To
> most appearances this looks like a dead machine on boot.
> 
> Avoid the unbind by already marking the object cache_level
> as WT when creating it. We still do an excplicit ggtt pin
> which will rewrite the PTEs anyway, so they will match whatever
> cache level we set.
> 
> Cc: <stable@vger.kernel.org> # v5.7+
> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2381
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Link: https://patchwork.freedesktop.org/patch/msgid/20201007120329.17076-1-ville.syrjala@linux.intel.com
> Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-1-chris@chris-wilson.co.uk
> (cherry picked from commit d46b60a2e8d246f1f0faa38e52f4f5a73858c338)
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> (cherry picked from commit 1664ffee760a5d98952318fdd9b198fae396d660)
> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
> ---
>   drivers/gpu/drm/i915/display/intel_display.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
> index 26996e1839e2..4bd383ff0880 100644
> --- a/drivers/gpu/drm/i915/display/intel_display.c
> +++ b/drivers/gpu/drm/i915/display/intel_display.c
> @@ -3431,6 +3431,14 @@ initial_plane_vma(struct drm_i915_private *i915,
>   	if (IS_ERR(obj))
>   		return NULL;
>   
> +	/*
> +	 * Mark it WT ahead of time to avoid changing the
> +	 * cache_level during fbdev initialization. The
> +	 * unbind there would get stuck waiting for rcu.
> +	 */
> +	i915_gem_object_set_cache_coherency(obj, HAS_WT(i915) ?
> +					    I915_CACHE_WT : I915_CACHE_NONE);
> +
>   	switch (plane_config->tiling) {
>   	case I915_TILING_NONE:
>   		break;
> 

Applied to groovy/linux.

Thanks,
Kleber
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index 26996e1839e2..4bd383ff0880 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -3431,6 +3431,14 @@  initial_plane_vma(struct drm_i915_private *i915,
 	if (IS_ERR(obj))
 		return NULL;
 
+	/*
+	 * Mark it WT ahead of time to avoid changing the
+	 * cache_level during fbdev initialization. The
+	 * unbind there would get stuck waiting for rcu.
+	 */
+	i915_gem_object_set_cache_coherency(obj, HAS_WT(i915) ?
+					    I915_CACHE_WT : I915_CACHE_NONE);
+
 	switch (plane_config->tiling) {
 	case I915_TILING_NONE:
 		break;