mbox series

[SRU,Bionic,B-OEM,0/1] fix the hang problem for nvidia p1000 graphic card

Message ID 1536643489-6238-1-git-send-email-hui.wang@canonical.com
Headers show
Series fix the hang problem for nvidia p1000 graphic card | expand

Message

Hui Wang Sept. 11, 2018, 5:24 a.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1791569

This patch is in the 4.18 already, no need to send it to cosmic.

Due to the context conflict, if we want to apply this patch as it is, we
need to apply a large amount of patches ahead of this patch, it is possible
to introduce some regression. So I made some change in this patch, it is
some differnt from the orignal patch, but they have the same logic, and it
can be applied to bionic kernel.

[Impact]
We have 2 nvidia graphic cards, and the nouveau driver in the bionic kernel can't
work well with both of these 2 cards, one of the cards hang during the boot
process, we compared the output of lspci and vbios version of these 2 cards, they
are same; and according to nivida's reply, it is possible that they have some
difference on computational units (https://devtalk.nvidia.com/default/topic/1038973/
linux/2-same-quadro-p1000-cards-but-only-one-can-install-ubuntu-/), and kernel-4.18
fixed this problem, through bisect, this patch was found.

[Fix]
backport a upstream patch to fix this problem. without this patch, the number of tpc
(texture process cluster) is hardcoded to be 5 for some nv graphic families, but
in practice, the tpc number of many families is not 5. And the p1000 grphic card belong
to gp107 family, it is 3 intead of 5.


[Test Case]
tested this patch with P1000, P2000, P620 and P500 graphic cards, all work well as before.

[Regression Potential]
Very low, this patch comes from upstream, and I have tested it with many nv graphic
cards, they all worked well as before.



Ben Skeggs (1):
  drm/nouveau/gr/gf100-: virtualise tpc_mask + apply fixes from traces

 drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.c |  6 ++++++
 drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.h | 12 +++++++-----
 drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgm200.c | 17 ++++++++++++++---
 drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgm20b.c |  2 +-
 drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgp100.c | 17 +++++++++--------
 drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgp102.c |  2 ++
 drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgp107.c |  2 ++
 drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.h    |  2 ++
 drivers/gpu/drm/nouveau/nvkm/engine/gr/gm200.c    |  1 +
 drivers/gpu/drm/nouveau/nvkm/engine/gr/gp100.c    |  2 ++
 drivers/gpu/drm/nouveau/nvkm/engine/gr/gp102.c    |  2 ++
 drivers/gpu/drm/nouveau/nvkm/engine/gr/gp107.c    |  2 ++
 drivers/gpu/drm/nouveau/nvkm/engine/gr/gp10b.c    |  2 ++
 13 files changed, 52 insertions(+), 17 deletions(-)

Comments

Stefan Bader Oct. 1, 2018, 2:16 p.m. UTC | #1
On 11.09.2018 07:24, Hui Wang wrote:
> BugLink: https://bugs.launchpad.net/bugs/1791569
> 
> This patch is in the 4.18 already, no need to send it to cosmic.
> 
> Due to the context conflict, if we want to apply this patch as it is, we
> need to apply a large amount of patches ahead of this patch, it is possible
> to introduce some regression. So I made some change in this patch, it is
> some differnt from the orignal patch, but they have the same logic, and it
> can be applied to bionic kernel.
> 
> [Impact]
> We have 2 nvidia graphic cards, and the nouveau driver in the bionic kernel can't
> work well with both of these 2 cards, one of the cards hang during the boot
> process, we compared the output of lspci and vbios version of these 2 cards, they
> are same; and according to nivida's reply, it is possible that they have some
> difference on computational units (https://devtalk.nvidia.com/default/topic/1038973/
> linux/2-same-quadro-p1000-cards-but-only-one-can-install-ubuntu-/), and kernel-4.18
> fixed this problem, through bisect, this patch was found.
> 
> [Fix]
> backport a upstream patch to fix this problem. without this patch, the number of tpc
> (texture process cluster) is hardcoded to be 5 for some nv graphic families, but
> in practice, the tpc number of many families is not 5. And the p1000 grphic card belong
> to gp107 family, it is 3 intead of 5.
> 
> 
> [Test Case]
> tested this patch with P1000, P2000, P620 and P500 graphic cards, all work well as before.
> 
> [Regression Potential]
> Very low, this patch comes from upstream, and I have tested it with many nv graphic
> cards, they all worked well as before.
> 
> 
> 
> Ben Skeggs (1):
>   drm/nouveau/gr/gf100-: virtualise tpc_mask + apply fixes from traces
> 
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.c |  6 ++++++
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.h | 12 +++++++-----
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgm200.c | 17 ++++++++++++++---
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgm20b.c |  2 +-
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgp100.c | 17 +++++++++--------
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgp102.c |  2 ++
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgp107.c |  2 ++
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.h    |  2 ++
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/gm200.c    |  1 +
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/gp100.c    |  2 ++
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/gp102.c    |  2 ++
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/gp107.c    |  2 ++
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/gp10b.c    |  2 ++
>  13 files changed, 52 insertions(+), 17 deletions(-)
> 
Applied to bionic/master-next. Thanks.

-Stefan