npu2: Reset NVLinks on hot reset

Message ID 20180613062211.42581-1-aik@ozlabs.ru
State Accepted
Headers show
Series
  • npu2: Reset NVLinks on hot reset
Related show

Commit Message

Alexey Kardashevskiy June 13, 2018, 6:22 a.m.
This effectively fences GPU RAM on GPU reset so the host system
does not have to crash every time we stop a KVM guest with a GPU
passed through.

Suggested-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/npu2.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Balbir singh June 15, 2018, 4:05 a.m. | #1
On Wed, Jun 13, 2018 at 4:22 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> This effectively fences GPU RAM on GPU reset so the host system
> does not have to crash every time we stop a KVM guest with a GPU
> passed through.
>
> Suggested-by: Balbir Singh <bsingharora@gmail.com>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  hw/npu2.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/hw/npu2.c b/hw/npu2.c
> index 238fff4..3ed089f 100644
> --- a/hw/npu2.c
> +++ b/hw/npu2.c
> @@ -1092,6 +1092,20 @@ static int64_t npu2_get_power_state(struct pci_slot *slot __unused, uint8_t *val
>
>  static int64_t npu2_hreset(struct pci_slot *slot __unused)
>  {
> +       struct npu2 *p;
> +       int i;
> +       struct npu2_dev *ndev;
> +
> +       p = phb_to_npu2_nvlink(slot->phb);
> +       NPU2INF(p, "Hreset PHB state\n");
> +
> +       for (i = 0; i < p->total_devices; i++) {
> +               ndev = &p->devices[i];
> +               if (ndev) {
> +                       NPU2DEVINF(ndev, "Resetting device\n");
> +                       reset_ntl(ndev);
> +               }
> +       }
>         return OPAL_SUCCESS;
>  }

We may have some common code across hreset and sreset which can be fixedup later

Acked-by: Balbir Singh <bsingharora@gmail.com>
Stewart Smith June 19, 2018, 5:42 a.m. | #2
Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> This effectively fences GPU RAM on GPU reset so the host system
> does not have to crash every time we stop a KVM guest with a GPU
> passed through.
>
> Suggested-by: Balbir Singh <bsingharora@gmail.com>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  hw/npu2.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)

Thanks, merged to master as of fca2b2b839a673a1e52fc6b19ee6d33b2dfbc003

Patch

diff --git a/hw/npu2.c b/hw/npu2.c
index 238fff4..3ed089f 100644
--- a/hw/npu2.c
+++ b/hw/npu2.c
@@ -1092,6 +1092,20 @@  static int64_t npu2_get_power_state(struct pci_slot *slot __unused, uint8_t *val
 
 static int64_t npu2_hreset(struct pci_slot *slot __unused)
 {
+	struct npu2 *p;
+	int i;
+	struct npu2_dev *ndev;
+
+	p = phb_to_npu2_nvlink(slot->phb);
+	NPU2INF(p, "Hreset PHB state\n");
+
+	for (i = 0; i < p->total_devices; i++) {
+		ndev = &p->devices[i];
+		if (ndev) {
+			NPU2DEVINF(ndev, "Resetting device\n");
+			reset_ntl(ndev);
+		}
+	}
 	return OPAL_SUCCESS;
 }