diff mbox series

npu2: Fix NPU<->GPU binding error message

Message ID 20180419044119.1970-1-oohall@gmail.com
State Rejected
Headers show
Series npu2: Fix NPU<->GPU binding error message | expand

Commit Message

Oliver O'Halloran April 19, 2018, 4:41 a.m. UTC
NVLinks with the same target appear as a multi-function PCI device
rather than individual devices. When printing out the error message we
assume that the link index is the same as the device number, which is no
longer true. Fix this by using the BDFN and setting the appropriate
fields. e.g.

Old and broken:

No PCI device for NPU2 device 0006:00:03.0 to bind to.
No PCI device for NPU2 device 0006:00:04.0 to bind to.
No PCI device for NPU2 device 0006:00:05.0 to bind to.

New and fixed:

No PCI device for NPU2 device 0006:00:01.00 to bind to.
No PCI device for NPU2 device 0006:00:01.01 to bind to.
No PCI device for NPU2 device 0006:00:01.02 to bind to.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 hw/npu2.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Reza Arbab April 19, 2018, 9:15 p.m. UTC | #1
On Thu, Apr 19, 2018 at 02:41:19PM +1000, Oliver O'Halloran wrote:
>--- a/hw/npu2.c
>+++ b/hw/npu2.c
>@@ -434,8 +434,9 @@ static void npu2_dev_bind_pci_dev(struct npu2_dev *dev)
> 		}
> 	}
>
>-	prlog(PR_INFO, "%s: No PCI device for NPU2 device %04x:00:%02x.0 to bind to. If you expect a GPU to be there, this is a problem.\n",
>-	      __func__, dev->npu->phb_nvlink.opal_id, dev->index);
>+	prlog(PR_INFO, "%s: No PCI device for NPU2 device %04x:00:%02x.%x to bind to. If you expect a GPU to be there, this is a problem.\n",
>+	      __func__, dev->npu->phb_nvlink.opal_id, (dev->bdfn >> 3) & 0x1f,
>+	      dev->bdfn & 0x3);
> }
>
> static struct lock pci_npu_phandle_lock = LOCK_UNLOCKED;

Oddly enough, I just sent a patch about the same line:
http://patchwork.ozlabs.org/patch/901343/

It uses NPU2DEVINF() to print the location, including slot label.
Oliver O'Halloran April 20, 2018, 12:24 a.m. UTC | #2
On Fri, Apr 20, 2018 at 7:15 AM, Reza Arbab <arbab@linux.ibm.com> wrote:
> On Thu, Apr 19, 2018 at 02:41:19PM +1000, Oliver O'Halloran wrote:
>>
>> --- a/hw/npu2.c
>> +++ b/hw/npu2.c
>> @@ -434,8 +434,9 @@ static void npu2_dev_bind_pci_dev(struct npu2_dev
>> *dev)
>>                 }
>>         }
>>
>> -       prlog(PR_INFO, "%s: No PCI device for NPU2 device %04x:00:%02x.0
>> to bind to. If you expect a GPU to be there, this is a problem.\n",
>> -             __func__, dev->npu->phb_nvlink.opal_id, dev->index);
>> +       prlog(PR_INFO, "%s: No PCI device for NPU2 device %04x:00:%02x.%x
>> to bind to. If you expect a GPU to be there, this is a problem.\n",
>> +             __func__, dev->npu->phb_nvlink.opal_id, (dev->bdfn >> 3) &
>> 0x1f,
>> +             dev->bdfn & 0x3);
>> }
>>
>> static struct lock pci_npu_phandle_lock = LOCK_UNLOCKED;
>
>
> Oddly enough, I just sent a patch about the same line:
> http://patchwork.ozlabs.org/patch/901343/
>
> It uses NPU2DEVINF() to print the location, including slot label.

Oh cool, that's probably a better idea than open coding the BDFN stuff here.


> --
> Reza Arbab
>
diff mbox series

Patch

diff --git a/hw/npu2.c b/hw/npu2.c
index 06e06d4ff2aa..99bc67b30af9 100644
--- a/hw/npu2.c
+++ b/hw/npu2.c
@@ -434,8 +434,9 @@  static void npu2_dev_bind_pci_dev(struct npu2_dev *dev)
 		}
 	}
 
-	prlog(PR_INFO, "%s: No PCI device for NPU2 device %04x:00:%02x.0 to bind to. If you expect a GPU to be there, this is a problem.\n",
-	      __func__, dev->npu->phb_nvlink.opal_id, dev->index);
+	prlog(PR_INFO, "%s: No PCI device for NPU2 device %04x:00:%02x.%x to bind to. If you expect a GPU to be there, this is a problem.\n",
+	      __func__, dev->npu->phb_nvlink.opal_id, (dev->bdfn >> 3) & 0x1f,
+	      dev->bdfn & 0x3);
 }
 
 static struct lock pci_npu_phandle_lock = LOCK_UNLOCKED;