Message ID | 4FACB581.2050609@ozlabs.ru |
---|---|
State | New |
Headers | show |
On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: > Normally the pci_add_capability is called on devices to add new > capability. This is ok for emulated devices which capabilities list > is being built by QEMU. > > In the case of VFIO the capability may already exist and adding new > capability into the beginning of the linked list may create a loop. > > For example, the old code destroys the following config > of PCIe Intel E1000E: > > before adding PCI_CAP_ID_MSI (0x05): > 0x34: 0xC8 > 0xC8: 0x01 0xD0 > 0xD0: 0x05 0xE0 > 0xE0: 0x10 0x00 > > after: > 0x34: 0xD0 > 0xC8: 0x01 0xD0 > 0xD0: 0x05 0xC8 > 0xE0: 0x10 0x00 > > As result capabilities 0x01 and 0x05 point to each other. > > The proposed patch does not change capability pointers when > the same type capability is about to add. > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > --- > hw/pci.c | 10 ++++++---- > 1 files changed, 6 insertions(+), 4 deletions(-) > > diff --git a/hw/pci.c b/hw/pci.c > index aa0c0b8..1f7c924 100644 > --- a/hw/pci.c > +++ b/hw/pci.c > @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, > } > > config = pdev->config + offset; > - config[PCI_CAP_LIST_ID] = cap_id; > - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; > - pdev->config[PCI_CAPABILITY_LIST] = offset; > - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; > + if (config[PCI_CAP_LIST_ID] != cap_id) { This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either * replace the existing one or * drop out and not write the new one in. I'm not sure which way would be more natural. > + config[PCI_CAP_LIST_ID] = cap_id; > + config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; > + pdev->config[PCI_CAPABILITY_LIST] = offset; > + pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; > + } > memset(pdev->used + offset, 0xFF, size); > /* Make capability read-only by default */ > memset(pdev->wmask + offset, 0, size); > > > -- > Alexey
11.05.2012 20:52, Alexander Graf написал: > > On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: > >> Normally the pci_add_capability is called on devices to add new >> capability. This is ok for emulated devices which capabilities list >> is being built by QEMU. >> >> In the case of VFIO the capability may already exist and adding new >> capability into the beginning of the linked list may create a loop. >> >> For example, the old code destroys the following config >> of PCIe Intel E1000E: >> >> before adding PCI_CAP_ID_MSI (0x05): >> 0x34: 0xC8 >> 0xC8: 0x01 0xD0 >> 0xD0: 0x05 0xE0 >> 0xE0: 0x10 0x00 >> >> after: >> 0x34: 0xD0 >> 0xC8: 0x01 0xD0 >> 0xD0: 0x05 0xC8 >> 0xE0: 0x10 0x00 >> >> As result capabilities 0x01 and 0x05 point to each other. >> >> The proposed patch does not change capability pointers when >> the same type capability is about to add. >> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >> --- >> hw/pci.c | 10 ++++++---- >> 1 files changed, 6 insertions(+), 4 deletions(-) >> >> diff --git a/hw/pci.c b/hw/pci.c >> index aa0c0b8..1f7c924 100644 >> --- a/hw/pci.c >> +++ b/hw/pci.c >> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >> } >> >> config = pdev->config + offset; >> - config[PCI_CAP_LIST_ID] = cap_id; >> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >> - pdev->config[PCI_CAPABILITY_LIST] = offset; >> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >> + if (config[PCI_CAP_LIST_ID] != cap_id) { > > This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either > * replace the existing one or > * drop out and not write the new one in. > > I'm not sure which way would be more natural. There is a third option - add another function, lets call it pci_fixup_capability() which would do whatever pci_add_capability() does but won't touch list pointers. When vfio, pci_add_capability() is called from the code which knows exactly that the capability exists and where it is and it calls pci_add_capability() based on this knowledge so doing additional loops just for imaginery scalability is a bit weird, no? >> + config[PCI_CAP_LIST_ID] = cap_id; >> + config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >> + pdev->config[PCI_CAPABILITY_LIST] = offset; >> + pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >> + } >> memset(pdev->used + offset, 0xFF, size); >> /* Make capability read-only by default */ >> memset(pdev->wmask + offset, 0, size);
On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: > 11.05.2012 20:52, Alexander Graf написал: >> >> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: >> >>> Normally the pci_add_capability is called on devices to add new >>> capability. This is ok for emulated devices which capabilities list >>> is being built by QEMU. >>> >>> In the case of VFIO the capability may already exist and adding new >>> capability into the beginning of the linked list may create a loop. >>> >>> For example, the old code destroys the following config >>> of PCIe Intel E1000E: >>> >>> before adding PCI_CAP_ID_MSI (0x05): >>> 0x34: 0xC8 >>> 0xC8: 0x01 0xD0 >>> 0xD0: 0x05 0xE0 >>> 0xE0: 0x10 0x00 >>> >>> after: >>> 0x34: 0xD0 >>> 0xC8: 0x01 0xD0 >>> 0xD0: 0x05 0xC8 >>> 0xE0: 0x10 0x00 >>> >>> As result capabilities 0x01 and 0x05 point to each other. >>> >>> The proposed patch does not change capability pointers when >>> the same type capability is about to add. >>> >>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>> --- >>> hw/pci.c | 10 ++++++---- >>> 1 files changed, 6 insertions(+), 4 deletions(-) >>> >>> diff --git a/hw/pci.c b/hw/pci.c >>> index aa0c0b8..1f7c924 100644 >>> --- a/hw/pci.c >>> +++ b/hw/pci.c >>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>> } >>> >>> config = pdev->config + offset; >>> - config[PCI_CAP_LIST_ID] = cap_id; >>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >>> - pdev->config[PCI_CAPABILITY_LIST] = offset; >>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >>> + if (config[PCI_CAP_LIST_ID] != cap_id) { >> >> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either >> * replace the existing one or >> * drop out and not write the new one in. * hw_error :) >> >> I'm not sure which way would be more natural. > > There is a third option - add another function, lets call it > pci_fixup_capability() which would do whatever pci_add_capability() does > but won't touch list pointers. What good is a function that breaks internal consistency? > When vfio, pci_add_capability() is called from the code which knows > exactly that the capability exists and where it is and it calls > pci_add_capability() based on this knowledge so doing additional loops > just for imaginery scalability is a bit weird, no? Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability. Alex
On Fri, May 11, 2012 at 04:45:21PM +1000, Alexey Kardashevskiy wrote: > Normally the pci_add_capability is called on devices to add new > capability. This is ok for emulated devices which capabilities list > is being built by QEMU. > > In the case of VFIO the capability may already exist and adding new > capability into the beginning of the linked list may create a loop. Hi, I don't quite understand how we get a loop, if 'offset' is supplied to 'pci_add_capability' and there is an overlap we get -EINVAL. Otherwise, we are adding the capability in a new empty space. So, I see how we could get the capability in the list twice, but not how there is a loop. what am I missing? Thanks, -Jason > > For example, the old code destroys the following config > of PCIe Intel E1000E: > > before adding PCI_CAP_ID_MSI (0x05): > 0x34: 0xC8 > 0xC8: 0x01 0xD0 > 0xD0: 0x05 0xE0 > 0xE0: 0x10 0x00 > > after: > 0x34: 0xD0 > 0xC8: 0x01 0xD0 > 0xD0: 0x05 0xC8 > 0xE0: 0x10 0x00 > > As result capabilities 0x01 and 0x05 point to each other. > > The proposed patch does not change capability pointers when > the same type capability is about to add. > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > --- > hw/pci.c | 10 ++++++---- > 1 files changed, 6 insertions(+), 4 deletions(-) > > diff --git a/hw/pci.c b/hw/pci.c > index aa0c0b8..1f7c924 100644 > --- a/hw/pci.c > +++ b/hw/pci.c > @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, > } > > config = pdev->config + offset; > - config[PCI_CAP_LIST_ID] = cap_id; > - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; > - pdev->config[PCI_CAPABILITY_LIST] = offset; > - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; > + if (config[PCI_CAP_LIST_ID] != cap_id) { > + config[PCI_CAP_LIST_ID] = cap_id; > + config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; > + pdev->config[PCI_CAPABILITY_LIST] = offset; > + pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; > + } > memset(pdev->used + offset, 0xFF, size); > /* Make capability read-only by default */ > memset(pdev->wmask + offset, 0, size); > > > -- > Alexey >
12.05.2012 5:20, Jason Baron написал: > On Fri, May 11, 2012 at 04:45:21PM +1000, Alexey Kardashevskiy wrote: >> Normally the pci_add_capability is called on devices to add new >> capability. This is ok for emulated devices which capabilities list >> is being built by QEMU. >> >> In the case of VFIO the capability may already exist and adding new >> capability into the beginning of the linked list may create a loop. > > Hi, > > I don't quite understand how we get a loop, if 'offset' is supplied to > 'pci_add_capability' and there is an overlap we get -EINVAL. Otherwise, > we are adding the capability in a new empty space. So, I see how we > could get the capability in the list twice, but not how there is a loop. > what am I missing? This happens only with VFIO. The capability already exists in the config space as it is fetched from the host kernel _before_ msi_init is called. Furthermore, msi_init() is called when VFIO sees this capability in the config space. We probably want to re-add all capabilities, do not know... > Thanks, > > -Jason > >> >> For example, the old code destroys the following config >> of PCIe Intel E1000E: >> >> before adding PCI_CAP_ID_MSI (0x05): >> 0x34: 0xC8 >> 0xC8: 0x01 0xD0 >> 0xD0: 0x05 0xE0 >> 0xE0: 0x10 0x00 >> >> after: >> 0x34: 0xD0 >> 0xC8: 0x01 0xD0 >> 0xD0: 0x05 0xC8 >> 0xE0: 0x10 0x00 >> >> As result capabilities 0x01 and 0x05 point to each other. >> >> The proposed patch does not change capability pointers when >> the same type capability is about to add. >> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >> --- >> hw/pci.c | 10 ++++++---- >> 1 files changed, 6 insertions(+), 4 deletions(-) >> >> diff --git a/hw/pci.c b/hw/pci.c >> index aa0c0b8..1f7c924 100644 >> --- a/hw/pci.c >> +++ b/hw/pci.c >> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >> } >> >> config = pdev->config + offset; >> - config[PCI_CAP_LIST_ID] = cap_id; >> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >> - pdev->config[PCI_CAPABILITY_LIST] = offset; >> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >> + if (config[PCI_CAP_LIST_ID] != cap_id) { >> + config[PCI_CAP_LIST_ID] = cap_id; >> + config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >> + pdev->config[PCI_CAPABILITY_LIST] = offset; >> + pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >> + } >> memset(pdev->used + offset, 0xFF, size); >> /* Make capability read-only by default */ >> memset(pdev->wmask + offset, 0, size);
On Sat, 2012-05-12 at 10:27 +1000, Alexey Kardashevskiy wrote: > 12.05.2012 5:20, Jason Baron написал: > > On Fri, May 11, 2012 at 04:45:21PM +1000, Alexey Kardashevskiy wrote: > >> Normally the pci_add_capability is called on devices to add new > >> capability. This is ok for emulated devices which capabilities list > >> is being built by QEMU. > >> > >> In the case of VFIO the capability may already exist and adding new > >> capability into the beginning of the linked list may create a loop. > > > > Hi, > > > > I don't quite understand how we get a loop, if 'offset' is supplied to > > 'pci_add_capability' and there is an overlap we get -EINVAL. Otherwise, > > we are adding the capability in a new empty space. So, I see how we > > could get the capability in the list twice, but not how there is a loop. > > what am I missing? > > > This happens only with VFIO. > > The capability already exists in the config space as it is fetched from > the host kernel _before_ msi_init is called. Furthermore, msi_init() is > called when VFIO sees this capability in the config space. > > We probably want to re-add all capabilities, do not know... Yep, I've had a msi[1] and msix[2] patches in my vfio tree for a long time, we really want to support this generically for all capabilities though. We either need to detect or allow the caller to specify that the config space is already programmed. Note that even if we don't create a loop, particularly finicky drivers may balk at just changing the order of the capabilities list. Thanks, Alex [1]https://github.com/awilliam/qemu-vfio/commit/a9f04351610ab69e22d90a76dc85be3269000a9f [2]https://github.com/awilliam/qemu-vfio/commit/b4de3d0436b0260fbc6fcd40787c1c92ffca2980 > >> > >> For example, the old code destroys the following config > >> of PCIe Intel E1000E: > >> > >> before adding PCI_CAP_ID_MSI (0x05): > >> 0x34: 0xC8 > >> 0xC8: 0x01 0xD0 > >> 0xD0: 0x05 0xE0 > >> 0xE0: 0x10 0x00 > >> > >> after: > >> 0x34: 0xD0 > >> 0xC8: 0x01 0xD0 > >> 0xD0: 0x05 0xC8 > >> 0xE0: 0x10 0x00 > >> > >> As result capabilities 0x01 and 0x05 point to each other. > >> > >> The proposed patch does not change capability pointers when > >> the same type capability is about to add. > >> > >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > >> --- > >> hw/pci.c | 10 ++++++---- > >> 1 files changed, 6 insertions(+), 4 deletions(-) > >> > >> diff --git a/hw/pci.c b/hw/pci.c > >> index aa0c0b8..1f7c924 100644 > >> --- a/hw/pci.c > >> +++ b/hw/pci.c > >> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, > >> } > >> > >> config = pdev->config + offset; > >> - config[PCI_CAP_LIST_ID] = cap_id; > >> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; > >> - pdev->config[PCI_CAPABILITY_LIST] = offset; > >> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; > >> + if (config[PCI_CAP_LIST_ID] != cap_id) { > >> + config[PCI_CAP_LIST_ID] = cap_id; > >> + config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; > >> + pdev->config[PCI_CAPABILITY_LIST] = offset; > >> + pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; > >> + } > >> memset(pdev->used + offset, 0xFF, size); > >> /* Make capability read-only by default */ > >> memset(pdev->wmask + offset, 0, size); > > >
On 12/05/12 00:13, Alexander Graf wrote: > > On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: > >> 11.05.2012 20:52, Alexander Graf написал: >>> >>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: >>> >>>> Normally the pci_add_capability is called on devices to add new >>>> capability. This is ok for emulated devices which capabilities list >>>> is being built by QEMU. >>>> >>>> In the case of VFIO the capability may already exist and adding new >>>> capability into the beginning of the linked list may create a loop. >>>> >>>> For example, the old code destroys the following config >>>> of PCIe Intel E1000E: >>>> >>>> before adding PCI_CAP_ID_MSI (0x05): >>>> 0x34: 0xC8 >>>> 0xC8: 0x01 0xD0 >>>> 0xD0: 0x05 0xE0 >>>> 0xE0: 0x10 0x00 >>>> >>>> after: >>>> 0x34: 0xD0 >>>> 0xC8: 0x01 0xD0 >>>> 0xD0: 0x05 0xC8 >>>> 0xE0: 0x10 0x00 >>>> >>>> As result capabilities 0x01 and 0x05 point to each other. >>>> >>>> The proposed patch does not change capability pointers when >>>> the same type capability is about to add. >>>> >>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>> --- >>>> hw/pci.c | 10 ++++++---- >>>> 1 files changed, 6 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/hw/pci.c b/hw/pci.c >>>> index aa0c0b8..1f7c924 100644 >>>> --- a/hw/pci.c >>>> +++ b/hw/pci.c >>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>> } >>>> >>>> config = pdev->config + offset; >>>> - config[PCI_CAP_LIST_ID] = cap_id; >>>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >>>> - pdev->config[PCI_CAPABILITY_LIST] = offset; >>>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >>>> + if (config[PCI_CAP_LIST_ID] != cap_id) { >>> >>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either >>> * replace the existing one or >>> * drop out and not write the new one in. > > * hw_error :) > >>> >>> I'm not sure which way would be more natural. >> >> There is a third option - add another function, lets call it >> pci_fixup_capability() which would do whatever pci_add_capability() does >> but won't touch list pointers. > > What good is a function that breaks internal consistency? It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through the whole list and add a capability if it does not exist. Emulated devices which care about having a capability at some fixed offset would have initialized their config space before calling this capabilities API (as VFIO does). If we really want to support emulated devices which want some capabilities be at fixed offset and others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest driver may care about its offset. >> When vfio, pci_add_capability() is called from the code which knows >> exactly that the capability exists and where it is and it calls >> pci_add_capability() based on this knowledge so doing additional loops >> just for imaginery scalability is a bit weird, no? > > Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability. > > > Alex >
Alexander, Is that any better? :) @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) * in pci config space */ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t offset, uint8_t size) { - uint8_t *config; + uint8_t *config, existing; int i, overlapping_cap; + existing = pci_find_capability(pdev, cap_id); + if (existing) { + if (offset && (existing != offset)) { + return -EEXIST; + } + for (i = existing; i < size; ++i) { + if (pdev->used[i]) { + return -EFAULT; + } + } + memset(pdev->used + offset, 0xFF, size); + /* Make capability read-only by default */ + memset(pdev->wmask + offset, 0, size); + /* Check capability by default */ + memset(pdev->cmask + offset, 0xFF, size); + return existing; + } + if (!offset) { offset = pci_find_space(pdev, size); if (!offset) { return -ENOSPC; On 14/05/12 13:49, Alexey Kardashevskiy wrote: > On 12/05/12 00:13, Alexander Graf wrote: >> >> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: >> >>> 11.05.2012 20:52, Alexander Graf написал: >>>> >>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: >>>> >>>>> Normally the pci_add_capability is called on devices to add new >>>>> capability. This is ok for emulated devices which capabilities list >>>>> is being built by QEMU. >>>>> >>>>> In the case of VFIO the capability may already exist and adding new >>>>> capability into the beginning of the linked list may create a loop. >>>>> >>>>> For example, the old code destroys the following config >>>>> of PCIe Intel E1000E: >>>>> >>>>> before adding PCI_CAP_ID_MSI (0x05): >>>>> 0x34: 0xC8 >>>>> 0xC8: 0x01 0xD0 >>>>> 0xD0: 0x05 0xE0 >>>>> 0xE0: 0x10 0x00 >>>>> >>>>> after: >>>>> 0x34: 0xD0 >>>>> 0xC8: 0x01 0xD0 >>>>> 0xD0: 0x05 0xC8 >>>>> 0xE0: 0x10 0x00 >>>>> >>>>> As result capabilities 0x01 and 0x05 point to each other. >>>>> >>>>> The proposed patch does not change capability pointers when >>>>> the same type capability is about to add. >>>>> >>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>>> --- >>>>> hw/pci.c | 10 ++++++---- >>>>> 1 files changed, 6 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/hw/pci.c b/hw/pci.c >>>>> index aa0c0b8..1f7c924 100644 >>>>> --- a/hw/pci.c >>>>> +++ b/hw/pci.c >>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>>> } >>>>> >>>>> config = pdev->config + offset; >>>>> - config[PCI_CAP_LIST_ID] = cap_id; >>>>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >>>>> - pdev->config[PCI_CAPABILITY_LIST] = offset; >>>>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >>>>> + if (config[PCI_CAP_LIST_ID] != cap_id) { >>>> >>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either >>>> * replace the existing one or >>>> * drop out and not write the new one in. >> >> * hw_error :) >> >>>> >>>> I'm not sure which way would be more natural. >>> >>> There is a third option - add another function, lets call it >>> pci_fixup_capability() which would do whatever pci_add_capability() does >>> but won't touch list pointers. >> >> What good is a function that breaks internal consistency? > > > It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through > the whole list and add a capability if it does not exist. Emulated devices which care about having a > capability at some fixed offset would have initialized their config space before calling this > capabilities API (as VFIO does). > > If we really want to support emulated devices which want some capabilities be at fixed offset and > others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency > by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest > driver may care about its offset. > > > >>> When vfio, pci_add_capability() is called from the code which knows >>> exactly that the capability exists and where it is and it calls >>> pci_add_capability() based on this knowledge so doing additional loops >>> just for imaginery scalability is a bit weird, no? >> >> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability. >> >> >> Alex >> > >
On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: > Alexander, > > Is that any better? :) Alex (Graf that is), ping ? The original patch from Alexey was fine btw. VFIO will always call things with the existing capability offset so there's no real risk of doing the wrong thing or break the list or anything. IE. A small simple patch that addresses the problem :-) The new patch is a bit more "robust" I believe, I don't think we need to go too far to fix a problem we don't have. But we need a fix for the real issue and the simple patch does it neatly from what I can understand. Cheers, Ben. > > @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) > * in pci config space */ > int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, > uint8_t offset, uint8_t size) > { > - uint8_t *config; > + uint8_t *config, existing; > int i, overlapping_cap; > > + existing = pci_find_capability(pdev, cap_id); > + if (existing) { > + if (offset && (existing != offset)) { > + return -EEXIST; > + } > + for (i = existing; i < size; ++i) { > + if (pdev->used[i]) { > + return -EFAULT; > + } > + } > + memset(pdev->used + offset, 0xFF, size); > + /* Make capability read-only by default */ > + memset(pdev->wmask + offset, 0, size); > + /* Check capability by default */ > + memset(pdev->cmask + offset, 0xFF, size); > + return existing; > + } > + > if (!offset) { > offset = pci_find_space(pdev, size); > if (!offset) { > return -ENOSPC; > > > > > > > On 14/05/12 13:49, Alexey Kardashevskiy wrote: > > On 12/05/12 00:13, Alexander Graf wrote: > >> > >> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: > >> > >>> 11.05.2012 20:52, Alexander Graf написал: > >>>> > >>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: > >>>> > >>>>> Normally the pci_add_capability is called on devices to add new > >>>>> capability. This is ok for emulated devices which capabilities list > >>>>> is being built by QEMU. > >>>>> > >>>>> In the case of VFIO the capability may already exist and adding new > >>>>> capability into the beginning of the linked list may create a loop. > >>>>> > >>>>> For example, the old code destroys the following config > >>>>> of PCIe Intel E1000E: > >>>>> > >>>>> before adding PCI_CAP_ID_MSI (0x05): > >>>>> 0x34: 0xC8 > >>>>> 0xC8: 0x01 0xD0 > >>>>> 0xD0: 0x05 0xE0 > >>>>> 0xE0: 0x10 0x00 > >>>>> > >>>>> after: > >>>>> 0x34: 0xD0 > >>>>> 0xC8: 0x01 0xD0 > >>>>> 0xD0: 0x05 0xC8 > >>>>> 0xE0: 0x10 0x00 > >>>>> > >>>>> As result capabilities 0x01 and 0x05 point to each other. > >>>>> > >>>>> The proposed patch does not change capability pointers when > >>>>> the same type capability is about to add. > >>>>> > >>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > >>>>> --- > >>>>> hw/pci.c | 10 ++++++---- > >>>>> 1 files changed, 6 insertions(+), 4 deletions(-) > >>>>> > >>>>> diff --git a/hw/pci.c b/hw/pci.c > >>>>> index aa0c0b8..1f7c924 100644 > >>>>> --- a/hw/pci.c > >>>>> +++ b/hw/pci.c > >>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, > >>>>> } > >>>>> > >>>>> config = pdev->config + offset; > >>>>> - config[PCI_CAP_LIST_ID] = cap_id; > >>>>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; > >>>>> - pdev->config[PCI_CAPABILITY_LIST] = offset; > >>>>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; > >>>>> + if (config[PCI_CAP_LIST_ID] != cap_id) { > >>>> > >>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either > >>>> * replace the existing one or > >>>> * drop out and not write the new one in. > >> > >> * hw_error :) > >> > >>>> > >>>> I'm not sure which way would be more natural. > >>> > >>> There is a third option - add another function, lets call it > >>> pci_fixup_capability() which would do whatever pci_add_capability() does > >>> but won't touch list pointers. > >> > >> What good is a function that breaks internal consistency? > > > > > > It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through > > the whole list and add a capability if it does not exist. Emulated devices which care about having a > > capability at some fixed offset would have initialized their config space before calling this > > capabilities API (as VFIO does). > > > > If we really want to support emulated devices which want some capabilities be at fixed offset and > > others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency > > by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest > > driver may care about its offset. > > > > > > > >>> When vfio, pci_add_capability() is called from the code which knows > >>> exactly that the capability exists and where it is and it calls > >>> pci_add_capability() based on this knowledge so doing additional loops > >>> just for imaginery scalability is a bit weird, no? > >> > >> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability. > >> > >> > >> Alex > >> > > > > > >
On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: >> Alexander, >> >> Is that any better? :) > > Alex (Graf that is), ping ? > > The original patch from Alexey was fine btw. > > VFIO will always call things with the existing capability offset so > there's no real risk of doing the wrong thing or break the list or > anything. > > IE. A small simple patch that addresses the problem :-) > > The new patch is a bit more "robust" I believe, I don't think we need to > go too far to fix a problem we don't have. But we need a fix for the > real issue and the simple patch does it neatly from what I can > understand. > > Cheers, > Ben. > >> >> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) >> * in pci config space */ >> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >> uint8_t offset, uint8_t size) >> { >> - uint8_t *config; >> + uint8_t *config, existing; Existing is a pointer to the target dev's config space, right? >> int i, overlapping_cap; >> >> + existing = pci_find_capability(pdev, cap_id); >> + if (existing) { >> + if (offset && (existing != offset)) { >> + return -EEXIST; >> + } >> + for (i = existing; i < size; ++i) { So how does this possibly make sense? >> + if (pdev->used[i]) { >> + return -EFAULT; >> + } >> + } >> + memset(pdev->used + offset, 0xFF, size); Why? >> + /* Make capability read-only by default */ >> + memset(pdev->wmask + offset, 0, size); Why? >> + /* Check capability by default */ >> + memset(pdev->cmask + offset, 0xFF, size); I don't understand this part either. Alex >> + return existing; >> + } >> + >> if (!offset) { >> offset = pci_find_space(pdev, size); >> if (!offset) { >> return -ENOSPC; >> >> >> >> >> >> >> On 14/05/12 13:49, Alexey Kardashevskiy wrote: >>> On 12/05/12 00:13, Alexander Graf wrote: >>>> >>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: >>>> >>>>> 11.05.2012 20:52, Alexander Graf написал: >>>>>> >>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: >>>>>> >>>>>>> Normally the pci_add_capability is called on devices to add new >>>>>>> capability. This is ok for emulated devices which capabilities list >>>>>>> is being built by QEMU. >>>>>>> >>>>>>> In the case of VFIO the capability may already exist and adding new >>>>>>> capability into the beginning of the linked list may create a loop. >>>>>>> >>>>>>> For example, the old code destroys the following config >>>>>>> of PCIe Intel E1000E: >>>>>>> >>>>>>> before adding PCI_CAP_ID_MSI (0x05): >>>>>>> 0x34: 0xC8 >>>>>>> 0xC8: 0x01 0xD0 >>>>>>> 0xD0: 0x05 0xE0 >>>>>>> 0xE0: 0x10 0x00 >>>>>>> >>>>>>> after: >>>>>>> 0x34: 0xD0 >>>>>>> 0xC8: 0x01 0xD0 >>>>>>> 0xD0: 0x05 0xC8 >>>>>>> 0xE0: 0x10 0x00 >>>>>>> >>>>>>> As result capabilities 0x01 and 0x05 point to each other. >>>>>>> >>>>>>> The proposed patch does not change capability pointers when >>>>>>> the same type capability is about to add. >>>>>>> >>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>>>>> --- >>>>>>> hw/pci.c | 10 ++++++---- >>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-) >>>>>>> >>>>>>> diff --git a/hw/pci.c b/hw/pci.c >>>>>>> index aa0c0b8..1f7c924 100644 >>>>>>> --- a/hw/pci.c >>>>>>> +++ b/hw/pci.c >>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>>>>> } >>>>>>> >>>>>>> config = pdev->config + offset; >>>>>>> - config[PCI_CAP_LIST_ID] = cap_id; >>>>>>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >>>>>>> - pdev->config[PCI_CAPABILITY_LIST] = offset; >>>>>>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >>>>>>> + if (config[PCI_CAP_LIST_ID] != cap_id) { >>>>>> >>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either >>>>>> * replace the existing one or >>>>>> * drop out and not write the new one in. >>>> >>>> * hw_error :) >>>> >>>>>> >>>>>> I'm not sure which way would be more natural. >>>>> >>>>> There is a third option - add another function, lets call it >>>>> pci_fixup_capability() which would do whatever pci_add_capability() does >>>>> but won't touch list pointers. >>>> >>>> What good is a function that breaks internal consistency? >>> >>> >>> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through >>> the whole list and add a capability if it does not exist. Emulated devices which care about having a >>> capability at some fixed offset would have initialized their config space before calling this >>> capabilities API (as VFIO does). >>> >>> If we really want to support emulated devices which want some capabilities be at fixed offset and >>> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency >>> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest >>> driver may care about its offset. >>> >>> >>> >>>>> When vfio, pci_add_capability() is called from the code which knows >>>>> exactly that the capability exists and where it is and it calls >>>>> pci_add_capability() based on this knowledge so doing additional loops >>>>> just for imaginery scalability is a bit weird, no? >>>> >>>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability. >>>> >>>> >>>> Alex >>>> >>> >>> >> >> > >
On 22/05/12 13:21, Alexander Graf wrote: > > > On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > >> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: >>> Alexander, >>> >>> Is that any better? :) >> >> Alex (Graf that is), ping ? >> >> The original patch from Alexey was fine btw. >> >> VFIO will always call things with the existing capability offset so >> there's no real risk of doing the wrong thing or break the list or >> anything. >> >> IE. A small simple patch that addresses the problem :-) >> >> The new patch is a bit more "robust" I believe, I don't think we need to >> go too far to fix a problem we don't have. But we need a fix for the >> real issue and the simple patch does it neatly from what I can >> understand. >> >> Cheers, >> Ben. >> >>> >>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) >>> * in pci config space */ >>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>> uint8_t offset, uint8_t size) >>> { >>> - uint8_t *config; >>> + uint8_t *config, existing; > > Existing is a pointer to the target dev's config space, right? Yes. >>> int i, overlapping_cap; >>> >>> + existing = pci_find_capability(pdev, cap_id); >>> + if (existing) { >>> + if (offset && (existing != offset)) { >>> + return -EEXIST; >>> + } >>> + for (i = existing; i < size; ++i) { > > So how does this possibly make sense? Although I do not expect VFIO to add capabilities (does not make sense), I still want to double check that this space has not been tried to use by someone else. >>> + if (pdev->used[i]) { >>> + return -EFAULT; >>> + } >>> + } >>> + memset(pdev->used + offset, 0xFF, size); > Why? Because I am marking the space this capability takes as used. >>> + /* Make capability read-only by default */ >>> + memset(pdev->wmask + offset, 0, size); > Why? Because the pci_add_capability() does it for a new capability by default. >>> + /* Check capability by default */ >>> + memset(pdev->cmask + offset, 0xFF, size); > > I don't understand this part either. The pci_add_capability() does it for a new capability by default. > > Alex > >>> + return existing; >>> + } >>> + >>> if (!offset) { >>> offset = pci_find_space(pdev, size); >>> if (!offset) { >>> return -ENOSPC; >>> >>> >>> >>> >>> >>> >>> On 14/05/12 13:49, Alexey Kardashevskiy wrote: >>>> On 12/05/12 00:13, Alexander Graf wrote: >>>>> >>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: >>>>> >>>>>> 11.05.2012 20:52, Alexander Graf написал: >>>>>>> >>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: >>>>>>> >>>>>>>> Normally the pci_add_capability is called on devices to add new >>>>>>>> capability. This is ok for emulated devices which capabilities list >>>>>>>> is being built by QEMU. >>>>>>>> >>>>>>>> In the case of VFIO the capability may already exist and adding new >>>>>>>> capability into the beginning of the linked list may create a loop. >>>>>>>> >>>>>>>> For example, the old code destroys the following config >>>>>>>> of PCIe Intel E1000E: >>>>>>>> >>>>>>>> before adding PCI_CAP_ID_MSI (0x05): >>>>>>>> 0x34: 0xC8 >>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>> 0xD0: 0x05 0xE0 >>>>>>>> 0xE0: 0x10 0x00 >>>>>>>> >>>>>>>> after: >>>>>>>> 0x34: 0xD0 >>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>> 0xD0: 0x05 0xC8 >>>>>>>> 0xE0: 0x10 0x00 >>>>>>>> >>>>>>>> As result capabilities 0x01 and 0x05 point to each other. >>>>>>>> >>>>>>>> The proposed patch does not change capability pointers when >>>>>>>> the same type capability is about to add. >>>>>>>> >>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>>>>>> --- >>>>>>>> hw/pci.c | 10 ++++++---- >>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-) >>>>>>>> >>>>>>>> diff --git a/hw/pci.c b/hw/pci.c >>>>>>>> index aa0c0b8..1f7c924 100644 >>>>>>>> --- a/hw/pci.c >>>>>>>> +++ b/hw/pci.c >>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>>>>>> } >>>>>>>> >>>>>>>> config = pdev->config + offset; >>>>>>>> - config[PCI_CAP_LIST_ID] = cap_id; >>>>>>>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >>>>>>>> - pdev->config[PCI_CAPABILITY_LIST] = offset; >>>>>>>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >>>>>>>> + if (config[PCI_CAP_LIST_ID] != cap_id) { >>>>>>> >>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either >>>>>>> * replace the existing one or >>>>>>> * drop out and not write the new one in. >>>>> >>>>> * hw_error :) >>>>> >>>>>>> >>>>>>> I'm not sure which way would be more natural. >>>>>> >>>>>> There is a third option - add another function, lets call it >>>>>> pci_fixup_capability() which would do whatever pci_add_capability() does >>>>>> but won't touch list pointers. >>>>> >>>>> What good is a function that breaks internal consistency? >>>> >>>> >>>> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through >>>> the whole list and add a capability if it does not exist. Emulated devices which care about having a >>>> capability at some fixed offset would have initialized their config space before calling this >>>> capabilities API (as VFIO does). >>>> >>>> If we really want to support emulated devices which want some capabilities be at fixed offset and >>>> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency >>>> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest >>>> driver may care about its offset. >>>> >>>> >>>> >>>>>> When vfio, pci_add_capability() is called from the code which knows >>>>>> exactly that the capability exists and where it is and it calls >>>>>> pci_add_capability() based on this knowledge so doing additional loops >>>>>> just for imaginery scalability is a bit weird, no? >>>>> >>>>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability. >>>>> >>>>> >>>>> Alex >>>>> >>>> >>>> >>> >>> >> >>
On 22.05.2012, at 05:44, Alexey Kardashevskiy <aik@ozlabs.ru> wrote: > On 22/05/12 13:21, Alexander Graf wrote: >> >> >> On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: >> >>> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: >>>> Alexander, >>>> >>>> Is that any better? :) >>> >>> Alex (Graf that is), ping ? >>> >>> The original patch from Alexey was fine btw. >>> >>> VFIO will always call things with the existing capability offset so >>> there's no real risk of doing the wrong thing or break the list or >>> anything. >>> >>> IE. A small simple patch that addresses the problem :-) >>> >>> The new patch is a bit more "robust" I believe, I don't think we need to >>> go too far to fix a problem we don't have. But we need a fix for the >>> real issue and the simple patch does it neatly from what I can >>> understand. >>> >>> Cheers, >>> Ben. >>> >>>> >>>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) >>>> * in pci config space */ >>>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>> uint8_t offset, uint8_t size) >>>> { >>>> - uint8_t *config; >>>> + uint8_t *config, existing; >> >> Existing is a pointer to the target dev's config space, right? > > Yes. > >>>> int i, overlapping_cap; >>>> >>>> + existing = pci_find_capability(pdev, cap_id); >>>> + if (existing) { >>>> + if (offset && (existing != offset)) { >>>> + return -EEXIST; >>>> + } >>>> + for (i = existing; i < size; ++i) { >> >> So how does this possibly make sense? > > Although I do not expect VFIO to add capabilities (does not make sense), I still want to double > check that this space has not been tried to use by someone else. i is an int. existing is a uint8_t*. > >>>> + if (pdev->used[i]) { >>>> + return -EFAULT; >>>> + } >>>> + } >>>> + memset(pdev->used + offset, 0xFF, size); >> Why? > > Because I am marking the space this capability takes as used. But if it already existed (at the same offset), it should be set used already, no? Unless size > existing size, in which case you might overwrite data in the next chunk, no? > >>>> + /* Make capability read-only by default */ >>>> + memset(pdev->wmask + offset, 0, size); >> Why? > > Because the pci_add_capability() does it for a new capability by default. Hrm. So you're copying code? Can't you merge the overwrite and write cases? Alex > > >>>> + /* Check capability by default */ >>>> + memset(pdev->cmask + offset, 0xFF, size); >> >> I don't understand this part either. > > The pci_add_capability() does it for a new capability by default. > > > >> >> Alex >> >>>> + return existing; >>>> + } >>>> + >>>> if (!offset) { >>>> offset = pci_find_space(pdev, size); >>>> if (!offset) { >>>> return -ENOSPC; >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 14/05/12 13:49, Alexey Kardashevskiy wrote: >>>>> On 12/05/12 00:13, Alexander Graf wrote: >>>>>> >>>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: >>>>>> >>>>>>> 11.05.2012 20:52, Alexander Graf написал: >>>>>>>> >>>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: >>>>>>>> >>>>>>>>> Normally the pci_add_capability is called on devices to add new >>>>>>>>> capability. This is ok for emulated devices which capabilities list >>>>>>>>> is being built by QEMU. >>>>>>>>> >>>>>>>>> In the case of VFIO the capability may already exist and adding new >>>>>>>>> capability into the beginning of the linked list may create a loop. >>>>>>>>> >>>>>>>>> For example, the old code destroys the following config >>>>>>>>> of PCIe Intel E1000E: >>>>>>>>> >>>>>>>>> before adding PCI_CAP_ID_MSI (0x05): >>>>>>>>> 0x34: 0xC8 >>>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>>> 0xD0: 0x05 0xE0 >>>>>>>>> 0xE0: 0x10 0x00 >>>>>>>>> >>>>>>>>> after: >>>>>>>>> 0x34: 0xD0 >>>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>>> 0xD0: 0x05 0xC8 >>>>>>>>> 0xE0: 0x10 0x00 >>>>>>>>> >>>>>>>>> As result capabilities 0x01 and 0x05 point to each other. >>>>>>>>> >>>>>>>>> The proposed patch does not change capability pointers when >>>>>>>>> the same type capability is about to add. >>>>>>>>> >>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>>>>>>> --- >>>>>>>>> hw/pci.c | 10 ++++++---- >>>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/hw/pci.c b/hw/pci.c >>>>>>>>> index aa0c0b8..1f7c924 100644 >>>>>>>>> --- a/hw/pci.c >>>>>>>>> +++ b/hw/pci.c >>>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>>>>>>> } >>>>>>>>> >>>>>>>>> config = pdev->config + offset; >>>>>>>>> - config[PCI_CAP_LIST_ID] = cap_id; >>>>>>>>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >>>>>>>>> - pdev->config[PCI_CAPABILITY_LIST] = offset; >>>>>>>>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >>>>>>>>> + if (config[PCI_CAP_LIST_ID] != cap_id) { >>>>>>>> >>>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either >>>>>>>> * replace the existing one or >>>>>>>> * drop out and not write the new one in. >>>>>> >>>>>> * hw_error :) >>>>>> >>>>>>>> >>>>>>>> I'm not sure which way would be more natural. >>>>>>> >>>>>>> There is a third option - add another function, lets call it >>>>>>> pci_fixup_capability() which would do whatever pci_add_capability() does >>>>>>> but won't touch list pointers. >>>>>> >>>>>> What good is a function that breaks internal consistency? >>>>> >>>>> >>>>> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through >>>>> the whole list and add a capability if it does not exist. Emulated devices which care about having a >>>>> capability at some fixed offset would have initialized their config space before calling this >>>>> capabilities API (as VFIO does). >>>>> >>>>> If we really want to support emulated devices which want some capabilities be at fixed offset and >>>>> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency >>>>> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest >>>>> driver may care about its offset. >>>>> >>>>> >>>>> >>>>>>> When vfio, pci_add_capability() is called from the code which knows >>>>>>> exactly that the capability exists and where it is and it calls >>>>>>> pci_add_capability() based on this knowledge so doing additional loops >>>>>>> just for imaginery scalability is a bit weird, no? >>>>>> >>>>>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability. >>>>>> >>>>>> >>>>>> Alex >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> > > > -- > Alexey
diff --git a/hw/pci.c b/hw/pci.c index aa0c0b8..1f7c924 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, } config = pdev->config + offset; - config[PCI_CAP_LIST_ID] = cap_id; - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; - pdev->config[PCI_CAPABILITY_LIST] = offset; - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; + if (config[PCI_CAP_LIST_ID] != cap_id) { + config[PCI_CAP_LIST_ID] = cap_id; + config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; + pdev->config[PCI_CAPABILITY_LIST] = offset; + pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; + } memset(pdev->used + offset, 0xFF, size); /* Make capability read-only by default */ memset(pdev->wmask + offset, 0, size);
Normally the pci_add_capability is called on devices to add new capability. This is ok for emulated devices which capabilities list is being built by QEMU. In the case of VFIO the capability may already exist and adding new capability into the beginning of the linked list may create a loop. For example, the old code destroys the following config of PCIe Intel E1000E: before adding PCI_CAP_ID_MSI (0x05): 0x34: 0xC8 0xC8: 0x01 0xD0 0xD0: 0x05 0xE0 0xE0: 0x10 0x00 after: 0x34: 0xD0 0xC8: 0x01 0xD0 0xD0: 0x05 0xC8 0xE0: 0x10 0x00 As result capabilities 0x01 and 0x05 point to each other. The proposed patch does not change capability pointers when the same type capability is about to add. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- hw/pci.c | 10 ++++++---- 1 files changed, 6 insertions(+), 4 deletions(-)