diff mbox

[v3,0/5] intel-iommu: introduce Intel IOMMU (VT-d) emulation to q35 chipset

Message ID 1408175661.11008.113.camel@ori.omang.mine.nu
State New
Headers show

Commit Message

Knut Omang Aug. 16, 2014, 7:54 a.m. UTC
On Fri, 2014-08-15 at 19:37 +0800, Le Tan wrote:
> Hi Knut,
> 
> 2014-08-15 19:15 GMT+08:00 Knut Omang <knut.omang@oracle.com>:
> > On Fri, 2014-08-15 at 06:42 +0200, Knut Omang wrote:
> >> On Thu, 2014-08-14 at 14:10 +0200, Jan Kiszka wrote:
> >> > On 2014-08-14 13:15, Michael S. Tsirkin wrote:
> >> > > On Mon, Aug 11, 2014 at 03:04:57PM +0800, Le Tan wrote:
> >> > >> Hi,
> >> > >>
> >> > >> These patches are intended to introduce Intel IOMMU (VT-d) emulation to q35
> >> > >> chipset. The major job in these patches is to add support for emulating Intel
> >> > >> IOMMU according to the VT-d specification, including basic responses to CSRs
> >> > >> accesses, the logics of DMAR (DMA remapping) and DMA memory address
> >> > >> translations.
> >> > >
> >> > > Thanks!
> >> > > Looks very good overall, I noted some coding style issues - I didn't
> >> > > bother reporting each issue in every place where it appears - reported
> >> > > each issue once only, so please find and fix all instances of each
> >> > > issue.
> >> >
> >> > BTW, because I was in urgent need for virtual test environment for
> >> > Jailhouse, I hacked interrupt remapping on top of Le's patches:
> >> >
> >> > http://git.kiszka.org/?p=qemu.git;a=shortlog;h=refs/heads/queues/vtd-intremap
> >> >
> >> > The approach likely needs further discussions and refinements but it
> >> > already allows me to work on top with our hypervisor, and also Linux.
> >> > You can see from the last commit that Le's work made it pretty easy to
> >> > build this on top.
> >>
> >> Le,
> >>
> >> I have tried Jan's branch with my device setup which consists of a
> >> minimal q35 setup, an ioh3420 root port (specified as -device
> >> ioh3420,slot=0 ) and a pcie device plugged into that root port, which
> >> gives the following lscpi -t:
> >>
> >> -[0000:00]-+-00.0
> >>            +-01.0
> >>            +-02.0
> >>            +-03.0-[01]----00.0
> >>            +-04.0
> >>            +-1f.0
> >>            +-1f.2
> >>            \-1f.3
> >>
> >> All seems to work beautifully (I see the ISA bridge happily receive
> >> translations) until the first DMA from my device model (at 1:00.0)
> >> arrives, at which point I get:
> >>
> >> [ 1663.732413] dmar: DMAR:[DMA Write] Request device [00:03.0] fault addr fffa0000
> >> [ 1663.732413] DMAR:[fault reason 02] Present bit in context entry is clear
> >>
> >> I would have expected request device 01:00.0 for this.
> >> It is not clear to me yet if this is a weakness of the implementation of
> >> ioh3420 or the iommu. Just wanted to let you know right away in case you
> >> can shed some light to it or it is an easy fix,
> >>
> >> The device uses pci_dma_rw with itself as device pointer.
> >
> > To verify my hypothesis: with this rude hack my device now works much
> > better:
> >
> > @@ -774,6 +780,8 @@ static void iommu_translate(VTDAddressSpace *vtd_as,
> > int bus_num, int devfn,
> >          is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
> >      } else {
> >          ret_fr = dev_to_context_entry(s, bus_num, devfn, &ce);
> > +        if (ret_fr)
> > +            ret_fr = dev_to_context_entry(s, 1, 0, &ce);
> >          is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
> >          if (ret_fr) {
> >              ret_fr = -ret_fr;
> >
> > Looking at how things look on hardware, multiple devices often receive
> > overlapping DMA address ranges for different physical addresses.
> >
> > So if I understand the way this works, every requester ID would also
> > need to have it's own unique VTDAddressSpace, as each pci
> > device/function sees a unique DMA address space..
> 
> ioh3420 is a pcie-to-pcie bridge, right? 

Yes.

> In my opinion, each pci-e
> device behind the pcie-to-pcie bridge can be assigned individually.
> For now I added the VT-d to q35 by just adding it to the root pci bus.
> You can see here in q35.c:
> pci_setup_iommu(pci_bus, q35_host_dma_iommu, mch->iommu);
> So if we add a pcie-to-pcie bridge, we may have to call the
> pci_setup_iommu() for that new bus. I don't know where to hook into
> this now. :) If you know the mechanism behind that, you can try to add
> that for the new bus. (I will dive into this after the clean up.)
> What do you think?

Thanks for the quick answer, that helped a lot!

Looking into the details here I realize it is slightly more complicated:
secondary buses are enumerated after device instantiation, as part of
the host PCI enumeration, so if I add a similar setup call in the bridge
setup, it will be called for a new device long before it has received
it's bus number from the OS (via config[PCI_SECONDARY_BUS] )

I agree that the lookup function for contexts needs to be as efficient
as possible so the simple <busno,defvn> lookup key may be the best
solution but then the address_spaces table cannot be populated with the
secondary bus entries before it receives a nonzero != 255 bus number,
eg. along the lines of this: 


but it is getting complicated...
Thoughts?

Thanks,

Knut

Comments

Jan Kiszka Aug. 16, 2014, 8:45 a.m. UTC | #1
On 2014-08-16 09:54, Knut Omang wrote:
> On Fri, 2014-08-15 at 19:37 +0800, Le Tan wrote:
>> Hi Knut,
>>
>> 2014-08-15 19:15 GMT+08:00 Knut Omang <knut.omang@oracle.com>:
>>> On Fri, 2014-08-15 at 06:42 +0200, Knut Omang wrote:
>>>> On Thu, 2014-08-14 at 14:10 +0200, Jan Kiszka wrote:
>>>>> On 2014-08-14 13:15, Michael S. Tsirkin wrote:
>>>>>> On Mon, Aug 11, 2014 at 03:04:57PM +0800, Le Tan wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> These patches are intended to introduce Intel IOMMU (VT-d) emulation to q35
>>>>>>> chipset. The major job in these patches is to add support for emulating Intel
>>>>>>> IOMMU according to the VT-d specification, including basic responses to CSRs
>>>>>>> accesses, the logics of DMAR (DMA remapping) and DMA memory address
>>>>>>> translations.
>>>>>>
>>>>>> Thanks!
>>>>>> Looks very good overall, I noted some coding style issues - I didn't
>>>>>> bother reporting each issue in every place where it appears - reported
>>>>>> each issue once only, so please find and fix all instances of each
>>>>>> issue.
>>>>>
>>>>> BTW, because I was in urgent need for virtual test environment for
>>>>> Jailhouse, I hacked interrupt remapping on top of Le's patches:
>>>>>
>>>>> http://git.kiszka.org/?p=qemu.git;a=shortlog;h=refs/heads/queues/vtd-intremap
>>>>>
>>>>> The approach likely needs further discussions and refinements but it
>>>>> already allows me to work on top with our hypervisor, and also Linux.
>>>>> You can see from the last commit that Le's work made it pretty easy to
>>>>> build this on top.
>>>>
>>>> Le,
>>>>
>>>> I have tried Jan's branch with my device setup which consists of a
>>>> minimal q35 setup, an ioh3420 root port (specified as -device
>>>> ioh3420,slot=0 ) and a pcie device plugged into that root port, which
>>>> gives the following lscpi -t:
>>>>
>>>> -[0000:00]-+-00.0
>>>>            +-01.0
>>>>            +-02.0
>>>>            +-03.0-[01]----00.0
>>>>            +-04.0
>>>>            +-1f.0
>>>>            +-1f.2
>>>>            \-1f.3
>>>>
>>>> All seems to work beautifully (I see the ISA bridge happily receive
>>>> translations) until the first DMA from my device model (at 1:00.0)
>>>> arrives, at which point I get:
>>>>
>>>> [ 1663.732413] dmar: DMAR:[DMA Write] Request device [00:03.0] fault addr fffa0000
>>>> [ 1663.732413] DMAR:[fault reason 02] Present bit in context entry is clear
>>>>
>>>> I would have expected request device 01:00.0 for this.
>>>> It is not clear to me yet if this is a weakness of the implementation of
>>>> ioh3420 or the iommu. Just wanted to let you know right away in case you
>>>> can shed some light to it or it is an easy fix,
>>>>
>>>> The device uses pci_dma_rw with itself as device pointer.
>>>
>>> To verify my hypothesis: with this rude hack my device now works much
>>> better:
>>>
>>> @@ -774,6 +780,8 @@ static void iommu_translate(VTDAddressSpace *vtd_as,
>>> int bus_num, int devfn,
>>>          is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
>>>      } else {
>>>          ret_fr = dev_to_context_entry(s, bus_num, devfn, &ce);
>>> +        if (ret_fr)
>>> +            ret_fr = dev_to_context_entry(s, 1, 0, &ce);
>>>          is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
>>>          if (ret_fr) {
>>>              ret_fr = -ret_fr;
>>>
>>> Looking at how things look on hardware, multiple devices often receive
>>> overlapping DMA address ranges for different physical addresses.
>>>
>>> So if I understand the way this works, every requester ID would also
>>> need to have it's own unique VTDAddressSpace, as each pci
>>> device/function sees a unique DMA address space..
>>
>> ioh3420 is a pcie-to-pcie bridge, right? 
> 
> Yes.
> 
>> In my opinion, each pci-e
>> device behind the pcie-to-pcie bridge can be assigned individually.
>> For now I added the VT-d to q35 by just adding it to the root pci bus.
>> You can see here in q35.c:
>> pci_setup_iommu(pci_bus, q35_host_dma_iommu, mch->iommu);
>> So if we add a pcie-to-pcie bridge, we may have to call the
>> pci_setup_iommu() for that new bus. I don't know where to hook into
>> this now. :) If you know the mechanism behind that, you can try to add
>> that for the new bus. (I will dive into this after the clean up.)
>> What do you think?
> 
> Thanks for the quick answer, that helped a lot!
> 
> Looking into the details here I realize it is slightly more complicated:
> secondary buses are enumerated after device instantiation, as part of
> the host PCI enumeration, so if I add a similar setup call in the bridge
> setup, it will be called for a new device long before it has received
> it's bus number from the OS (via config[PCI_SECONDARY_BUS] )
> 
> I agree that the lookup function for contexts needs to be as efficient
> as possible so the simple <busno,defvn> lookup key may be the best
> solution but then the address_spaces table cannot be populated with the
> secondary bus entries before it receives a nonzero != 255 bus number,
> eg. along the lines of this: 
> 
> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
> index 4becdc1..d9a8c23 100644
> --- a/hw/pci/pci_bridge.c
> +++ b/hw/pci/pci_bridge.c
> @@ -265,6 +265,12 @@ void pci_bridge_write_config(PCIDevice *d,
>          pci_bridge_update_mappings(s);
>      }
>  
> +    if (ranges_overlap(address, len, PCI_SECONDARY_BUS, 1)) {
> +        int bus_num = pci_bus_num(&s->sec_bus);
> +        if (bus_num != 0xff && bus_num != 0x00)
> +            <handle bus number change>
> +    }
> +
>      newctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
>      if (~oldctl & newctl & PCI_BRIDGE_CTL_BUS_RESET) {
>          /* Trigger hot reset on 0->1 transition. */
> 
> but it is getting complicated...
> Thoughts?

Point to the PCI bus from VTDAddressSpace instead of storing the bus_num
there?

Jan
Jan Kiszka Aug. 16, 2014, 8:47 a.m. UTC | #2
On 2014-08-16 10:45, Jan Kiszka wrote:
> On 2014-08-16 09:54, Knut Omang wrote:
>> On Fri, 2014-08-15 at 19:37 +0800, Le Tan wrote:
>>> Hi Knut,
>>>
>>> 2014-08-15 19:15 GMT+08:00 Knut Omang <knut.omang@oracle.com>:
>>>> On Fri, 2014-08-15 at 06:42 +0200, Knut Omang wrote:
>>>>> On Thu, 2014-08-14 at 14:10 +0200, Jan Kiszka wrote:
>>>>>> On 2014-08-14 13:15, Michael S. Tsirkin wrote:
>>>>>>> On Mon, Aug 11, 2014 at 03:04:57PM +0800, Le Tan wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> These patches are intended to introduce Intel IOMMU (VT-d) emulation to q35
>>>>>>>> chipset. The major job in these patches is to add support for emulating Intel
>>>>>>>> IOMMU according to the VT-d specification, including basic responses to CSRs
>>>>>>>> accesses, the logics of DMAR (DMA remapping) and DMA memory address
>>>>>>>> translations.
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Looks very good overall, I noted some coding style issues - I didn't
>>>>>>> bother reporting each issue in every place where it appears - reported
>>>>>>> each issue once only, so please find and fix all instances of each
>>>>>>> issue.
>>>>>>
>>>>>> BTW, because I was in urgent need for virtual test environment for
>>>>>> Jailhouse, I hacked interrupt remapping on top of Le's patches:
>>>>>>
>>>>>> http://git.kiszka.org/?p=qemu.git;a=shortlog;h=refs/heads/queues/vtd-intremap
>>>>>>
>>>>>> The approach likely needs further discussions and refinements but it
>>>>>> already allows me to work on top with our hypervisor, and also Linux.
>>>>>> You can see from the last commit that Le's work made it pretty easy to
>>>>>> build this on top.
>>>>>
>>>>> Le,
>>>>>
>>>>> I have tried Jan's branch with my device setup which consists of a
>>>>> minimal q35 setup, an ioh3420 root port (specified as -device
>>>>> ioh3420,slot=0 ) and a pcie device plugged into that root port, which
>>>>> gives the following lscpi -t:
>>>>>
>>>>> -[0000:00]-+-00.0
>>>>>            +-01.0
>>>>>            +-02.0
>>>>>            +-03.0-[01]----00.0
>>>>>            +-04.0
>>>>>            +-1f.0
>>>>>            +-1f.2
>>>>>            \-1f.3
>>>>>
>>>>> All seems to work beautifully (I see the ISA bridge happily receive
>>>>> translations) until the first DMA from my device model (at 1:00.0)
>>>>> arrives, at which point I get:
>>>>>
>>>>> [ 1663.732413] dmar: DMAR:[DMA Write] Request device [00:03.0] fault addr fffa0000
>>>>> [ 1663.732413] DMAR:[fault reason 02] Present bit in context entry is clear
>>>>>
>>>>> I would have expected request device 01:00.0 for this.
>>>>> It is not clear to me yet if this is a weakness of the implementation of
>>>>> ioh3420 or the iommu. Just wanted to let you know right away in case you
>>>>> can shed some light to it or it is an easy fix,
>>>>>
>>>>> The device uses pci_dma_rw with itself as device pointer.
>>>>
>>>> To verify my hypothesis: with this rude hack my device now works much
>>>> better:
>>>>
>>>> @@ -774,6 +780,8 @@ static void iommu_translate(VTDAddressSpace *vtd_as,
>>>> int bus_num, int devfn,
>>>>          is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
>>>>      } else {
>>>>          ret_fr = dev_to_context_entry(s, bus_num, devfn, &ce);
>>>> +        if (ret_fr)
>>>> +            ret_fr = dev_to_context_entry(s, 1, 0, &ce);
>>>>          is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
>>>>          if (ret_fr) {
>>>>              ret_fr = -ret_fr;
>>>>
>>>> Looking at how things look on hardware, multiple devices often receive
>>>> overlapping DMA address ranges for different physical addresses.
>>>>
>>>> So if I understand the way this works, every requester ID would also
>>>> need to have it's own unique VTDAddressSpace, as each pci
>>>> device/function sees a unique DMA address space..
>>>
>>> ioh3420 is a pcie-to-pcie bridge, right? 
>>
>> Yes.
>>
>>> In my opinion, each pci-e
>>> device behind the pcie-to-pcie bridge can be assigned individually.
>>> For now I added the VT-d to q35 by just adding it to the root pci bus.
>>> You can see here in q35.c:
>>> pci_setup_iommu(pci_bus, q35_host_dma_iommu, mch->iommu);
>>> So if we add a pcie-to-pcie bridge, we may have to call the
>>> pci_setup_iommu() for that new bus. I don't know where to hook into
>>> this now. :) If you know the mechanism behind that, you can try to add
>>> that for the new bus. (I will dive into this after the clean up.)
>>> What do you think?
>>
>> Thanks for the quick answer, that helped a lot!
>>
>> Looking into the details here I realize it is slightly more complicated:
>> secondary buses are enumerated after device instantiation, as part of
>> the host PCI enumeration, so if I add a similar setup call in the bridge
>> setup, it will be called for a new device long before it has received
>> it's bus number from the OS (via config[PCI_SECONDARY_BUS] )
>>
>> I agree that the lookup function for contexts needs to be as efficient
>> as possible so the simple <busno,defvn> lookup key may be the best
>> solution but then the address_spaces table cannot be populated with the
>> secondary bus entries before it receives a nonzero != 255 bus number,
>> eg. along the lines of this: 
>>
>> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
>> index 4becdc1..d9a8c23 100644
>> --- a/hw/pci/pci_bridge.c
>> +++ b/hw/pci/pci_bridge.c
>> @@ -265,6 +265,12 @@ void pci_bridge_write_config(PCIDevice *d,
>>          pci_bridge_update_mappings(s);
>>      }
>>  
>> +    if (ranges_overlap(address, len, PCI_SECONDARY_BUS, 1)) {
>> +        int bus_num = pci_bus_num(&s->sec_bus);
>> +        if (bus_num != 0xff && bus_num != 0x00)
>> +            <handle bus number change>
>> +    }
>> +
>>      newctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
>>      if (~oldctl & newctl & PCI_BRIDGE_CTL_BUS_RESET) {
>>          /* Trigger hot reset on 0->1 transition. */
>>
>> but it is getting complicated...
>> Thoughts?
> 
> Point to the PCI bus from VTDAddressSpace instead of storing the bus_num
> there?

Also, each PCIe bus should hold an array of VTDAddressSpaces, instead of
the IntelIOMMUState.

Jan
Knut Omang Aug. 18, 2014, 4:34 p.m. UTC | #3
On Sat, 2014-08-16 at 10:47 +0200, Jan Kiszka wrote:
> On 2014-08-16 10:45, Jan Kiszka wrote:
> > On 2014-08-16 09:54, Knut Omang wrote:
> >> On Fri, 2014-08-15 at 19:37 +0800, Le Tan wrote:
> >>> Hi Knut,
> >>>
> >>> 2014-08-15 19:15 GMT+08:00 Knut Omang <knut.omang@oracle.com>:
> >>>> On Fri, 2014-08-15 at 06:42 +0200, Knut Omang wrote:
> >>>>> On Thu, 2014-08-14 at 14:10 +0200, Jan Kiszka wrote:
> >>>>>> On 2014-08-14 13:15, Michael S. Tsirkin wrote:
> >>>>>>> On Mon, Aug 11, 2014 at 03:04:57PM +0800, Le Tan wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> These patches are intended to introduce Intel IOMMU (VT-d) emulation to q35
> >>>>>>>> chipset. The major job in these patches is to add support for emulating Intel
> >>>>>>>> IOMMU according to the VT-d specification, including basic responses to CSRs
> >>>>>>>> accesses, the logics of DMAR (DMA remapping) and DMA memory address
> >>>>>>>> translations.
> >>>>>>>
> >>>>>>> Thanks!
> >>>>>>> Looks very good overall, I noted some coding style issues - I didn't
> >>>>>>> bother reporting each issue in every place where it appears - reported
> >>>>>>> each issue once only, so please find and fix all instances of each
> >>>>>>> issue.
> >>>>>>
> >>>>>> BTW, because I was in urgent need for virtual test environment for
> >>>>>> Jailhouse, I hacked interrupt remapping on top of Le's patches:
> >>>>>>
> >>>>>> http://git.kiszka.org/?p=qemu.git;a=shortlog;h=refs/heads/queues/vtd-intremap
> >>>>>>
> >>>>>> The approach likely needs further discussions and refinements but it
> >>>>>> already allows me to work on top with our hypervisor, and also Linux.
> >>>>>> You can see from the last commit that Le's work made it pretty easy to
> >>>>>> build this on top.
> >>>>>
> >>>>> Le,
> >>>>>
> >>>>> I have tried Jan's branch with my device setup which consists of a
> >>>>> minimal q35 setup, an ioh3420 root port (specified as -device
> >>>>> ioh3420,slot=0 ) and a pcie device plugged into that root port, which
> >>>>> gives the following lscpi -t:
> >>>>>
> >>>>> -[0000:00]-+-00.0
> >>>>>            +-01.0
> >>>>>            +-02.0
> >>>>>            +-03.0-[01]----00.0
> >>>>>            +-04.0
> >>>>>            +-1f.0
> >>>>>            +-1f.2
> >>>>>            \-1f.3
> >>>>>
> >>>>> All seems to work beautifully (I see the ISA bridge happily receive
> >>>>> translations) until the first DMA from my device model (at 1:00.0)
> >>>>> arrives, at which point I get:
> >>>>>
> >>>>> [ 1663.732413] dmar: DMAR:[DMA Write] Request device [00:03.0] fault addr fffa0000
> >>>>> [ 1663.732413] DMAR:[fault reason 02] Present bit in context entry is clear
> >>>>>
> >>>>> I would have expected request device 01:00.0 for this.
> >>>>> It is not clear to me yet if this is a weakness of the implementation of
> >>>>> ioh3420 or the iommu. Just wanted to let you know right away in case you
> >>>>> can shed some light to it or it is an easy fix,
> >>>>>
> >>>>> The device uses pci_dma_rw with itself as device pointer.
> >>>>
> >>>> To verify my hypothesis: with this rude hack my device now works much
> >>>> better:
> >>>>
> >>>> @@ -774,6 +780,8 @@ static void iommu_translate(VTDAddressSpace *vtd_as,
> >>>> int bus_num, int devfn,
> >>>>          is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
> >>>>      } else {
> >>>>          ret_fr = dev_to_context_entry(s, bus_num, devfn, &ce);
> >>>> +        if (ret_fr)
> >>>> +            ret_fr = dev_to_context_entry(s, 1, 0, &ce);
> >>>>          is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
> >>>>          if (ret_fr) {
> >>>>              ret_fr = -ret_fr;
> >>>>
> >>>> Looking at how things look on hardware, multiple devices often receive
> >>>> overlapping DMA address ranges for different physical addresses.
> >>>>
> >>>> So if I understand the way this works, every requester ID would also
> >>>> need to have it's own unique VTDAddressSpace, as each pci
> >>>> device/function sees a unique DMA address space..
> >>>
> >>> ioh3420 is a pcie-to-pcie bridge, right? 
> >>
> >> Yes.
> >>
> >>> In my opinion, each pci-e
> >>> device behind the pcie-to-pcie bridge can be assigned individually.
> >>> For now I added the VT-d to q35 by just adding it to the root pci bus.
> >>> You can see here in q35.c:
> >>> pci_setup_iommu(pci_bus, q35_host_dma_iommu, mch->iommu);
> >>> So if we add a pcie-to-pcie bridge, we may have to call the
> >>> pci_setup_iommu() for that new bus. I don't know where to hook into
> >>> this now. :) If you know the mechanism behind that, you can try to add
> >>> that for the new bus. (I will dive into this after the clean up.)
> >>> What do you think?
> >>
> >> Thanks for the quick answer, that helped a lot!
> >>
> >> Looking into the details here I realize it is slightly more complicated:
> >> secondary buses are enumerated after device instantiation, as part of
> >> the host PCI enumeration, so if I add a similar setup call in the bridge
> >> setup, it will be called for a new device long before it has received
> >> it's bus number from the OS (via config[PCI_SECONDARY_BUS] )
> >>
> >> I agree that the lookup function for contexts needs to be as efficient
> >> as possible so the simple <busno,defvn> lookup key may be the best
> >> solution but then the address_spaces table cannot be populated with the
> >> secondary bus entries before it receives a nonzero != 255 bus number,
> >> eg. along the lines of this: 
> >>
> >> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
> >> index 4becdc1..d9a8c23 100644
> >> --- a/hw/pci/pci_bridge.c
> >> +++ b/hw/pci/pci_bridge.c
> >> @@ -265,6 +265,12 @@ void pci_bridge_write_config(PCIDevice *d,
> >>          pci_bridge_update_mappings(s);
> >>      }
> >>  
> >> +    if (ranges_overlap(address, len, PCI_SECONDARY_BUS, 1)) {
> >> +        int bus_num = pci_bus_num(&s->sec_bus);
> >> +        if (bus_num != 0xff && bus_num != 0x00)
> >> +            <handle bus number change>
> >> +    }
> >> +
> >>      newctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
> >>      if (~oldctl & newctl & PCI_BRIDGE_CTL_BUS_RESET) {
> >>          /* Trigger hot reset on 0->1 transition. */
> >>
> >> but it is getting complicated...
> >> Thoughts?
> > 
> > Point to the PCI bus from VTDAddressSpace instead of storing the bus_num
> > there?
> 
> Also, each PCIe bus should hold an array of VTDAddressSpaces, instead of
> the IntelIOMMUState.

Thanks - that got me going - after some playing around with the data
structures I ended up with these patches (based on top of Jan's
vtd-intremap branch):

https://github.com/knuto/qemu/tree/vtd_patches

In essence I ended up replacing the address_spaces[] array in
IntelIOMMUState with a pointer dma_as in PCIDevice and a QLIST linking
the VTDAddressSpace objects to serve vtd_context_device_invalidate, then
use pci_bus_num() whenever a bus number is requires, except for the 
"special" bus needed by the interrupt remapping code.

To achieve this, I had to change the signature of the pci_setup_iommu
function (the first commit).

With this I am now seeing translations for the device in the virtual
root port, but I think this should work equally well with other PCI
bridges.

Knut
 
> Jan
> 
>
Jan Kiszka Aug. 18, 2014, 6:50 p.m. UTC | #4
On 2014-08-18 18:34, Knut Omang wrote:
> On Sat, 2014-08-16 at 10:47 +0200, Jan Kiszka wrote:
>> On 2014-08-16 10:45, Jan Kiszka wrote:
>>> On 2014-08-16 09:54, Knut Omang wrote:
>>>> On Fri, 2014-08-15 at 19:37 +0800, Le Tan wrote:
>>>>> Hi Knut,
>>>>>
>>>>> 2014-08-15 19:15 GMT+08:00 Knut Omang <knut.omang@oracle.com>:
>>>>>> On Fri, 2014-08-15 at 06:42 +0200, Knut Omang wrote:
>>>>>>> On Thu, 2014-08-14 at 14:10 +0200, Jan Kiszka wrote:
>>>>>>>> On 2014-08-14 13:15, Michael S. Tsirkin wrote:
>>>>>>>>> On Mon, Aug 11, 2014 at 03:04:57PM +0800, Le Tan wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> These patches are intended to introduce Intel IOMMU (VT-d) emulation to q35
>>>>>>>>>> chipset. The major job in these patches is to add support for emulating Intel
>>>>>>>>>> IOMMU according to the VT-d specification, including basic responses to CSRs
>>>>>>>>>> accesses, the logics of DMAR (DMA remapping) and DMA memory address
>>>>>>>>>> translations.
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>> Looks very good overall, I noted some coding style issues - I didn't
>>>>>>>>> bother reporting each issue in every place where it appears - reported
>>>>>>>>> each issue once only, so please find and fix all instances of each
>>>>>>>>> issue.
>>>>>>>>
>>>>>>>> BTW, because I was in urgent need for virtual test environment for
>>>>>>>> Jailhouse, I hacked interrupt remapping on top of Le's patches:
>>>>>>>>
>>>>>>>> http://git.kiszka.org/?p=qemu.git;a=shortlog;h=refs/heads/queues/vtd-intremap
>>>>>>>>
>>>>>>>> The approach likely needs further discussions and refinements but it
>>>>>>>> already allows me to work on top with our hypervisor, and also Linux.
>>>>>>>> You can see from the last commit that Le's work made it pretty easy to
>>>>>>>> build this on top.
>>>>>>>
>>>>>>> Le,
>>>>>>>
>>>>>>> I have tried Jan's branch with my device setup which consists of a
>>>>>>> minimal q35 setup, an ioh3420 root port (specified as -device
>>>>>>> ioh3420,slot=0 ) and a pcie device plugged into that root port, which
>>>>>>> gives the following lscpi -t:
>>>>>>>
>>>>>>> -[0000:00]-+-00.0
>>>>>>>            +-01.0
>>>>>>>            +-02.0
>>>>>>>            +-03.0-[01]----00.0
>>>>>>>            +-04.0
>>>>>>>            +-1f.0
>>>>>>>            +-1f.2
>>>>>>>            \-1f.3
>>>>>>>
>>>>>>> All seems to work beautifully (I see the ISA bridge happily receive
>>>>>>> translations) until the first DMA from my device model (at 1:00.0)
>>>>>>> arrives, at which point I get:
>>>>>>>
>>>>>>> [ 1663.732413] dmar: DMAR:[DMA Write] Request device [00:03.0] fault addr fffa0000
>>>>>>> [ 1663.732413] DMAR:[fault reason 02] Present bit in context entry is clear
>>>>>>>
>>>>>>> I would have expected request device 01:00.0 for this.
>>>>>>> It is not clear to me yet if this is a weakness of the implementation of
>>>>>>> ioh3420 or the iommu. Just wanted to let you know right away in case you
>>>>>>> can shed some light to it or it is an easy fix,
>>>>>>>
>>>>>>> The device uses pci_dma_rw with itself as device pointer.
>>>>>>
>>>>>> To verify my hypothesis: with this rude hack my device now works much
>>>>>> better:
>>>>>>
>>>>>> @@ -774,6 +780,8 @@ static void iommu_translate(VTDAddressSpace *vtd_as,
>>>>>> int bus_num, int devfn,
>>>>>>          is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
>>>>>>      } else {
>>>>>>          ret_fr = dev_to_context_entry(s, bus_num, devfn, &ce);
>>>>>> +        if (ret_fr)
>>>>>> +            ret_fr = dev_to_context_entry(s, 1, 0, &ce);
>>>>>>          is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
>>>>>>          if (ret_fr) {
>>>>>>              ret_fr = -ret_fr;
>>>>>>
>>>>>> Looking at how things look on hardware, multiple devices often receive
>>>>>> overlapping DMA address ranges for different physical addresses.
>>>>>>
>>>>>> So if I understand the way this works, every requester ID would also
>>>>>> need to have it's own unique VTDAddressSpace, as each pci
>>>>>> device/function sees a unique DMA address space..
>>>>>
>>>>> ioh3420 is a pcie-to-pcie bridge, right? 
>>>>
>>>> Yes.
>>>>
>>>>> In my opinion, each pci-e
>>>>> device behind the pcie-to-pcie bridge can be assigned individually.
>>>>> For now I added the VT-d to q35 by just adding it to the root pci bus.
>>>>> You can see here in q35.c:
>>>>> pci_setup_iommu(pci_bus, q35_host_dma_iommu, mch->iommu);
>>>>> So if we add a pcie-to-pcie bridge, we may have to call the
>>>>> pci_setup_iommu() for that new bus. I don't know where to hook into
>>>>> this now. :) If you know the mechanism behind that, you can try to add
>>>>> that for the new bus. (I will dive into this after the clean up.)
>>>>> What do you think?
>>>>
>>>> Thanks for the quick answer, that helped a lot!
>>>>
>>>> Looking into the details here I realize it is slightly more complicated:
>>>> secondary buses are enumerated after device instantiation, as part of
>>>> the host PCI enumeration, so if I add a similar setup call in the bridge
>>>> setup, it will be called for a new device long before it has received
>>>> it's bus number from the OS (via config[PCI_SECONDARY_BUS] )
>>>>
>>>> I agree that the lookup function for contexts needs to be as efficient
>>>> as possible so the simple <busno,defvn> lookup key may be the best
>>>> solution but then the address_spaces table cannot be populated with the
>>>> secondary bus entries before it receives a nonzero != 255 bus number,
>>>> eg. along the lines of this: 
>>>>
>>>> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
>>>> index 4becdc1..d9a8c23 100644
>>>> --- a/hw/pci/pci_bridge.c
>>>> +++ b/hw/pci/pci_bridge.c
>>>> @@ -265,6 +265,12 @@ void pci_bridge_write_config(PCIDevice *d,
>>>>          pci_bridge_update_mappings(s);
>>>>      }
>>>>  
>>>> +    if (ranges_overlap(address, len, PCI_SECONDARY_BUS, 1)) {
>>>> +        int bus_num = pci_bus_num(&s->sec_bus);
>>>> +        if (bus_num != 0xff && bus_num != 0x00)
>>>> +            <handle bus number change>
>>>> +    }
>>>> +
>>>>      newctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
>>>>      if (~oldctl & newctl & PCI_BRIDGE_CTL_BUS_RESET) {
>>>>          /* Trigger hot reset on 0->1 transition. */
>>>>
>>>> but it is getting complicated...
>>>> Thoughts?
>>>
>>> Point to the PCI bus from VTDAddressSpace instead of storing the bus_num
>>> there?
>>
>> Also, each PCIe bus should hold an array of VTDAddressSpaces, instead of
>> the IntelIOMMUState.
> 
> Thanks - that got me going - after some playing around with the data
> structures I ended up with these patches (based on top of Jan's
> vtd-intremap branch):

Are you depending on interrupt remapping? If not, my patches are a bit
hacky and may cause their own issues if you are unlucky.

> 
> https://github.com/knuto/qemu/tree/vtd_patches
> 
> In essence I ended up replacing the address_spaces[] array in
> IntelIOMMUState with a pointer dma_as in PCIDevice and a QLIST linking
> the VTDAddressSpace objects to serve vtd_context_device_invalidate, then
> use pci_bus_num() whenever a bus number is requires, except for the 
> "special" bus needed by the interrupt remapping code.
> 
> To achieve this, I had to change the signature of the pci_setup_iommu
> function (the first commit).
> 
> With this I am now seeing translations for the device in the virtual
> root port, but I think this should work equally well with other PCI
> bridges.

Good to know that we are on the right path! Maybe this can be integrated
in the next posting round so that we do not need any bridge exceptions
due to incompatibility.

Jan
Knut Omang Aug. 19, 2014, 4:08 a.m. UTC | #5
On Mon, 2014-08-18 at 20:50 +0200, Jan Kiszka wrote:
> On 2014-08-18 18:34, Knut Omang wrote:
> > On Sat, 2014-08-16 at 10:47 +0200, Jan Kiszka wrote:
> >> On 2014-08-16 10:45, Jan Kiszka wrote:
> >>> On 2014-08-16 09:54, Knut Omang wrote:
> >>>> On Fri, 2014-08-15 at 19:37 +0800, Le Tan wrote:
> >>>>> Hi Knut,
> >>>>>
> >>>>> 2014-08-15 19:15 GMT+08:00 Knut Omang <knut.omang@oracle.com>:
> >>>>>> On Fri, 2014-08-15 at 06:42 +0200, Knut Omang wrote:
> >>>>>>> On Thu, 2014-08-14 at 14:10 +0200, Jan Kiszka wrote:
> >>>>>>>> On 2014-08-14 13:15, Michael S. Tsirkin wrote:
> >>>>>>>>> On Mon, Aug 11, 2014 at 03:04:57PM +0800, Le Tan wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> These patches are intended to introduce Intel IOMMU (VT-d) emulation to q35
> >>>>>>>>>> chipset. The major job in these patches is to add support for emulating Intel
> >>>>>>>>>> IOMMU according to the VT-d specification, including basic responses to CSRs
> >>>>>>>>>> accesses, the logics of DMAR (DMA remapping) and DMA memory address
> >>>>>>>>>> translations.
> >>>>>>>>>
> >>>>>>>>> Thanks!
> >>>>>>>>> Looks very good overall, I noted some coding style issues - I didn't
> >>>>>>>>> bother reporting each issue in every place where it appears - reported
> >>>>>>>>> each issue once only, so please find and fix all instances of each
> >>>>>>>>> issue.
> >>>>>>>>
> >>>>>>>> BTW, because I was in urgent need for virtual test environment for
> >>>>>>>> Jailhouse, I hacked interrupt remapping on top of Le's patches:
> >>>>>>>>
> >>>>>>>> http://git.kiszka.org/?p=qemu.git;a=shortlog;h=refs/heads/queues/vtd-intremap
> >>>>>>>>
> >>>>>>>> The approach likely needs further discussions and refinements but it
> >>>>>>>> already allows me to work on top with our hypervisor, and also Linux.
> >>>>>>>> You can see from the last commit that Le's work made it pretty easy to
> >>>>>>>> build this on top.
> >>>>>>>
> >>>>>>> Le,
> >>>>>>>
> >>>>>>> I have tried Jan's branch with my device setup which consists of a
> >>>>>>> minimal q35 setup, an ioh3420 root port (specified as -device
> >>>>>>> ioh3420,slot=0 ) and a pcie device plugged into that root port, which
> >>>>>>> gives the following lscpi -t:
> >>>>>>>
> >>>>>>> -[0000:00]-+-00.0
> >>>>>>>            +-01.0
> >>>>>>>            +-02.0
> >>>>>>>            +-03.0-[01]----00.0
> >>>>>>>            +-04.0
> >>>>>>>            +-1f.0
> >>>>>>>            +-1f.2
> >>>>>>>            \-1f.3
> >>>>>>>
> >>>>>>> All seems to work beautifully (I see the ISA bridge happily receive
> >>>>>>> translations) until the first DMA from my device model (at 1:00.0)
> >>>>>>> arrives, at which point I get:
> >>>>>>>
> >>>>>>> [ 1663.732413] dmar: DMAR:[DMA Write] Request device [00:03.0] fault addr fffa0000
> >>>>>>> [ 1663.732413] DMAR:[fault reason 02] Present bit in context entry is clear
> >>>>>>>
> >>>>>>> I would have expected request device 01:00.0 for this.
> >>>>>>> It is not clear to me yet if this is a weakness of the implementation of
> >>>>>>> ioh3420 or the iommu. Just wanted to let you know right away in case you
> >>>>>>> can shed some light to it or it is an easy fix,
> >>>>>>>
> >>>>>>> The device uses pci_dma_rw with itself as device pointer.
> >>>>>>
> >>>>>> To verify my hypothesis: with this rude hack my device now works much
> >>>>>> better:
> >>>>>>
> >>>>>> @@ -774,6 +780,8 @@ static void iommu_translate(VTDAddressSpace *vtd_as,
> >>>>>> int bus_num, int devfn,
> >>>>>>          is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
> >>>>>>      } else {
> >>>>>>          ret_fr = dev_to_context_entry(s, bus_num, devfn, &ce);
> >>>>>> +        if (ret_fr)
> >>>>>> +            ret_fr = dev_to_context_entry(s, 1, 0, &ce);
> >>>>>>          is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
> >>>>>>          if (ret_fr) {
> >>>>>>              ret_fr = -ret_fr;
> >>>>>>
> >>>>>> Looking at how things look on hardware, multiple devices often receive
> >>>>>> overlapping DMA address ranges for different physical addresses.
> >>>>>>
> >>>>>> So if I understand the way this works, every requester ID would also
> >>>>>> need to have it's own unique VTDAddressSpace, as each pci
> >>>>>> device/function sees a unique DMA address space..
> >>>>>
> >>>>> ioh3420 is a pcie-to-pcie bridge, right? 
> >>>>
> >>>> Yes.
> >>>>
> >>>>> In my opinion, each pci-e
> >>>>> device behind the pcie-to-pcie bridge can be assigned individually.
> >>>>> For now I added the VT-d to q35 by just adding it to the root pci bus.
> >>>>> You can see here in q35.c:
> >>>>> pci_setup_iommu(pci_bus, q35_host_dma_iommu, mch->iommu);
> >>>>> So if we add a pcie-to-pcie bridge, we may have to call the
> >>>>> pci_setup_iommu() for that new bus. I don't know where to hook into
> >>>>> this now. :) If you know the mechanism behind that, you can try to add
> >>>>> that for the new bus. (I will dive into this after the clean up.)
> >>>>> What do you think?
> >>>>
> >>>> Thanks for the quick answer, that helped a lot!
> >>>>
> >>>> Looking into the details here I realize it is slightly more complicated:
> >>>> secondary buses are enumerated after device instantiation, as part of
> >>>> the host PCI enumeration, so if I add a similar setup call in the bridge
> >>>> setup, it will be called for a new device long before it has received
> >>>> it's bus number from the OS (via config[PCI_SECONDARY_BUS] )
> >>>>
> >>>> I agree that the lookup function for contexts needs to be as efficient
> >>>> as possible so the simple <busno,defvn> lookup key may be the best
> >>>> solution but then the address_spaces table cannot be populated with the
> >>>> secondary bus entries before it receives a nonzero != 255 bus number,
> >>>> eg. along the lines of this: 
> >>>>
> >>>> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
> >>>> index 4becdc1..d9a8c23 100644
> >>>> --- a/hw/pci/pci_bridge.c
> >>>> +++ b/hw/pci/pci_bridge.c
> >>>> @@ -265,6 +265,12 @@ void pci_bridge_write_config(PCIDevice *d,
> >>>>          pci_bridge_update_mappings(s);
> >>>>      }
> >>>>  
> >>>> +    if (ranges_overlap(address, len, PCI_SECONDARY_BUS, 1)) {
> >>>> +        int bus_num = pci_bus_num(&s->sec_bus);
> >>>> +        if (bus_num != 0xff && bus_num != 0x00)
> >>>> +            <handle bus number change>
> >>>> +    }
> >>>> +
> >>>>      newctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
> >>>>      if (~oldctl & newctl & PCI_BRIDGE_CTL_BUS_RESET) {
> >>>>          /* Trigger hot reset on 0->1 transition. */
> >>>>
> >>>> but it is getting complicated...
> >>>> Thoughts?
> >>>
> >>> Point to the PCI bus from VTDAddressSpace instead of storing the bus_num
> >>> there?
> >>
> >> Also, each PCIe bus should hold an array of VTDAddressSpaces, instead of
> >> the IntelIOMMUState.
> > 
> > Thanks - that got me going - after some playing around with the data
> > structures I ended up with these patches (based on top of Jan's
> > vtd-intremap branch):
> 
> Are you depending on interrupt remapping? If not, my patches are a bit
> hacky and may cause their own issues if you are unlucky.
 
It does not depend directly but interprets a NULL PciDevice pointer as
the special bus number (0xff) for non-pci devices, so I have tried to
take heights for that - I can rebase if so desired, but I would like to
see the interrupt remapping emerge as well ;-)

Knut

> > https://github.com/knuto/qemu/tree/vtd_patches
> > 
> > In essence I ended up replacing the address_spaces[] array in
> > IntelIOMMUState with a pointer dma_as in PCIDevice and a QLIST linking
> > the VTDAddressSpace objects to serve vtd_context_device_invalidate, then
> > use pci_bus_num() whenever a bus number is requires, except for the 
> > "special" bus needed by the interrupt remapping code.
> > 
> > To achieve this, I had to change the signature of the pci_setup_iommu
> > function (the first commit).
> > 
> > With this I am now seeing translations for the device in the virtual
> > root port, but I think this should work equally well with other PCI
> > bridges.
> 
> Good to know that we are on the right path! Maybe this can be integrated
> in the next posting round so that we do not need any bridge exceptions
> due to incompatibility.
> 
> Jan
> 
>
Knut Omang Aug. 19, 2014, 4:18 a.m. UTC | #6
On Tue, 2014-08-19 at 06:08 +0200, Knut Omang wrote:
> On Mon, 2014-08-18 at 20:50 +0200, Jan Kiszka wrote:
> > On 2014-08-18 18:34, Knut Omang wrote:
> > > On Sat, 2014-08-16 at 10:47 +0200, Jan Kiszka wrote:
> > >> On 2014-08-16 10:45, Jan Kiszka wrote:
> > >>> On 2014-08-16 09:54, Knut Omang wrote:
> > >>>> On Fri, 2014-08-15 at 19:37 +0800, Le Tan wrote:
> > >>>>> Hi Knut,
> > >>>>>
> > >>>>> 2014-08-15 19:15 GMT+08:00 Knut Omang <knut.omang@oracle.com>:
> > >>>>>> On Fri, 2014-08-15 at 06:42 +0200, Knut Omang wrote:
> > >>>>>>> On Thu, 2014-08-14 at 14:10 +0200, Jan Kiszka wrote:
> > >>>>>>>> On 2014-08-14 13:15, Michael S. Tsirkin wrote:
> > >>>>>>>>> On Mon, Aug 11, 2014 at 03:04:57PM +0800, Le Tan wrote:
> > >>>>>>>>>> Hi,
> > >>>>>>>>>>
> > >>>>>>>>>> These patches are intended to introduce Intel IOMMU (VT-d) emulation to q35
> > >>>>>>>>>> chipset. The major job in these patches is to add support for emulating Intel
> > >>>>>>>>>> IOMMU according to the VT-d specification, including basic responses to CSRs
> > >>>>>>>>>> accesses, the logics of DMAR (DMA remapping) and DMA memory address
> > >>>>>>>>>> translations.
> > >>>>>>>>>
> > >>>>>>>>> Thanks!
> > >>>>>>>>> Looks very good overall, I noted some coding style issues - I didn't
> > >>>>>>>>> bother reporting each issue in every place where it appears - reported
> > >>>>>>>>> each issue once only, so please find and fix all instances of each
> > >>>>>>>>> issue.
> > >>>>>>>>
> > >>>>>>>> BTW, because I was in urgent need for virtual test environment for
> > >>>>>>>> Jailhouse, I hacked interrupt remapping on top of Le's patches:
> > >>>>>>>>
> > >>>>>>>> http://git.kiszka.org/?p=qemu.git;a=shortlog;h=refs/heads/queues/vtd-intremap
> > >>>>>>>>
> > >>>>>>>> The approach likely needs further discussions and refinements but it
> > >>>>>>>> already allows me to work on top with our hypervisor, and also Linux.
> > >>>>>>>> You can see from the last commit that Le's work made it pretty easy to
> > >>>>>>>> build this on top.
> > >>>>>>>
> > >>>>>>> Le,
> > >>>>>>>
> > >>>>>>> I have tried Jan's branch with my device setup which consists of a
> > >>>>>>> minimal q35 setup, an ioh3420 root port (specified as -device
> > >>>>>>> ioh3420,slot=0 ) and a pcie device plugged into that root port, which
> > >>>>>>> gives the following lscpi -t:
> > >>>>>>>
> > >>>>>>> -[0000:00]-+-00.0
> > >>>>>>>            +-01.0
> > >>>>>>>            +-02.0
> > >>>>>>>            +-03.0-[01]----00.0
> > >>>>>>>            +-04.0
> > >>>>>>>            +-1f.0
> > >>>>>>>            +-1f.2
> > >>>>>>>            \-1f.3
> > >>>>>>>
> > >>>>>>> All seems to work beautifully (I see the ISA bridge happily receive
> > >>>>>>> translations) until the first DMA from my device model (at 1:00.0)
> > >>>>>>> arrives, at which point I get:
> > >>>>>>>
> > >>>>>>> [ 1663.732413] dmar: DMAR:[DMA Write] Request device [00:03.0] fault addr fffa0000
> > >>>>>>> [ 1663.732413] DMAR:[fault reason 02] Present bit in context entry is clear
> > >>>>>>>
> > >>>>>>> I would have expected request device 01:00.0 for this.
> > >>>>>>> It is not clear to me yet if this is a weakness of the implementation of
> > >>>>>>> ioh3420 or the iommu. Just wanted to let you know right away in case you
> > >>>>>>> can shed some light to it or it is an easy fix,
> > >>>>>>>
> > >>>>>>> The device uses pci_dma_rw with itself as device pointer.
> > >>>>>>
> > >>>>>> To verify my hypothesis: with this rude hack my device now works much
> > >>>>>> better:
> > >>>>>>
> > >>>>>> @@ -774,6 +780,8 @@ static void iommu_translate(VTDAddressSpace *vtd_as,
> > >>>>>> int bus_num, int devfn,
> > >>>>>>          is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
> > >>>>>>      } else {
> > >>>>>>          ret_fr = dev_to_context_entry(s, bus_num, devfn, &ce);
> > >>>>>> +        if (ret_fr)
> > >>>>>> +            ret_fr = dev_to_context_entry(s, 1, 0, &ce);
> > >>>>>>          is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
> > >>>>>>          if (ret_fr) {
> > >>>>>>              ret_fr = -ret_fr;
> > >>>>>>
> > >>>>>> Looking at how things look on hardware, multiple devices often receive
> > >>>>>> overlapping DMA address ranges for different physical addresses.
> > >>>>>>
> > >>>>>> So if I understand the way this works, every requester ID would also
> > >>>>>> need to have it's own unique VTDAddressSpace, as each pci
> > >>>>>> device/function sees a unique DMA address space..
> > >>>>>
> > >>>>> ioh3420 is a pcie-to-pcie bridge, right? 
> > >>>>
> > >>>> Yes.
> > >>>>
> > >>>>> In my opinion, each pci-e
> > >>>>> device behind the pcie-to-pcie bridge can be assigned individually.
> > >>>>> For now I added the VT-d to q35 by just adding it to the root pci bus.
> > >>>>> You can see here in q35.c:
> > >>>>> pci_setup_iommu(pci_bus, q35_host_dma_iommu, mch->iommu);
> > >>>>> So if we add a pcie-to-pcie bridge, we may have to call the
> > >>>>> pci_setup_iommu() for that new bus. I don't know where to hook into
> > >>>>> this now. :) If you know the mechanism behind that, you can try to add
> > >>>>> that for the new bus. (I will dive into this after the clean up.)
> > >>>>> What do you think?
> > >>>>
> > >>>> Thanks for the quick answer, that helped a lot!
> > >>>>
> > >>>> Looking into the details here I realize it is slightly more complicated:
> > >>>> secondary buses are enumerated after device instantiation, as part of
> > >>>> the host PCI enumeration, so if I add a similar setup call in the bridge
> > >>>> setup, it will be called for a new device long before it has received
> > >>>> it's bus number from the OS (via config[PCI_SECONDARY_BUS] )
> > >>>>
> > >>>> I agree that the lookup function for contexts needs to be as efficient
> > >>>> as possible so the simple <busno,defvn> lookup key may be the best
> > >>>> solution but then the address_spaces table cannot be populated with the
> > >>>> secondary bus entries before it receives a nonzero != 255 bus number,
> > >>>> eg. along the lines of this: 
> > >>>>
> > >>>> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
> > >>>> index 4becdc1..d9a8c23 100644
> > >>>> --- a/hw/pci/pci_bridge.c
> > >>>> +++ b/hw/pci/pci_bridge.c
> > >>>> @@ -265,6 +265,12 @@ void pci_bridge_write_config(PCIDevice *d,
> > >>>>          pci_bridge_update_mappings(s);
> > >>>>      }
> > >>>>  
> > >>>> +    if (ranges_overlap(address, len, PCI_SECONDARY_BUS, 1)) {
> > >>>> +        int bus_num = pci_bus_num(&s->sec_bus);
> > >>>> +        if (bus_num != 0xff && bus_num != 0x00)
> > >>>> +            <handle bus number change>
> > >>>> +    }
> > >>>> +
> > >>>>      newctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
> > >>>>      if (~oldctl & newctl & PCI_BRIDGE_CTL_BUS_RESET) {
> > >>>>          /* Trigger hot reset on 0->1 transition. */
> > >>>>
> > >>>> but it is getting complicated...
> > >>>> Thoughts?
> > >>>
> > >>> Point to the PCI bus from VTDAddressSpace instead of storing the bus_num
> > >>> there?
> > >>
> > >> Also, each PCIe bus should hold an array of VTDAddressSpaces, instead of
> > >> the IntelIOMMUState.
> > > 
> > > Thanks - that got me going - after some playing around with the data
> > > structures I ended up with these patches (based on top of Jan's
> > > vtd-intremap branch):
> > 
> > Are you depending on interrupt remapping? If not, my patches are a bit
> > hacky and may cause their own issues if you are unlucky.
>  
> It does not depend directly but interprets a NULL PciDevice pointer as
> the special bus number (0xff) for non-pci devices, so I have tried to
> take heights for that - I can rebase if so desired, but I would like to
> see the interrupt remapping emerge as well ;-)
> 
> Knut
> 
> > > https://github.com/knuto/qemu/tree/vtd_patches
> > > 
> > > In essence I ended up replacing the address_spaces[] array in
> > > IntelIOMMUState with a pointer dma_as in PCIDevice and a QLIST linking
> > > the VTDAddressSpace objects to serve vtd_context_device_invalidate, then
> > > use pci_bus_num() whenever a bus number is requires, except for the 
> > > "special" bus needed by the interrupt remapping code.
> > > 
> > > To achieve this, I had to change the signature of the pci_setup_iommu
> > > function (the first commit).
> > > 
> > > With this I am now seeing translations for the device in the virtual
> > > root port, but I think this should work equally well with other PCI
> > > bridges.
> > 
> > Good to know that we are on the right path! Maybe this can be integrated
> > in the next posting round so that we do not need any bridge exceptions
> > due to incompatibility.
> >
> > Jan

Lets aim at that - as far as I understand the Linux iommu driver, the
kernel will try to setup mappings for everything (using identity
mappings for some devices), so having exceptions on the Qemu side would
probably have to be a hack.

Knut
Jan Kiszka Aug. 19, 2014, 5:36 a.m. UTC | #7
On 2014-08-19 06:08, Knut Omang wrote:
>> Are you depending on interrupt remapping? If not, my patches are a bit
>> hacky and may cause their own issues if you are unlucky.
>  
> It does not depend directly but interprets a NULL PciDevice pointer as
> the special bus number (0xff) for non-pci devices, so I have tried to

Yeah, this won't map nicely on future chipsets - unless we redefine
Q35_PSEUDO_XXX as VTD_PSEUDO_XXX, or so, and use the same values for all
of them. Or create an unconnected pseudo PCI bus for the chipset devices
that carry the pseudo bus number.

> take heights for that - I can rebase if so desired, but I would like to
> see the interrupt remapping emerge as well ;-)

Good, then we are already two ;). Let's get the baseline ready, then we
can discuss extensions like interrupt remapping.

Jan
diff mbox

Patch

diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
index 4becdc1..d9a8c23 100644
--- a/hw/pci/pci_bridge.c
+++ b/hw/pci/pci_bridge.c
@@ -265,6 +265,12 @@  void pci_bridge_write_config(PCIDevice *d,
         pci_bridge_update_mappings(s);
     }
 
+    if (ranges_overlap(address, len, PCI_SECONDARY_BUS, 1)) {
+        int bus_num = pci_bus_num(&s->sec_bus);
+        if (bus_num != 0xff && bus_num != 0x00)
+            <handle bus number change>
+    }
+
     newctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
     if (~oldctl & newctl & PCI_BRIDGE_CTL_BUS_RESET) {
         /* Trigger hot reset on 0->1 transition. */