[v3,0/5] intel-iommu: introduce Intel IOMMU (VT-d) emulation to q35 chipset

On Fri, 2014-08-15 at 19:37 +0800, Le Tan wrote:
> Hi Knut,
> 
> 2014-08-15 19:15 GMT+08:00 Knut Omang <knut.omang@oracle.com>:
> > On Fri, 2014-08-15 at 06:42 +0200, Knut Omang wrote:
> >> On Thu, 2014-08-14 at 14:10 +0200, Jan Kiszka wrote:
> >> > On 2014-08-14 13:15, Michael S. Tsirkin wrote:
> >> > > On Mon, Aug 11, 2014 at 03:04:57PM +0800, Le Tan wrote:
> >> > >> Hi,
> >> > >>
> >> > >> These patches are intended to introduce Intel IOMMU (VT-d) emulation to q35
> >> > >> chipset. The major job in these patches is to add support for emulating Intel
> >> > >> IOMMU according to the VT-d specification, including basic responses to CSRs
> >> > >> accesses, the logics of DMAR (DMA remapping) and DMA memory address
> >> > >> translations.
> >> > >
> >> > > Thanks!
> >> > > Looks very good overall, I noted some coding style issues - I didn't
> >> > > bother reporting each issue in every place where it appears - reported
> >> > > each issue once only, so please find and fix all instances of each
> >> > > issue.
> >> >
> >> > BTW, because I was in urgent need for virtual test environment for
> >> > Jailhouse, I hacked interrupt remapping on top of Le's patches:
> >> >
> >> > http://git.kiszka.org/?p=qemu.git;a=shortlog;h=refs/heads/queues/vtd-intremap
> >> >
> >> > The approach likely needs further discussions and refinements but it
> >> > already allows me to work on top with our hypervisor, and also Linux.
> >> > You can see from the last commit that Le's work made it pretty easy to
> >> > build this on top.
> >>
> >> Le,
> >>
> >> I have tried Jan's branch with my device setup which consists of a
> >> minimal q35 setup, an ioh3420 root port (specified as -device
> >> ioh3420,slot=0 ) and a pcie device plugged into that root port, which
> >> gives the following lscpi -t:
> >>
> >> -[0000:00]-+-00.0
> >>            +-01.0
> >>            +-02.0
> >>            +-03.0-[01]----00.0
> >>            +-04.0
> >>            +-1f.0
> >>            +-1f.2
> >>            \-1f.3
> >>
> >> All seems to work beautifully (I see the ISA bridge happily receive
> >> translations) until the first DMA from my device model (at 1:00.0)
> >> arrives, at which point I get:
> >>
> >> [ 1663.732413] dmar: DMAR:[DMA Write] Request device [00:03.0] fault addr fffa0000
> >> [ 1663.732413] DMAR:[fault reason 02] Present bit in context entry is clear
> >>
> >> I would have expected request device 01:00.0 for this.
> >> It is not clear to me yet if this is a weakness of the implementation of
> >> ioh3420 or the iommu. Just wanted to let you know right away in case you
> >> can shed some light to it or it is an easy fix,
> >>
> >> The device uses pci_dma_rw with itself as device pointer.
> >
> > To verify my hypothesis: with this rude hack my device now works much
> > better:
> >
> > @@ -774,6 +780,8 @@ static void iommu_translate(VTDAddressSpace *vtd_as,
> > int bus_num, int devfn,
> >          is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
> >      } else {
> >          ret_fr = dev_to_context_entry(s, bus_num, devfn, &ce);
> > +        if (ret_fr)
> > +            ret_fr = dev_to_context_entry(s, 1, 0, &ce);
> >          is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
> >          if (ret_fr) {
> >              ret_fr = -ret_fr;
> >
> > Looking at how things look on hardware, multiple devices often receive
> > overlapping DMA address ranges for different physical addresses.
> >
> > So if I understand the way this works, every requester ID would also
> > need to have it's own unique VTDAddressSpace, as each pci
> > device/function sees a unique DMA address space..
> 
> ioh3420 is a pcie-to-pcie bridge, right? 

Yes.

> In my opinion, each pci-e
> device behind the pcie-to-pcie bridge can be assigned individually.
> For now I added the VT-d to q35 by just adding it to the root pci bus.
> You can see here in q35.c:
> pci_setup_iommu(pci_bus, q35_host_dma_iommu, mch->iommu);
> So if we add a pcie-to-pcie bridge, we may have to call the
> pci_setup_iommu() for that new bus. I don't know where to hook into
> this now. :) If you know the mechanism behind that, you can try to add
> that for the new bus. (I will dive into this after the clean up.)
> What do you think?

Thanks for the quick answer, that helped a lot!

Looking into the details here I realize it is slightly more complicated:
secondary buses are enumerated after device instantiation, as part of
the host PCI enumeration, so if I add a similar setup call in the bridge
setup, it will be called for a new device long before it has received
it's bus number from the OS (via config[PCI_SECONDARY_BUS] )

I agree that the lookup function for contexts needs to be as efficient
as possible so the simple <busno,defvn> lookup key may be the best
solution but then the address_spaces table cannot be populated with the
secondary bus entries before it receives a nonzero != 255 bus number,
eg. along the lines of this: 

but it is getting complicated...
Thoughts?

Thanks,

Knut

[v3,0/5] intel-iommu: introduce Intel IOMMU (VT-d) emulation to q35 chipset

Commit Message

Comments

Patch