diff mbox series

[03/10] intel-iommu: add iommu lock

Message ID 20180425045129.17449-4-peterx@redhat.com
State New
Headers show
Series [01/10] intel-iommu: send PSI always even if across PDEs | expand

Commit Message

Peter Xu April 25, 2018, 4:51 a.m. UTC
Add a per-iommu big lock to protect IOMMU status.  Currently the only
thing to be protected is the IOTLB cache, since that can be accessed
even without BQL, e.g., in IO dataplane.

Note that device page tables should not need any protection.  The safety
of that should be provided by guest OS.  E.g., when a page entry is
freed, the guest OS should be responsible to make sure that no device
will be using that page any more.

Reported-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/hw/i386/intel_iommu.h |  8 ++++++++
 hw/i386/intel_iommu.c         | 31 +++++++++++++++++++++++++++++--
 2 files changed, 37 insertions(+), 2 deletions(-)

Comments

Emilio Cota April 25, 2018, 4:26 p.m. UTC | #1
On Wed, Apr 25, 2018 at 12:51:22 +0800, Peter Xu wrote:
> Add a per-iommu big lock to protect IOMMU status.  Currently the only
> thing to be protected is the IOTLB cache, since that can be accessed
> even without BQL, e.g., in IO dataplane.

Is the added lock going to protect anything else beyond a g_hash_table?
If not, did you consider using qht instead of the lock + hash table?
It would give you lockless lookups as well as scalable updates.

[Sorry if my question isn't relevant, I am not familiar with
this code and just briefly skimmed over this series.]

Thanks,

		Emilio
Peter Xu April 26, 2018, 5:45 a.m. UTC | #2
On Wed, Apr 25, 2018 at 12:26:58PM -0400, Emilio G. Cota wrote:
> On Wed, Apr 25, 2018 at 12:51:22 +0800, Peter Xu wrote:
> > Add a per-iommu big lock to protect IOMMU status.  Currently the only
> > thing to be protected is the IOTLB cache, since that can be accessed
> > even without BQL, e.g., in IO dataplane.
> 
> Is the added lock going to protect anything else beyond a g_hash_table?

It's only for that now.  Maybe there can be other candidates.

> If not, did you consider using qht instead of the lock + hash table?
> It would give you lockless lookups as well as scalable updates.

Thanks for pointing that out.  I didn't even know that we have such a
nice hash there. :) Though for this series I would still prefer to
keep the mutex way as is for now.

Firstly, I'd better fix the problem first before considering
performance - after all this series is not really persuing for
performance but fixing holes.  So we can have follow up works for QHT
convertions.  Meanwhile, IOTLB is a bit special in that it might
really update more often than we thought, so I'm not really sure
whether a RCU-based hash would bring better performance.  This is even
not considering the fact that Linux guest intel-iommu driver will do
more tricks (like frequently flushing IOTLBs) to make the update much
more than usual.

But again, the suggestion is valid and it worths further
investigations to make sure my understandings are correct.

> 
> [Sorry if my question isn't relevant, I am not familiar with
> this code and just briefly skimmed over this series.

It's of course relevant.  Thanks for your input!
Jason Wang April 27, 2018, 5:13 a.m. UTC | #3
On 2018年04月25日 12:51, Peter Xu wrote:
> Add a per-iommu big lock to protect IOMMU status.  Currently the only
> thing to be protected is the IOTLB cache, since that can be accessed
> even without BQL, e.g., in IO dataplane.
>
> Note that device page tables should not need any protection.  The safety
> of that should be provided by guest OS.  E.g., when a page entry is
> freed, the guest OS should be responsible to make sure that no device
> will be using that page any more.
>
> Reported-by: Fam Zheng<famz@redhat.com>
> Signed-off-by: Peter Xu<peterx@redhat.com>
> ---
>   include/hw/i386/intel_iommu.h |  8 ++++++++
>   hw/i386/intel_iommu.c         | 31 +++++++++++++++++++++++++++++--
>   2 files changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 220697253f..1a8ba8e415 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -262,6 +262,14 @@ struct IntelIOMMUState {
>       uint8_t w1cmask[DMAR_REG_SIZE]; /* RW1C(Write 1 to Clear) bytes */
>       uint8_t womask[DMAR_REG_SIZE];  /* WO (write only - read returns 0) */
>       uint32_t version;
> +    /*
> +     * Protects IOMMU states in general.  Normally we don't need to
> +     * take this lock when we are with BQL held.  However we have code
> +     * paths that may run even without BQL.  In those cases, we need
> +     * to take the lock when we have access to IOMMU state
> +     * informations, e.g., the IOTLB.
> +     */
> +    QemuMutex iommu_lock;

Some questions:

1) Do we need to protect context cache too?
2) Can we just reuse qemu BQL here?
3) I think the issue is common to all other kinds of IOMMU, so can we 
simply synchronize before calling ->translate() in memory.c. This seems 
a more common solution.

Thanks
Peter Xu April 27, 2018, 6:26 a.m. UTC | #4
On Fri, Apr 27, 2018 at 01:13:02PM +0800, Jason Wang wrote:
> 
> 
> On 2018年04月25日 12:51, Peter Xu wrote:
> > Add a per-iommu big lock to protect IOMMU status.  Currently the only
> > thing to be protected is the IOTLB cache, since that can be accessed
> > even without BQL, e.g., in IO dataplane.
> > 
> > Note that device page tables should not need any protection.  The safety
> > of that should be provided by guest OS.  E.g., when a page entry is
> > freed, the guest OS should be responsible to make sure that no device
> > will be using that page any more.
> > 
> > Reported-by: Fam Zheng<famz@redhat.com>
> > Signed-off-by: Peter Xu<peterx@redhat.com>
> > ---
> >   include/hw/i386/intel_iommu.h |  8 ++++++++
> >   hw/i386/intel_iommu.c         | 31 +++++++++++++++++++++++++++++--
> >   2 files changed, 37 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> > index 220697253f..1a8ba8e415 100644
> > --- a/include/hw/i386/intel_iommu.h
> > +++ b/include/hw/i386/intel_iommu.h
> > @@ -262,6 +262,14 @@ struct IntelIOMMUState {
> >       uint8_t w1cmask[DMAR_REG_SIZE]; /* RW1C(Write 1 to Clear) bytes */
> >       uint8_t womask[DMAR_REG_SIZE];  /* WO (write only - read returns 0) */
> >       uint32_t version;
> > +    /*
> > +     * Protects IOMMU states in general.  Normally we don't need to
> > +     * take this lock when we are with BQL held.  However we have code
> > +     * paths that may run even without BQL.  In those cases, we need
> > +     * to take the lock when we have access to IOMMU state
> > +     * informations, e.g., the IOTLB.
> > +     */
> > +    QemuMutex iommu_lock;
> 
> Some questions:
> 
> 1) Do we need to protect context cache too?

IMHO the context cache entry should work even without lock.  That's a
bit trickly since we have two cases that this cache will be updated:

  (1) first translation of the address space of a device
  (2) invalidation of context entries

For (2) IMHO we don't need to worry about since guest OS should be
controlling that part, say, device should not be doing any translation
(DMA operations) when the context entry is invalidated.

For (1) the worst case is that the context entry cache be updated
multiple times with the same value by multiple threads.  IMHO that'll
be fine too.

But yes for sure we can protect that too with the iommu lock.

> 2) Can we just reuse qemu BQL here?

I would prefer not.  As I mentioned, at least I have spent too much
time on fighting BQL already.  I really hope we can start to use
isolated locks when capable.  BQL is always the worst choice to me.

> 3) I think the issue is common to all other kinds of IOMMU, so can we simply
> synchronize before calling ->translate() in memory.c. This seems a more
> common solution.

I suspect Power and s390 live well with that.  I think it mean at
least these platforms won't have problem in concurrency.  I'm adding
DavidG in loop in case there is further comment.  IMHO we should just
make sure IOMMU code be thread safe, and we fix problem if there is.

Thanks,
Tian, Kevin April 27, 2018, 7:19 a.m. UTC | #5
> From: Peter Xu
> Sent: Friday, April 27, 2018 2:26 PM
> 
> On Fri, Apr 27, 2018 at 01:13:02PM +0800, Jason Wang wrote:
> >
> >
> > On 2018年04月25日 12:51, Peter Xu wrote:
> > > Add a per-iommu big lock to protect IOMMU status.  Currently the only
> > > thing to be protected is the IOTLB cache, since that can be accessed
> > > even without BQL, e.g., in IO dataplane.
> > >
> > > Note that device page tables should not need any protection.  The
> safety
> > > of that should be provided by guest OS.  E.g., when a page entry is
> > > freed, the guest OS should be responsible to make sure that no device
> > > will be using that page any more.

device page table definitely doesn't require protection, since it is
in-memory structure managed by guest. However the reason
above is not accurate - there is no way that guest OS can make
sure no device uses non-present page entry, otherwise it doesn't
require virtual IOMMU to protect itself. There could be bogus/
malicious drivers which surely may program the device to attempt so.

> > >
> > > Reported-by: Fam Zheng<famz@redhat.com>
> > > Signed-off-by: Peter Xu<peterx@redhat.com>
> > > ---
> > >   include/hw/i386/intel_iommu.h |  8 ++++++++
> > >   hw/i386/intel_iommu.c         | 31 +++++++++++++++++++++++++++++--
> > >   2 files changed, 37 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/hw/i386/intel_iommu.h
> b/include/hw/i386/intel_iommu.h
> > > index 220697253f..1a8ba8e415 100644
> > > --- a/include/hw/i386/intel_iommu.h
> > > +++ b/include/hw/i386/intel_iommu.h
> > > @@ -262,6 +262,14 @@ struct IntelIOMMUState {
> > >       uint8_t w1cmask[DMAR_REG_SIZE]; /* RW1C(Write 1 to Clear) bytes
> */
> > >       uint8_t womask[DMAR_REG_SIZE];  /* WO (write only - read returns
> 0) */
> > >       uint32_t version;
> > > +    /*
> > > +     * Protects IOMMU states in general.  Normally we don't need to
> > > +     * take this lock when we are with BQL held.  However we have code
> > > +     * paths that may run even without BQL.  In those cases, we need
> > > +     * to take the lock when we have access to IOMMU state
> > > +     * informations, e.g., the IOTLB.

better if you can whitelist those paths here to clarify.

> > > +     */
> > > +    QemuMutex iommu_lock;
> >
> > Some questions:
> >
> > 1) Do we need to protect context cache too?
> 
> IMHO the context cache entry should work even without lock.  That's a
> bit trickly since we have two cases that this cache will be updated:
> 
>   (1) first translation of the address space of a device
>   (2) invalidation of context entries
> 
> For (2) IMHO we don't need to worry about since guest OS should be
> controlling that part, say, device should not be doing any translation
> (DMA operations) when the context entry is invalidated.

again above cannot be assumed.

> 
> For (1) the worst case is that the context entry cache be updated
> multiple times with the same value by multiple threads.  IMHO that'll
> be fine too.
> 
> But yes for sure we can protect that too with the iommu lock.
> 
> > 2) Can we just reuse qemu BQL here?
> 
> I would prefer not.  As I mentioned, at least I have spent too much
> time on fighting BQL already.  I really hope we can start to use
> isolated locks when capable.  BQL is always the worst choice to me.
> 
> > 3) I think the issue is common to all other kinds of IOMMU, so can we
> simply
> > synchronize before calling ->translate() in memory.c. This seems a more
> > common solution.
> 
> I suspect Power and s390 live well with that.  I think it mean at
> least these platforms won't have problem in concurrency.  I'm adding
> DavidG in loop in case there is further comment.  IMHO we should just
> make sure IOMMU code be thread safe, and we fix problem if there is.
> 
> Thanks,
> 
> --
> Peter Xu
Peter Xu April 27, 2018, 9:53 a.m. UTC | #6
On Fri, Apr 27, 2018 at 07:19:25AM +0000, Tian, Kevin wrote:
> > From: Peter Xu
> > Sent: Friday, April 27, 2018 2:26 PM
> > 
> > On Fri, Apr 27, 2018 at 01:13:02PM +0800, Jason Wang wrote:
> > >
> > >
> > > On 2018年04月25日 12:51, Peter Xu wrote:
> > > > Add a per-iommu big lock to protect IOMMU status.  Currently the only
> > > > thing to be protected is the IOTLB cache, since that can be accessed
> > > > even without BQL, e.g., in IO dataplane.
> > > >
> > > > Note that device page tables should not need any protection.  The
> > safety
> > > > of that should be provided by guest OS.  E.g., when a page entry is
> > > > freed, the guest OS should be responsible to make sure that no device
> > > > will be using that page any more.
> 
> device page table definitely doesn't require protection, since it is
> in-memory structure managed by guest. However the reason
> above is not accurate - there is no way that guest OS can make
> sure no device uses non-present page entry, otherwise it doesn't
> require virtual IOMMU to protect itself. There could be bogus/
> malicious drivers which surely may program the device to attempt so.

How about this:

  Note that we don't need to protect device page tables since that's
  fully controlled by the guest kernel.  However there is still
  possibilities that malicious drivers will still program the device
  to not disobey the rule.  In that case QEMU can't really do anything
  useful, instead the guest itself will be responsible for all
  uncertainties.

> 
> > > >
> > > > Reported-by: Fam Zheng<famz@redhat.com>
> > > > Signed-off-by: Peter Xu<peterx@redhat.com>
> > > > ---
> > > >   include/hw/i386/intel_iommu.h |  8 ++++++++
> > > >   hw/i386/intel_iommu.c         | 31 +++++++++++++++++++++++++++++--
> > > >   2 files changed, 37 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/include/hw/i386/intel_iommu.h
> > b/include/hw/i386/intel_iommu.h
> > > > index 220697253f..1a8ba8e415 100644
> > > > --- a/include/hw/i386/intel_iommu.h
> > > > +++ b/include/hw/i386/intel_iommu.h
> > > > @@ -262,6 +262,14 @@ struct IntelIOMMUState {
> > > >       uint8_t w1cmask[DMAR_REG_SIZE]; /* RW1C(Write 1 to Clear) bytes
> > */
> > > >       uint8_t womask[DMAR_REG_SIZE];  /* WO (write only - read returns
> > 0) */
> > > >       uint32_t version;
> > > > +    /*
> > > > +     * Protects IOMMU states in general.  Normally we don't need to
> > > > +     * take this lock when we are with BQL held.  However we have code
> > > > +     * paths that may run even without BQL.  In those cases, we need
> > > > +     * to take the lock when we have access to IOMMU state
> > > > +     * informations, e.g., the IOTLB.
> 
> better if you can whitelist those paths here to clarify.

Sure. Basically it's the translation path (vtd_iommu_translate).

> 
> > > > +     */
> > > > +    QemuMutex iommu_lock;
> > >
> > > Some questions:
> > >
> > > 1) Do we need to protect context cache too?
> > 
> > IMHO the context cache entry should work even without lock.  That's a
> > bit trickly since we have two cases that this cache will be updated:
> > 
> >   (1) first translation of the address space of a device
> >   (2) invalidation of context entries
> > 
> > For (2) IMHO we don't need to worry about since guest OS should be
> > controlling that part, say, device should not be doing any translation
> > (DMA operations) when the context entry is invalidated.
> 
> again above cannot be assumed.

Yeah, but in that case IMHO it's still the same just like page tables
- we can't control anything really, and the guest itself will be
responsible for any undefined subsequences.

Anyway, let me protect that field too in my next version.

Thanks,
Jason Wang April 28, 2018, 1:43 a.m. UTC | #7
On 2018年04月27日 14:26, Peter Xu wrote:
> On Fri, Apr 27, 2018 at 01:13:02PM +0800, Jason Wang wrote:
>>
>> On 2018年04月25日 12:51, Peter Xu wrote:
>>> Add a per-iommu big lock to protect IOMMU status.  Currently the only
>>> thing to be protected is the IOTLB cache, since that can be accessed
>>> even without BQL, e.g., in IO dataplane.
>>>
>>> Note that device page tables should not need any protection.  The safety
>>> of that should be provided by guest OS.  E.g., when a page entry is
>>> freed, the guest OS should be responsible to make sure that no device
>>> will be using that page any more.
>>>
>>> Reported-by: Fam Zheng<famz@redhat.com>
>>> Signed-off-by: Peter Xu<peterx@redhat.com>
>>> ---
>>>    include/hw/i386/intel_iommu.h |  8 ++++++++
>>>    hw/i386/intel_iommu.c         | 31 +++++++++++++++++++++++++++++--
>>>    2 files changed, 37 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
>>> index 220697253f..1a8ba8e415 100644
>>> --- a/include/hw/i386/intel_iommu.h
>>> +++ b/include/hw/i386/intel_iommu.h
>>> @@ -262,6 +262,14 @@ struct IntelIOMMUState {
>>>        uint8_t w1cmask[DMAR_REG_SIZE]; /* RW1C(Write 1 to Clear) bytes */
>>>        uint8_t womask[DMAR_REG_SIZE];  /* WO (write only - read returns 0) */
>>>        uint32_t version;
>>> +    /*
>>> +     * Protects IOMMU states in general.  Normally we don't need to
>>> +     * take this lock when we are with BQL held.  However we have code
>>> +     * paths that may run even without BQL.  In those cases, we need
>>> +     * to take the lock when we have access to IOMMU state
>>> +     * informations, e.g., the IOTLB.
>>> +     */
>>> +    QemuMutex iommu_lock;
>> Some questions:
>>
>> 1) Do we need to protect context cache too?
> IMHO the context cache entry should work even without lock.  That's a
> bit trickly since we have two cases that this cache will be updated:
>
>    (1) first translation of the address space of a device
>    (2) invalidation of context entries
>
> For (2) IMHO we don't need to worry about since guest OS should be
> controlling that part, say, device should not be doing any translation
> (DMA operations) when the context entry is invalidated.
>
> For (1) the worst case is that the context entry cache be updated
> multiple times with the same value by multiple threads.  IMHO that'll
> be fine too.
>
> But yes for sure we can protect that too with the iommu lock.
>
>> 2) Can we just reuse qemu BQL here?
> I would prefer not.  As I mentioned, at least I have spent too much
> time on fighting BQL already.  I really hope we can start to use
> isolated locks when capable.  BQL is always the worst choice to me.

Just a thought, using BQL may greatly simplify the code actually 
(consider we don't plan to remove BQL now).

>
>> 3) I think the issue is common to all other kinds of IOMMU, so can we simply
>> synchronize before calling ->translate() in memory.c. This seems a more
>> common solution.
> I suspect Power and s390 live well with that.  I think it mean at
> least these platforms won't have problem in concurrency.  I'm adding
> DavidG in loop in case there is further comment.  IMHO we should just
> make sure IOMMU code be thread safe, and we fix problem if there is.
>
> Thanks,
>

Yes, it needs some investigation, but we have other IOMMUs like AMD, and 
we could have a flag to bypass BQL if IOMMU can synchronize by itself.

Thanks
Tian, Kevin April 28, 2018, 1:54 a.m. UTC | #8
> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Friday, April 27, 2018 5:54 PM
> 
> On Fri, Apr 27, 2018 at 07:19:25AM +0000, Tian, Kevin wrote:
> > > From: Peter Xu
> > > Sent: Friday, April 27, 2018 2:26 PM
> > >
> > > On Fri, Apr 27, 2018 at 01:13:02PM +0800, Jason Wang wrote:
> > > >
> > > >
> > > > On 2018年04月25日 12:51, Peter Xu wrote:
> > > > > Add a per-iommu big lock to protect IOMMU status.  Currently the
> only
> > > > > thing to be protected is the IOTLB cache, since that can be accessed
> > > > > even without BQL, e.g., in IO dataplane.
> > > > >
> > > > > Note that device page tables should not need any protection.  The
> > > safety
> > > > > of that should be provided by guest OS.  E.g., when a page entry is
> > > > > freed, the guest OS should be responsible to make sure that no
> device
> > > > > will be using that page any more.
> >
> > device page table definitely doesn't require protection, since it is
> > in-memory structure managed by guest. However the reason
> > above is not accurate - there is no way that guest OS can make
> > sure no device uses non-present page entry, otherwise it doesn't
> > require virtual IOMMU to protect itself. There could be bogus/
> > malicious drivers which surely may program the device to attempt so.
> 
> How about this:
> 
>   Note that we don't need to protect device page tables since that's
>   fully controlled by the guest kernel.  However there is still
>   possibilities that malicious drivers will still program the device
>   to not disobey the rule.  In that case QEMU can't really do anything
>   useful, instead the guest itself will be responsible for all
>   uncertainties.
> 

yes, OK to me

Thanks
Kevin
Peter Xu April 28, 2018, 2:24 a.m. UTC | #9
On Sat, Apr 28, 2018 at 09:43:54AM +0800, Jason Wang wrote:
> 
> 
> On 2018年04月27日 14:26, Peter Xu wrote:
> > On Fri, Apr 27, 2018 at 01:13:02PM +0800, Jason Wang wrote:
> > > 
> > > On 2018年04月25日 12:51, Peter Xu wrote:
> > > > Add a per-iommu big lock to protect IOMMU status.  Currently the only
> > > > thing to be protected is the IOTLB cache, since that can be accessed
> > > > even without BQL, e.g., in IO dataplane.
> > > > 
> > > > Note that device page tables should not need any protection.  The safety
> > > > of that should be provided by guest OS.  E.g., when a page entry is
> > > > freed, the guest OS should be responsible to make sure that no device
> > > > will be using that page any more.
> > > > 
> > > > Reported-by: Fam Zheng<famz@redhat.com>
> > > > Signed-off-by: Peter Xu<peterx@redhat.com>
> > > > ---
> > > >    include/hw/i386/intel_iommu.h |  8 ++++++++
> > > >    hw/i386/intel_iommu.c         | 31 +++++++++++++++++++++++++++++--
> > > >    2 files changed, 37 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> > > > index 220697253f..1a8ba8e415 100644
> > > > --- a/include/hw/i386/intel_iommu.h
> > > > +++ b/include/hw/i386/intel_iommu.h
> > > > @@ -262,6 +262,14 @@ struct IntelIOMMUState {
> > > >        uint8_t w1cmask[DMAR_REG_SIZE]; /* RW1C(Write 1 to Clear) bytes */
> > > >        uint8_t womask[DMAR_REG_SIZE];  /* WO (write only - read returns 0) */
> > > >        uint32_t version;
> > > > +    /*
> > > > +     * Protects IOMMU states in general.  Normally we don't need to
> > > > +     * take this lock when we are with BQL held.  However we have code
> > > > +     * paths that may run even without BQL.  In those cases, we need
> > > > +     * to take the lock when we have access to IOMMU state
> > > > +     * informations, e.g., the IOTLB.
> > > > +     */
> > > > +    QemuMutex iommu_lock;
> > > Some questions:
> > > 
> > > 1) Do we need to protect context cache too?
> > IMHO the context cache entry should work even without lock.  That's a
> > bit trickly since we have two cases that this cache will be updated:
> > 
> >    (1) first translation of the address space of a device
> >    (2) invalidation of context entries
> > 
> > For (2) IMHO we don't need to worry about since guest OS should be
> > controlling that part, say, device should not be doing any translation
> > (DMA operations) when the context entry is invalidated.
> > 
> > For (1) the worst case is that the context entry cache be updated
> > multiple times with the same value by multiple threads.  IMHO that'll
> > be fine too.
> > 
> > But yes for sure we can protect that too with the iommu lock.
> > 
> > > 2) Can we just reuse qemu BQL here?
> > I would prefer not.  As I mentioned, at least I have spent too much
> > time on fighting BQL already.  I really hope we can start to use
> > isolated locks when capable.  BQL is always the worst choice to me.
> 
> Just a thought, using BQL may greatly simplify the code actually (consider
> we don't plan to remove BQL now).

Frankly speaking I don't understand why using BQL may greatly simplify
the code... :( IMHO the lock here is really not a complicated one.

Note that IMO BQL is mostly helpful when we really want something to
be run sequentially with some other things _already_ protected by BQL.
In this case, all the stuff is inside VT-d code itself (or other
IOMMUs), why bother taking the BQL to make our life harder?

So, even if we want to provide a general lock for the translation
procedure, I would prefer we add a per AddressSpace lock but not BQL.
However still that will need some extra flag showing that whether we
need the protection of not.  For example, we may need to expliclitly
turn that off for Power and s390.  Would that really worth it?

So my final preference is still current patch - we solve thread-safety
problems in VT-d and IOMMU code.  Again, we really should make sure
all IOMMUs work with multithreads.

> 
> > 
> > > 3) I think the issue is common to all other kinds of IOMMU, so can we simply
> > > synchronize before calling ->translate() in memory.c. This seems a more
> > > common solution.
> > I suspect Power and s390 live well with that.  I think it mean at
> > least these platforms won't have problem in concurrency.  I'm adding
> > DavidG in loop in case there is further comment.  IMHO we should just
> > make sure IOMMU code be thread safe, and we fix problem if there is.
> > 
> > Thanks,
> > 
> 
> Yes, it needs some investigation, but we have other IOMMUs like AMD, and we
> could have a flag to bypass BQL if IOMMU can synchronize by itself.

AMD is still only for experimental.  If we really want to use it in
production IMHO it'll need more testings and tunings not only on
thread-safety but on other stuffs too.  So again, we can just fix them
when needed.  I still don't see it a reason to depend on BQL here.

I'll see what others think about it.

CCing Paolo, Stefan and Fam too.

Thanks,
Jason Wang April 28, 2018, 2:42 a.m. UTC | #10
On 2018年04月28日 10:24, Peter Xu wrote:
> On Sat, Apr 28, 2018 at 09:43:54AM +0800, Jason Wang wrote:
>>
>> On 2018年04月27日 14:26, Peter Xu wrote:
>>> On Fri, Apr 27, 2018 at 01:13:02PM +0800, Jason Wang wrote:
>>>> On 2018年04月25日 12:51, Peter Xu wrote:
>>>>> Add a per-iommu big lock to protect IOMMU status.  Currently the only
>>>>> thing to be protected is the IOTLB cache, since that can be accessed
>>>>> even without BQL, e.g., in IO dataplane.
>>>>>
>>>>> Note that device page tables should not need any protection.  The safety
>>>>> of that should be provided by guest OS.  E.g., when a page entry is
>>>>> freed, the guest OS should be responsible to make sure that no device
>>>>> will be using that page any more.
>>>>>
>>>>> Reported-by: Fam Zheng<famz@redhat.com>
>>>>> Signed-off-by: Peter Xu<peterx@redhat.com>
>>>>> ---
>>>>>     include/hw/i386/intel_iommu.h |  8 ++++++++
>>>>>     hw/i386/intel_iommu.c         | 31 +++++++++++++++++++++++++++++--
>>>>>     2 files changed, 37 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
>>>>> index 220697253f..1a8ba8e415 100644
>>>>> --- a/include/hw/i386/intel_iommu.h
>>>>> +++ b/include/hw/i386/intel_iommu.h
>>>>> @@ -262,6 +262,14 @@ struct IntelIOMMUState {
>>>>>         uint8_t w1cmask[DMAR_REG_SIZE]; /* RW1C(Write 1 to Clear) bytes */
>>>>>         uint8_t womask[DMAR_REG_SIZE];  /* WO (write only - read returns 0) */
>>>>>         uint32_t version;
>>>>> +    /*
>>>>> +     * Protects IOMMU states in general.  Normally we don't need to
>>>>> +     * take this lock when we are with BQL held.  However we have code
>>>>> +     * paths that may run even without BQL.  In those cases, we need
>>>>> +     * to take the lock when we have access to IOMMU state
>>>>> +     * informations, e.g., the IOTLB.
>>>>> +     */
>>>>> +    QemuMutex iommu_lock;
>>>> Some questions:
>>>>
>>>> 1) Do we need to protect context cache too?
>>> IMHO the context cache entry should work even without lock.  That's a
>>> bit trickly since we have two cases that this cache will be updated:
>>>
>>>     (1) first translation of the address space of a device
>>>     (2) invalidation of context entries
>>>
>>> For (2) IMHO we don't need to worry about since guest OS should be
>>> controlling that part, say, device should not be doing any translation
>>> (DMA operations) when the context entry is invalidated.
>>>
>>> For (1) the worst case is that the context entry cache be updated
>>> multiple times with the same value by multiple threads.  IMHO that'll
>>> be fine too.
>>>
>>> But yes for sure we can protect that too with the iommu lock.
>>>
>>>> 2) Can we just reuse qemu BQL here?
>>> I would prefer not.  As I mentioned, at least I have spent too much
>>> time on fighting BQL already.  I really hope we can start to use
>>> isolated locks when capable.  BQL is always the worst choice to me.
>> Just a thought, using BQL may greatly simplify the code actually (consider
>> we don't plan to remove BQL now).
> Frankly speaking I don't understand why using BQL may greatly simplify
> the code... :( IMHO the lock here is really not a complicated one.
>
> Note that IMO BQL is mostly helpful when we really want something to
> be run sequentially with some other things _already_ protected by BQL.

Except for the translate path from dataplane, I belive all other codes 
were already protected by BQL.

> In this case, all the stuff is inside VT-d code itself (or other
> IOMMUs), why bother taking the BQL to make our life harder?

It looks to me it was as simple as:

@@ -494,6 +494,7 @@ static MemoryRegionSection 
flatview_do_translate(FlatView *fv,
      IOMMUMemoryRegionClass *imrc;
      hwaddr page_mask = (hwaddr)(-1);
      hwaddr plen = (hwaddr)(-1);
+    int locked = false;

      if (plen_out) {
          plen = *plen_out;
@@ -510,8 +511,15 @@ static MemoryRegionSection 
flatview_do_translate(FlatView *fv,
          }
          imrc = memory_region_get_iommu_class_nocheck(iommu_mr);

+        if (!qemu_mutex_iothread_locked()) {
+            locked = true;
+            qemu_mutex_lock_iothread();
+        }
          iotlb = imrc->translate(iommu_mr, addr, is_write ?
                                  IOMMU_WO : IOMMU_RO);
+        if (locked) {
+            qemu_mutex_unlock_iothread();
+        }
          addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
                  | (addr & iotlb.addr_mask));
          page_mask &= iotlb.addr_mask;


>
> So, even if we want to provide a general lock for the translation
> procedure, I would prefer we add a per AddressSpace lock but not BQL.

It could be, but it needs more work on each specific IOMMU codes.

> However still that will need some extra flag showing that whether we
> need the protection of not.  For example, we may need to expliclitly
> turn that off for Power and s390.  Would that really worth it?

It would cost just several lines of code, anything wrong with this?

>
> So my final preference is still current patch - we solve thread-safety
> problems in VT-d and IOMMU code.  Again, we really should make sure
> all IOMMUs work with multithreads.
>
>>>> 3) I think the issue is common to all other kinds of IOMMU, so can we simply
>>>> synchronize before calling ->translate() in memory.c. This seems a more
>>>> common solution.
>>> I suspect Power and s390 live well with that.  I think it mean at
>>> least these platforms won't have problem in concurrency.  I'm adding
>>> DavidG in loop in case there is further comment.  IMHO we should just
>>> make sure IOMMU code be thread safe, and we fix problem if there is.
>>>
>>> Thanks,
>>>
>> Yes, it needs some investigation, but we have other IOMMUs like AMD, and we
>> could have a flag to bypass BQL if IOMMU can synchronize by itself.
> AMD is still only for experimental.  If we really want to use it in
> production IMHO it'll need more testings and tunings not only on
> thread-safety but on other stuffs too.  So again, we can just fix them
> when needed.  I still don't see it a reason to depend on BQL here.

Well, it's not about BQL specifically, it's about whether we have or 
need a generic thread safety solution for all IOMMUs.

We have more IOMMUs than just AMD, s390 and ppc:

# git grep imrc-\>translate\ =
hw/alpha/typhoon.c:    imrc->translate = typhoon_translate_iommu;
hw/dma/rc4030.c:    imrc->translate = rc4030_dma_translate;
hw/i386/amd_iommu.c:    imrc->translate = amdvi_translate;
hw/i386/intel_iommu.c:    imrc->translate = vtd_iommu_translate;
hw/ppc/spapr_iommu.c:    imrc->translate = spapr_tce_translate_iommu;
hw/s390x/s390-pci-bus.c:    imrc->translate = s390_translate_iommu;
hw/sparc/sun4m_iommu.c:    imrc->translate = sun4m_translate_iommu;
hw/sparc64/sun4u_iommu.c:    imrc->translate = sun4u_translate_iommu;

And we know there will be more in the near future.

Thanks

>
> I'll see what others think about it.
>
> CCing Paolo, Stefan and Fam too.
>
> Thanks,
>
Peter Xu April 28, 2018, 3:06 a.m. UTC | #11
On Sat, Apr 28, 2018 at 10:42:11AM +0800, Jason Wang wrote:
> 
> 
> On 2018年04月28日 10:24, Peter Xu wrote:
> > On Sat, Apr 28, 2018 at 09:43:54AM +0800, Jason Wang wrote:
> > > 
> > > On 2018年04月27日 14:26, Peter Xu wrote:
> > > > On Fri, Apr 27, 2018 at 01:13:02PM +0800, Jason Wang wrote:
> > > > > On 2018年04月25日 12:51, Peter Xu wrote:
> > > > > > Add a per-iommu big lock to protect IOMMU status.  Currently the only
> > > > > > thing to be protected is the IOTLB cache, since that can be accessed
> > > > > > even without BQL, e.g., in IO dataplane.
> > > > > > 
> > > > > > Note that device page tables should not need any protection.  The safety
> > > > > > of that should be provided by guest OS.  E.g., when a page entry is
> > > > > > freed, the guest OS should be responsible to make sure that no device
> > > > > > will be using that page any more.
> > > > > > 
> > > > > > Reported-by: Fam Zheng<famz@redhat.com>
> > > > > > Signed-off-by: Peter Xu<peterx@redhat.com>
> > > > > > ---
> > > > > >     include/hw/i386/intel_iommu.h |  8 ++++++++
> > > > > >     hw/i386/intel_iommu.c         | 31 +++++++++++++++++++++++++++++--
> > > > > >     2 files changed, 37 insertions(+), 2 deletions(-)
> > > > > > 
> > > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> > > > > > index 220697253f..1a8ba8e415 100644
> > > > > > --- a/include/hw/i386/intel_iommu.h
> > > > > > +++ b/include/hw/i386/intel_iommu.h
> > > > > > @@ -262,6 +262,14 @@ struct IntelIOMMUState {
> > > > > >         uint8_t w1cmask[DMAR_REG_SIZE]; /* RW1C(Write 1 to Clear) bytes */
> > > > > >         uint8_t womask[DMAR_REG_SIZE];  /* WO (write only - read returns 0) */
> > > > > >         uint32_t version;
> > > > > > +    /*
> > > > > > +     * Protects IOMMU states in general.  Normally we don't need to
> > > > > > +     * take this lock when we are with BQL held.  However we have code
> > > > > > +     * paths that may run even without BQL.  In those cases, we need
> > > > > > +     * to take the lock when we have access to IOMMU state
> > > > > > +     * informations, e.g., the IOTLB.
> > > > > > +     */
> > > > > > +    QemuMutex iommu_lock;
> > > > > Some questions:
> > > > > 
> > > > > 1) Do we need to protect context cache too?
> > > > IMHO the context cache entry should work even without lock.  That's a
> > > > bit trickly since we have two cases that this cache will be updated:
> > > > 
> > > >     (1) first translation of the address space of a device
> > > >     (2) invalidation of context entries
> > > > 
> > > > For (2) IMHO we don't need to worry about since guest OS should be
> > > > controlling that part, say, device should not be doing any translation
> > > > (DMA operations) when the context entry is invalidated.
> > > > 
> > > > For (1) the worst case is that the context entry cache be updated
> > > > multiple times with the same value by multiple threads.  IMHO that'll
> > > > be fine too.
> > > > 
> > > > But yes for sure we can protect that too with the iommu lock.
> > > > 
> > > > > 2) Can we just reuse qemu BQL here?
> > > > I would prefer not.  As I mentioned, at least I have spent too much
> > > > time on fighting BQL already.  I really hope we can start to use
> > > > isolated locks when capable.  BQL is always the worst choice to me.
> > > Just a thought, using BQL may greatly simplify the code actually (consider
> > > we don't plan to remove BQL now).
> > Frankly speaking I don't understand why using BQL may greatly simplify
> > the code... :( IMHO the lock here is really not a complicated one.
> > 
> > Note that IMO BQL is mostly helpful when we really want something to
> > be run sequentially with some other things _already_ protected by BQL.
> 
> Except for the translate path from dataplane, I belive all other codes were
> already protected by BQL.
> 
> > In this case, all the stuff is inside VT-d code itself (or other
> > IOMMUs), why bother taking the BQL to make our life harder?
> 
> It looks to me it was as simple as:
> 
> @@ -494,6 +494,7 @@ static MemoryRegionSection
> flatview_do_translate(FlatView *fv,
>      IOMMUMemoryRegionClass *imrc;
>      hwaddr page_mask = (hwaddr)(-1);
>      hwaddr plen = (hwaddr)(-1);
> +    int locked = false;
> 
>      if (plen_out) {
>          plen = *plen_out;
> @@ -510,8 +511,15 @@ static MemoryRegionSection
> flatview_do_translate(FlatView *fv,
>          }
>          imrc = memory_region_get_iommu_class_nocheck(iommu_mr);
> 
> +        if (!qemu_mutex_iothread_locked()) {
> +            locked = true;
> +            qemu_mutex_lock_iothread();
> +        }
>          iotlb = imrc->translate(iommu_mr, addr, is_write ?
>                                  IOMMU_WO : IOMMU_RO);
> +        if (locked) {
> +            qemu_mutex_unlock_iothread();
> +        }
>          addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
>                  | (addr & iotlb.addr_mask));
>          page_mask &= iotlb.addr_mask;

We'll need to add the flags thing too.  How do we flag-out existing
thread-safe IOMMUs?

> 
> 
> > 
> > So, even if we want to provide a general lock for the translation
> > procedure, I would prefer we add a per AddressSpace lock but not BQL.
> 
> It could be, but it needs more work on each specific IOMMU codes.
> 
> > However still that will need some extra flag showing that whether we
> > need the protection of not.  For example, we may need to expliclitly
> > turn that off for Power and s390.  Would that really worth it?
> 
> It would cost just several lines of code, anything wrong with this?

It's not about anything wrong; it's just about preference.

I never said BQL won't work here.  It will work.  But if you have
spent tens of hours working on BQL-related problems maybe you'll have
the same preference as me... :)

IMHO the point is to decide which might be simpler and more efficient
in general, really.

> 
> > 
> > So my final preference is still current patch - we solve thread-safety
> > problems in VT-d and IOMMU code.  Again, we really should make sure
> > all IOMMUs work with multithreads.
> > 
> > > > > 3) I think the issue is common to all other kinds of IOMMU, so can we simply
> > > > > synchronize before calling ->translate() in memory.c. This seems a more
> > > > > common solution.
> > > > I suspect Power and s390 live well with that.  I think it mean at
> > > > least these platforms won't have problem in concurrency.  I'm adding
> > > > DavidG in loop in case there is further comment.  IMHO we should just
> > > > make sure IOMMU code be thread safe, and we fix problem if there is.
> > > > 
> > > > Thanks,
> > > > 
> > > Yes, it needs some investigation, but we have other IOMMUs like AMD, and we
> > > could have a flag to bypass BQL if IOMMU can synchronize by itself.
> > AMD is still only for experimental.  If we really want to use it in
> > production IMHO it'll need more testings and tunings not only on
> > thread-safety but on other stuffs too.  So again, we can just fix them
> > when needed.  I still don't see it a reason to depend on BQL here.
> 
> Well, it's not about BQL specifically, it's about whether we have or need a
> generic thread safety solution for all IOMMUs.
> 
> We have more IOMMUs than just AMD, s390 and ppc:
> 
> # git grep imrc-\>translate\ =
> hw/alpha/typhoon.c:    imrc->translate = typhoon_translate_iommu;
> hw/dma/rc4030.c:    imrc->translate = rc4030_dma_translate;
> hw/i386/amd_iommu.c:    imrc->translate = amdvi_translate;
> hw/i386/intel_iommu.c:    imrc->translate = vtd_iommu_translate;
> hw/ppc/spapr_iommu.c:    imrc->translate = spapr_tce_translate_iommu;
> hw/s390x/s390-pci-bus.c:    imrc->translate = s390_translate_iommu;
> hw/sparc/sun4m_iommu.c:    imrc->translate = sun4m_translate_iommu;
> hw/sparc64/sun4u_iommu.c:    imrc->translate = sun4u_translate_iommu;
> 
> And we know there will be more in the near future.

Again - here I would suggest we consider thread-safe when implementing
new ones.  I suppose it should not be a hard thing to achieve.

I don't have more and new input here since I have had some in previous
posts already.  If this is still during discussion before the next
post, I'll pick this patch out of the series since this patch is not
related to other patches at all, so can be dealt with isolatedly.

Thanks,
Jason Wang April 28, 2018, 3:11 a.m. UTC | #12
On 2018年04月28日 11:06, Peter Xu wrote:
> On Sat, Apr 28, 2018 at 10:42:11AM +0800, Jason Wang wrote:
>>
>> On 2018年04月28日 10:24, Peter Xu wrote:
>>> On Sat, Apr 28, 2018 at 09:43:54AM +0800, Jason Wang wrote:
>>>> On 2018年04月27日 14:26, Peter Xu wrote:
>>>>> On Fri, Apr 27, 2018 at 01:13:02PM +0800, Jason Wang wrote:
>>>>>> On 2018年04月25日 12:51, Peter Xu wrote:
>>>>>>> Add a per-iommu big lock to protect IOMMU status.  Currently the only
>>>>>>> thing to be protected is the IOTLB cache, since that can be accessed
>>>>>>> even without BQL, e.g., in IO dataplane.
>>>>>>>
>>>>>>> Note that device page tables should not need any protection.  The safety
>>>>>>> of that should be provided by guest OS.  E.g., when a page entry is
>>>>>>> freed, the guest OS should be responsible to make sure that no device
>>>>>>> will be using that page any more.
>>>>>>>
>>>>>>> Reported-by: Fam Zheng<famz@redhat.com>
>>>>>>> Signed-off-by: Peter Xu<peterx@redhat.com>
>>>>>>> ---
>>>>>>>      include/hw/i386/intel_iommu.h |  8 ++++++++
>>>>>>>      hw/i386/intel_iommu.c         | 31 +++++++++++++++++++++++++++++--
>>>>>>>      2 files changed, 37 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
>>>>>>> index 220697253f..1a8ba8e415 100644
>>>>>>> --- a/include/hw/i386/intel_iommu.h
>>>>>>> +++ b/include/hw/i386/intel_iommu.h
>>>>>>> @@ -262,6 +262,14 @@ struct IntelIOMMUState {
>>>>>>>          uint8_t w1cmask[DMAR_REG_SIZE]; /* RW1C(Write 1 to Clear) bytes */
>>>>>>>          uint8_t womask[DMAR_REG_SIZE];  /* WO (write only - read returns 0) */
>>>>>>>          uint32_t version;
>>>>>>> +    /*
>>>>>>> +     * Protects IOMMU states in general.  Normally we don't need to
>>>>>>> +     * take this lock when we are with BQL held.  However we have code
>>>>>>> +     * paths that may run even without BQL.  In those cases, we need
>>>>>>> +     * to take the lock when we have access to IOMMU state
>>>>>>> +     * informations, e.g., the IOTLB.
>>>>>>> +     */
>>>>>>> +    QemuMutex iommu_lock;
>>>>>> Some questions:
>>>>>>
>>>>>> 1) Do we need to protect context cache too?
>>>>> IMHO the context cache entry should work even without lock.  That's a
>>>>> bit trickly since we have two cases that this cache will be updated:
>>>>>
>>>>>      (1) first translation of the address space of a device
>>>>>      (2) invalidation of context entries
>>>>>
>>>>> For (2) IMHO we don't need to worry about since guest OS should be
>>>>> controlling that part, say, device should not be doing any translation
>>>>> (DMA operations) when the context entry is invalidated.
>>>>>
>>>>> For (1) the worst case is that the context entry cache be updated
>>>>> multiple times with the same value by multiple threads.  IMHO that'll
>>>>> be fine too.
>>>>>
>>>>> But yes for sure we can protect that too with the iommu lock.
>>>>>
>>>>>> 2) Can we just reuse qemu BQL here?
>>>>> I would prefer not.  As I mentioned, at least I have spent too much
>>>>> time on fighting BQL already.  I really hope we can start to use
>>>>> isolated locks when capable.  BQL is always the worst choice to me.
>>>> Just a thought, using BQL may greatly simplify the code actually (consider
>>>> we don't plan to remove BQL now).
>>> Frankly speaking I don't understand why using BQL may greatly simplify
>>> the code... :( IMHO the lock here is really not a complicated one.
>>>
>>> Note that IMO BQL is mostly helpful when we really want something to
>>> be run sequentially with some other things _already_ protected by BQL.
>> Except for the translate path from dataplane, I belive all other codes were
>> already protected by BQL.
>>
>>> In this case, all the stuff is inside VT-d code itself (or other
>>> IOMMUs), why bother taking the BQL to make our life harder?
>> It looks to me it was as simple as:
>>
>> @@ -494,6 +494,7 @@ static MemoryRegionSection
>> flatview_do_translate(FlatView *fv,
>>       IOMMUMemoryRegionClass *imrc;
>>       hwaddr page_mask = (hwaddr)(-1);
>>       hwaddr plen = (hwaddr)(-1);
>> +    int locked = false;
>>
>>       if (plen_out) {
>>           plen = *plen_out;
>> @@ -510,8 +511,15 @@ static MemoryRegionSection
>> flatview_do_translate(FlatView *fv,
>>           }
>>           imrc = memory_region_get_iommu_class_nocheck(iommu_mr);
>>
>> +        if (!qemu_mutex_iothread_locked()) {
>> +            locked = true;
>> +            qemu_mutex_lock_iothread();
>> +        }
>>           iotlb = imrc->translate(iommu_mr, addr, is_write ?
>>                                   IOMMU_WO : IOMMU_RO);
>> +        if (locked) {
>> +            qemu_mutex_unlock_iothread();
>> +        }
>>           addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
>>                   | (addr & iotlb.addr_mask));
>>           page_mask &= iotlb.addr_mask;
> We'll need to add the flags thing too.  How do we flag-out existing
> thread-safe IOMMUs?

We can let thread safe IOMMU code to choose to set a flag somewhere.

>
>>
>>> So, even if we want to provide a general lock for the translation
>>> procedure, I would prefer we add a per AddressSpace lock but not BQL.
>> It could be, but it needs more work on each specific IOMMU codes.
>>
>>> However still that will need some extra flag showing that whether we
>>> need the protection of not.  For example, we may need to expliclitly
>>> turn that off for Power and s390.  Would that really worth it?
>> It would cost just several lines of code, anything wrong with this?
> It's not about anything wrong; it's just about preference.
>
> I never said BQL won't work here.  It will work.  But if you have
> spent tens of hours working on BQL-related problems maybe you'll have
> the same preference as me... :)
>
> IMHO the point is to decide which might be simpler and more efficient
> in general, really.

So I'm not against your approach. It could be on top of the BQL patch I 
think.

>
>>> So my final preference is still current patch - we solve thread-safety
>>> problems in VT-d and IOMMU code.  Again, we really should make sure
>>> all IOMMUs work with multithreads.
>>>
>>>>>> 3) I think the issue is common to all other kinds of IOMMU, so can we simply
>>>>>> synchronize before calling ->translate() in memory.c. This seems a more
>>>>>> common solution.
>>>>> I suspect Power and s390 live well with that.  I think it mean at
>>>>> least these platforms won't have problem in concurrency.  I'm adding
>>>>> DavidG in loop in case there is further comment.  IMHO we should just
>>>>> make sure IOMMU code be thread safe, and we fix problem if there is.
>>>>>
>>>>> Thanks,
>>>>>
>>>> Yes, it needs some investigation, but we have other IOMMUs like AMD, and we
>>>> could have a flag to bypass BQL if IOMMU can synchronize by itself.
>>> AMD is still only for experimental.  If we really want to use it in
>>> production IMHO it'll need more testings and tunings not only on
>>> thread-safety but on other stuffs too.  So again, we can just fix them
>>> when needed.  I still don't see it a reason to depend on BQL here.
>> Well, it's not about BQL specifically, it's about whether we have or need a
>> generic thread safety solution for all IOMMUs.
>>
>> We have more IOMMUs than just AMD, s390 and ppc:
>>
>> # git grep imrc-\>translate\ =
>> hw/alpha/typhoon.c:    imrc->translate = typhoon_translate_iommu;
>> hw/dma/rc4030.c:    imrc->translate = rc4030_dma_translate;
>> hw/i386/amd_iommu.c:    imrc->translate = amdvi_translate;
>> hw/i386/intel_iommu.c:    imrc->translate = vtd_iommu_translate;
>> hw/ppc/spapr_iommu.c:    imrc->translate = spapr_tce_translate_iommu;
>> hw/s390x/s390-pci-bus.c:    imrc->translate = s390_translate_iommu;
>> hw/sparc/sun4m_iommu.c:    imrc->translate = sun4m_translate_iommu;
>> hw/sparc64/sun4u_iommu.c:    imrc->translate = sun4u_translate_iommu;
>>
>> And we know there will be more in the near future.
> Again - here I would suggest we consider thread-safe when implementing
> new ones.  I suppose it should not be a hard thing to achieve.
>
> I don't have more and new input here since I have had some in previous
> posts already.  If this is still during discussion before the next
> post, I'll pick this patch out of the series since this patch is not
> related to other patches at all, so can be dealt with isolatedly.
>
> Thanks,
>

I fully understand the your motivation, just want to see if we can do 
something simply for all other IOMMUs. I think this series can go alone 
without caring other IOMMUs for sure.

Thanks
Peter Xu April 28, 2018, 3:14 a.m. UTC | #13
On Sat, Apr 28, 2018 at 10:42:11AM +0800, Jason Wang wrote:

[...]

> Well, it's not about BQL specifically, it's about whether we have or need a
> generic thread safety solution for all IOMMUs.
> 
> We have more IOMMUs than just AMD, s390 and ppc:
> 
> # git grep imrc-\>translate\ =
> hw/alpha/typhoon.c:    imrc->translate = typhoon_translate_iommu;
> hw/dma/rc4030.c:    imrc->translate = rc4030_dma_translate;
> hw/i386/amd_iommu.c:    imrc->translate = amdvi_translate;
> hw/i386/intel_iommu.c:    imrc->translate = vtd_iommu_translate;
> hw/ppc/spapr_iommu.c:    imrc->translate = spapr_tce_translate_iommu;
> hw/s390x/s390-pci-bus.c:    imrc->translate = s390_translate_iommu;
> hw/sparc/sun4m_iommu.c:    imrc->translate = sun4m_translate_iommu;
> hw/sparc64/sun4u_iommu.c:    imrc->translate = sun4u_translate_iommu;

Sorry I didn't notice this one.  This point is valid.  But again, we
need to know whether they are thread-safe already.  For VT-d, it never
hurt to have this patch to fix its own problem, so above is not a
reason to not have current patch, since it solves different problems.
Basically I'll see the solution of above problem as a separate patch
as current one.

Meanwhile, even if we want to provide a general protection on that,
again I would prefer not using BQL but use some common and new
iommu-lock.  BQL can be a nightmare sometimes.

Thanks,
Jason Wang April 28, 2018, 3:16 a.m. UTC | #14
On 2018年04月28日 11:14, Peter Xu wrote:
> On Sat, Apr 28, 2018 at 10:42:11AM +0800, Jason Wang wrote:
>
> [...]
>
>> Well, it's not about BQL specifically, it's about whether we have or need a
>> generic thread safety solution for all IOMMUs.
>>
>> We have more IOMMUs than just AMD, s390 and ppc:
>>
>> # git grep imrc-\>translate\ =
>> hw/alpha/typhoon.c:    imrc->translate = typhoon_translate_iommu;
>> hw/dma/rc4030.c:    imrc->translate = rc4030_dma_translate;
>> hw/i386/amd_iommu.c:    imrc->translate = amdvi_translate;
>> hw/i386/intel_iommu.c:    imrc->translate = vtd_iommu_translate;
>> hw/ppc/spapr_iommu.c:    imrc->translate = spapr_tce_translate_iommu;
>> hw/s390x/s390-pci-bus.c:    imrc->translate = s390_translate_iommu;
>> hw/sparc/sun4m_iommu.c:    imrc->translate = sun4m_translate_iommu;
>> hw/sparc64/sun4u_iommu.c:    imrc->translate = sun4u_translate_iommu;
> Sorry I didn't notice this one.  This point is valid.  But again, we
> need to know whether they are thread-safe already.  For VT-d, it never
> hurt to have this patch to fix its own problem, so above is not a
> reason to not have current patch, since it solves different problems.
> Basically I'll see the solution of above problem as a separate patch
> as current one.
>
> Meanwhile, even if we want to provide a general protection on that,
> again I would prefer not using BQL but use some common and new
> iommu-lock.  BQL can be a nightmare sometimes.

Yes, you can use other if BQL turns out to be troublesome.

Thanks

>
> Thanks,
>
Paolo Bonzini April 30, 2018, 7:20 a.m. UTC | #15
On 28/04/2018 04:24, Peter Xu wrote:
>>>> 2) Can we just reuse qemu BQL here?
>>> I would prefer not.  As I mentioned, at least I have spent too much
>>> time on fighting BQL already.  I really hope we can start to use
>>> isolated locks when capable.  BQL is always the worst choice to me.
>> Just a thought, using BQL may greatly simplify the code actually (consider
>> we don't plan to remove BQL now).
> Frankly speaking I don't understand why using BQL may greatly simplify
> the code... :( IMHO the lock here is really not a complicated one.
> 
> Note that IMO BQL is mostly helpful when we really want something to
> be run sequentially with some other things _already_ protected by BQL.
> In this case, all the stuff is inside VT-d code itself (or other
> IOMMUs), why bother taking the BQL to make our life harder?
> 
> So, even if we want to provide a general lock for the translation
> procedure, I would prefer we add a per AddressSpace lock but not BQL.
> However still that will need some extra flag showing that whether we
> need the protection of not.  For example, we may need to expliclitly
> turn that off for Power and s390.  Would that really worth it?
> 
> So my final preference is still current patch - we solve thread-safety
> problems in VT-d and IOMMU code.  Again, we really should make sure
> all IOMMUs work with multithreads.
> 

I agree.  In particular, using BQL is _worse_ because it has very strict
lock ordering requirements.  Using fine-grained locks is greatly
preferred as long as:

1) they are leaves in the lock ordering

2) they are not kept across calls to external callbacks (or there are no
external callbacks involved)

Paolo
Paolo Bonzini April 30, 2018, 7:22 a.m. UTC | #16
On 28/04/2018 05:14, Peter Xu wrote:
>> # git grep imrc-\>translate\ =
>> hw/alpha/typhoon.c:    imrc->translate = typhoon_translate_iommu;
>> hw/dma/rc4030.c:    imrc->translate = rc4030_dma_translate;
>> hw/i386/amd_iommu.c:    imrc->translate = amdvi_translate;
>> hw/i386/intel_iommu.c:    imrc->translate = vtd_iommu_translate;
>> hw/ppc/spapr_iommu.c:    imrc->translate = spapr_tce_translate_iommu;
>> hw/s390x/s390-pci-bus.c:    imrc->translate = s390_translate_iommu;
>> hw/sparc/sun4m_iommu.c:    imrc->translate = sun4m_translate_iommu;
>> hw/sparc64/sun4u_iommu.c:    imrc->translate = sun4u_translate_iommu;
> Sorry I didn't notice this one.  This point is valid.  But again, we
> need to know whether they are thread-safe already.  For VT-d, it never
> hurt to have this patch to fix its own problem, so above is not a
> reason to not have current patch, since it solves different problems.
> Basically I'll see the solution of above problem as a separate patch
> as current one.

If they are not thread-safe, we fix them too.  They are not many.

Paolo
Peter Xu May 3, 2018, 5:39 a.m. UTC | #17
On Mon, Apr 30, 2018 at 09:20:42AM +0200, Paolo Bonzini wrote:
> On 28/04/2018 04:24, Peter Xu wrote:
> >>>> 2) Can we just reuse qemu BQL here?
> >>> I would prefer not.  As I mentioned, at least I have spent too much
> >>> time on fighting BQL already.  I really hope we can start to use
> >>> isolated locks when capable.  BQL is always the worst choice to me.
> >> Just a thought, using BQL may greatly simplify the code actually (consider
> >> we don't plan to remove BQL now).
> > Frankly speaking I don't understand why using BQL may greatly simplify
> > the code... :( IMHO the lock here is really not a complicated one.
> > 
> > Note that IMO BQL is mostly helpful when we really want something to
> > be run sequentially with some other things _already_ protected by BQL.
> > In this case, all the stuff is inside VT-d code itself (or other
> > IOMMUs), why bother taking the BQL to make our life harder?
> > 
> > So, even if we want to provide a general lock for the translation
> > procedure, I would prefer we add a per AddressSpace lock but not BQL.
> > However still that will need some extra flag showing that whether we
> > need the protection of not.  For example, we may need to expliclitly
> > turn that off for Power and s390.  Would that really worth it?
> > 
> > So my final preference is still current patch - we solve thread-safety
> > problems in VT-d and IOMMU code.  Again, we really should make sure
> > all IOMMUs work with multithreads.
> > 
> 
> I agree.  In particular, using BQL is _worse_ because it has very strict
> lock ordering requirements.  Using fine-grained locks is greatly
> preferred as long as:
> 
> 1) they are leaves in the lock ordering
> 
> 2) they are not kept across calls to external callbacks (or there are no
> external callbacks involved)

Thanks Paolo for the input.

I'll temporarily keep this patch in my next post.  We can further
discuss it there if we have better alternatives.

Regards,
diff mbox series

Patch

diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 220697253f..1a8ba8e415 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -262,6 +262,14 @@  struct IntelIOMMUState {
     uint8_t w1cmask[DMAR_REG_SIZE]; /* RW1C(Write 1 to Clear) bytes */
     uint8_t womask[DMAR_REG_SIZE];  /* WO (write only - read returns 0) */
     uint32_t version;
+    /*
+     * Protects IOMMU states in general.  Normally we don't need to
+     * take this lock when we are with BQL held.  However we have code
+     * paths that may run even without BQL.  In those cases, we need
+     * to take the lock when we have access to IOMMU state
+     * informations, e.g., the IOTLB.
+     */
+    QemuMutex iommu_lock;
 
     bool caching_mode;          /* RO - is cap CM enabled? */
 
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 5987b48d43..e4ee211dde 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -128,6 +128,16 @@  static uint64_t vtd_set_clear_mask_quad(IntelIOMMUState *s, hwaddr addr,
     return new_val;
 }
 
+static inline void vtd_iommu_lock(IntelIOMMUState *s)
+{
+    qemu_mutex_lock(&s->iommu_lock);
+}
+
+static inline void vtd_iommu_unlock(IntelIOMMUState *s)
+{
+    qemu_mutex_unlock(&s->iommu_lock);
+}
+
 /* GHashTable functions */
 static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
 {
@@ -197,12 +207,19 @@  static void vtd_reset_context_cache(IntelIOMMUState *s)
     s->context_cache_gen = 1;
 }
 
-static void vtd_reset_iotlb(IntelIOMMUState *s)
+static void vtd_reset_iotlb_locked(IntelIOMMUState *s)
 {
     assert(s->iotlb);
     g_hash_table_remove_all(s->iotlb);
 }
 
+static void vtd_reset_iotlb(IntelIOMMUState *s)
+{
+    vtd_iommu_lock(s);
+    vtd_reset_iotlb_locked(s);
+    vtd_iommu_unlock(s);
+}
+
 static uint64_t vtd_get_iotlb_key(uint64_t gfn, uint16_t source_id,
                                   uint32_t level)
 {
@@ -222,6 +239,7 @@  static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, uint16_t source_id,
     uint64_t key;
     int level;
 
+    vtd_iommu_lock(s);
     for (level = VTD_SL_PT_LEVEL; level < VTD_SL_PML4_LEVEL; level++) {
         key = vtd_get_iotlb_key(vtd_get_iotlb_gfn(addr, level),
                                 source_id, level);
@@ -232,6 +250,7 @@  static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, uint16_t source_id,
     }
 
 out:
+    vtd_iommu_unlock(s);
     return entry;
 }
 
@@ -244,9 +263,11 @@  static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t source_id,
     uint64_t gfn = vtd_get_iotlb_gfn(addr, level);
 
     trace_vtd_iotlb_page_update(source_id, addr, slpte, domain_id);
+
+    vtd_iommu_lock(s);
     if (g_hash_table_size(s->iotlb) >= VTD_IOTLB_MAX_SIZE) {
         trace_vtd_iotlb_reset("iotlb exceeds size limit");
-        vtd_reset_iotlb(s);
+        vtd_reset_iotlb_locked(s);
     }
 
     entry->gfn = gfn;
@@ -256,6 +277,7 @@  static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t source_id,
     entry->mask = vtd_slpt_level_page_mask(level);
     *key = vtd_get_iotlb_key(gfn, source_id, level);
     g_hash_table_replace(s->iotlb, key, entry);
+    vtd_iommu_unlock(s);
 }
 
 /* Given the reg addr of both the message data and address, generate an
@@ -1377,8 +1399,10 @@  static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
 
     trace_vtd_inv_desc_iotlb_domain(domain_id);
 
+    vtd_iommu_lock(s);
     g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_domain,
                                 &domain_id);
+    vtd_iommu_unlock(s);
 
     QLIST_FOREACH(vtd_as, &s->notifiers_list, next) {
         if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
@@ -1426,7 +1450,9 @@  static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
     info.domain_id = domain_id;
     info.addr = addr;
     info.mask = ~((1 << am) - 1);
+    vtd_iommu_lock(s);
     g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, &info);
+    vtd_iommu_unlock(s);
     vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);
 }
 
@@ -3072,6 +3098,7 @@  static void vtd_realize(DeviceState *dev, Error **errp)
     }
 
     QLIST_INIT(&s->notifiers_list);
+    qemu_mutex_init(&s->iommu_lock);
     memset(s->vtd_as_by_bus_num, 0, sizeof(s->vtd_as_by_bus_num));
     memory_region_init_io(&s->csrmem, OBJECT(s), &vtd_mem_ops, s,
                           "intel_iommu", DMAR_REG_SIZE);