diff mbox

[08/10] tcg: add memory barriers in page_find_alloc accesses

Message ID 1439397664-70734-9-git-send-email-pbonzini@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini Aug. 12, 2015, 4:41 p.m. UTC
page_find is reading the radix tree outside all locks, so it has to
use the RCU primitives.  It does not need RCU critical sections
because the PageDescs are never removed, so there is never a need
to wait for the end of code sections that use a PageDesc.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 translate-all.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Emilio Cota Aug. 12, 2015, 8:37 p.m. UTC | #1
On Wed, Aug 12, 2015 at 18:41:00 +0200, Paolo Bonzini wrote:
> page_find is reading the radix tree outside all locks, so it has to
> use the RCU primitives.  It does not need RCU critical sections
> because the PageDescs are never removed, so there is never a need
> to wait for the end of code sections that use a PageDesc.

Note that rcu_find_alloc might end up writing to the tree, see below.

BTW the fact that there are no removals makes the use of RCU unnecessary.

> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  translate-all.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/translate-all.c b/translate-all.c
> index 7727091..78a787d 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -437,14 +437,14 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc)
>  
>      /* Level 2..N-1.  */
>      for (i = V_L1_SHIFT / V_L2_BITS - 1; i > 0; i--) {
> -        void **p = *lp;
> +        void **p = atomic_rcu_read(lp);
>  
>          if (p == NULL) {
>              if (!alloc) {
>                  return NULL;
>              }
>              p = g_new0(void *, V_L2_SIZE);
> -            *lp = p;
> +            atomic_rcu_set(lp, p);
>          }
>  
>          lp = p + ((index >> (i * V_L2_BITS)) & (V_L2_SIZE - 1));
> @@ -456,7 +456,7 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc)
>              return NULL;
>          }
>          pd = g_new0(PageDesc, V_L2_SIZE);
> -        *lp = pd;
> +        atomic_rcu_set(lp, pd);

rcu_set is not enough; a cmpxchg with a fail path would be needed here, since
if the find_alloc is called without any locks held (as described in the commit message)
several threads could concurrently write to the same node, corrupting the tree.

I argue however that it is better to call page_find/_alloc with a mutex held,
since otherwise we'd have to add per-PageDesc locks (it's very common to
call page_find and then update the PageDesc). I have an RFC patchset for multithreaded tcg
that deals with this, will submit once I bring it up to date with upstream.

		Emilio
Paolo Bonzini Aug. 13, 2015, 8:13 a.m. UTC | #2
On 12/08/2015 22:37, Emilio G. Cota wrote:
> > page_find is reading the radix tree outside all locks, so it has to
> > use the RCU primitives.  It does not need RCU critical sections
> > because the PageDescs are never removed, so there is never a need
> > to wait for the end of code sections that use a PageDesc.
>
> Note that rcu_find_alloc might end up writing to the tree, see below.

Yes, but in that case it's always called with the mmap_lock held, see
patch 7.

page_find_alloc is only called by tb_alloc_page (called by tb_link_page
which takes mmap_lock), or by page_set_flags (called with mmap_lock held
by linux-user/mmap.c).

> BTW the fact that there are no removals makes the use of RCU unnecessary.

It only makes it not use the RCU synchronization primitives.  You still
need the memory barriers.

> I argue however that it is better to call page_find/_alloc with a mutex held,
> since otherwise we'd have to add per-PageDesc locks (it's very common to
> call page_find and then update the PageDesc). 

The fields are protected by either the mmap_lock (e.g. the flags, see
page_unprotect and tb_alloc_page) or the tb_lock (e.g. the tb lists).

The code is complicated and could definitely use more documentation,
especially for struct PageDesc, but it seems correct to me apart from
the lock inversion fixed in patch 10.

Paolo
Emilio Cota Aug. 13, 2015, 7:50 p.m. UTC | #3
On Thu, Aug 13, 2015 at 10:13:32 +0200, Paolo Bonzini wrote:
> On 12/08/2015 22:37, Emilio G. Cota wrote:
> > > page_find is reading the radix tree outside all locks, so it has to
> > > use the RCU primitives.  It does not need RCU critical sections
> > > because the PageDescs are never removed, so there is never a need
> > > to wait for the end of code sections that use a PageDesc.
> >
> > Note that rcu_find_alloc might end up writing to the tree, see below.
> 
> Yes, but in that case it's always called with the mmap_lock held, see
> patch 7.

Oh I see. Didn't have much time to take a deep look at the patchset; the
commit message got me confused.

> page_find_alloc is only called by tb_alloc_page (called by tb_link_page
> which takes mmap_lock), or by page_set_flags (called with mmap_lock held
> by linux-user/mmap.c).
> 
> > BTW the fact that there are no removals makes the use of RCU unnecessary.
> 
> It only makes it not use the RCU synchronization primitives.  You still
> need the memory barriers.

Yes. Personally I find it confusing to use the RCU macros just
for the convenience that they bring in barriers we need; I'd prefer to
explicitly have the barriers in the code.

> > I argue however that it is better to call page_find/_alloc with a mutex held,
> > since otherwise we'd have to add per-PageDesc locks (it's very common to
> > call page_find and then update the PageDesc). 
> 
> The fields are protected by either the mmap_lock (e.g. the flags, see
> page_unprotect and tb_alloc_page) or the tb_lock (e.g. the tb lists).
> 
> The code is complicated and could definitely use more documentation,
> especially for struct PageDesc, but it seems correct to me apart from
> the lock inversion fixed in patch 10.

OK. I have a bit of time today and tomorrow so I'll rebase my work on
top of this patchset and then submit it.

Thanks,

		Emilio
Peter Maydell Aug. 28, 2015, 3:40 p.m. UTC | #4
On 12 August 2015 at 17:41, Paolo Bonzini <pbonzini@redhat.com> wrote:
> page_find is reading the radix tree outside all locks, so it has to
> use the RCU primitives.  It does not need RCU critical sections
> because the PageDescs are never removed, so there is never a need
> to wait for the end of code sections that use a PageDesc.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  translate-all.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/translate-all.c b/translate-all.c
> index 7727091..78a787d 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -437,14 +437,14 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc)
>
>      /* Level 2..N-1.  */
>      for (i = V_L1_SHIFT / V_L2_BITS - 1; i > 0; i--) {
> -        void **p = *lp;
> +        void **p = atomic_rcu_read(lp);
>
>          if (p == NULL) {
>              if (!alloc) {
>                  return NULL;
>              }
>              p = g_new0(void *, V_L2_SIZE);
> -            *lp = p;
> +            atomic_rcu_set(lp, p);
>          }
>
>          lp = p + ((index >> (i * V_L2_BITS)) & (V_L2_SIZE - 1));
> @@ -456,7 +456,7 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc)
>              return NULL;
>          }
>          pd = g_new0(PageDesc, V_L2_SIZE);
> -        *lp = pd;
> +        atomic_rcu_set(lp, pd);
>      }
>
>      return pd + (index & (V_L2_SIZE - 1));

Don't we also need to use an atomic_rcu_read() for the load
    pd = *lp;
(which is between hunk 1 and 2 in this patch) ?

thanks
-- PMM
Paolo Bonzini Aug. 29, 2015, 6:58 a.m. UTC | #5
On 28/08/2015 17:40, Peter Maydell wrote:
> Don't we also need to use an atomic_rcu_read() for the load
>     pd = *lp;
> (which is between hunk 1 and 2 in this patch) ?

Yes.

Paolo
diff mbox

Patch

diff --git a/translate-all.c b/translate-all.c
index 7727091..78a787d 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -437,14 +437,14 @@  static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc)
 
     /* Level 2..N-1.  */
     for (i = V_L1_SHIFT / V_L2_BITS - 1; i > 0; i--) {
-        void **p = *lp;
+        void **p = atomic_rcu_read(lp);
 
         if (p == NULL) {
             if (!alloc) {
                 return NULL;
             }
             p = g_new0(void *, V_L2_SIZE);
-            *lp = p;
+            atomic_rcu_set(lp, p);
         }
 
         lp = p + ((index >> (i * V_L2_BITS)) & (V_L2_SIZE - 1));
@@ -456,7 +456,7 @@  static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc)
             return NULL;
         }
         pd = g_new0(PageDesc, V_L2_SIZE);
-        *lp = pd;
+        atomic_rcu_set(lp, pd);
     }
 
     return pd + (index & (V_L2_SIZE - 1));