diff mbox

[PULL,2/2] vhost: replace rcu with mutex

Message ID 1401744482-17764-3-git-send-email-mst@redhat.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Michael S. Tsirkin June 2, 2014, 9:30 p.m. UTC
All memory accesses are done under some VQ mutex.
So lock/unlock all VQs is a faster equivalent of synchronize_rcu()
for memory access changes.
Some guests cause a lot of these changes, so it's helpful
to make them faster.

Reported-by: "Gonglei (Arei)" <arei.gonglei@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/vhost/vhost.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Eric Dumazet June 2, 2014, 9:58 p.m. UTC | #1
On Tue, 2014-06-03 at 00:30 +0300, Michael S. Tsirkin wrote:
> All memory accesses are done under some VQ mutex.
> So lock/unlock all VQs is a faster equivalent of synchronize_rcu()
> for memory access changes.
> Some guests cause a lot of these changes, so it's helpful
> to make them faster.
> 
> Reported-by: "Gonglei (Arei)" <arei.gonglei@huawei.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  drivers/vhost/vhost.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 78987e4..1c05e60 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -593,6 +593,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
>  {
>  	struct vhost_memory mem, *newmem, *oldmem;
>  	unsigned long size = offsetof(struct vhost_memory, regions);
> +	int i;
>  
>  	if (copy_from_user(&mem, m, size))
>  		return -EFAULT;
> @@ -619,7 +620,14 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
>  	oldmem = rcu_dereference_protected(d->memory,
>  					   lockdep_is_held(&d->mutex));
>  	rcu_assign_pointer(d->memory, newmem);
> -	synchronize_rcu();
> +
> +	/* All memory accesses are done under some VQ mutex.
> +	 * So below is a faster equivalent of synchronize_rcu()
> +	 */
> +	for (i = 0; i < d->nvqs; ++i) {
> +		mutex_lock(&d->vqs[i]->mutex);
> +		mutex_unlock(&d->vqs[i]->mutex);
> +	}
>  	kfree(oldmem);
>  	return 0;
>  }

This looks dubious

What about using kfree_rcu() instead ?

translate_desc() still uses rcu_read_lock(), its not clear if the mutex
is really held.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini June 3, 2014, 12:48 p.m. UTC | #2
Il 02/06/2014 23:58, Eric Dumazet ha scritto:
> This looks dubious
>
> What about using kfree_rcu() instead ?

It would lead to unbound allocation from userspace.

> translate_desc() still uses rcu_read_lock(), its not clear if the mutex
> is really held.

Yes, vhost_get_vq_desc must be called with the vq mutex held.

The rcu_read_lock/unlock in translate_desc is unnecessary.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vladislav Yasevich June 3, 2014, 1:35 p.m. UTC | #3
On 06/03/2014 08:48 AM, Paolo Bonzini wrote:
> Il 02/06/2014 23:58, Eric Dumazet ha scritto:
>> This looks dubious
>>
>> What about using kfree_rcu() instead ?
> 
> It would lead to unbound allocation from userspace.
> 
>> translate_desc() still uses rcu_read_lock(), its not clear if the mutex
>> is really held.
> 
> Yes, vhost_get_vq_desc must be called with the vq mutex held.
> 
> The rcu_read_lock/unlock in translate_desc is unnecessary.
> 

If that's true, then does dev->memory really needs to be rcu protected?
It appears to always be read under mutex.

-vlad

> Paolo
> -- 
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini June 3, 2014, 1:55 p.m. UTC | #4
Il 03/06/2014 15:35, Vlad Yasevich ha scritto:
> > Yes, vhost_get_vq_desc must be called with the vq mutex held.
> >
> > The rcu_read_lock/unlock in translate_desc is unnecessary.
>
> If that's true, then does dev->memory really needs to be rcu protected?
> It appears to always be read under mutex.

It's always read under one of many mutexes, yes.

However, it's still RCU-like in the sense that you separate the removal 
and reclamation phases so you still need rcu_dereference/rcu_assign_pointer.

With this mechanism, readers do not contend the mutexes with the 
VHOST_SET_MEMORY ioctl, except for the very short lock-and-unlock 
sequence at the end of it.  They also never contend the mutexes between 
themselves (which would be the case if VHOST_SET_MEMORY locked all the 
mutexes).

You could also wrap all virtqueue processing with a rwsem and take the 
rwsem for write in VHOST_SET_MEMORY.  That simplifies some things however:

- unnecessarily complicates the code for all users of vhost_get_vq_desc

- suppose the reader-writer lock is fair, and VHOST_SET_MEMORY places a 
writer in the queue.  Then a long-running reader R1 could still block 
another reader R2, because the writer would be served before R2.


The RCU-like approach avoids all this, which is important because of the 
generally simpler code and because VHOST_SET_MEMORY is the only vhost 
ioctl that can happen in the hot path.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet June 3, 2014, 1:57 p.m. UTC | #5
On Tue, 2014-06-03 at 14:48 +0200, Paolo Bonzini wrote:
> Il 02/06/2014 23:58, Eric Dumazet ha scritto:
> > This looks dubious
> >
> > What about using kfree_rcu() instead ?
> 
> It would lead to unbound allocation from userspace.

Look at how we did this in commit
c3059477fce2d956a0bb3e04357324780c5d8eeb

> 
> > translate_desc() still uses rcu_read_lock(), its not clear if the mutex
> > is really held.
> 
> Yes, vhost_get_vq_desc must be called with the vq mutex held.
> 
> The rcu_read_lock/unlock in translate_desc is unnecessary.

Yep, this is what I pointed out. This is not only necessary, but
confusing and might be incorrectly copy/pasted in the future.

This patch is a partial one and leaves confusion.

Some places uses the proper 

mp = rcu_dereference_protected(dev->memory,
		 lockdep_is_held(&dev->mutex));

others use the now incorrect :

rcu_read_lock();
mp = rcu_dereference(dev->memory);
...



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini June 3, 2014, 2:20 p.m. UTC | #6
Il 03/06/2014 15:57, Eric Dumazet ha scritto:
> On Tue, 2014-06-03 at 14:48 +0200, Paolo Bonzini wrote:
>> Il 02/06/2014 23:58, Eric Dumazet ha scritto:
>>> This looks dubious
>>>
>>> What about using kfree_rcu() instead ?
>>
>> It would lead to unbound allocation from userspace.
>
> Look at how we did this in commit
> c3059477fce2d956a0bb3e04357324780c5d8eeb

That would make VHOST_SET_MEMORY as slow as before (even though once 
every few times).

>>> translate_desc() still uses rcu_read_lock(), its not clear if the mutex
>>> is really held.
>>
>> Yes, vhost_get_vq_desc must be called with the vq mutex held.
>>
>> The rcu_read_lock/unlock in translate_desc is unnecessary.
>
> Yep, this is what I pointed out. This is not only necessary, but
> confusing and might be incorrectly copy/pasted in the future.
>
> This patch is a partial one and leaves confusion.

I agree.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin June 4, 2014, 6:12 p.m. UTC | #7
On Mon, Jun 02, 2014 at 02:58:00PM -0700, Eric Dumazet wrote:
> On Tue, 2014-06-03 at 00:30 +0300, Michael S. Tsirkin wrote:
> > All memory accesses are done under some VQ mutex.
> > So lock/unlock all VQs is a faster equivalent of synchronize_rcu()
> > for memory access changes.
> > Some guests cause a lot of these changes, so it's helpful
> > to make them faster.
> > 
> > Reported-by: "Gonglei (Arei)" <arei.gonglei@huawei.com>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  drivers/vhost/vhost.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index 78987e4..1c05e60 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -593,6 +593,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
> >  {
> >  	struct vhost_memory mem, *newmem, *oldmem;
> >  	unsigned long size = offsetof(struct vhost_memory, regions);
> > +	int i;
> >  
> >  	if (copy_from_user(&mem, m, size))
> >  		return -EFAULT;
> > @@ -619,7 +620,14 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
> >  	oldmem = rcu_dereference_protected(d->memory,
> >  					   lockdep_is_held(&d->mutex));
> >  	rcu_assign_pointer(d->memory, newmem);
> > -	synchronize_rcu();
> > +
> > +	/* All memory accesses are done under some VQ mutex.
> > +	 * So below is a faster equivalent of synchronize_rcu()
> > +	 */
> > +	for (i = 0; i < d->nvqs; ++i) {
> > +		mutex_lock(&d->vqs[i]->mutex);
> > +		mutex_unlock(&d->vqs[i]->mutex);
> > +	}
> >  	kfree(oldmem);
> >  	return 0;
> >  }
> 
> This looks dubious
> 
> What about using kfree_rcu() instead ?

Unfortunately userspace relies on the fact that no one
uses the old mappings by the time ioctl returns.
The issue isn't freeing the memory.

> translate_desc() still uses rcu_read_lock(), its not clear if the mutex
> is really held.
> 

Thanks, good point, we can drop that rcu_read_lock now, but I think this could be a
patch on top.
Michael S. Tsirkin June 4, 2014, 7:51 p.m. UTC | #8
On Tue, Jun 03, 2014 at 06:57:43AM -0700, Eric Dumazet wrote:
> On Tue, 2014-06-03 at 14:48 +0200, Paolo Bonzini wrote:
> > Il 02/06/2014 23:58, Eric Dumazet ha scritto:
> > > This looks dubious
> > >
> > > What about using kfree_rcu() instead ?
> > 
> > It would lead to unbound allocation from userspace.
> 
> Look at how we did this in commit
> c3059477fce2d956a0bb3e04357324780c5d8eeb
> 
> > 
> > > translate_desc() still uses rcu_read_lock(), its not clear if the mutex
> > > is really held.
> > 
> > Yes, vhost_get_vq_desc must be called with the vq mutex held.
> > 
> > The rcu_read_lock/unlock in translate_desc is unnecessary.
> 
> Yep, this is what I pointed out. This is not only necessary, but
> confusing and might be incorrectly copy/pasted in the future.
> 
> This patch is a partial one and leaves confusion.
> 
> Some places uses the proper 
> 
> mp = rcu_dereference_protected(dev->memory,
> 		 lockdep_is_held(&dev->mutex));
> 
> others use the now incorrect :
> 
> rcu_read_lock();
> mp = rcu_dereference(dev->memory);
> ...
> 

I agree, working on a cleanup patch on top now.
Michael S. Tsirkin June 5, 2014, 10:45 a.m. UTC | #9
On Wed, Jun 04, 2014 at 10:51:12PM +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 03, 2014 at 06:57:43AM -0700, Eric Dumazet wrote:
> > On Tue, 2014-06-03 at 14:48 +0200, Paolo Bonzini wrote:
> > > Il 02/06/2014 23:58, Eric Dumazet ha scritto:
> > > > This looks dubious
> > > >
> > > > What about using kfree_rcu() instead ?
> > > 
> > > It would lead to unbound allocation from userspace.
> > 
> > Look at how we did this in commit
> > c3059477fce2d956a0bb3e04357324780c5d8eeb
> > 
> > > 
> > > > translate_desc() still uses rcu_read_lock(), its not clear if the mutex
> > > > is really held.
> > > 
> > > Yes, vhost_get_vq_desc must be called with the vq mutex held.
> > > 
> > > The rcu_read_lock/unlock in translate_desc is unnecessary.
> > 
> > Yep, this is what I pointed out. This is not only necessary, but
> > confusing and might be incorrectly copy/pasted in the future.
> > 
> > This patch is a partial one and leaves confusion.
> > 
> > Some places uses the proper 
> > 
> > mp = rcu_dereference_protected(dev->memory,
> > 		 lockdep_is_held(&dev->mutex));
> > 
> > others use the now incorrect :
> > 
> > rcu_read_lock();
> > mp = rcu_dereference(dev->memory);
> > ...
> > 
> 
> I agree, working on a cleanup patch on top now.

OK I just posted two cleanups as patches on top that address this.
Eric, could you please confirm that you are fine with
cleanups being patches on top?
Bisect will be fine since this hack is ugly but technically correct.

Thanks a lot for pointing out the issues!

> -- 
> MST
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 78987e4..1c05e60 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -593,6 +593,7 @@  static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
 {
 	struct vhost_memory mem, *newmem, *oldmem;
 	unsigned long size = offsetof(struct vhost_memory, regions);
+	int i;
 
 	if (copy_from_user(&mem, m, size))
 		return -EFAULT;
@@ -619,7 +620,14 @@  static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
 	oldmem = rcu_dereference_protected(d->memory,
 					   lockdep_is_held(&d->mutex));
 	rcu_assign_pointer(d->memory, newmem);
-	synchronize_rcu();
+
+	/* All memory accesses are done under some VQ mutex.
+	 * So below is a faster equivalent of synchronize_rcu()
+	 */
+	for (i = 0; i < d->nvqs; ++i) {
+		mutex_lock(&d->vqs[i]->mutex);
+		mutex_unlock(&d->vqs[i]->mutex);
+	}
 	kfree(oldmem);
 	return 0;
 }