Message ID | 1401744482-17764-3-git-send-email-mst@redhat.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, 2014-06-03 at 00:30 +0300, Michael S. Tsirkin wrote: > All memory accesses are done under some VQ mutex. > So lock/unlock all VQs is a faster equivalent of synchronize_rcu() > for memory access changes. > Some guests cause a lot of these changes, so it's helpful > to make them faster. > > Reported-by: "Gonglei (Arei)" <arei.gonglei@huawei.com> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > --- > drivers/vhost/vhost.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index 78987e4..1c05e60 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -593,6 +593,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m) > { > struct vhost_memory mem, *newmem, *oldmem; > unsigned long size = offsetof(struct vhost_memory, regions); > + int i; > > if (copy_from_user(&mem, m, size)) > return -EFAULT; > @@ -619,7 +620,14 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m) > oldmem = rcu_dereference_protected(d->memory, > lockdep_is_held(&d->mutex)); > rcu_assign_pointer(d->memory, newmem); > - synchronize_rcu(); > + > + /* All memory accesses are done under some VQ mutex. > + * So below is a faster equivalent of synchronize_rcu() > + */ > + for (i = 0; i < d->nvqs; ++i) { > + mutex_lock(&d->vqs[i]->mutex); > + mutex_unlock(&d->vqs[i]->mutex); > + } > kfree(oldmem); > return 0; > } This looks dubious What about using kfree_rcu() instead ? translate_desc() still uses rcu_read_lock(), its not clear if the mutex is really held. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 02/06/2014 23:58, Eric Dumazet ha scritto: > This looks dubious > > What about using kfree_rcu() instead ? It would lead to unbound allocation from userspace. > translate_desc() still uses rcu_read_lock(), its not clear if the mutex > is really held. Yes, vhost_get_vq_desc must be called with the vq mutex held. The rcu_read_lock/unlock in translate_desc is unnecessary. Paolo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/03/2014 08:48 AM, Paolo Bonzini wrote: > Il 02/06/2014 23:58, Eric Dumazet ha scritto: >> This looks dubious >> >> What about using kfree_rcu() instead ? > > It would lead to unbound allocation from userspace. > >> translate_desc() still uses rcu_read_lock(), its not clear if the mutex >> is really held. > > Yes, vhost_get_vq_desc must be called with the vq mutex held. > > The rcu_read_lock/unlock in translate_desc is unnecessary. > If that's true, then does dev->memory really needs to be rcu protected? It appears to always be read under mutex. -vlad > Paolo > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 03/06/2014 15:35, Vlad Yasevich ha scritto: > > Yes, vhost_get_vq_desc must be called with the vq mutex held. > > > > The rcu_read_lock/unlock in translate_desc is unnecessary. > > If that's true, then does dev->memory really needs to be rcu protected? > It appears to always be read under mutex. It's always read under one of many mutexes, yes. However, it's still RCU-like in the sense that you separate the removal and reclamation phases so you still need rcu_dereference/rcu_assign_pointer. With this mechanism, readers do not contend the mutexes with the VHOST_SET_MEMORY ioctl, except for the very short lock-and-unlock sequence at the end of it. They also never contend the mutexes between themselves (which would be the case if VHOST_SET_MEMORY locked all the mutexes). You could also wrap all virtqueue processing with a rwsem and take the rwsem for write in VHOST_SET_MEMORY. That simplifies some things however: - unnecessarily complicates the code for all users of vhost_get_vq_desc - suppose the reader-writer lock is fair, and VHOST_SET_MEMORY places a writer in the queue. Then a long-running reader R1 could still block another reader R2, because the writer would be served before R2. The RCU-like approach avoids all this, which is important because of the generally simpler code and because VHOST_SET_MEMORY is the only vhost ioctl that can happen in the hot path. Paolo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2014-06-03 at 14:48 +0200, Paolo Bonzini wrote: > Il 02/06/2014 23:58, Eric Dumazet ha scritto: > > This looks dubious > > > > What about using kfree_rcu() instead ? > > It would lead to unbound allocation from userspace. Look at how we did this in commit c3059477fce2d956a0bb3e04357324780c5d8eeb > > > translate_desc() still uses rcu_read_lock(), its not clear if the mutex > > is really held. > > Yes, vhost_get_vq_desc must be called with the vq mutex held. > > The rcu_read_lock/unlock in translate_desc is unnecessary. Yep, this is what I pointed out. This is not only necessary, but confusing and might be incorrectly copy/pasted in the future. This patch is a partial one and leaves confusion. Some places uses the proper mp = rcu_dereference_protected(dev->memory, lockdep_is_held(&dev->mutex)); others use the now incorrect : rcu_read_lock(); mp = rcu_dereference(dev->memory); ... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 03/06/2014 15:57, Eric Dumazet ha scritto: > On Tue, 2014-06-03 at 14:48 +0200, Paolo Bonzini wrote: >> Il 02/06/2014 23:58, Eric Dumazet ha scritto: >>> This looks dubious >>> >>> What about using kfree_rcu() instead ? >> >> It would lead to unbound allocation from userspace. > > Look at how we did this in commit > c3059477fce2d956a0bb3e04357324780c5d8eeb That would make VHOST_SET_MEMORY as slow as before (even though once every few times). >>> translate_desc() still uses rcu_read_lock(), its not clear if the mutex >>> is really held. >> >> Yes, vhost_get_vq_desc must be called with the vq mutex held. >> >> The rcu_read_lock/unlock in translate_desc is unnecessary. > > Yep, this is what I pointed out. This is not only necessary, but > confusing and might be incorrectly copy/pasted in the future. > > This patch is a partial one and leaves confusion. I agree. Paolo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 02, 2014 at 02:58:00PM -0700, Eric Dumazet wrote: > On Tue, 2014-06-03 at 00:30 +0300, Michael S. Tsirkin wrote: > > All memory accesses are done under some VQ mutex. > > So lock/unlock all VQs is a faster equivalent of synchronize_rcu() > > for memory access changes. > > Some guests cause a lot of these changes, so it's helpful > > to make them faster. > > > > Reported-by: "Gonglei (Arei)" <arei.gonglei@huawei.com> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > --- > > drivers/vhost/vhost.c | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > index 78987e4..1c05e60 100644 > > --- a/drivers/vhost/vhost.c > > +++ b/drivers/vhost/vhost.c > > @@ -593,6 +593,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m) > > { > > struct vhost_memory mem, *newmem, *oldmem; > > unsigned long size = offsetof(struct vhost_memory, regions); > > + int i; > > > > if (copy_from_user(&mem, m, size)) > > return -EFAULT; > > @@ -619,7 +620,14 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m) > > oldmem = rcu_dereference_protected(d->memory, > > lockdep_is_held(&d->mutex)); > > rcu_assign_pointer(d->memory, newmem); > > - synchronize_rcu(); > > + > > + /* All memory accesses are done under some VQ mutex. > > + * So below is a faster equivalent of synchronize_rcu() > > + */ > > + for (i = 0; i < d->nvqs; ++i) { > > + mutex_lock(&d->vqs[i]->mutex); > > + mutex_unlock(&d->vqs[i]->mutex); > > + } > > kfree(oldmem); > > return 0; > > } > > This looks dubious > > What about using kfree_rcu() instead ? Unfortunately userspace relies on the fact that no one uses the old mappings by the time ioctl returns. The issue isn't freeing the memory. > translate_desc() still uses rcu_read_lock(), its not clear if the mutex > is really held. > Thanks, good point, we can drop that rcu_read_lock now, but I think this could be a patch on top.
On Tue, Jun 03, 2014 at 06:57:43AM -0700, Eric Dumazet wrote: > On Tue, 2014-06-03 at 14:48 +0200, Paolo Bonzini wrote: > > Il 02/06/2014 23:58, Eric Dumazet ha scritto: > > > This looks dubious > > > > > > What about using kfree_rcu() instead ? > > > > It would lead to unbound allocation from userspace. > > Look at how we did this in commit > c3059477fce2d956a0bb3e04357324780c5d8eeb > > > > > > translate_desc() still uses rcu_read_lock(), its not clear if the mutex > > > is really held. > > > > Yes, vhost_get_vq_desc must be called with the vq mutex held. > > > > The rcu_read_lock/unlock in translate_desc is unnecessary. > > Yep, this is what I pointed out. This is not only necessary, but > confusing and might be incorrectly copy/pasted in the future. > > This patch is a partial one and leaves confusion. > > Some places uses the proper > > mp = rcu_dereference_protected(dev->memory, > lockdep_is_held(&dev->mutex)); > > others use the now incorrect : > > rcu_read_lock(); > mp = rcu_dereference(dev->memory); > ... > I agree, working on a cleanup patch on top now.
On Wed, Jun 04, 2014 at 10:51:12PM +0300, Michael S. Tsirkin wrote: > On Tue, Jun 03, 2014 at 06:57:43AM -0700, Eric Dumazet wrote: > > On Tue, 2014-06-03 at 14:48 +0200, Paolo Bonzini wrote: > > > Il 02/06/2014 23:58, Eric Dumazet ha scritto: > > > > This looks dubious > > > > > > > > What about using kfree_rcu() instead ? > > > > > > It would lead to unbound allocation from userspace. > > > > Look at how we did this in commit > > c3059477fce2d956a0bb3e04357324780c5d8eeb > > > > > > > > > translate_desc() still uses rcu_read_lock(), its not clear if the mutex > > > > is really held. > > > > > > Yes, vhost_get_vq_desc must be called with the vq mutex held. > > > > > > The rcu_read_lock/unlock in translate_desc is unnecessary. > > > > Yep, this is what I pointed out. This is not only necessary, but > > confusing and might be incorrectly copy/pasted in the future. > > > > This patch is a partial one and leaves confusion. > > > > Some places uses the proper > > > > mp = rcu_dereference_protected(dev->memory, > > lockdep_is_held(&dev->mutex)); > > > > others use the now incorrect : > > > > rcu_read_lock(); > > mp = rcu_dereference(dev->memory); > > ... > > > > I agree, working on a cleanup patch on top now. OK I just posted two cleanups as patches on top that address this. Eric, could you please confirm that you are fine with cleanups being patches on top? Bisect will be fine since this hack is ugly but technically correct. Thanks a lot for pointing out the issues! > -- > MST -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 78987e4..1c05e60 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -593,6 +593,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m) { struct vhost_memory mem, *newmem, *oldmem; unsigned long size = offsetof(struct vhost_memory, regions); + int i; if (copy_from_user(&mem, m, size)) return -EFAULT; @@ -619,7 +620,14 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m) oldmem = rcu_dereference_protected(d->memory, lockdep_is_held(&d->mutex)); rcu_assign_pointer(d->memory, newmem); - synchronize_rcu(); + + /* All memory accesses are done under some VQ mutex. + * So below is a faster equivalent of synchronize_rcu() + */ + for (i = 0; i < d->nvqs; ++i) { + mutex_lock(&d->vqs[i]->mutex); + mutex_unlock(&d->vqs[i]->mutex); + } kfree(oldmem); return 0; }
All memory accesses are done under some VQ mutex. So lock/unlock all VQs is a faster equivalent of synchronize_rcu() for memory access changes. Some guests cause a lot of these changes, so it's helpful to make them faster. Reported-by: "Gonglei (Arei)" <arei.gonglei@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> --- drivers/vhost/vhost.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)