diff mbox

[VHOST] fix race with guest on multi-buffer used buffer updates

Message ID 1274374686.8492.12.camel@w-dls.beaverton.ibm.com
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

David Stevens May 20, 2010, 4:58 p.m. UTC
[for Michael Tsirkin's vhost development git tree]

This patch fixes a race between guest and host when
adding used buffers wraps the ring. Without it, guests
can see partial packets before num_buffers is set in
the vnet header.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Michael S. Tsirkin May 24, 2010, 10:17 a.m. UTC | #1
On Thu, May 20, 2010 at 09:58:06AM -0700, David L Stevens wrote:
> [for Michael Tsirkin's vhost development git tree]
> 
> This patch fixes a race between guest and host when
> adding used buffers wraps the ring. Without it, guests
> can see partial packets before num_buffers is set in
> the vnet header.
> 
> Signed-off-by: David L Stevens <dlstevens@us.ibm.com>

Could you please explain what the race is?

> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 7f2568d..74790ab 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1065,14 +1065,6 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
>  		vq_err(vq, "Failed to write used");
>  		return -EFAULT;
>  	}
> -	/* Make sure buffer is written before we update index. */
> -	smp_wmb();
> -	if (put_user(vq->last_used_idx + count, &vq->used->idx)) {
> -		vq_err(vq, "Failed to increment used idx");
> -		return -EFAULT;
> -	}
> -	if (unlikely(vq->log_used))
> -		vhost_log_used(vq, used);
>  	vq->last_used_idx += count;
>  	return 0;
>  }
> @@ -1093,7 +1085,17 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
>  		heads += n;
>  		count -= n;
>  	}
> -	return __vhost_add_used_n(vq, heads, count);
> +	r = __vhost_add_used_n(vq, heads, count);
> +
> +	/* Make sure buffer is written before we update index. */
> +	smp_wmb();
> +	if (put_user(vq->last_used_idx, &vq->used->idx)) {
> +		vq_err(vq, "Failed to increment used idx");
> +		return -EFAULT;
> +	}
> +	if (unlikely(vq->log_used))
> +		vhost_log_used(vq, vq->used->ring + start);
> +	return r;
>  }
>  
>  /* This actually signals the guest, using eventfd. */
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Stevens May 24, 2010, 3:52 p.m. UTC | #2
"Michael S. Tsirkin" <mst@redhat.com> wrote on 05/24/2010 03:17:10 AM:

> On Thu, May 20, 2010 at 09:58:06AM -0700, David L Stevens wrote:
> > [for Michael Tsirkin's vhost development git tree]
> > 
> > This patch fixes a race between guest and host when
> > adding used buffers wraps the ring. Without it, guests
> > can see partial packets before num_buffers is set in
> > the vnet header.
> > 
> > Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
> 
> Could you please explain what the race is?

        Sure. The pre-patch code in the ring-wrap case
does this:

        add part1 bufs
        update used index
        add part2 bufs
        update used index

        After we update the used index for part1, the part1
buffers are available to the guest. If the guest is
consuming at that point, it can process the partial
packet before the rest of the packet is there. In that
case, num_buffers will be greater than the number of
buffers available to the guest and it'll drop the
packet with a framing error. I was seeing 2 or 3 framing
errors every 100 million packets or so pre-patch, none
post-patch.
        Actually, the second sentence is incorrect in the
original description-- num_buffers is up to date when
the guest sees it, but the used index is not.

                                                +-DLS


> 
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index 7f2568d..74790ab 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -1065,14 +1065,6 @@ static int __vhost_add_used_n(struct 
vhost_virtqueue *vq,
> >        vq_err(vq, "Failed to write used");
> >        return -EFAULT;
> >     }
> > -   /* Make sure buffer is written before we update index. */
> > -   smp_wmb();
> > -   if (put_user(vq->last_used_idx + count, &vq->used->idx)) {
> > -      vq_err(vq, "Failed to increment used idx");
> > -      return -EFAULT;
> > -   }
> > -   if (unlikely(vq->log_used))
> > -      vhost_log_used(vq, used);
> >     vq->last_used_idx += count;
> >     return 0;
> >  }
> > @@ -1093,7 +1085,17 @@ int vhost_add_used_n(struct vhost_virtqueue 
*vq, 
> struct vring_used_elem *heads,
> >        heads += n;
> >        count -= n;
> >     }
> > -   return __vhost_add_used_n(vq, heads, count);
> > +   r = __vhost_add_used_n(vq, heads, count);
> > +
> > +   /* Make sure buffer is written before we update index. */
> > +   smp_wmb();
> > +   if (put_user(vq->last_used_idx, &vq->used->idx)) {
> > +      vq_err(vq, "Failed to increment used idx");
> > +      return -EFAULT;
> > +   }
> > +   if (unlikely(vq->log_used))
> > +      vhost_log_used(vq, vq->used->ring + start);
> > +   return r;
> >  }
> > 
> >  /* This actually signals the guest, using eventfd. */
> > 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin May 24, 2010, 4:13 p.m. UTC | #3
On Mon, May 24, 2010 at 08:52:40AM -0700, David Stevens wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 05/24/2010 03:17:10 AM:
> 
> > On Thu, May 20, 2010 at 09:58:06AM -0700, David L Stevens wrote:
> > > [for Michael Tsirkin's vhost development git tree]
> > > 
> > > This patch fixes a race between guest and host when
> > > adding used buffers wraps the ring. Without it, guests
> > > can see partial packets before num_buffers is set in
> > > the vnet header.
> > > 
> > > Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
> > 
> > Could you please explain what the race is?
> 
>         Sure. The pre-patch code in the ring-wrap case
> does this:
> 
>         add part1 bufs
>         update used index
>         add part2 bufs
>         update used index
> 
>         After we update the used index for part1, the part1
> buffers are available to the guest. If the guest is
> consuming at that point, it can process the partial
> packet before the rest of the packet is there. In that
> case, num_buffers will be greater than the number of
> buffers available to the guest and it'll drop the
> packet with a framing error. I was seeing 2 or 3 framing
> errors every 100 million packets or so pre-patch, none
> post-patch.
>         Actually, the second sentence is incorrect in the
> original description-- num_buffers is up to date when
> the guest sees it, but the used index is not.
> 
>                                                 +-DLS

so this happens always - what does wrap-around refer to?

> 
> > 
> > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > index 7f2568d..74790ab 100644
> > > --- a/drivers/vhost/vhost.c
> > > +++ b/drivers/vhost/vhost.c
> > > @@ -1065,14 +1065,6 @@ static int __vhost_add_used_n(struct 
> vhost_virtqueue *vq,
> > >        vq_err(vq, "Failed to write used");
> > >        return -EFAULT;
> > >     }
> > > -   /* Make sure buffer is written before we update index. */
> > > -   smp_wmb();
> > > -   if (put_user(vq->last_used_idx + count, &vq->used->idx)) {
> > > -      vq_err(vq, "Failed to increment used idx");
> > > -      return -EFAULT;
> > > -   }
> > > -   if (unlikely(vq->log_used))
> > > -      vhost_log_used(vq, used);
> > >     vq->last_used_idx += count;
> > >     return 0;
> > >  }
> > > @@ -1093,7 +1085,17 @@ int vhost_add_used_n(struct vhost_virtqueue 
> *vq, 
> > struct vring_used_elem *heads,
> > >        heads += n;
> > >        count -= n;
> > >     }
> > > -   return __vhost_add_used_n(vq, heads, count);
> > > +   r = __vhost_add_used_n(vq, heads, count);
> > > +
> > > +   /* Make sure buffer is written before we update index. */
> > > +   smp_wmb();
> > > +   if (put_user(vq->last_used_idx, &vq->used->idx)) {
> > > +      vq_err(vq, "Failed to increment used idx");
> > > +      return -EFAULT;
> > > +   }
> > > +   if (unlikely(vq->log_used))
> > > +      vhost_log_used(vq, vq->used->ring + start);
> > > +   return r;
> > >  }


I think a single vhost_log_used will not DTRT here:
it only updates log for a single entry.
So we'll need to split this to functions that
1. log used entries writes: called from __vhost_add_used_n
2. log used index write: called from vhost_add_used_n

> > > 
> > >  /* This actually signals the guest, using eventfd. */
> > > 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Stevens May 24, 2010, 4:27 p.m. UTC | #4
netdev-owner@vger.kernel.org wrote on 05/24/2010 09:13:51 AM:

> On Mon, May 24, 2010 at 08:52:40AM -0700, David Stevens wrote:
> > "Michael S. Tsirkin" <mst@redhat.com> wrote on 05/24/2010 03:17:10 AM:
> > 
> > > On Thu, May 20, 2010 at 09:58:06AM -0700, David L Stevens wrote:
> > > > [for Michael Tsirkin's vhost development git tree]
> > > > 
> > > > This patch fixes a race between guest and host when
> > > > adding used buffers wraps the ring. Without it, guests
> > > > can see partial packets before num_buffers is set in
> > > > the vnet header.
> > > > 
> > > > Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
> > > 
> > > Could you please explain what the race is?
> > 
> >         Sure. The pre-patch code in the ring-wrap case
> > does this:
> > 
> >         add part1 bufs
> >         update used index
> >         add part2 bufs
> >         update used index
> > 
> >         After we update the used index for part1, the part1
> > buffers are available to the guest. If the guest is
> > consuming at that point, it can process the partial
> > packet before the rest of the packet is there. In that
> > case, num_buffers will be greater than the number of
> > buffers available to the guest and it'll drop the
> > packet with a framing error. I was seeing 2 or 3 framing
> > errors every 100 million packets or so pre-patch, none
> > post-patch.
> >         Actually, the second sentence is incorrect in the
> > original description-- num_buffers is up to date when
> > the guest sees it, but the used index is not.
> > 
> >                                                 +-DLS
> 
> so this happens always - what does wrap-around refer to?

        The 2-part update only happens when a packet spans
the end/beginning of the vring (the wrap). The framing error only
happens if the guest sees the vring-wrapping packets
before the second used-index write (the race).
        So, the framing error doesn't happen always--it's
pretty rare. But with the patch, it never happens.

                                                        +-DLS

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin May 24, 2010, 4:42 p.m. UTC | #5
On Mon, May 24, 2010 at 09:27:15AM -0700, David Stevens wrote:
> netdev-owner@vger.kernel.org wrote on 05/24/2010 09:13:51 AM:
> 
> > On Mon, May 24, 2010 at 08:52:40AM -0700, David Stevens wrote:
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote on 05/24/2010 03:17:10 AM:
> > > 
> > > > On Thu, May 20, 2010 at 09:58:06AM -0700, David L Stevens wrote:
> > > > > [for Michael Tsirkin's vhost development git tree]
> > > > > 
> > > > > This patch fixes a race between guest and host when
> > > > > adding used buffers wraps the ring. Without it, guests
> > > > > can see partial packets before num_buffers is set in
> > > > > the vnet header.
> > > > > 
> > > > > Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
> > > > 
> > > > Could you please explain what the race is?
> > > 
> > >         Sure. The pre-patch code in the ring-wrap case
> > > does this:
> > > 
> > >         add part1 bufs
> > >         update used index
> > >         add part2 bufs
> > >         update used index
> > > 
> > >         After we update the used index for part1, the part1
> > > buffers are available to the guest. If the guest is
> > > consuming at that point, it can process the partial
> > > packet before the rest of the packet is there. In that
> > > case, num_buffers will be greater than the number of
> > > buffers available to the guest and it'll drop the
> > > packet with a framing error. I was seeing 2 or 3 framing
> > > errors every 100 million packets or so pre-patch, none
> > > post-patch.
> > >         Actually, the second sentence is incorrect in the
> > > original description-- num_buffers is up to date when
> > > the guest sees it, but the used index is not.
> > > 
> > >                                                 +-DLS
> > 
> > so this happens always - what does wrap-around refer to?
> 
>         The 2-part update only happens when a packet spans
> the end/beginning of the vring (the wrap). The framing error only
> happens if the guest sees the vring-wrapping packets
> before the second used-index write (the race).
>         So, the framing error doesn't happen always--it's
> pretty rare. But with the patch, it never happens.
> 
>                                                         +-DLS

I see. The logging is still bugg though I think.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Stevens May 24, 2010, 5:50 p.m. UTC | #6
Michael S. Tsirkin <mst@redhat.com> wrote on 05/24/2010 09:42:05 AM:
> 
> I see. The logging is still bugg though I think.

Possibly; migration isn't working for me under load even
without mergeable buffers (investigating), so I haven't
yet been able to test wrap w/ logging, but did you see
something specific that's wrong?

                                                +-DLS

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin May 24, 2010, 5:52 p.m. UTC | #7
On Mon, May 24, 2010 at 10:50:44AM -0700, David Stevens wrote:
> Michael S. Tsirkin <mst@redhat.com> wrote on 05/24/2010 09:42:05 AM:
> > 
> > I see. The logging is still bugg though I think.
> 
> Possibly; migration isn't working for me under load even
> without mergeable buffers (investigating), so I haven't
> yet been able to test wrap w/ logging, but did you see
> something specific that's wrong?
> 
>                                                 +-DLS

Yes, code only logs a single entry as dirty even if
multiple entries have been written.
diff mbox

Patch

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 7f2568d..74790ab 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1065,14 +1065,6 @@  static int __vhost_add_used_n(struct vhost_virtqueue *vq,
 		vq_err(vq, "Failed to write used");
 		return -EFAULT;
 	}
-	/* Make sure buffer is written before we update index. */
-	smp_wmb();
-	if (put_user(vq->last_used_idx + count, &vq->used->idx)) {
-		vq_err(vq, "Failed to increment used idx");
-		return -EFAULT;
-	}
-	if (unlikely(vq->log_used))
-		vhost_log_used(vq, used);
 	vq->last_used_idx += count;
 	return 0;
 }
@@ -1093,7 +1085,17 @@  int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
 		heads += n;
 		count -= n;
 	}
-	return __vhost_add_used_n(vq, heads, count);
+	r = __vhost_add_used_n(vq, heads, count);
+
+	/* Make sure buffer is written before we update index. */
+	smp_wmb();
+	if (put_user(vq->last_used_idx, &vq->used->idx)) {
+		vq_err(vq, "Failed to increment used idx");
+		return -EFAULT;
+	}
+	if (unlikely(vq->log_used))
+		vhost_log_used(vq, vq->used->ring + start);
+	return r;
 }
 
 /* This actually signals the guest, using eventfd. */