Message ID | 20101104122424.GA29830@redhat.com |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
"Michael S. Tsirkin" <mst@redhat.com> wrote on 11/04/2010 05:54:24 PM: > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > I thought about this some more. I think the original > code is actually correct in returning ENOSPC: indirect > buffers are nice, but it's a mistake > to rely on them as a memory allocation might fail. > > And if you look at virtio-net, it is dropping packets > under memory pressure which is not really a happy outcome: > the packet will get freed, reallocated and we get another one, > adding pressure on the allocator instead of releasing it > until we free up some buffers. > > So I now think we should calculate the capacity > assuming non-indirect entries, and if we manage to > use indirect, all the better. > > So below is what I propose now - as a replacement for > my original patch. Krishna Kumar, Rusty, what do you think? > > Separately I'm also considering moving the > if (vq->num_free < out + in) > check earlier in the function to keep all users honest, > but need to check what the implications are for e.g. block. > Thoughts on this? This looks like the right thing to do. Besides this, I think virtio-net still needs to remove check for ENOMEM? I will test this patch tomorrow. Another question about add_recvbuf_small and add_recvbuf_big - both call virtqueue_add_buf_gfp with in+out > 1, and that can fail with -ENOSPC. So try_fill_recv gets -ENOSPC. When that happens, oom is not set to true, I thought it should have got set. Is this a bug? Thanks, - KK -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Nov 04, 2010 at 09:47:04PM +0530, Krishna Kumar2 wrote: > "Michael S. Tsirkin" <mst@redhat.com> wrote on 11/04/2010 05:54:24 PM: > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > > I thought about this some more. I think the original > > code is actually correct in returning ENOSPC: indirect > > buffers are nice, but it's a mistake > > to rely on them as a memory allocation might fail. > > > > And if you look at virtio-net, it is dropping packets > > under memory pressure which is not really a happy outcome: > > the packet will get freed, reallocated and we get another one, > > adding pressure on the allocator instead of releasing it > > until we free up some buffers. > > > > So I now think we should calculate the capacity > > assuming non-indirect entries, and if we manage to > > use indirect, all the better. > > > > So below is what I propose now - as a replacement for > > my original patch. Krishna Kumar, Rusty, what do you think? > > > > Separately I'm also considering moving the > > if (vq->num_free < out + in) > > check earlier in the function to keep all users honest, > > but need to check what the implications are for e.g. block. > > Thoughts on this? > > This looks like the right thing to do. Besides this, I > think virtio-net still needs to remove check for ENOMEM? Yes, the only valid reason for failure would be a unexpected error. No need to special-case ENOMEM anymore. > I will test this patch tomorrow. > > Another question about add_recvbuf_small and > add_recvbuf_big - both call virtqueue_add_buf_gfp with > in+out > 1, and that can fail with -ENOSPC. So try_fill_recv > gets -ENOSPC. When that happens, oom is not set to true, > I thought it should have got set. Is this a bug? > > Thanks, > > - KK I don't see a bug: on ENOSPC we don't need to (and can't) add any more buffers, we know we will make progress since there must be some buffers in the ring already, ENOMEM makes us try again later with more buffers (and possibly more aggressive GFP flag). What's wrong?
On Thu, 4 Nov 2010 10:54:24 pm Michael S. Tsirkin wrote: > I thought about this some more. I think the original > code is actually correct in returning ENOSPC: indirect > buffers are nice, but it's a mistake > to rely on them as a memory allocation might fail. > > And if you look at virtio-net, it is dropping packets > under memory pressure which is not really a happy outcome: > the packet will get freed, reallocated and we get another one, > adding pressure on the allocator instead of releasing it > until we free up some buffers. > > So I now think we should calculate the capacity > assuming non-indirect entries, and if we manage to > use indirect, all the better. I've long said it's a weakness in the network stack that it insists drivers stop the tx queue before they *might* run out of room, leading to worst-case assumptions and underutilization of the tx ring. However, I lost that debate, and so your patch is the way it's supposed to work. The other main indirect user (block) doesn't care as its queue allows for post-attempt blocking. I enhanced your commentry a little: Subject: virtio: return correct capacity to users Date: Thu, 4 Nov 2010 14:24:24 +0200 From: "Michael S. Tsirkin" <mst@redhat.com> We can't rely on indirect buffers for capacity calculations because they need a memory allocation which might fail. In particular, virtio_net can get into this situation under stress, and it drops packets and performs badly. So return the number of buffers we can guarantee users. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Reported-By: Krishna Kumar2 <krkumar2@in.ibm.com> Thanks! Rusty. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Rusty Russell <rusty@rustcorp.com.au> wrote on 11/08/2010 04:38:47 AM: > Re: [PATCH] virtio_net: Fix queue full check > > On Thu, 4 Nov 2010 10:54:24 pm Michael S. Tsirkin wrote: > > I thought about this some more. I think the original > > code is actually correct in returning ENOSPC: indirect > > buffers are nice, but it's a mistake > > to rely on them as a memory allocation might fail. > > > > And if you look at virtio-net, it is dropping packets > > under memory pressure which is not really a happy outcome: > > the packet will get freed, reallocated and we get another one, > > adding pressure on the allocator instead of releasing it > > until we free up some buffers. > > > > So I now think we should calculate the capacity > > assuming non-indirect entries, and if we manage to > > use indirect, all the better. > > I've long said it's a weakness in the network stack that it insists > drivers stop the tx queue before they *might* run out of room, leading to > worst-case assumptions and underutilization of the tx ring. > > However, I lost that debate, and so your patch is the way it's supposed to > work. The other main indirect user (block) doesn't care as its queue > allows for post-attempt blocking. > > I enhanced your commentry a little: > > Subject: virtio: return correct capacity to users > Date: Thu, 4 Nov 2010 14:24:24 +0200 > From: "Michael S. Tsirkin" <mst@redhat.com> > > We can't rely on indirect buffers for capacity > calculations because they need a memory allocation > which might fail. In particular, virtio_net can get > into this situation under stress, and it drops packets > and performs badly. > > So return the number of buffers we can guarantee users. > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> > Reported-By: Krishna Kumar2 <krkumar2@in.ibm.com> I have tested this patch for 3-4 hours but so far I have not got the tx full error. I am not sure if "Tested-By" applies to this situation, but just in case: Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Reported-By: Krishna Kumar2 <krkumar2@in.ibm.com> Tested-By: Krishna Kumar2 <krkumar2@in.ibm.com> I think both this patch and the original patch I submitted are needed? That patch removes ENOMEM check and the increment of dev->stats.tx_fifo_errors, and reports "memory failure". Thanks, - KK -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 09, 2010 at 09:56:03AM +0530, Krishna Kumar2 wrote: > Rusty Russell <rusty@rustcorp.com.au> wrote on 11/08/2010 04:38:47 AM: > > > Re: [PATCH] virtio_net: Fix queue full check > > > > On Thu, 4 Nov 2010 10:54:24 pm Michael S. Tsirkin wrote: > > > I thought about this some more. I think the original > > > code is actually correct in returning ENOSPC: indirect > > > buffers are nice, but it's a mistake > > > to rely on them as a memory allocation might fail. > > > > > > And if you look at virtio-net, it is dropping packets > > > under memory pressure which is not really a happy outcome: > > > the packet will get freed, reallocated and we get another one, > > > adding pressure on the allocator instead of releasing it > > > until we free up some buffers. > > > > > > So I now think we should calculate the capacity > > > assuming non-indirect entries, and if we manage to > > > use indirect, all the better. > > > > I've long said it's a weakness in the network stack that it insists > > drivers stop the tx queue before they *might* run out of room, leading to > > worst-case assumptions and underutilization of the tx ring. > > > > However, I lost that debate, and so your patch is the way it's supposed > to > > work. The other main indirect user (block) doesn't care as its queue > > allows for post-attempt blocking. > > > > I enhanced your commentry a little: > > > > Subject: virtio: return correct capacity to users > > Date: Thu, 4 Nov 2010 14:24:24 +0200 > > From: "Michael S. Tsirkin" <mst@redhat.com> > > > > We can't rely on indirect buffers for capacity > > calculations because they need a memory allocation > > which might fail. In particular, virtio_net can get > > into this situation under stress, and it drops packets > > and performs badly. > > > > So return the number of buffers we can guarantee users. > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> > > Reported-By: Krishna Kumar2 <krkumar2@in.ibm.com> > > I have tested this patch for 3-4 hours but so far I have not got the tx > full > error. I am not sure if "Tested-By" applies to this situation, but just in > case: > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> > Reported-By: Krishna Kumar2 <krkumar2@in.ibm.com> > Tested-By: Krishna Kumar2 <krkumar2@in.ibm.com> > > I think both this patch and the original patch I submitted > are needed? That patch removes ENOMEM check and the increment > of dev->stats.tx_fifo_errors, and reports "memory failure". > > Thanks, > > - KK So I think your patch on top of this one would be wrong: we actually make sure out of memory does not affect TX path at all, so any error would be unexpected. Incrementing tx fifo errors is probably also helpful for debugging. If you like, we could kill the special handling for ENOMEM. Not sure whether it matters.
On Tue, Nov 09, 2010 at 09:00:58PM +0530, Krishna Kumar2 wrote: > "Michael S. Tsirkin" <mst@redhat.com> wrote on 11/09/2010 06:45:55 PM: > > > Re: [PATCH] virtio_net: Fix queue full check > > > > On Tue, Nov 09, 2010 at 09:56:03AM +0530, Krishna Kumar2 wrote: > > > Rusty Russell <rusty@rustcorp.com.au> wrote on 11/08/2010 04:38:47 AM: > > > > > > > Re: [PATCH] virtio_net: Fix queue full check > > > > > > > > On Thu, 4 Nov 2010 10:54:24 pm Michael S. Tsirkin wrote: > > > > > I thought about this some more. I think the original > > > > > code is actually correct in returning ENOSPC: indirect > > > > > buffers are nice, but it's a mistake > > > > > to rely on them as a memory allocation might fail. > > > > > > > > > > And if you look at virtio-net, it is dropping packets > > > > > under memory pressure which is not really a happy outcome: > > > > > the packet will get freed, reallocated and we get another one, > > > > > adding pressure on the allocator instead of releasing it > > > > > until we free up some buffers. > > > > > > > > > > So I now think we should calculate the capacity > > > > > assuming non-indirect entries, and if we manage to > > > > > use indirect, all the better. > > > > > > > > I've long said it's a weakness in the network stack that it insists > > > > drivers stop the tx queue before they *might* run out of room, > leading to > > > > worst-case assumptions and underutilization of the tx ring. > > > > > > > > However, I lost that debate, and so your patch is the way it's > supposed > > > to > > > > work. The other main indirect user (block) doesn't care as its queue > > > > allows for post-attempt blocking. > > > > > > > > I enhanced your commentry a little: > > > > > > > > Subject: virtio: return correct capacity to users > > > > Date: Thu, 4 Nov 2010 14:24:24 +0200 > > > > From: "Michael S. Tsirkin" <mst@redhat.com> > > > > > > > > We can't rely on indirect buffers for capacity > > > > calculations because they need a memory allocation > > > > which might fail. In particular, virtio_net can get > > > > into this situation under stress, and it drops packets > > > > and performs badly. > > > > > > > > So return the number of buffers we can guarantee users. > > > > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> > > > > Reported-By: Krishna Kumar2 <krkumar2@in.ibm.com> > > > > > > I have tested this patch for 3-4 hours but so far I have not got the tx > > > full > > > error. I am not sure if "Tested-By" applies to this situation, but just > in > > > case: > > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> > > > Reported-By: Krishna Kumar2 <krkumar2@in.ibm.com> > > > Tested-By: Krishna Kumar2 <krkumar2@in.ibm.com> > > > > > > I think both this patch and the original patch I submitted > > > are needed? That patch removes ENOMEM check and the increment > > > of dev->stats.tx_fifo_errors, and reports "memory failure". > > > > > > Thanks, > > > > > > - KK > > > > So I think your patch on top of this one would be wrong: > > we actually make sure out of memory does not affect TX path > > at all, so any error would be unexpected. > > > > Incrementing tx fifo errors is probably also helpful for debugging. > > > > If you like, we could kill the special handling for ENOMEM. > > Not sure whether it matters. > > Since that is dead code, we could remove it (and the fifo error > should disappear too - tx_dropped should be the only counter to > be incremented?). Sorry if I misunderstood something. > > Thanks, > > - KK It's just a sanity check. The capacity checking is tricky enough that I'm happier with some way to detect overflow both from ifconfig and dmesg. I don't really care which counter gets incremented really, since this is some TX bug fifo error seems appropriate but I don't really care much.
"Michael S. Tsirkin" <mst@redhat.com> wrote on 11/09/2010 06:45:55 PM: > Re: [PATCH] virtio_net: Fix queue full check > > On Tue, Nov 09, 2010 at 09:56:03AM +0530, Krishna Kumar2 wrote: > > Rusty Russell <rusty@rustcorp.com.au> wrote on 11/08/2010 04:38:47 AM: > > > > > Re: [PATCH] virtio_net: Fix queue full check > > > > > > On Thu, 4 Nov 2010 10:54:24 pm Michael S. Tsirkin wrote: > > > > I thought about this some more. I think the original > > > > code is actually correct in returning ENOSPC: indirect > > > > buffers are nice, but it's a mistake > > > > to rely on them as a memory allocation might fail. > > > > > > > > And if you look at virtio-net, it is dropping packets > > > > under memory pressure which is not really a happy outcome: > > > > the packet will get freed, reallocated and we get another one, > > > > adding pressure on the allocator instead of releasing it > > > > until we free up some buffers. > > > > > > > > So I now think we should calculate the capacity > > > > assuming non-indirect entries, and if we manage to > > > > use indirect, all the better. > > > > > > I've long said it's a weakness in the network stack that it insists > > > drivers stop the tx queue before they *might* run out of room, leading to > > > worst-case assumptions and underutilization of the tx ring. > > > > > > However, I lost that debate, and so your patch is the way it's supposed > > to > > > work. The other main indirect user (block) doesn't care as its queue > > > allows for post-attempt blocking. > > > > > > I enhanced your commentry a little: > > > > > > Subject: virtio: return correct capacity to users > > > Date: Thu, 4 Nov 2010 14:24:24 +0200 > > > From: "Michael S. Tsirkin" <mst@redhat.com> > > > > > > We can't rely on indirect buffers for capacity > > > calculations because they need a memory allocation > > > which might fail. In particular, virtio_net can get > > > into this situation under stress, and it drops packets > > > and performs badly. > > > > > > So return the number of buffers we can guarantee users. > > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> > > > Reported-By: Krishna Kumar2 <krkumar2@in.ibm.com> > > > > I have tested this patch for 3-4 hours but so far I have not got the tx > > full > > error. I am not sure if "Tested-By" applies to this situation, but just in > > case: > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> > > Reported-By: Krishna Kumar2 <krkumar2@in.ibm.com> > > Tested-By: Krishna Kumar2 <krkumar2@in.ibm.com> > > > > I think both this patch and the original patch I submitted > > are needed? That patch removes ENOMEM check and the increment > > of dev->stats.tx_fifo_errors, and reports "memory failure". > > > > Thanks, > > > > - KK > > So I think your patch on top of this one would be wrong: > we actually make sure out of memory does not affect TX path > at all, so any error would be unexpected. > > Incrementing tx fifo errors is probably also helpful for debugging. > > If you like, we could kill the special handling for ENOMEM. > Not sure whether it matters. Since that is dead code, we could remove it (and the fifo error should disappear too - tx_dropped should be the only counter to be incremented?). Sorry if I misunderstood something. Thanks, - KK -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 1475ed6..cc2f73e 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -230,9 +230,6 @@ add_head: pr_debug("Added buffer head %i to %p\n", head, vq); END_USE(vq); - /* If we're indirect, we can fit many (assuming not OOM). */ - if (vq->indirect) - return vq->num_free ? vq->vring.num : 0; return vq->num_free; } EXPORT_SYMBOL_GPL(virtqueue_add_buf_gfp);