Message ID | 1260312605.19229.8.camel@w-sridhar.beaverton.ibm.com |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
Sridhar Samudrala <sri@us.ibm.com> wrote: > When testing vhost-net patches with a guest running linux 2.6.32, i am > running into "Unexpected full queue" warning messages from start_xmit() in > virtio_net.c causing a lot of requeues and a drastic reduction in throughput. > > With a guest running 2.6.31, i get guest to host throughput around 7000Mb/s, > but it drops to around 3200Mb/s with 2.6.32. > > I don't see this problem with usermode virtio in qemu, but i get a thruput of > only 2700Mb/s with both 2.6.31 and 2.6.32. > > The following patch fixes this problem by dropping the skb and not requeuing > to qdisc when it cannot be added to ring buffer. With this patch, i see > similar performance as 2.6.31 with vhost-net. > > Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index b9e002f..307cfd6 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -528,7 +528,11 @@ again: > netif_start_queue(dev); > goto again; > } > - return NETDEV_TX_BUSY; > + > + /* drop the skb under stress. */ > + vi->dev->stats.tx_dropped++; > + kfree_skb(skb); > + return NETDEV_TX_OK; > } This patch only hides the real problem. The issue is that we should never start the queue unless we can accomodate a full 64KB packet. In your case, whenever we have the space for a single slot we'd start the queue. However, if the head of the queue required more than one slot it would immediately return BUSY and stop the queue again. Your patch simply drops the packet (likely to be TSO) which would end up upsetting the TCP transmitter. Please see what a real Ethernet driver (e.g., tg3 or ixgbe) does for queue management. Cheers,
On Sun, Dec 13, 2009 at 08:25:12PM +0800, Herbert Xu wrote: > Sridhar Samudrala <sri@us.ibm.com> wrote: > > When testing vhost-net patches with a guest running linux 2.6.32, i am > > running into "Unexpected full queue" warning messages from start_xmit() in > > virtio_net.c causing a lot of requeues and a drastic reduction in throughput. > > > > With a guest running 2.6.31, i get guest to host throughput around 7000Mb/s, > > but it drops to around 3200Mb/s with 2.6.32. > > > > I don't see this problem with usermode virtio in qemu, but i get a thruput of > > only 2700Mb/s with both 2.6.31 and 2.6.32. > > > > The following patch fixes this problem by dropping the skb and not requeuing > > to qdisc when it cannot be added to ring buffer. With this patch, i see > > similar performance as 2.6.31 with vhost-net. > > > > Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index b9e002f..307cfd6 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -528,7 +528,11 @@ again: > > netif_start_queue(dev); > > goto again; > > } > > - return NETDEV_TX_BUSY; > > + > > + /* drop the skb under stress. */ > > + vi->dev->stats.tx_dropped++; > > + kfree_skb(skb); > > + return NETDEV_TX_OK; > > } > > This patch only hides the real problem. The issue is that we > should never start the queue unless we can accomodate a full > 64KB packet. > > In your case, whenever we have the space for a single slot we'd > start the queue. However, if the head of the queue required more > than one slot it would immediately return BUSY and stop the queue > again. > > Your patch simply drops the packet (likely to be TSO) which would > end up upsetting the TCP transmitter. > > Please see what a real Ethernet driver (e.g., tg3 or ixgbe) does > for queue management. > > Cheers, An interesting thing is that 48925e372f04f5e35fec6269127c62b2c71ab794 was supposed to do this already. > -- > Visit Openswan at http://www.openswan.org/ > Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Dec 14, 2009 at 01:40:55AM +0200, Michael S. Tsirkin wrote: > > An interesting thing is that > 48925e372f04f5e35fec6269127c62b2c71ab794 > was supposed to do this already. But Rusty missed the queue wake call. That too needs to check that the required amount of space exists. Cheers,
On Tue, 2009-12-15 at 22:42 +0800, Herbert Xu wrote: > On Mon, Dec 14, 2009 at 01:40:55AM +0200, Michael S. Tsirkin wrote: > > > > An interesting thing is that > > 48925e372f04f5e35fec6269127c62b2c71ab794 > > was supposed to do this already. > > But Rusty missed the queue wake call. That too needs to check > that the required amount of space exists. Yes. I noticied that. skb_xmit_done() doesn't check for MAX_SKB_FRAGS+2 descriptors before doing a netif_wake_queue(). We don't have any easy way to get the number of free desciptors in the virtqueue and we may need a vring_get_num_free() interface. Also, can we do free_old_xmit_skbs() to freeup any used descriptors in the callback(skb_xmit_done) context? I tried this, but hit a panic in free_old_xmit_skbs(__skb_unlink() trying to remove the skb from the vi->send queue). Looks like we need to add the skb to the vi->send queue before doing a kick. Thanks Sridhar -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Dec 15, 2009 at 10:42:52PM +0800, Herbert Xu wrote: > On Mon, Dec 14, 2009 at 01:40:55AM +0200, Michael S. Tsirkin wrote: > > > > An interesting thing is that > > 48925e372f04f5e35fec6269127c62b2c71ab794 > > was supposed to do this already. > > But Rusty missed the queue wake call. That too needs to check > that the required amount of space exists. Right. Hmm. So for this to work we'll need to 1. Free skb upon interrupt instead of waiting for the next xmit call 2. Add API to query free ring capacity Rusty, sounds like a good plan? We could also extend host to delay interrupt until there is sufficient TX capacity but of course we also need to support old hosts as well. > Cheers, > -- > Visit Openswan at http://www.openswan.org/ > Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Dec 15, 2009 at 08:26:20AM -0800, Sridhar Samudrala wrote: > > Also, can we do free_old_xmit_skbs() to freeup any used descriptors in > the callback(skb_xmit_done) context? I tried this, but hit a panic in > free_old_xmit_skbs(__skb_unlink() trying to remove the skb from the vi->send queue). > Looks like we need to add the skb to the vi->send queue before doing a kick. No you can't free skbs from hard IRQ context. Cheers,
On Wed, Dec 16, 2009 at 01:32:27AM +0200, Michael S. Tsirkin wrote: > > Right. Hmm. So for this to work we'll need to > 1. Free skb upon interrupt instead of > waiting for the next xmit call No the next xmit call can free the skbs. > 2. Add API to query free ring capacity This is all you need. > We could also extend host to delay interrupt > until there is sufficient TX capacity > but of course we also need to support > old hosts as well. Yes that what real hardware does too. They are able to delay the interrupt until the number of free entries exceeds a minimum threshold. Cheers,
On Sun, 13 Dec 2009 10:55:12 pm Herbert Xu wrote: > Please see what a real Ethernet driver (e.g., tg3 or ixgbe) does > for queue management. Hi Herbert, Thanks for the hint. They seem to use NAPI for xmit cleanup, so that's what we should do? I'll try, but such a rewrite doesn't belong in 2.6.32. Sridhar, if you simply remove the dev_warn() does that restore some performance? I'll work on a proper fix for 2.6.33 now. Thanks, Rusty. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Dec 16, 2009 at 01:11:40PM +1030, Rusty Russell wrote: > > Thanks for the hint. They seem to use NAPI for xmit cleanup, so that's > what we should do? I'll try, but such a rewrite doesn't belong in 2.6.32. Well it depends. Real drivers can't touch the hardware so they're stuck with whatever the hardware does. For virtio we do have the flexibility of modifying the backend. Having said that, for existing backends that will signal when there is just a single free entry on the queue something like NAPI could reduce the overhead associated with the IRQs. Cheers,
On Wed, 16 Dec 2009 10:02:27 am Michael S. Tsirkin wrote: > Right. Hmm. So for this to work we'll need to > 1. Free skb upon interrupt instead of > waiting for the next xmit call > 2. Add API to query free ring capacity > > Rusty, sounds like a good plan? Well, the query stuff is not too bad, but I can't completely convince myself it's race-free. We don't want to do locking. A NAPI-style solution seems cleaner, and I'm testing that now. > We could also extend host to delay interrupt > until there is sufficient TX capacity > but of course we also need to support > old hosts as well. Xen does this, and I rejected it in favor of simple enable/disable flags in the original virtio design. It was probably wrong: while the guest can enable on a "few remaining" heuristic, it's going to have latency. The host can do a more timely decision. There's nothing stopping the Host from doing this heuristic today, of course: the DISABLE flag is advisory only. But let's check the limitations of the guest-enable approach first? Thanks, Rusty. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Dec 16, 2009 at 03:07:53PM +1030, Rusty Russell wrote: > On Wed, 16 Dec 2009 10:02:27 am Michael S. Tsirkin wrote: > > Right. Hmm. So for this to work we'll need to > > 1. Free skb upon interrupt instead of > > waiting for the next xmit call > > 2. Add API to query free ring capacity > > > > Rusty, sounds like a good plan? > > Well, the query stuff is not too bad, but I can't completely convince myself > it's race-free. We don't want to do locking. We do not want to lock TX? For performance? It might not be a problem: interrupts only start to be enabled when we are running out of space on TX, so this is a kind of slow path anyway now. If necessary, we can also only do this range check if netif_tx_queue_stopped. > A NAPI-style solution seems cleaner, and I'm testing that now. Hmm, as you say separately, this might not be 2.6.32 material though. Maybe a simply capacity check will be safe enough for 2.6.32. > > We could also extend host to delay interrupt > > until there is sufficient TX capacity > > but of course we also need to support > > old hosts as well. > > Xen does this, and I rejected it in favor of simple enable/disable > flags in the original virtio design. It was probably wrong: while the > guest can enable on a "few remaining" heuristic, it's going to have > latency. The host can do a more timely decision. > > There's nothing stopping the Host from doing this heuristic today, of > course: the DISABLE flag is advisory only. Heh, but this might hurt performance on guests that do assume it's used correctly. A new feature might be cleaner? > But let's check the limitations > of the guest-enable approach first? Sure. > Thanks, > Rusty. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2009-12-16 at 13:11 +1030, Rusty Russell wrote: > Sridhar, if you simply remove the dev_warn() does that restore some > performance? I'll work on a proper fix for 2.6.33 now. Here is the comparision of a 60sec TCP STREAM test from 2.6.31.6 and 2.6.32 guests to a 2.6.32 host with vhost-net patches. Removing the dev_warn() in 2.6.32 doesn't make any significant difference in throughput. I think the requeues are the main cause for the regression. Will try your xmit NAPI patch. linux-2.6.32 ------------ $ ./netperf -c -C -H 192.168.122.1 -t TCP_STREAM -l60 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.1 (192.168.122.1) port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 60.03 3279.98 81.48 77.79 2.035 3.885 $ tc -s qdisc show dev eth0 qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 24708777531 bytes 1469482 pkt (dropped 0, overlimits 0 requeues 283575) rate 0bit 0pps backlog 0b 0p requeues 283575 $ ip -s link show dev eth0 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000 link/ether 54:52:00:35:e3:74 brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped overrun mcast 97000940 1469585 0 0 0 0 TX: bytes packets errors dropped carrier collsns 3233946817 1469533 0 0 0 0 linux-2.6.31.6 -------------- $ ./netperf -c -C -H 192.168.122.1 -t TCP_STREAM -l60 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.1 (192.168.122.1) port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 60.03 8038.48 69.44 71.02 0.708 1.448 $ tc -s qdisc show dev eth0 qdisc pfifo_fast 0: root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 60378698117 bytes 937388 pkt (dropped 0, overlimits 0 requeues 61) rate 0bit 0pps backlog 0b 0p requeues 61 $ ip -s link show dev eth0 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000 link/ether 54:52:00:35:e3:73 brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped overrun mcast 54702188 828655 0 0 0 0 TX: bytes packets errors dropped carrier collsns 248753779 937399 0 7 0 0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index b9e002f..307cfd6 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -528,7 +528,11 @@ again: netif_start_queue(dev); goto again; } - return NETDEV_TX_BUSY; + + /* drop the skb under stress. */ + vi->dev->stats.tx_dropped++; + kfree_skb(skb); + return NETDEV_TX_OK; } vi->svq->vq_ops->kick(vi->svq);
When testing vhost-net patches with a guest running linux 2.6.32, i am running into "Unexpected full queue" warning messages from start_xmit() in virtio_net.c causing a lot of requeues and a drastic reduction in throughput. With a guest running 2.6.31, i get guest to host throughput around 7000Mb/s, but it drops to around 3200Mb/s with 2.6.32. I don't see this problem with usermode virtio in qemu, but i get a thruput of only 2700Mb/s with both 2.6.31 and 2.6.32. The following patch fixes this problem by dropping the skb and not requeuing to qdisc when it cannot be added to ring buffer. With this patch, i see similar performance as 2.6.31 with vhost-net. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Thanks Sridhar -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html