Message ID | 1435223885-12745-1-git-send-email-stefanha@redhat.com |
---|---|
State | New |
Headers | show |
On Thu, 06/25 10:18, Stefan Hajnoczi wrote: > e1000_can_receive() checks the link up status register bit. If the bit > is clear, packets will be queued and the peer may disable receive to > avoid wasting CPU reading packets that cannot be delivered. The queue > must be flushed once the link comes back up again. > > This patch fixes broken e1000 receive with Mac OS X Snow Leopard guests > and tap networking. Flushing the queue invokes the async send callback, > which re-enables tap fd read. > > Reported-by: Jonathan Liu <net147@gmail.com> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> > --- > hw/net/e1000.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/hw/net/e1000.c b/hw/net/e1000.c > index bab8e2a..5c6bcd0 100644 > --- a/hw/net/e1000.c > +++ b/hw/net/e1000.c > @@ -185,6 +185,9 @@ e1000_link_up(E1000State *s) > { > s->mac_reg[STATUS] |= E1000_STATUS_LU; > s->phy_reg[PHY_STATUS] |= MII_SR_LINK_STATUS; > + > + /* E1000_STATUS_LU is tested by e1000_can_receive() */ > + qemu_flush_queued_packets(qemu_get_queue(s->nic)); > } > > static bool > -- > 2.4.3 > >
On Thu, Jun 25, 2015 at 10:18:05AM +0100, Stefan Hajnoczi wrote: > e1000_can_receive() checks the link up status register bit. If the bit > is clear, packets will be queued and the peer may disable receive to > avoid wasting CPU reading packets that cannot be delivered. The queue > must be flushed once the link comes back up again. > > This patch fixes broken e1000 receive with Mac OS X Snow Leopard guests > and tap networking. Flushing the queue invokes the async send callback, > which re-enables tap fd read. > > Reported-by: Jonathan Liu <net147@gmail.com> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > --- > hw/net/e1000.c | 3 +++ > 1 file changed, 3 insertions(+) Thanks, applied to my net tree: https://github.com/stefanha/qemu/commits/net Stefan
On 06/25/2015 05:18 PM, Stefan Hajnoczi wrote: > e1000_can_receive() checks the link up status register bit. If the bit > is clear, packets will be queued and the peer may disable receive to > avoid wasting CPU reading packets that cannot be delivered. The queue > must be flushed once the link comes back up again. > > This patch fixes broken e1000 receive with Mac OS X Snow Leopard guests > and tap networking. Flushing the queue invokes the async send callback, > which re-enables tap fd read. > > Reported-by: Jonathan Liu <net147@gmail.com> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > --- > hw/net/e1000.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/hw/net/e1000.c b/hw/net/e1000.c > index bab8e2a..5c6bcd0 100644 > --- a/hw/net/e1000.c > +++ b/hw/net/e1000.c > @@ -185,6 +185,9 @@ e1000_link_up(E1000State *s) > { > s->mac_reg[STATUS] |= E1000_STATUS_LU; > s->phy_reg[PHY_STATUS] |= MII_SR_LINK_STATUS; > + > + /* E1000_STATUS_LU is tested by e1000_can_receive() */ > + qemu_flush_queued_packets(qemu_get_queue(s->nic)); > } > > static bool This only solves the issue partially and just for e1000. After checking all can_receive() functions, another card with similar behaviour is vmxnet3. Looking at commit a90a7425cf592a3afeff3eaf32f543b83050ee5c ("tap: Drop tap_can_send") again. The commit disable tap read poll when qemu_net_queue_send() returns zero. Which is usually the following cases: 1) queue->delivering is 1 2) qemu_can_send_packet() returns zero, which is: 2a) vm_running is false or 2b) can_receive() return zero 3) qemu_net_queue_deliver() returns zero, which is: 3a) nc->receive_disabled is true or 3b) info->receive_raw() or nc->receive->receive() returns zero This means we should enable tap read poll when one of those conditions is not existed. This patch fixes 2b) only for e1000. for 1, I'm not quite sure it's a real problem or how to fix. for 2a, we may probably need a vm state change handler to flush the queue, otherwise network will stall after a stop and cont. for 2b, we can either fixes the card that check link status in its can_receive() or just simply can qemu_flush_queued_packets() in set_link. 3a and 3b looks ok, since this happen when there's no space for rx, after guest refill, qemu will call qemu_flush_queued_packets() to flush and re-enable tap read poll. 2a and 2b does not exits before this commit, since tap_send check qemu_can_send_packet() before. Looks like netmap has the same issue.
On Fri, 06/26 17:06, Jason Wang wrote: > > > On 06/25/2015 05:18 PM, Stefan Hajnoczi wrote: > > e1000_can_receive() checks the link up status register bit. If the bit > > is clear, packets will be queued and the peer may disable receive to > > avoid wasting CPU reading packets that cannot be delivered. The queue > > must be flushed once the link comes back up again. > > > > This patch fixes broken e1000 receive with Mac OS X Snow Leopard guests > > and tap networking. Flushing the queue invokes the async send callback, > > which re-enables tap fd read. > > > > Reported-by: Jonathan Liu <net147@gmail.com> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > --- > > hw/net/e1000.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/hw/net/e1000.c b/hw/net/e1000.c > > index bab8e2a..5c6bcd0 100644 > > --- a/hw/net/e1000.c > > +++ b/hw/net/e1000.c > > @@ -185,6 +185,9 @@ e1000_link_up(E1000State *s) > > { > > s->mac_reg[STATUS] |= E1000_STATUS_LU; > > s->phy_reg[PHY_STATUS] |= MII_SR_LINK_STATUS; > > + > > + /* E1000_STATUS_LU is tested by e1000_can_receive() */ > > + qemu_flush_queued_packets(qemu_get_queue(s->nic)); > > } > > > > static bool > > This only solves the issue partially and just for e1000. After checking > all can_receive() functions, another card with similar behaviour is vmxnet3. > > Looking at commit a90a7425cf592a3afeff3eaf32f543b83050ee5c ("tap: Drop > tap_can_send") again. The commit disable tap read poll when > qemu_net_queue_send() returns zero. Which is usually the following cases: > > 1) queue->delivering is 1 > 2) qemu_can_send_packet() returns zero, which is: > 2a) vm_running is false > or > 2b) can_receive() return zero > 3) qemu_net_queue_deliver() returns zero, which is: > 3a) nc->receive_disabled is true > or > 3b) info->receive_raw() or nc->receive->receive() returns zero > > This means we should enable tap read poll when one of those conditions > is not existed. This patch fixes 2b) only for e1000. > > for 1, I'm not quite sure it's a real problem or how to fix. > for 2a, we may probably need a vm state change handler to flush the > queue, otherwise network will stall after a stop and cont. > for 2b, we can either fixes the card that check link status in its > can_receive() or just simply can qemu_flush_queued_packets() in set_link. > 3a and 3b looks ok, since this happen when there's no space for rx, > after guest refill, qemu will call qemu_flush_queued_packets() to flush > and re-enable tap read poll. > > 2a and 2b does not exits before this commit, since tap_send check > qemu_can_send_packet() before. > > Looks like netmap has the same issue. > Ouch! Good catch! I will take a look at the devices today and see if we can fix the problem by adding qemu_flush_queued_packets() calls in the state transition points (vm_running, can_receive). The worst case is reverting the whole series (0bc12c4f7..f4d248bdc3) for 2.4, but dropping can_read is a worthwhile step for optimizing our event loop. Thanks, Fam
On Fri, Jun 26, 2015 at 05:06:01PM +0800, Jason Wang wrote: > > > On 06/25/2015 05:18 PM, Stefan Hajnoczi wrote: > > e1000_can_receive() checks the link up status register bit. If the bit > > is clear, packets will be queued and the peer may disable receive to > > avoid wasting CPU reading packets that cannot be delivered. The queue > > must be flushed once the link comes back up again. > > > > This patch fixes broken e1000 receive with Mac OS X Snow Leopard guests > > and tap networking. Flushing the queue invokes the async send callback, > > which re-enables tap fd read. > > > > Reported-by: Jonathan Liu <net147@gmail.com> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > --- > > hw/net/e1000.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/hw/net/e1000.c b/hw/net/e1000.c > > index bab8e2a..5c6bcd0 100644 > > --- a/hw/net/e1000.c > > +++ b/hw/net/e1000.c > > @@ -185,6 +185,9 @@ e1000_link_up(E1000State *s) > > { > > s->mac_reg[STATUS] |= E1000_STATUS_LU; > > s->phy_reg[PHY_STATUS] |= MII_SR_LINK_STATUS; > > + > > + /* E1000_STATUS_LU is tested by e1000_can_receive() */ > > + qemu_flush_queued_packets(qemu_get_queue(s->nic)); > > } > > > > static bool > > This only solves the issue partially and just for e1000. After checking > all can_receive() functions, another card with similar behaviour is vmxnet3. No, this bug is specific to e1000 (details on that below in case 2b). There are more issues though, so additional patches are welcome. > Looking at commit a90a7425cf592a3afeff3eaf32f543b83050ee5c ("tap: Drop > tap_can_send") again. The commit disable tap read poll when > qemu_net_queue_send() returns zero. Which is usually the following cases: > > 1) queue->delivering is 1 > 2) qemu_can_send_packet() returns zero, which is: > 2a) vm_running is false > or > 2b) can_receive() return zero > 3) qemu_net_queue_deliver() returns zero, which is: > 3a) nc->receive_disabled is true > or > 3b) info->receive_raw() or nc->receive->receive() returns zero > > This means we should enable tap read poll when one of those conditions > is not existed. This patch fixes 2b) only for e1000. > > for 1, I'm not quite sure it's a real problem or how to fix. This is not a problem. When the delivery code returns it flushes the queue. This will invoke send callback functions, so tap will re-enable read poll. > for 2a, we may probably need a vm state change handler to flush the > queue, otherwise network will stall after a stop and cont. Yes, I'm surprised the code isn't doing this already. > for 2b, we can either fixes the card that check link status in its > can_receive() or just simply can qemu_flush_queued_packets() in set_link. No, this doesn't work because the e1000 link status bit is *not* equivalent to nc->link_down. The link auto-negotiation code changes the e1000 bit independently from nc->link_down using a timer. The net subsystem cannot know when this e1000-specific bit is supposed to change. Therefore there is no general fix. Also, qemu_send_packet() drops packets while nc->link_down is true. We're not supposed to queue so it shouldn't be necessary to flush packets in set_link at the net subsystem level. > 3a and 3b looks ok, since this happen when there's no space for rx, > after guest refill, qemu will call qemu_flush_queued_packets() to flush > and re-enable tap read poll. > > 2a and 2b does not exits before this commit, since tap_send check > qemu_can_send_packet() before. The delivery and queuing mechanism in QEMU is overcomplicated. The semantics of .can_receive() and .receive() need to be clarified before we can do real cleanup though: .receive() returning 0 is not quite the same as .can_receive() returning false. When .receive() returns 0 this means the peer should wait before sending more. In most cases the net queue only needs to hold 1 packet. But there are cases like slirp where QEMU code generates packets and may not be able to continue without calling qemu_send_packet() on more packets (I'm not sure but I suspect this is the case). When .can_receive() returns false it also casues packets to be queued, but NIC emulation code often includes more conditions in .can_receive() than .receive() returning 0. Cleaning this up requires auditing all the NICs. Maybe we can get rid of .can_receive() and just have a simplified .receive(): false - queue packet and tell peer to stop sending. Flush must be called later to re-enable sending. true - packet delivered or dropped
On Mon, 06/29 11:39, Stefan Hajnoczi wrote: > On Fri, Jun 26, 2015 at 05:06:01PM +0800, Jason Wang wrote: > > > > > > On 06/25/2015 05:18 PM, Stefan Hajnoczi wrote: > > > e1000_can_receive() checks the link up status register bit. If the bit > > > is clear, packets will be queued and the peer may disable receive to > > > avoid wasting CPU reading packets that cannot be delivered. The queue > > > must be flushed once the link comes back up again. > > > > > > This patch fixes broken e1000 receive with Mac OS X Snow Leopard guests > > > and tap networking. Flushing the queue invokes the async send callback, > > > which re-enables tap fd read. > > > > > > Reported-by: Jonathan Liu <net147@gmail.com> > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > > --- > > > hw/net/e1000.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/hw/net/e1000.c b/hw/net/e1000.c > > > index bab8e2a..5c6bcd0 100644 > > > --- a/hw/net/e1000.c > > > +++ b/hw/net/e1000.c > > > @@ -185,6 +185,9 @@ e1000_link_up(E1000State *s) > > > { > > > s->mac_reg[STATUS] |= E1000_STATUS_LU; > > > s->phy_reg[PHY_STATUS] |= MII_SR_LINK_STATUS; > > > + > > > + /* E1000_STATUS_LU is tested by e1000_can_receive() */ > > > + qemu_flush_queued_packets(qemu_get_queue(s->nic)); > > > } > > > > > > static bool > > > > This only solves the issue partially and just for e1000. After checking > > all can_receive() functions, another card with similar behaviour is vmxnet3. > > No, this bug is specific to e1000 (details on that below in case 2b). > > There are more issues though, so additional patches are welcome. > > > Looking at commit a90a7425cf592a3afeff3eaf32f543b83050ee5c ("tap: Drop > > tap_can_send") again. The commit disable tap read poll when > > qemu_net_queue_send() returns zero. Which is usually the following cases: > > > > 1) queue->delivering is 1 > > 2) qemu_can_send_packet() returns zero, which is: > > 2a) vm_running is false > > or > > 2b) can_receive() return zero > > 3) qemu_net_queue_deliver() returns zero, which is: > > 3a) nc->receive_disabled is true > > or > > 3b) info->receive_raw() or nc->receive->receive() returns zero > > > > This means we should enable tap read poll when one of those conditions > > is not existed. This patch fixes 2b) only for e1000. > > > > for 1, I'm not quite sure it's a real problem or how to fix. > > This is not a problem. When the delivery code returns it flushes the > queue. This will invoke send callback functions, so tap will re-enable > read poll. > > > for 2a, we may probably need a vm state change handler to flush the > > queue, otherwise network will stall after a stop and cont. > > Yes, I'm surprised the code isn't doing this already. It's rare that a .can_receive checks vm_running. Why is it only necessary in virtio-net, or is it that other NIC models are wrong? > > > for 2b, we can either fixes the card that check link status in its > > can_receive() or just simply can qemu_flush_queued_packets() in set_link. > > No, this doesn't work because the e1000 link status bit is *not* > equivalent to nc->link_down. The link auto-negotiation code changes the > e1000 bit independently from nc->link_down using a timer. > > The net subsystem cannot know when this e1000-specific bit is supposed > to change. Therefore there is no general fix. > > Also, qemu_send_packet() drops packets while nc->link_down is true. > We're not supposed to queue so it shouldn't be necessary to flush > packets in set_link at the net subsystem level. > > > 3a and 3b looks ok, since this happen when there's no space for rx, > > after guest refill, qemu will call qemu_flush_queued_packets() to flush > > and re-enable tap read poll. > > > > 2a and 2b does not exits before this commit, since tap_send check > > qemu_can_send_packet() before. > > The delivery and queuing mechanism in QEMU is overcomplicated. The > semantics of .can_receive() and .receive() need to be clarified before > we can do real cleanup though: > > .receive() returning 0 is not quite the same as .can_receive() returning > false. > > When .receive() returns 0 this means the peer should wait before sending > more. In most cases the net queue only needs to hold 1 packet. But > there are cases like slirp where QEMU code generates packets and may not > be able to continue without calling qemu_send_packet() on more packets > (I'm not sure but I suspect this is the case). > > When .can_receive() returns false it also casues packets to be queued, > but NIC emulation code often includes more conditions in .can_receive() > than .receive() returning 0. > > Cleaning this up requires auditing all the NICs. Maybe we can get rid > of .can_receive() and just have a simplified .receive(): > > false - queue packet and tell peer to stop sending. Flush must be > called later to re-enable sending. > true - packet delivered or dropped Yes, I find that some of the .can_receive callbacks look OK for a simple dropping, for example net-hub and virtio-net. I can try this, or even better if we can split the work. As a starter maybe we can drop the .can_receive callback and move existing implementations to the beginning of .receive() - something that a few NICs already do. Fam
On 06/29/2015 06:39 PM, Stefan Hajnoczi wrote: > On Fri, Jun 26, 2015 at 05:06:01PM +0800, Jason Wang wrote: >> >> On 06/25/2015 05:18 PM, Stefan Hajnoczi wrote: >>> e1000_can_receive() checks the link up status register bit. If the bit >>> is clear, packets will be queued and the peer may disable receive to >>> avoid wasting CPU reading packets that cannot be delivered. The queue >>> must be flushed once the link comes back up again. >>> >>> This patch fixes broken e1000 receive with Mac OS X Snow Leopard guests >>> and tap networking. Flushing the queue invokes the async send callback, >>> which re-enables tap fd read. >>> >>> Reported-by: Jonathan Liu <net147@gmail.com> >>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> >>> --- >>> hw/net/e1000.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/hw/net/e1000.c b/hw/net/e1000.c >>> index bab8e2a..5c6bcd0 100644 >>> --- a/hw/net/e1000.c >>> +++ b/hw/net/e1000.c >>> @@ -185,6 +185,9 @@ e1000_link_up(E1000State *s) >>> { >>> s->mac_reg[STATUS] |= E1000_STATUS_LU; >>> s->phy_reg[PHY_STATUS] |= MII_SR_LINK_STATUS; >>> + >>> + /* E1000_STATUS_LU is tested by e1000_can_receive() */ >>> + qemu_flush_queued_packets(qemu_get_queue(s->nic)); >>> } >>> >>> static bool >> This only solves the issue partially and just for e1000. After checking >> all can_receive() functions, another card with similar behaviour is vmxnet3. > No, this bug is specific to e1000 (details on that below in case 2b). > > There are more issues though, so additional patches are welcome. > >> Looking at commit a90a7425cf592a3afeff3eaf32f543b83050ee5c ("tap: Drop >> tap_can_send") again. The commit disable tap read poll when >> qemu_net_queue_send() returns zero. Which is usually the following cases: >> >> 1) queue->delivering is 1 >> 2) qemu_can_send_packet() returns zero, which is: >> 2a) vm_running is false >> or >> 2b) can_receive() return zero >> 3) qemu_net_queue_deliver() returns zero, which is: >> 3a) nc->receive_disabled is true >> or >> 3b) info->receive_raw() or nc->receive->receive() returns zero >> >> This means we should enable tap read poll when one of those conditions >> is not existed. This patch fixes 2b) only for e1000. >> >> for 1, I'm not quite sure it's a real problem or how to fix. > This is not a problem. When the delivery code returns it flushes the > queue. This will invoke send callback functions, so tap will re-enable > read poll. > >> for 2a, we may probably need a vm state change handler to flush the >> queue, otherwise network will stall after a stop and cont. > Yes, I'm surprised the code isn't doing this already. > >> for 2b, we can either fixes the card that check link status in its >> can_receive() or just simply can qemu_flush_queued_packets() in set_link. > No, this doesn't work because the e1000 link status bit is *not* > equivalent to nc->link_down. The link auto-negotiation code changes the > e1000 bit independently from nc->link_down using a timer. > > The net subsystem cannot know when this e1000-specific bit is supposed > to change. Therefore there is no general fix. > > Also, qemu_send_packet() drops packets while nc->link_down is true. > We're not supposed to queue so it shouldn't be necessary to flush > packets in set_link at the net subsystem level. But qemu_send_packet() only check sender's nc->link_down. So tap read poll will be disabled and won't get enabled when its peers' link status is recovered. > >> 3a and 3b looks ok, since this happen when there's no space for rx, >> after guest refill, qemu will call qemu_flush_queued_packets() to flush >> and re-enable tap read poll. >> >> 2a and 2b does not exits before this commit, since tap_send check >> qemu_can_send_packet() before. > The delivery and queuing mechanism in QEMU is overcomplicated. The > semantics of .can_receive() and .receive() need to be clarified before > we can do real cleanup though: > > .receive() returning 0 is not quite the same as .can_receive() returning > false. > > When .receive() returns 0 this means the peer should wait before sending > more. In most cases the net queue only needs to hold 1 packet. But > there are cases like slirp where QEMU code generates packets and may not > be able to continue without calling qemu_send_packet() on more packets > (I'm not sure but I suspect this is the case). > > When .can_receive() returns false it also casues packets to be queued, > but NIC emulation code often includes more conditions in .can_receive() > than .receive() returning 0. Some NICs do even more tricks. E.g when receiver is disabled, 8139's can_receive() returns true and expect receive() to drop the packets. > > Cleaning this up requires auditing all the NICs. Maybe we can get rid > of .can_receive() and just have a simplified .receive(): > > false - queue packet and tell peer to stop sending. Flush must be > called later to re-enable sending. And call can_receive() in receive(). > true - packet delivered or dropped This looks good, but do you think we can do this in 2.4? Looks a little bit risky.
diff --git a/hw/net/e1000.c b/hw/net/e1000.c index bab8e2a..5c6bcd0 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -185,6 +185,9 @@ e1000_link_up(E1000State *s) { s->mac_reg[STATUS] |= E1000_STATUS_LU; s->phy_reg[PHY_STATUS] |= MII_SR_LINK_STATUS; + + /* E1000_STATUS_LU is tested by e1000_can_receive() */ + qemu_flush_queued_packets(qemu_get_queue(s->nic)); } static bool
e1000_can_receive() checks the link up status register bit. If the bit is clear, packets will be queued and the peer may disable receive to avoid wasting CPU reading packets that cannot be delivered. The queue must be flushed once the link comes back up again. This patch fixes broken e1000 receive with Mac OS X Snow Leopard guests and tap networking. Flushing the queue invokes the async send callback, which re-enables tap fd read. Reported-by: Jonathan Liu <net147@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> --- hw/net/e1000.c | 3 +++ 1 file changed, 3 insertions(+)