Message ID | 1359675832-11999-1-git-send-email-mdroth@linux.vnet.ibm.com |
---|---|
State | New |
Headers | show |
On Thu, Jan 31, 2013 at 05:43:51PM -0600, Michael Roth wrote: > This reverts commit 84dd2120247a7d25ff1bb337de21c0e76816ad2d. > > I'm not sure what issue the original commit was meant to fix, or if > the logic is actually wrong, but it causes e1000 to stop working > after a guest issues a reset. Hi Michael, What's your test scenario? I tried this test with current qemu code, link status is not reseted to 'up' after step 3. Is it the problem you said? This problem also exists with current virtio (existed in the past) / rtl8139 (introduced in 83f58e570f21c3e7227e7fbef1fc0e18b5ed7ea9) 1) boot a guest with e1000 nic 2) set link down in monitor (hmp) set_link e1000.0 down 3) reset guest by 'system_reset' in monitor (hmp) system_reset My original patch is used to restore the link status after guest reboot(execute 'reboot' insider guest system). The link status should always be up after virtual 'hardware' reset (execute 'system_reset' in monitor). Thanks, Amos > >From what I can tell a guest with an e1000 nic has no way of changing > the link status, as far as it's NetClient peer is concerned, except > in the auto-negotiation path, so with this patch in place there's no > recovery after a reset, since the link goes down and stays that way. > > Revert this patch now to fix the bigger problem, and handle any > lingering issues with a follow-up. > > Reproduced/tested with qemu-jeos and Ubuntu 12.10. > > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> > --- > hw/e1000.c | 5 ----- > 1 file changed, 5 deletions(-) > > diff --git a/hw/e1000.c b/hw/e1000.c > index ef06ca1..563a58f 100644 > --- a/hw/e1000.c > +++ b/hw/e1000.c > @@ -166,11 +166,6 @@ static void > set_phy_ctrl(E1000State *s, int index, uint16_t val) > { > if ((val & MII_CR_AUTO_NEG_EN) && (val & MII_CR_RESTART_AUTO_NEG)) { > - /* no need auto-negotiation if link was down */ > - if (s->nic->nc.link_down) { > - s->phy_reg[PHY_STATUS] |= MII_SR_AUTONEG_COMPLETE; > - return; > - } > s->nic->nc.link_down = true; > e1000_link_down(s); > s->phy_reg[PHY_STATUS] &= ~MII_SR_AUTONEG_COMPLETE; > -- > 1.7.9.5
On Fri, Feb 01, 2013 at 03:20:59PM +0800, Amos Kong wrote: > On Thu, Jan 31, 2013 at 05:43:51PM -0600, Michael Roth wrote: > > This reverts commit 84dd2120247a7d25ff1bb337de21c0e76816ad2d. > > > > I'm not sure what issue the original commit was meant to fix, or if > > the logic is actually wrong, but it causes e1000 to stop working > > after a guest issues a reset. > > Hi Michael, > > What's your test scenario? > > I tried this test with current qemu code, link status is not reseted > to 'up' after step 3. Is it the problem you said? > This problem also exists with current virtio (existed in the past) / > rtl8139 (introduced in 83f58e570f21c3e7227e7fbef1fc0e18b5ed7ea9) > > 1) boot a guest with e1000 nic > 2) set link down in monitor > (hmp) set_link e1000.0 down > 3) reset guest by 'system_reset' in monitor > (hmp) system_reset > > > My original patch is used to restore the link status after guest > reboot(execute 'reboot' insider guest system). > The link status should always be up after virtual 'hardware' reset > (execute 'system_reset' in monitor). Is it expected? When we reset the virtual system, do we need to reset the status of simulation of network cable? I think it's deciced by that if we think simulation of network cable is a part of the virtual machine. ---- commit 436e5e53c97d8fb469306b18a0c31dc60f5e546c Author: aliguori <aliguori@c046a42c-6fe2-441c-8c8c-71466251a162> Date: Thu Jan 8 19:44:06 2009 +0000 Add 'set_link' monitor command (Mark McLoughlin) Add a monitor command to setting a given network device's link status to 'up' or 'down'. Allows simulation of network cable disconnect. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
On Fri, Feb 01, 2013 at 03:20:59PM +0800, Amos Kong wrote: > On Thu, Jan 31, 2013 at 05:43:51PM -0600, Michael Roth wrote: > > This reverts commit 84dd2120247a7d25ff1bb337de21c0e76816ad2d. > > > > I'm not sure what issue the original commit was meant to fix, or if > > the logic is actually wrong, but it causes e1000 to stop working > > after a guest issues a reset. > > Hi Michael, > > What's your test scenario? Nothing special, I just started a guest with -net nic,model=e1000 -net user or -net nic,model=e1000 -net tap and networking stopped working (could not lease an IP, no outbound traffic) after I rebooted. > > I tried this test with current qemu code, link status is not reseted > to 'up' after step 3. Is it the problem you said? I think it's related, but I'm not so much concerned with the qmp-visible link status changing as I am with the guest. > This problem also exists with current virtio (existed in the past) / > rtl8139 (introduced in 83f58e570f21c3e7227e7fbef1fc0e18b5ed7ea9) > > 1) boot a guest with e1000 nic > 2) set link down in monitor > (hmp) set_link e1000.0 down > 3) reset guest by 'system_reset' in monitor > (hmp) system_reset > > > My original patch is used to restore the link status after guest > reboot(execute 'reboot' insider guest system). > The link status should always be up after virtual 'hardware' reset > (execute 'system_reset' in monitor). You sure you don't have that backwards? It seems to me that your original patch was meant to *prevent* the link status from changing after a system reset, which makes sense from the perspective of a qmp-issued "set_link down" meaning "unplug the cable". > > Thanks, Amos > > > >From what I can tell a guest with an e1000 nic has no way of changing > > the link status, as far as it's NetClient peer is concerned, except > > in the auto-negotiation path, so with this patch in place there's no > > recovery after a reset, since the link goes down and stays that way. > > > > Revert this patch now to fix the bigger problem, and handle any > > lingering issues with a follow-up. > > > > Reproduced/tested with qemu-jeos and Ubuntu 12.10. > > > > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> > > --- > > hw/e1000.c | 5 ----- > > 1 file changed, 5 deletions(-) > > > > diff --git a/hw/e1000.c b/hw/e1000.c > > index ef06ca1..563a58f 100644 > > --- a/hw/e1000.c > > +++ b/hw/e1000.c > > @@ -166,11 +166,6 @@ static void > > set_phy_ctrl(E1000State *s, int index, uint16_t val) > > { > > if ((val & MII_CR_AUTO_NEG_EN) && (val & MII_CR_RESTART_AUTO_NEG)) { > > - /* no need auto-negotiation if link was down */ > > - if (s->nic->nc.link_down) { > > - s->phy_reg[PHY_STATUS] |= MII_SR_AUTONEG_COMPLETE; > > - return; > > - } > > s->nic->nc.link_down = true; > > e1000_link_down(s); > > s->phy_reg[PHY_STATUS] &= ~MII_SR_AUTONEG_COMPLETE; > > -- > > 1.7.9.5 >
On Fri, Feb 01, 2013 at 03:53:22PM +0800, Amos Kong wrote: > On Fri, Feb 01, 2013 at 03:20:59PM +0800, Amos Kong wrote: > > On Thu, Jan 31, 2013 at 05:43:51PM -0600, Michael Roth wrote: > > > This reverts commit 84dd2120247a7d25ff1bb337de21c0e76816ad2d. > > > > > > I'm not sure what issue the original commit was meant to fix, or if > > > the logic is actually wrong, but it causes e1000 to stop working > > > after a guest issues a reset. > > > > Hi Michael, > > > > What's your test scenario? > > > > I tried this test with current qemu code, link status is not reseted > > to 'up' after step 3. Is it the problem you said? > > This problem also exists with current virtio (existed in the past) / > > rtl8139 (introduced in 83f58e570f21c3e7227e7fbef1fc0e18b5ed7ea9) > > > > 1) boot a guest with e1000 nic > > 2) set link down in monitor > > (hmp) set_link e1000.0 down > > 3) reset guest by 'system_reset' in monitor > > (hmp) system_reset > > > > > > My original patch is used to restore the link status after guest > > reboot(execute 'reboot' insider guest system). > > > The link status should always be up after virtual 'hardware' reset > > (execute 'system_reset' in monitor). > > Is it expected? > > When we reset the virtual system, do we need to reset the status > of simulation of network cable? I don't so. If we "unplug the cable", it should stay unplugged when we reset the machine. When we reset the machine, we should set the NIC's link status in accordance with it's NetClient peer. So if that's what your patch was attempting to enforce I think we're in agreement there. I think the problem with your original patch is that it doesn't check the link status of the nic's NetClient peer, but instead checks it's own internal NetClient state, so we end up trying to enforce "leave the cable unplugged" even when "set_link down" was never issued. And if that's the case I think you can re-spin your original patch for 1.4 accordingly. But we definitely need to revert the current one ASAP since it breaks all QEMU guests that are started using the default nic. > I think it's deciced by that if we think simulation of network > cable is a part of the virtual machine. > > ---- > commit 436e5e53c97d8fb469306b18a0c31dc60f5e546c > Author: aliguori <aliguori@c046a42c-6fe2-441c-8c8c-71466251a162> > Date: Thu Jan 8 19:44:06 2009 +0000 > > Add 'set_link' monitor command (Mark McLoughlin) > > Add a monitor command to setting a given network device's link status > to 'up' or 'down'. > > Allows simulation of network cable disconnect. > > Signed-off-by: Mark McLoughlin <markmc@redhat.com> > Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> >
Michael Roth <mdroth@linux.vnet.ibm.com> writes: > This reverts commit 84dd2120247a7d25ff1bb337de21c0e76816ad2d. > > I'm not sure what issue the original commit was meant to fix, or if > the logic is actually wrong, but it causes e1000 to stop working > after a guest issues a reset. > >>From what I can tell a guest with an e1000 nic has no way of changing > the link status, as far as it's NetClient peer is concerned, except > in the auto-negotiation path, so with this patch in place there's no > recovery after a reset, since the link goes down and stays that way. > > Revert this patch now to fix the bigger problem, and handle any > lingering issues with a follow-up. > > Reproduced/tested with qemu-jeos and Ubuntu 12.10. > > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> > --- > hw/e1000.c | 5 ----- > 1 file changed, 5 deletions(-) > > diff --git a/hw/e1000.c b/hw/e1000.c > index ef06ca1..563a58f 100644 > --- a/hw/e1000.c > +++ b/hw/e1000.c > @@ -166,11 +166,6 @@ static void > set_phy_ctrl(E1000State *s, int index, uint16_t val) > { > if ((val & MII_CR_AUTO_NEG_EN) && (val & MII_CR_RESTART_AUTO_NEG)) { > - /* no need auto-negotiation if link was down */ > - if (s->nic->nc.link_down) { > - s->phy_reg[PHY_STATUS] |= MII_SR_AUTONEG_COMPLETE; > - return; > - } The problem with this patch is that it skips autonegotiate if the link is down however it doesn't take reset into account. Consider if you reset the guest during autonegotiation. The link is down but it's not really down--the guest is in the process of bringing it back up. But since it was down before reset, we won't let autonegotiation happen again. We shouldn't use nc.link_down state to indicate this. We should use another variable that we can clear during reset. I'm going to take this revert since it fixes a serious problem (networking doesn't work after a reboot) at the cost of a less serious problem (bring the link up if it was previously set to be down). But I hope we can get a proper fix during the -rc cycle. Regards, Anthony Liguori > s->nic->nc.link_down = true; > e1000_link_down(s); > s->phy_reg[PHY_STATUS] &= ~MII_SR_AUTONEG_COMPLETE; > -- > 1.7.9.5
On Fri, Feb 01, 2013 at 08:49:00AM -0600, mdroth wrote: > On Fri, Feb 01, 2013 at 03:53:22PM +0800, Amos Kong wrote: > > On Fri, Feb 01, 2013 at 03:20:59PM +0800, Amos Kong wrote: > > > On Thu, Jan 31, 2013 at 05:43:51PM -0600, Michael Roth wrote: > > > > This reverts commit 84dd2120247a7d25ff1bb337de21c0e76816ad2d. > > > > > > > > I'm not sure what issue the original commit was meant to fix, or if > > > > the logic is actually wrong, but it causes e1000 to stop working > > > > after a guest issues a reset. > > > > > > Hi Michael, > > > > > > What's your test scenario? > > > > > > I tried this test with current qemu code, link status is not reseted > > > to 'up' after step 3. Is it the problem you said? > > > This problem also exists with current virtio (existed in the past) / > > > rtl8139 (introduced in 83f58e570f21c3e7227e7fbef1fc0e18b5ed7ea9) > > > > > > 1) boot a guest with e1000 nic > > > 2) set link down in monitor > > > (hmp) set_link e1000.0 down > > > 3) reset guest by 'system_reset' in monitor > > > (hmp) system_reset > > > > > > > > > My original patch is used to restore the link status after guest > > > reboot(execute 'reboot' insider guest system). > > > > > The link status should always be up after virtual 'hardware' reset > > > (execute 'system_reset' in monitor). > > > > Is it expected? > > > > When we reset the virtual system, do we need to reset the status > > of simulation of network cable? > > I don't so. If we "unplug the cable", it should stay unplugged when we > reset the machine. Ok. > When we reset the machine, we should set the NIC's link status in > accordance with it's NetClient peer. So if that's what your patch was > attempting to enforce I think we're in agreement there. > > I think the problem with your original patch is that it doesn't check > the link status of the nic's NetClient peer, but instead checks it's > own internal NetClient state, so we end up trying to enforce "leave the > cable unplugged" even when "set_link down" was never issued. > > And if that's the case I think you can re-spin your original patch for > 1.4 accordingly. But we definitely need to revert the current one ASAP > since it breaks all QEMU guests that are started using the default nic. Agree, thanks.
On Fri, Feb 01, 2013 at 08:29:48AM -0600, mdroth wrote: > On Fri, Feb 01, 2013 at 03:20:59PM +0800, Amos Kong wrote: > > On Thu, Jan 31, 2013 at 05:43:51PM -0600, Michael Roth wrote: > > > This reverts commit 84dd2120247a7d25ff1bb337de21c0e76816ad2d. > > > > > > I'm not sure what issue the original commit was meant to fix, or if > > > the logic is actually wrong, but it causes e1000 to stop working > > > after a guest issues a reset. > > > > Hi Michael, > > > > What's your test scenario? > > Nothing special, I just started a guest with > > -net nic,model=e1000 -net user > > or > > -net nic,model=e1000 -net tap > > and networking stopped working (could not lease an IP, no outbound > traffic) after I rebooted. I can reproduce this problem sometimes. > > I tried this test with current qemu code, link status is not reseted > > to 'up' after step 3. Is it the problem you said? > > I think it's related, but I'm not so much concerned with the qmp-visible > link status changing as I am with the guest. > > > This problem also exists with current virtio (existed in the past) / > > rtl8139 (introduced in 83f58e570f21c3e7227e7fbef1fc0e18b5ed7ea9) > > > > 1) boot a guest with e1000 nic > > 2) set link down in monitor > > (hmp) set_link e1000.0 down > > 3) reset guest by 'system_reset' in monitor > > (hmp) system_reset > > > > > > My original patch is used to restore the link status after guest > > reboot(execute 'reboot' insider guest system). > > The link status should always be up after virtual 'hardware' reset > > (execute 'system_reset' in monitor). > > You sure you don't have that backwards? It seems to me that your > original patch was meant to *prevent* the link status from changing > after a system reset, which makes sense from the perspective of a > qmp-issued "set_link down" meaning "unplug the cable". I was confused after see your revert patch, now I know the real problem, and it's right to prevent down link status in 'system_reset'.
On Thu, Jan 31, 2013 at 05:43:51PM -0600, Michael Roth wrote: > This reverts commit 84dd2120247a7d25ff1bb337de21c0e76816ad2d. > > I'm not sure what issue the original commit was meant to fix, or if > the logic is actually wrong, but it causes e1000 to stop working > after a guest issues a reset. > > >From what I can tell a guest with an e1000 nic has no way of changing > the link status, as far as it's NetClient peer is concerned, except > in the auto-negotiation path, so with this patch in place there's no > recovery after a reset, since the link goes down and stays that way. > > Revert this patch now to fix the bigger problem, and handle any > lingering issues with a follow-up. > > Reproduced/tested with qemu-jeos and Ubuntu 12.10. > > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> So from discussion on list, it seems clear the proper fix will involve adding more state, which likely will need to be migrated as well. Seems like revert is the best we can do for 1.4. Acked-by: Michael S. Tsirkin <mst@redhat.com> > --- > hw/e1000.c | 5 ----- > 1 file changed, 5 deletions(-) > > diff --git a/hw/e1000.c b/hw/e1000.c > index ef06ca1..563a58f 100644 > --- a/hw/e1000.c > +++ b/hw/e1000.c > @@ -166,11 +166,6 @@ static void > set_phy_ctrl(E1000State *s, int index, uint16_t val) > { > if ((val & MII_CR_AUTO_NEG_EN) && (val & MII_CR_RESTART_AUTO_NEG)) { > - /* no need auto-negotiation if link was down */ > - if (s->nic->nc.link_down) { > - s->phy_reg[PHY_STATUS] |= MII_SR_AUTONEG_COMPLETE; > - return; > - } > s->nic->nc.link_down = true; > e1000_link_down(s); > s->phy_reg[PHY_STATUS] &= ~MII_SR_AUTONEG_COMPLETE; > -- > 1.7.9.5 >
On Thu, Jan 31, 2013 at 05:43:51PM -0600, Michael Roth wrote: > This reverts commit 84dd2120247a7d25ff1bb337de21c0e76816ad2d. > > I'm not sure what issue the original commit was meant to fix, or if > the logic is actually wrong, but it causes e1000 to stop working > after a guest issues a reset. > > From what I can tell a guest with an e1000 nic has no way of changing > the link status, as far as it's NetClient peer is concerned, except > in the auto-negotiation path, so with this patch in place there's no > recovery after a reset, since the link goes down and stays that way. > > Revert this patch now to fix the bigger problem, and handle any > lingering issues with a follow-up. > > Reproduced/tested with qemu-jeos and Ubuntu 12.10. > > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> > --- > hw/e1000.c | 5 ----- > 1 file changed, 5 deletions(-) Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Applied. Thanks. Regards, Anthony Liguori
diff --git a/hw/e1000.c b/hw/e1000.c index ef06ca1..563a58f 100644 --- a/hw/e1000.c +++ b/hw/e1000.c @@ -166,11 +166,6 @@ static void set_phy_ctrl(E1000State *s, int index, uint16_t val) { if ((val & MII_CR_AUTO_NEG_EN) && (val & MII_CR_RESTART_AUTO_NEG)) { - /* no need auto-negotiation if link was down */ - if (s->nic->nc.link_down) { - s->phy_reg[PHY_STATUS] |= MII_SR_AUTONEG_COMPLETE; - return; - } s->nic->nc.link_down = true; e1000_link_down(s); s->phy_reg[PHY_STATUS] &= ~MII_SR_AUTONEG_COMPLETE;
This reverts commit 84dd2120247a7d25ff1bb337de21c0e76816ad2d. I'm not sure what issue the original commit was meant to fix, or if the logic is actually wrong, but it causes e1000 to stop working after a guest issues a reset. From what I can tell a guest with an e1000 nic has no way of changing the link status, as far as it's NetClient peer is concerned, except in the auto-negotiation path, so with this patch in place there's no recovery after a reset, since the link goes down and stays that way. Revert this patch now to fix the bigger problem, and handle any lingering issues with a follow-up. Reproduced/tested with qemu-jeos and Ubuntu 12.10. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> --- hw/e1000.c | 5 ----- 1 file changed, 5 deletions(-)