diff mbox

[for-1.4] Revert "e1000: no need auto-negotiation if link was down"

Message ID 1359675832-11999-1-git-send-email-mdroth@linux.vnet.ibm.com
State New
Headers show

Commit Message

Michael Roth Jan. 31, 2013, 11:43 p.m. UTC
This reverts commit 84dd2120247a7d25ff1bb337de21c0e76816ad2d.

I'm not sure what issue the original commit was meant to fix, or if
the logic is actually wrong, but it causes e1000 to stop working
after a guest issues a reset.

From what I can tell a guest with an e1000 nic has no way of changing
the link status, as far as it's NetClient peer is concerned, except
in the auto-negotiation path, so with this patch in place there's no
recovery after a reset, since the link goes down and stays that way.

Revert this patch now to fix the bigger problem, and handle any
lingering issues with a follow-up.

Reproduced/tested with qemu-jeos and Ubuntu 12.10.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/e1000.c |    5 -----
 1 file changed, 5 deletions(-)

Comments

Amos Kong Feb. 1, 2013, 7:20 a.m. UTC | #1
On Thu, Jan 31, 2013 at 05:43:51PM -0600, Michael Roth wrote:
> This reverts commit 84dd2120247a7d25ff1bb337de21c0e76816ad2d.
> 
> I'm not sure what issue the original commit was meant to fix, or if
> the logic is actually wrong, but it causes e1000 to stop working
> after a guest issues a reset.

Hi Michael,

What's your test scenario?

I tried this test with current qemu code, link status is not reseted
to 'up' after step 3. Is it the problem you said?
This problem also exists with current virtio (existed in the past) /
rtl8139 (introduced in 83f58e570f21c3e7227e7fbef1fc0e18b5ed7ea9)

1) boot a guest with e1000 nic
2) set link down in monitor
   (hmp) set_link e1000.0 down
3) reset guest by 'system_reset' in monitor
   (hmp) system_reset


My original patch is used to restore the link status after guest
reboot(execute 'reboot' insider guest system).
The link status should always be up after virtual 'hardware' reset
(execute 'system_reset' in monitor).

Thanks, Amos
 
> >From what I can tell a guest with an e1000 nic has no way of changing
> the link status, as far as it's NetClient peer is concerned, except
> in the auto-negotiation path, so with this patch in place there's no
> recovery after a reset, since the link goes down and stays that way.
> 
> Revert this patch now to fix the bigger problem, and handle any
> lingering issues with a follow-up.
> 
> Reproduced/tested with qemu-jeos and Ubuntu 12.10.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/e1000.c |    5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/hw/e1000.c b/hw/e1000.c
> index ef06ca1..563a58f 100644
> --- a/hw/e1000.c
> +++ b/hw/e1000.c
> @@ -166,11 +166,6 @@ static void
>  set_phy_ctrl(E1000State *s, int index, uint16_t val)
>  {
>      if ((val & MII_CR_AUTO_NEG_EN) && (val & MII_CR_RESTART_AUTO_NEG)) {
> -        /* no need auto-negotiation if link was down */
> -        if (s->nic->nc.link_down) {
> -            s->phy_reg[PHY_STATUS] |= MII_SR_AUTONEG_COMPLETE;
> -            return;
> -        }
>          s->nic->nc.link_down = true;
>          e1000_link_down(s);
>          s->phy_reg[PHY_STATUS] &= ~MII_SR_AUTONEG_COMPLETE;
> -- 
> 1.7.9.5
Amos Kong Feb. 1, 2013, 7:53 a.m. UTC | #2
On Fri, Feb 01, 2013 at 03:20:59PM +0800, Amos Kong wrote:
> On Thu, Jan 31, 2013 at 05:43:51PM -0600, Michael Roth wrote:
> > This reverts commit 84dd2120247a7d25ff1bb337de21c0e76816ad2d.
> > 
> > I'm not sure what issue the original commit was meant to fix, or if
> > the logic is actually wrong, but it causes e1000 to stop working
> > after a guest issues a reset.
> 
> Hi Michael,
> 
> What's your test scenario?
> 
> I tried this test with current qemu code, link status is not reseted
> to 'up' after step 3. Is it the problem you said?
> This problem also exists with current virtio (existed in the past) /
> rtl8139 (introduced in 83f58e570f21c3e7227e7fbef1fc0e18b5ed7ea9)
> 
> 1) boot a guest with e1000 nic
> 2) set link down in monitor
>    (hmp) set_link e1000.0 down
> 3) reset guest by 'system_reset' in monitor
>    (hmp) system_reset
> 
> 
> My original patch is used to restore the link status after guest
> reboot(execute 'reboot' insider guest system).

> The link status should always be up after virtual 'hardware' reset
> (execute 'system_reset' in monitor).

Is it expected?

When we reset the virtual system, do we need to reset the status
of simulation of network cable?
I think it's deciced by that if we think simulation of network
cable is a part of the virtual machine.

----
commit 436e5e53c97d8fb469306b18a0c31dc60f5e546c
Author: aliguori <aliguori@c046a42c-6fe2-441c-8c8c-71466251a162>
Date:   Thu Jan 8 19:44:06 2009 +0000

    Add 'set_link' monitor command (Mark McLoughlin)
    
    Add a monitor command to setting a given network device's link status
    to 'up' or 'down'.
    
    Allows simulation of network cable disconnect.
    
    Signed-off-by: Mark McLoughlin <markmc@redhat.com>
    Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Michael Roth Feb. 1, 2013, 2:29 p.m. UTC | #3
On Fri, Feb 01, 2013 at 03:20:59PM +0800, Amos Kong wrote:
> On Thu, Jan 31, 2013 at 05:43:51PM -0600, Michael Roth wrote:
> > This reverts commit 84dd2120247a7d25ff1bb337de21c0e76816ad2d.
> > 
> > I'm not sure what issue the original commit was meant to fix, or if
> > the logic is actually wrong, but it causes e1000 to stop working
> > after a guest issues a reset.
> 
> Hi Michael,
> 
> What's your test scenario?

Nothing special, I just started a guest with

    -net nic,model=e1000 -net user

or

    -net nic,model=e1000 -net tap

and networking stopped working (could not lease an IP, no outbound
traffic) after I rebooted.

> 
> I tried this test with current qemu code, link status is not reseted
> to 'up' after step 3. Is it the problem you said?

I think it's related, but I'm not so much concerned with the qmp-visible
link status changing as I am with the guest.

> This problem also exists with current virtio (existed in the past) /
> rtl8139 (introduced in 83f58e570f21c3e7227e7fbef1fc0e18b5ed7ea9)
> 
> 1) boot a guest with e1000 nic
> 2) set link down in monitor
>    (hmp) set_link e1000.0 down
> 3) reset guest by 'system_reset' in monitor
>    (hmp) system_reset
> 
> 
> My original patch is used to restore the link status after guest
> reboot(execute 'reboot' insider guest system).
> The link status should always be up after virtual 'hardware' reset
> (execute 'system_reset' in monitor).

You sure you don't have that backwards? It seems to me that your
original patch was meant to *prevent* the link status from changing
after a system reset, which makes sense from the perspective of a
qmp-issued "set_link down" meaning "unplug the cable".

> 
> Thanks, Amos
> 
> > >From what I can tell a guest with an e1000 nic has no way of changing
> > the link status, as far as it's NetClient peer is concerned, except
> > in the auto-negotiation path, so with this patch in place there's no
> > recovery after a reset, since the link goes down and stays that way.
> > 
> > Revert this patch now to fix the bigger problem, and handle any
> > lingering issues with a follow-up.
> > 
> > Reproduced/tested with qemu-jeos and Ubuntu 12.10.
> > 
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/e1000.c |    5 -----
> >  1 file changed, 5 deletions(-)
> > 
> > diff --git a/hw/e1000.c b/hw/e1000.c
> > index ef06ca1..563a58f 100644
> > --- a/hw/e1000.c
> > +++ b/hw/e1000.c
> > @@ -166,11 +166,6 @@ static void
> >  set_phy_ctrl(E1000State *s, int index, uint16_t val)
> >  {
> >      if ((val & MII_CR_AUTO_NEG_EN) && (val & MII_CR_RESTART_AUTO_NEG)) {
> > -        /* no need auto-negotiation if link was down */
> > -        if (s->nic->nc.link_down) {
> > -            s->phy_reg[PHY_STATUS] |= MII_SR_AUTONEG_COMPLETE;
> > -            return;
> > -        }
> >          s->nic->nc.link_down = true;
> >          e1000_link_down(s);
> >          s->phy_reg[PHY_STATUS] &= ~MII_SR_AUTONEG_COMPLETE;
> > -- 
> > 1.7.9.5
>
Michael Roth Feb. 1, 2013, 2:49 p.m. UTC | #4
On Fri, Feb 01, 2013 at 03:53:22PM +0800, Amos Kong wrote:
> On Fri, Feb 01, 2013 at 03:20:59PM +0800, Amos Kong wrote:
> > On Thu, Jan 31, 2013 at 05:43:51PM -0600, Michael Roth wrote:
> > > This reverts commit 84dd2120247a7d25ff1bb337de21c0e76816ad2d.
> > > 
> > > I'm not sure what issue the original commit was meant to fix, or if
> > > the logic is actually wrong, but it causes e1000 to stop working
> > > after a guest issues a reset.
> > 
> > Hi Michael,
> > 
> > What's your test scenario?
> > 
> > I tried this test with current qemu code, link status is not reseted
> > to 'up' after step 3. Is it the problem you said?
> > This problem also exists with current virtio (existed in the past) /
> > rtl8139 (introduced in 83f58e570f21c3e7227e7fbef1fc0e18b5ed7ea9)
> > 
> > 1) boot a guest with e1000 nic
> > 2) set link down in monitor
> >    (hmp) set_link e1000.0 down
> > 3) reset guest by 'system_reset' in monitor
> >    (hmp) system_reset
> > 
> > 
> > My original patch is used to restore the link status after guest
> > reboot(execute 'reboot' insider guest system).
> 
> > The link status should always be up after virtual 'hardware' reset
> > (execute 'system_reset' in monitor).
> 
> Is it expected?
> 
> When we reset the virtual system, do we need to reset the status
> of simulation of network cable?

I don't so. If we "unplug the cable", it should stay unplugged when we
reset the machine.

When we reset the machine, we should set the NIC's link status in
accordance with it's NetClient peer. So if that's what your patch was
attempting to enforce I think we're in agreement there.

I think the problem with your original patch is that it doesn't check
the link status of the nic's NetClient peer, but instead checks it's
own internal NetClient state, so we end up trying to enforce "leave the
cable unplugged" even when "set_link down" was never issued.

And if that's the case I think you can re-spin your original patch for
1.4 accordingly. But we definitely need to revert the current one ASAP
since it breaks all QEMU guests that are started using the default nic.

> I think it's deciced by that if we think simulation of network
> cable is a part of the virtual machine.
> 
> ----
> commit 436e5e53c97d8fb469306b18a0c31dc60f5e546c
> Author: aliguori <aliguori@c046a42c-6fe2-441c-8c8c-71466251a162>
> Date:   Thu Jan 8 19:44:06 2009 +0000
> 
>     Add 'set_link' monitor command (Mark McLoughlin)
>     
>     Add a monitor command to setting a given network device's link status
>     to 'up' or 'down'.
>     
>     Allows simulation of network cable disconnect.
>     
>     Signed-off-by: Mark McLoughlin <markmc@redhat.com>
>     Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
>
Anthony Liguori Feb. 1, 2013, 2:56 p.m. UTC | #5
Michael Roth <mdroth@linux.vnet.ibm.com> writes:

> This reverts commit 84dd2120247a7d25ff1bb337de21c0e76816ad2d.
>
> I'm not sure what issue the original commit was meant to fix, or if
> the logic is actually wrong, but it causes e1000 to stop working
> after a guest issues a reset.
>
>>From what I can tell a guest with an e1000 nic has no way of changing
> the link status, as far as it's NetClient peer is concerned, except
> in the auto-negotiation path, so with this patch in place there's no
> recovery after a reset, since the link goes down and stays that way.
>
> Revert this patch now to fix the bigger problem, and handle any
> lingering issues with a follow-up.
>
> Reproduced/tested with qemu-jeos and Ubuntu 12.10.
>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/e1000.c |    5 -----
>  1 file changed, 5 deletions(-)
>
> diff --git a/hw/e1000.c b/hw/e1000.c
> index ef06ca1..563a58f 100644
> --- a/hw/e1000.c
> +++ b/hw/e1000.c
> @@ -166,11 +166,6 @@ static void
>  set_phy_ctrl(E1000State *s, int index, uint16_t val)
>  {
>      if ((val & MII_CR_AUTO_NEG_EN) && (val & MII_CR_RESTART_AUTO_NEG)) {
> -        /* no need auto-negotiation if link was down */
> -        if (s->nic->nc.link_down) {
> -            s->phy_reg[PHY_STATUS] |= MII_SR_AUTONEG_COMPLETE;
> -            return;
> -        }

The problem with this patch is that it skips autonegotiate if the link
is down however it doesn't take reset into account.

Consider if you reset the guest during autonegotiation.  The link is down
but it's not really down--the guest is in the process of bringing it
back up.

But since it was down before reset, we won't let autonegotiation happen
again.

We shouldn't use nc.link_down state to indicate this.  We should use
another variable that we can clear during reset.

I'm going to take this revert since it fixes a serious problem
(networking doesn't work after a reboot) at the cost of a less serious
problem (bring the link up if it was previously set to be down).  But I
hope we can get a proper fix during the -rc cycle.

Regards,

Anthony Liguori

>          s->nic->nc.link_down = true;
>          e1000_link_down(s);
>          s->phy_reg[PHY_STATUS] &= ~MII_SR_AUTONEG_COMPLETE;
> -- 
> 1.7.9.5
Amos Kong Feb. 1, 2013, 3:52 p.m. UTC | #6
On Fri, Feb 01, 2013 at 08:49:00AM -0600, mdroth wrote:
> On Fri, Feb 01, 2013 at 03:53:22PM +0800, Amos Kong wrote:
> > On Fri, Feb 01, 2013 at 03:20:59PM +0800, Amos Kong wrote:
> > > On Thu, Jan 31, 2013 at 05:43:51PM -0600, Michael Roth wrote:
> > > > This reverts commit 84dd2120247a7d25ff1bb337de21c0e76816ad2d.
> > > > 
> > > > I'm not sure what issue the original commit was meant to fix, or if
> > > > the logic is actually wrong, but it causes e1000 to stop working
> > > > after a guest issues a reset.
> > > 
> > > Hi Michael,
> > > 
> > > What's your test scenario?
> > > 
> > > I tried this test with current qemu code, link status is not reseted
> > > to 'up' after step 3. Is it the problem you said?
> > > This problem also exists with current virtio (existed in the past) /
> > > rtl8139 (introduced in 83f58e570f21c3e7227e7fbef1fc0e18b5ed7ea9)
> > > 
> > > 1) boot a guest with e1000 nic
> > > 2) set link down in monitor
> > >    (hmp) set_link e1000.0 down
> > > 3) reset guest by 'system_reset' in monitor
> > >    (hmp) system_reset
> > > 
> > > 
> > > My original patch is used to restore the link status after guest
> > > reboot(execute 'reboot' insider guest system).
> > 
> > > The link status should always be up after virtual 'hardware' reset
> > > (execute 'system_reset' in monitor).
> > 
> > Is it expected?
> > 
> > When we reset the virtual system, do we need to reset the status
> > of simulation of network cable?
> 
> I don't so. If we "unplug the cable", it should stay unplugged when we
> reset the machine.

Ok.
 
> When we reset the machine, we should set the NIC's link status in
> accordance with it's NetClient peer. So if that's what your patch was
> attempting to enforce I think we're in agreement there.
> 
> I think the problem with your original patch is that it doesn't check
> the link status of the nic's NetClient peer, but instead checks it's
> own internal NetClient state, so we end up trying to enforce "leave the
> cable unplugged" even when "set_link down" was never issued.
> 
> And if that's the case I think you can re-spin your original patch for
> 1.4 accordingly. But we definitely need to revert the current one ASAP
> since it breaks all QEMU guests that are started using the default nic.

Agree, thanks.
Amos Kong Feb. 1, 2013, 3:56 p.m. UTC | #7
On Fri, Feb 01, 2013 at 08:29:48AM -0600, mdroth wrote:
> On Fri, Feb 01, 2013 at 03:20:59PM +0800, Amos Kong wrote:
> > On Thu, Jan 31, 2013 at 05:43:51PM -0600, Michael Roth wrote:
> > > This reverts commit 84dd2120247a7d25ff1bb337de21c0e76816ad2d.
> > > 
> > > I'm not sure what issue the original commit was meant to fix, or if
> > > the logic is actually wrong, but it causes e1000 to stop working
> > > after a guest issues a reset.
> > 
> > Hi Michael,
> > 
> > What's your test scenario?
> 
> Nothing special, I just started a guest with
> 
>     -net nic,model=e1000 -net user
> 
> or
> 
>     -net nic,model=e1000 -net tap
> 
> and networking stopped working (could not lease an IP, no outbound
> traffic) after I rebooted.
 
I can reproduce this problem sometimes.

> > I tried this test with current qemu code, link status is not reseted
> > to 'up' after step 3. Is it the problem you said?
> 
> I think it's related, but I'm not so much concerned with the qmp-visible
> link status changing as I am with the guest.
> 
> > This problem also exists with current virtio (existed in the past) /
> > rtl8139 (introduced in 83f58e570f21c3e7227e7fbef1fc0e18b5ed7ea9)
> > 
> > 1) boot a guest with e1000 nic
> > 2) set link down in monitor
> >    (hmp) set_link e1000.0 down
> > 3) reset guest by 'system_reset' in monitor
> >    (hmp) system_reset
> > 
> > 
> > My original patch is used to restore the link status after guest
> > reboot(execute 'reboot' insider guest system).
> > The link status should always be up after virtual 'hardware' reset
> > (execute 'system_reset' in monitor).
> 
> You sure you don't have that backwards? It seems to me that your
> original patch was meant to *prevent* the link status from changing
> after a system reset, which makes sense from the perspective of a
> qmp-issued "set_link down" meaning "unplug the cable".

I was confused after see your revert patch, now I know the real
problem, and it's right to prevent down link status in 'system_reset'.
Michael S. Tsirkin Feb. 4, 2013, 8:06 a.m. UTC | #8
On Thu, Jan 31, 2013 at 05:43:51PM -0600, Michael Roth wrote:
> This reverts commit 84dd2120247a7d25ff1bb337de21c0e76816ad2d.
> 
> I'm not sure what issue the original commit was meant to fix, or if
> the logic is actually wrong, but it causes e1000 to stop working
> after a guest issues a reset.
> 
> >From what I can tell a guest with an e1000 nic has no way of changing
> the link status, as far as it's NetClient peer is concerned, except
> in the auto-negotiation path, so with this patch in place there's no
> recovery after a reset, since the link goes down and stays that way.
> 
> Revert this patch now to fix the bigger problem, and handle any
> lingering issues with a follow-up.
> 
> Reproduced/tested with qemu-jeos and Ubuntu 12.10.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

So from discussion on list, it seems clear the proper
fix will involve adding more state, which likely will
need to be migrated as well.
Seems like revert is the best we can do for 1.4.

Acked-by: Michael S. Tsirkin <mst@redhat.com>


> ---
>  hw/e1000.c |    5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/hw/e1000.c b/hw/e1000.c
> index ef06ca1..563a58f 100644
> --- a/hw/e1000.c
> +++ b/hw/e1000.c
> @@ -166,11 +166,6 @@ static void
>  set_phy_ctrl(E1000State *s, int index, uint16_t val)
>  {
>      if ((val & MII_CR_AUTO_NEG_EN) && (val & MII_CR_RESTART_AUTO_NEG)) {
> -        /* no need auto-negotiation if link was down */
> -        if (s->nic->nc.link_down) {
> -            s->phy_reg[PHY_STATUS] |= MII_SR_AUTONEG_COMPLETE;
> -            return;
> -        }
>          s->nic->nc.link_down = true;
>          e1000_link_down(s);
>          s->phy_reg[PHY_STATUS] &= ~MII_SR_AUTONEG_COMPLETE;
> -- 
> 1.7.9.5
>
Stefan Hajnoczi Feb. 4, 2013, 9:32 a.m. UTC | #9
On Thu, Jan 31, 2013 at 05:43:51PM -0600, Michael Roth wrote:
> This reverts commit 84dd2120247a7d25ff1bb337de21c0e76816ad2d.
> 
> I'm not sure what issue the original commit was meant to fix, or if
> the logic is actually wrong, but it causes e1000 to stop working
> after a guest issues a reset.
> 
> From what I can tell a guest with an e1000 nic has no way of changing
> the link status, as far as it's NetClient peer is concerned, except
> in the auto-negotiation path, so with this patch in place there's no
> recovery after a reset, since the link goes down and stays that way.
> 
> Revert this patch now to fix the bigger problem, and handle any
> lingering issues with a follow-up.
> 
> Reproduced/tested with qemu-jeos and Ubuntu 12.10.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/e1000.c |    5 -----
>  1 file changed, 5 deletions(-)

Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Anthony Liguori Feb. 4, 2013, 10:53 p.m. UTC | #10
Applied.  Thanks.

Regards,

Anthony Liguori
diff mbox

Patch

diff --git a/hw/e1000.c b/hw/e1000.c
index ef06ca1..563a58f 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -166,11 +166,6 @@  static void
 set_phy_ctrl(E1000State *s, int index, uint16_t val)
 {
     if ((val & MII_CR_AUTO_NEG_EN) && (val & MII_CR_RESTART_AUTO_NEG)) {
-        /* no need auto-negotiation if link was down */
-        if (s->nic->nc.link_down) {
-            s->phy_reg[PHY_STATUS] |= MII_SR_AUTONEG_COMPLETE;
-            return;
-        }
         s->nic->nc.link_down = true;
         e1000_link_down(s);
         s->phy_reg[PHY_STATUS] &= ~MII_SR_AUTONEG_COMPLETE;