diff mbox

[RFC,v1,2/2] e1000: adjust initial autoneg timing (for piix/osx)

Message ID 1404147350-28904-3-git-send-email-somlo@cmu.edu
State New
Headers show

Commit Message

Gabriel L. Somlo June 30, 2014, 4:55 p.m. UTC
When running on PIIX (as opposed to q35), the stock OS X e1000
driver (AppleIntel8254XEthernet.kext) takes longer to load and
activete, and will "miss" the link status change interrupt
injected when the emulated "hardware" autonegotiation completes
(see commit 39bb8ee737595e9b264d075dfcd7d86f4d3f1133).

This patch extends the delay of the autonetotiation timer set up
during set_phy_ctrl() to a value just large enough to work with
the OS X driver.

Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
---

So, the loading OS X driver must take longer between its last
write to the PHY_CTRL register and the time it starts looking
for LSC interrupts, because at delay==500 it obviously misses
the relevant interrupt. Making this 5500 (actually anything
larger than 5300, but there's a bit of variation across OS X
versions, so I rounded up a bit) has the timer fire after
enough time has passed that the driver knows what to do when
the interrupt from the network card fires...

Thanks,
  Gabriel

 hw/net/e1000.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Michael S. Tsirkin June 30, 2014, 5:55 p.m. UTC | #1
On Mon, Jun 30, 2014 at 12:55:50PM -0400, Gabriel L. Somlo wrote:
> When running on PIIX (as opposed to q35), the stock OS X e1000
> driver (AppleIntel8254XEthernet.kext) takes longer to load and
> activete, and will "miss" the link status change interrupt
> injected when the emulated "hardware" autonegotiation completes
> (see commit 39bb8ee737595e9b264d075dfcd7d86f4d3f1133).
> 
> This patch extends the delay of the autonetotiation timer set up
> during set_phy_ctrl() to a value just large enough to work with
> the OS X driver.
> 
> Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
> ---
> 
> So, the loading OS X driver must take longer between its last
> write to the PHY_CTRL register and the time it starts looking
> for LSC interrupts, because at delay==500 it obviously misses
> the relevant interrupt. Making this 5500 (actually anything
> larger than 5300, but there's a bit of variation across OS X
> versions, so I rounded up a bit) has the timer fire after
> enough time has passed that the driver knows what to do when
> the interrupt from the network card fires...
> 
> Thanks,
>   Gabriel
> 
>  hw/net/e1000.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
> index 2376910..2300477 100644
> --- a/hw/net/e1000.c
> +++ b/hw/net/e1000.c
> @@ -209,7 +209,7 @@ set_phy_ctrl(E1000State *s, int index, uint16_t val)
>          e1000_link_down(s);
>          DBGOUT(PHY, "Start link auto negotiation\n");
>          timer_mod(s->autoneg_timer,
> -                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 500);
> +                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 5500);
>      }
>  }


Besides being a bit hacky, it actually has a decent chance
to delay boot for guests. 500ms is probably the max we
can reasonably tolerate, even that is a bit high.

> -- 
> 1.9.3
Paolo Bonzini June 30, 2014, 6 p.m. UTC | #2
Il 30/06/2014 19:55, Michael S. Tsirkin ha scritto:
> On Mon, Jun 30, 2014 at 12:55:50PM -0400, Gabriel L. Somlo wrote:
>> When running on PIIX (as opposed to q35), the stock OS X e1000
>> driver (AppleIntel8254XEthernet.kext) takes longer to load and
>> activete, and will "miss" the link status change interrupt
>> injected when the emulated "hardware" autonegotiation completes
>> (see commit 39bb8ee737595e9b264d075dfcd7d86f4d3f1133).
>>
>> This patch extends the delay of the autonetotiation timer set up
>> during set_phy_ctrl() to a value just large enough to work with
>> the OS X driver.
>>
>> Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
>> ---
>>
>> So, the loading OS X driver must take longer between its last
>> write to the PHY_CTRL register and the time it starts looking
>> for LSC interrupts, because at delay==500 it obviously misses
>> the relevant interrupt. Making this 5500 (actually anything
>> larger than 5300, but there's a bit of variation across OS X
>> versions, so I rounded up a bit) has the timer fire after
>> enough time has passed that the driver knows what to do when
>> the interrupt from the network card fires...
>>
>> Thanks,
>>   Gabriel
>>
>>  hw/net/e1000.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
>> index 2376910..2300477 100644
>> --- a/hw/net/e1000.c
>> +++ b/hw/net/e1000.c
>> @@ -209,7 +209,7 @@ set_phy_ctrl(E1000State *s, int index, uint16_t val)
>>          e1000_link_down(s);
>>          DBGOUT(PHY, "Start link auto negotiation\n");
>>          timer_mod(s->autoneg_timer,
>> -                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 500);
>> +                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 5500);
>>      }
>>  }
>
>
> Besides being a bit hacky, it actually has a decent chance
> to delay boot for guests. 500ms is probably the max we
> can reasonably tolerate, even that is a bit high.

It can be turned into a property though.

Paolo
Alexander Graf June 30, 2014, 6:21 p.m. UTC | #3
> Am 30.06.2014 um 19:55 schrieb "Michael S. Tsirkin" <mst@redhat.com>:
> 
>> On Mon, Jun 30, 2014 at 12:55:50PM -0400, Gabriel L. Somlo wrote:
>> When running on PIIX (as opposed to q35), the stock OS X e1000
>> driver (AppleIntel8254XEthernet.kext) takes longer to load and
>> activete, and will "miss" the link status change interrupt
>> injected when the emulated "hardware" autonegotiation completes
>> (see commit 39bb8ee737595e9b264d075dfcd7d86f4d3f1133).
>> 
>> This patch extends the delay of the autonetotiation timer set up
>> during set_phy_ctrl() to a value just large enough to work with
>> the OS X driver.
>> 
>> Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
>> ---
>> 
>> So, the loading OS X driver must take longer between its last
>> write to the PHY_CTRL register and the time it starts looking
>> for LSC interrupts, because at delay==500 it obviously misses
>> the relevant interrupt. Making this 5500 (actually anything
>> larger than 5300, but there's a bit of variation across OS X
>> versions, so I rounded up a bit) has the timer fire after
>> enough time has passed that the driver knows what to do when
>> the interrupt from the network card fires...
>> 
>> Thanks,
>>  Gabriel
>> 
>> hw/net/e1000.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
>> index 2376910..2300477 100644
>> --- a/hw/net/e1000.c
>> +++ b/hw/net/e1000.c
>> @@ -209,7 +209,7 @@ set_phy_ctrl(E1000State *s, int index, uint16_t val)
>>         e1000_link_down(s);
>>         DBGOUT(PHY, "Start link auto negotiation\n");
>>         timer_mod(s->autoneg_timer,
>> -                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 500);
>> +                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 5500);
>>     }
>> }
> 
> 
> Besides being a bit hacky, it actually has a decent chance
> to delay boot for guests. 500ms is probably the max we
> can reasonably tolerate, even that is a bit high.

Are you sure there's not just simply some irq unmasking event after 5500ms we don't handle properly?

Alex

> 
>> -- 
>> 1.9.3
Gabriel L. Somlo July 2, 2014, 9:02 a.m. UTC | #4
On Mon, Jun 30, 2014 at 08:21:43PM +0200, Alexander Graf wrote:
> 
> 
> > Am 30.06.2014 um 19:55 schrieb "Michael S. Tsirkin" <mst@redhat.com>:
> > 
> >> On Mon, Jun 30, 2014 at 12:55:50PM -0400, Gabriel L. Somlo wrote:
> >> When running on PIIX (as opposed to q35), the stock OS X e1000
> >> driver (AppleIntel8254XEthernet.kext) takes longer to load and
> >> activete, and will "miss" the link status change interrupt
> >> injected when the emulated "hardware" autonegotiation completes
> >> (see commit 39bb8ee737595e9b264d075dfcd7d86f4d3f1133).
> >> 
> >> This patch extends the delay of the autonetotiation timer set up
> >> during set_phy_ctrl() to a value just large enough to work with
> >> the OS X driver.
> >> 
> >> Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
> >> ---
> >> 
> >> So, the loading OS X driver must take longer between its last
> >> write to the PHY_CTRL register and the time it starts looking
> >> for LSC interrupts, because at delay==500 it obviously misses
> >> the relevant interrupt. Making this 5500 (actually anything
> >> larger than 5300, but there's a bit of variation across OS X
> >> versions, so I rounded up a bit) has the timer fire after
> >> enough time has passed that the driver knows what to do when
> >> the interrupt from the network card fires...
> >> 
> >> Thanks,
> >>  Gabriel
> >> 
> >> hw/net/e1000.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >> 
> >> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
> >> index 2376910..2300477 100644
> >> --- a/hw/net/e1000.c
> >> +++ b/hw/net/e1000.c
> >> @@ -209,7 +209,7 @@ set_phy_ctrl(E1000State *s, int index, uint16_t val)
> >>         e1000_link_down(s);
> >>         DBGOUT(PHY, "Start link auto negotiation\n");
> >>         timer_mod(s->autoneg_timer,
> >> -                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 500);
> >> +                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 5500);
> >>     }
> >> }
> > 
> > 
> > Besides being a bit hacky, it actually has a decent chance
> > to delay boot for guests. 500ms is probably the max we
> > can reasonably tolerate, even that is a bit high.
> 
> Are you sure there's not just simply some irq unmasking event after 5500ms we don't handle properly?

I poked around a bit, and the e1000 interrupt mask register is NOT the
problem (the LSC mask bit is clear at all times). If anything, maybe
the PIIX southbridge (or something further up "north") is masking PCI
interrupts (at least from e1000) until roughly 5500 ms into the boot
process ? Any ideas on how I could go about verifying this (without
access to the guest source, obviously :) ) would be very helpful...

Thanks,
--Gabriel
Alexander Graf July 2, 2014, 9:16 a.m. UTC | #5
On 02.07.14 11:02, Gabriel L. Somlo wrote:
> On Mon, Jun 30, 2014 at 08:21:43PM +0200, Alexander Graf wrote:
>>
>>> Am 30.06.2014 um 19:55 schrieb "Michael S. Tsirkin" <mst@redhat.com>:
>>>
>>>> On Mon, Jun 30, 2014 at 12:55:50PM -0400, Gabriel L. Somlo wrote:
>>>> When running on PIIX (as opposed to q35), the stock OS X e1000
>>>> driver (AppleIntel8254XEthernet.kext) takes longer to load and
>>>> activete, and will "miss" the link status change interrupt
>>>> injected when the emulated "hardware" autonegotiation completes
>>>> (see commit 39bb8ee737595e9b264d075dfcd7d86f4d3f1133).
>>>>
>>>> This patch extends the delay of the autonetotiation timer set up
>>>> during set_phy_ctrl() to a value just large enough to work with
>>>> the OS X driver.
>>>>
>>>> Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
>>>> ---
>>>>
>>>> So, the loading OS X driver must take longer between its last
>>>> write to the PHY_CTRL register and the time it starts looking
>>>> for LSC interrupts, because at delay==500 it obviously misses
>>>> the relevant interrupt. Making this 5500 (actually anything
>>>> larger than 5300, but there's a bit of variation across OS X
>>>> versions, so I rounded up a bit) has the timer fire after
>>>> enough time has passed that the driver knows what to do when
>>>> the interrupt from the network card fires...
>>>>
>>>> Thanks,
>>>>   Gabriel
>>>>
>>>> hw/net/e1000.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
>>>> index 2376910..2300477 100644
>>>> --- a/hw/net/e1000.c
>>>> +++ b/hw/net/e1000.c
>>>> @@ -209,7 +209,7 @@ set_phy_ctrl(E1000State *s, int index, uint16_t val)
>>>>          e1000_link_down(s);
>>>>          DBGOUT(PHY, "Start link auto negotiation\n");
>>>>          timer_mod(s->autoneg_timer,
>>>> -                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 500);
>>>> +                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 5500);
>>>>      }
>>>> }
>>>
>>> Besides being a bit hacky, it actually has a decent chance
>>> to delay boot for guests. 500ms is probably the max we
>>> can reasonably tolerate, even that is a bit high.
>> Are you sure there's not just simply some irq unmasking event after 5500ms we don't handle properly?
> I poked around a bit, and the e1000 interrupt mask register is NOT the
> problem (the LSC mask bit is clear at all times). If anything, maybe
> the PIIX southbridge (or something further up "north") is masking PCI
> interrupts (at least from e1000) until roughly 5500 ms into the boot
> process ? Any ideas on how I could go about verifying this (without
> access to the guest source, obviously :) ) would be very helpful...

Yeah, maybe the interrupt is masked and doesn't get delivered properly? 
See if you can trace when the e1000 emulation starts kicking an 
interrupt and when the guest tries to fetch it (there should be an ack 
register for IRQs somewhere).

If we kick it but the guest doesn't react, the problem is further down - 
check whether the IRQ ever got injected into the guest with trace points.

If we don't kick it, we mask it somewhere in the e1000 emulation and 
need to make sure we do kick once we unmask :). I don't know whether the 
LSC mask is the only one involved.


Alex
Michael S. Tsirkin July 2, 2014, 9:33 a.m. UTC | #6
On Wed, Jul 02, 2014 at 05:02:07AM -0400, Gabriel L. Somlo wrote:
> On Mon, Jun 30, 2014 at 08:21:43PM +0200, Alexander Graf wrote:
> > 
> > 
> > > Am 30.06.2014 um 19:55 schrieb "Michael S. Tsirkin" <mst@redhat.com>:
> > > 
> > >> On Mon, Jun 30, 2014 at 12:55:50PM -0400, Gabriel L. Somlo wrote:
> > >> When running on PIIX (as opposed to q35), the stock OS X e1000
> > >> driver (AppleIntel8254XEthernet.kext) takes longer to load and
> > >> activete, and will "miss" the link status change interrupt
> > >> injected when the emulated "hardware" autonegotiation completes
> > >> (see commit 39bb8ee737595e9b264d075dfcd7d86f4d3f1133).
> > >> 
> > >> This patch extends the delay of the autonetotiation timer set up
> > >> during set_phy_ctrl() to a value just large enough to work with
> > >> the OS X driver.
> > >> 
> > >> Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
> > >> ---
> > >> 
> > >> So, the loading OS X driver must take longer between its last
> > >> write to the PHY_CTRL register and the time it starts looking
> > >> for LSC interrupts, because at delay==500 it obviously misses
> > >> the relevant interrupt. Making this 5500 (actually anything
> > >> larger than 5300, but there's a bit of variation across OS X
> > >> versions, so I rounded up a bit) has the timer fire after
> > >> enough time has passed that the driver knows what to do when
> > >> the interrupt from the network card fires...
> > >> 
> > >> Thanks,
> > >>  Gabriel
> > >> 
> > >> hw/net/e1000.c | 2 +-
> > >> 1 file changed, 1 insertion(+), 1 deletion(-)
> > >> 
> > >> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
> > >> index 2376910..2300477 100644
> > >> --- a/hw/net/e1000.c
> > >> +++ b/hw/net/e1000.c
> > >> @@ -209,7 +209,7 @@ set_phy_ctrl(E1000State *s, int index, uint16_t val)
> > >>         e1000_link_down(s);
> > >>         DBGOUT(PHY, "Start link auto negotiation\n");
> > >>         timer_mod(s->autoneg_timer,
> > >> -                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 500);
> > >> +                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 5500);
> > >>     }
> > >> }
> > > 
> > > 
> > > Besides being a bit hacky, it actually has a decent chance
> > > to delay boot for guests. 500ms is probably the max we
> > > can reasonably tolerate, even that is a bit high.
> > 
> > Are you sure there's not just simply some irq unmasking event after 5500ms we don't handle properly?
> 
> I poked around a bit, and the e1000 interrupt mask register is NOT the
> problem (the LSC mask bit is clear at all times). If anything, maybe
> the PIIX southbridge (or something further up "north") is masking PCI
> interrupts (at least from e1000) until roughly 5500 ms into the boot
> process ? Any ideas on how I could go about verifying this (without
> access to the guest source, obviously :) ) would be very helpful...
> 
> Thanks,
> --Gabriel


Just poking around the spec I find more things
we don't implement correctly wrt to auto-negotiation.
For example, MII_SR_AUTONEG_CAPS isn't set, is it?
Maybe that's why your guest doesn't work:
it doesn't expect to get autonegotation at all?

So I have a question: does your patch actually help any guests?
If not, maybe we should defer it to after release,
and try to clean up autonegotiation more thouroughly for 2.2?
Gabriel L. Somlo July 2, 2014, 12:05 p.m. UTC | #7
On Wed, Jul 02, 2014 at 12:33:16PM +0300, Michael S. Tsirkin wrote:
> On Wed, Jul 02, 2014 at 05:02:07AM -0400, Gabriel L. Somlo wrote:
> > On Mon, Jun 30, 2014 at 08:21:43PM +0200, Alexander Graf wrote:
> > > 
> > > 
> > > > Am 30.06.2014 um 19:55 schrieb "Michael S. Tsirkin" <mst@redhat.com>:
> > > > 
> > > >> On Mon, Jun 30, 2014 at 12:55:50PM -0400, Gabriel L. Somlo wrote:
> > > >> When running on PIIX (as opposed to q35), the stock OS X e1000
> > > >> driver (AppleIntel8254XEthernet.kext) takes longer to load and
> > > >> activete, and will "miss" the link status change interrupt
> > > >> injected when the emulated "hardware" autonegotiation completes
> > > >> (see commit 39bb8ee737595e9b264d075dfcd7d86f4d3f1133).
> > > >> 
> > > >> This patch extends the delay of the autonetotiation timer set up
> > > >> during set_phy_ctrl() to a value just large enough to work with
> > > >> the OS X driver.
> > > >> 
> > > >> Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
> > > >> ---
> > > >> 
> > > >> So, the loading OS X driver must take longer between its last
> > > >> write to the PHY_CTRL register and the time it starts looking
> > > >> for LSC interrupts, because at delay==500 it obviously misses
> > > >> the relevant interrupt. Making this 5500 (actually anything
> > > >> larger than 5300, but there's a bit of variation across OS X
> > > >> versions, so I rounded up a bit) has the timer fire after
> > > >> enough time has passed that the driver knows what to do when
> > > >> the interrupt from the network card fires...
> > > >> 
> > > >> Thanks,
> > > >>  Gabriel
> > > >> 
> > > >> hw/net/e1000.c | 2 +-
> > > >> 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >> 
> > > >> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
> > > >> index 2376910..2300477 100644
> > > >> --- a/hw/net/e1000.c
> > > >> +++ b/hw/net/e1000.c
> > > >> @@ -209,7 +209,7 @@ set_phy_ctrl(E1000State *s, int index, uint16_t val)
> > > >>         e1000_link_down(s);
> > > >>         DBGOUT(PHY, "Start link auto negotiation\n");
> > > >>         timer_mod(s->autoneg_timer,
> > > >> -                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 500);
> > > >> +                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 5500);
> > > >>     }
> > > >> }
> > > > 
> > > > 
> > > > Besides being a bit hacky, it actually has a decent chance
> > > > to delay boot for guests. 500ms is probably the max we
> > > > can reasonably tolerate, even that is a bit high.
> > > 
> > > Are you sure there's not just simply some irq unmasking event after 5500ms we don't handle properly?
> > 
> > I poked around a bit, and the e1000 interrupt mask register is NOT the
> > problem (the LSC mask bit is clear at all times). If anything, maybe
> > the PIIX southbridge (or something further up "north") is masking PCI
> > interrupts (at least from e1000) until roughly 5500 ms into the boot
> > process ? Any ideas on how I could go about verifying this (without
> > access to the guest source, obviously :) ) would be very helpful...
> > 
> > Thanks,
> > --Gabriel
> 
> 
> Just poking around the spec I find more things
> we don't implement correctly wrt to auto-negotiation.
> For example, MII_SR_AUTONEG_CAPS isn't set, is it?
> Maybe that's why your guest doesn't work:
> it doesn't expect to get autonegotation at all?
> 
> So I have a question: does your patch actually help any guests?
> If not, maybe we should defer it to after release,
> and try to clean up autonegotiation more thouroughly for 2.2?

The patch you just accepted (handling self clearing + reserved bits
for phy_ctrl) was just "collateral cleanup" in my attempts to fix
the suspected interrupt masking issue on PIIX. As such, either now
or waiting for 2.2 is fine with me.

Not sure I'll be able to figure out the real/main issue while 2.1
bugfixes are still being accepted, but *that* would actually be really
nice :)

Thanks,
--Gabriel
Michael S. Tsirkin July 2, 2014, 12:09 p.m. UTC | #8
On Wed, Jul 02, 2014 at 08:05:46AM -0400, Gabriel L. Somlo wrote:
> On Wed, Jul 02, 2014 at 12:33:16PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Jul 02, 2014 at 05:02:07AM -0400, Gabriel L. Somlo wrote:
> > > On Mon, Jun 30, 2014 at 08:21:43PM +0200, Alexander Graf wrote:
> > > > 
> > > > 
> > > > > Am 30.06.2014 um 19:55 schrieb "Michael S. Tsirkin" <mst@redhat.com>:
> > > > > 
> > > > >> On Mon, Jun 30, 2014 at 12:55:50PM -0400, Gabriel L. Somlo wrote:
> > > > >> When running on PIIX (as opposed to q35), the stock OS X e1000
> > > > >> driver (AppleIntel8254XEthernet.kext) takes longer to load and
> > > > >> activete, and will "miss" the link status change interrupt
> > > > >> injected when the emulated "hardware" autonegotiation completes
> > > > >> (see commit 39bb8ee737595e9b264d075dfcd7d86f4d3f1133).
> > > > >> 
> > > > >> This patch extends the delay of the autonetotiation timer set up
> > > > >> during set_phy_ctrl() to a value just large enough to work with
> > > > >> the OS X driver.
> > > > >> 
> > > > >> Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
> > > > >> ---
> > > > >> 
> > > > >> So, the loading OS X driver must take longer between its last
> > > > >> write to the PHY_CTRL register and the time it starts looking
> > > > >> for LSC interrupts, because at delay==500 it obviously misses
> > > > >> the relevant interrupt. Making this 5500 (actually anything
> > > > >> larger than 5300, but there's a bit of variation across OS X
> > > > >> versions, so I rounded up a bit) has the timer fire after
> > > > >> enough time has passed that the driver knows what to do when
> > > > >> the interrupt from the network card fires...
> > > > >> 
> > > > >> Thanks,
> > > > >>  Gabriel
> > > > >> 
> > > > >> hw/net/e1000.c | 2 +-
> > > > >> 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >> 
> > > > >> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
> > > > >> index 2376910..2300477 100644
> > > > >> --- a/hw/net/e1000.c
> > > > >> +++ b/hw/net/e1000.c
> > > > >> @@ -209,7 +209,7 @@ set_phy_ctrl(E1000State *s, int index, uint16_t val)
> > > > >>         e1000_link_down(s);
> > > > >>         DBGOUT(PHY, "Start link auto negotiation\n");
> > > > >>         timer_mod(s->autoneg_timer,
> > > > >> -                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 500);
> > > > >> +                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 5500);
> > > > >>     }
> > > > >> }
> > > > > 
> > > > > 
> > > > > Besides being a bit hacky, it actually has a decent chance
> > > > > to delay boot for guests. 500ms is probably the max we
> > > > > can reasonably tolerate, even that is a bit high.
> > > > 
> > > > Are you sure there's not just simply some irq unmasking event after 5500ms we don't handle properly?
> > > 
> > > I poked around a bit, and the e1000 interrupt mask register is NOT the
> > > problem (the LSC mask bit is clear at all times). If anything, maybe
> > > the PIIX southbridge (or something further up "north") is masking PCI
> > > interrupts (at least from e1000) until roughly 5500 ms into the boot
> > > process ? Any ideas on how I could go about verifying this (without
> > > access to the guest source, obviously :) ) would be very helpful...
> > > 
> > > Thanks,
> > > --Gabriel
> > 
> > 
> > Just poking around the spec I find more things
> > we don't implement correctly wrt to auto-negotiation.
> > For example, MII_SR_AUTONEG_CAPS isn't set, is it?
> > Maybe that's why your guest doesn't work:
> > it doesn't expect to get autonegotation at all?
> > 
> > So I have a question: does your patch actually help any guests?
> > If not, maybe we should defer it to after release,
> > and try to clean up autonegotiation more thouroughly for 2.2?
> 
> The patch you just accepted (handling self clearing + reserved bits
> for phy_ctrl) was just "collateral cleanup" in my attempts to fix
> the suspected interrupt masking issue on PIIX. As such, either now
> or waiting for 2.2 is fine with me.

OK so better safe than sorry: I'll drop it for now, please repost when
2.2 opens.  Thanks!

> Not sure I'll be able to figure out the real/main issue while 2.1
> bugfixes are still being accepted, but *that* would actually be really
> nice :)
> 
> Thanks,
> --Gabriel
Gabriel L. Somlo July 2, 2014, 2:21 p.m. UTC | #9
On Wed, Jul 02, 2014 at 12:33:16PM +0300, Michael S. Tsirkin wrote:
> Just poking around the spec I find more things
> we don't implement correctly wrt to auto-negotiation.
> For example, MII_SR_AUTONEG_CAPS isn't set, is it?
> Maybe that's why your guest doesn't work:
> it doesn't expect to get autonegotation at all?
> 
> So I have a question: does your patch actually help any guests?
> If not, maybe we should defer it to after release,
> and try to clean up autonegotiation more thouroughly for 2.2?

I'll re-submit after 2.1 is officially out. But, since we're talking
about MII_SR_AUTONEG_CAPS:  PHY_STATUS is initialized to 0x794d, which
includes setting the MII_SR_AUTONEG_CAPS bit (|= 0x8).

Did you mean: we should check for MII_SR_AUTONEG_CAPS in have_autoneg() ?
(i.e., on the chance it gets turned off by a guest-side write to
PHY_STATUS) ?

Thx,
--G

PS. Maybe also spell out the individual bits in phy_reg_init[] ? Like,
instead of:

    [PHY_STATUS] = 0x794d,

do this:

    [PHY_STATUS] = MII_SR_EXTENDED_CAPS |
                   MII_SR_LINK_STATUS |
                   MII_SR_AUTONEG_CAPS | 
                   MII_SR_PREAMBLE_SUPPRESS | 
                   MII_SR_EXTENDED_STATUS | 
                   MII_SR_10T_HD_CAPS | 
                   MII_SR_10T_FD_CAPS | 
                   MII_SR_100X_HD_CAPS | 
                   MII_SR_100X_FD_CAPS,

... for all registers ? Much more verbose, but IMHO that'd be a good
thing :)
Michael S. Tsirkin July 2, 2014, 3:17 p.m. UTC | #10
On Wed, Jul 02, 2014 at 10:21:34AM -0400, Gabriel L. Somlo wrote:
> On Wed, Jul 02, 2014 at 12:33:16PM +0300, Michael S. Tsirkin wrote:
> > Just poking around the spec I find more things
> > we don't implement correctly wrt to auto-negotiation.
> > For example, MII_SR_AUTONEG_CAPS isn't set, is it?
> > Maybe that's why your guest doesn't work:
> > it doesn't expect to get autonegotation at all?
> > 
> > So I have a question: does your patch actually help any guests?
> > If not, maybe we should defer it to after release,
> > and try to clean up autonegotiation more thouroughly for 2.2?
> 
> I'll re-submit after 2.1 is officially out. But, since we're talking
> about MII_SR_AUTONEG_CAPS:  PHY_STATUS is initialized to 0x794d, which
> includes setting the MII_SR_AUTONEG_CAPS bit (|= 0x8).
> Did you mean: we should check for MII_SR_AUTONEG_CAPS in have_autoneg() ?
> (i.e., on the chance it gets turned off by a guest-side write to
> PHY_STATUS) ?

PHY_STATUS isn't writeable is it?
No I just got confused with the binary math.
We really should use symbolic constants there,
we already have them defined.

> 
> Thx,
> --G
> 
> PS. Maybe also spell out the individual bits in phy_reg_init[] ? Like,
> instead of:
> 
>     [PHY_STATUS] = 0x794d,
> 
> do this:
> 
>     [PHY_STATUS] = MII_SR_EXTENDED_CAPS |
>                    MII_SR_LINK_STATUS |
>                    MII_SR_AUTONEG_CAPS | 
>                    MII_SR_PREAMBLE_SUPPRESS | 
>                    MII_SR_EXTENDED_STATUS | 
>                    MII_SR_10T_HD_CAPS | 
>                    MII_SR_10T_FD_CAPS | 
>                    MII_SR_100X_HD_CAPS | 
>                    MII_SR_100X_FD_CAPS,
> 
> ... for all registers ? Much more verbose, but IMHO that'd be a good
> thing :)
> 
>
Gabriel L. Somlo July 2, 2014, 8:49 p.m. UTC | #11
On Wed, Jul 02, 2014 at 11:16:52AM +0200, Alexander Graf wrote:
>>> Are you sure there's not just simply some irq unmasking event
>>> after 5500ms we don't handle properly?
>> I poked around a bit, and the e1000 interrupt mask register is NOT the
>> problem (the LSC mask bit is clear at all times). If anything, maybe
>> the PIIX southbridge (or something further up "north") is masking PCI
>> interrupts (at least from e1000) until roughly 5500 ms into the boot
>> process ? Any ideas on how I could go about verifying this (without
>> access to the guest source, obviously :) ) would be very helpful...
> 
> Yeah, maybe the interrupt is masked and doesn't get delivered properly? See
> if you can trace when the e1000 emulation starts kicking an interrupt and
> when the guest tries to fetch it (there should be an ack register for IRQs
> somewhere).
> 
> If we kick it but the guest doesn't react, the problem is further down -
> check whether the IRQ ever got injected into the guest with trace points.
> 
> If we don't kick it, we mask it somewhere in the e1000 emulation and need to
> make sure we do kick once we unmask :). I don't know whether the LSC mask is
> the only one involved.

So it turns out everything I thought I knew (which was little indeed)
was more or less wrong. The problem, as far as I'm observing it now,
is that on PIIX, the OS X guest obsessively reads the ICR in a tight
loop. It reads the injected LSC (and probably discards it) before
unmasking the corresponding interrupt bit; later on, when it unmasks
LSC, giving the emulated e1000 hardware a chance to raise the irq
line, the actual LSC event has been flushed from the ICR, and the
driver does not detect the link coming up.

Here's how things work normally on Q35 (with INTERRUPT and PHY debugging
enabled, // my comments on the side):

e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: Start link auto negotiation
e1000: Auto negotiation is completed
e1000: set_ics 4, ICR 0, IMR 0               // autoneg timer injects LSC
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_ics 2, ICR 4, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_ics 2, ICR 6, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_ims df                            // guest unmasks interrupts
e1000: set_ics 0, ICR 6, IMR df
e1000: set_interrupt_cause: mit_irq_level=1  // first raising irq edge
e1000: ICR read: 6                           // guest receives LSC (+more)

... and things work nicely from here on out :)


On PIIX, however, things look like this:

e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0

... <8155 "ICR read" repetitions deleted> ...

e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: Start link auto negotiation
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0

... <145 "ICR read" repetitions deleted> ...

e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: Auto negotiation is completed
e1000: set_ics 4, ICR 0, IMR 0               // autoneg timer injects LSC
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 4                           // guest reads (and discards?) LSC
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0

... <23 "ICR read" repetitions deleted> ...

e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_ics 2, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 2
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_ics 2, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_ims df                            // guest unmasks interrupts
e1000: set_ics 0, ICR 2, IMR df
e1000: set_interrupt_cause: mit_irq_level=1  // first raising irq edge
e1000: ICR read: 2                           // too late, LSC is gone :(
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc 2
e1000: set_ics 0, ICR 0, IMR dd
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_ims df
e1000: set_ics 0, ICR 0, IMR df
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_ics 80, ICR 0, IMR df
e1000: set_interrupt_cause: mit_irq_level=1
e1000: set_ics 3, ICR 80, IMR df
e1000: set_interrupt_cause: mit_irq_level=1
e1000: ICR read: 83
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc 82
e1000: set_ics 0, ICR 0, IMR 5d
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_ims df
e1000: set_ics 0, ICR 0, IMR df
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0

... and so on...


Any clue as to why ICR gets read like that on PIIX, but not Q35 ?

Thanks,
--Gabriel
Alexander Graf July 2, 2014, 9:02 p.m. UTC | #12
On 02.07.14 22:49, Gabriel L. Somlo wrote:
> On Wed, Jul 02, 2014 at 11:16:52AM +0200, Alexander Graf wrote:
>>>> Are you sure there's not just simply some irq unmasking event
>>>> after 5500ms we don't handle properly?
>>> I poked around a bit, and the e1000 interrupt mask register is NOT the
>>> problem (the LSC mask bit is clear at all times). If anything, maybe
>>> the PIIX southbridge (or something further up "north") is masking PCI
>>> interrupts (at least from e1000) until roughly 5500 ms into the boot
>>> process ? Any ideas on how I could go about verifying this (without
>>> access to the guest source, obviously :) ) would be very helpful...
>> Yeah, maybe the interrupt is masked and doesn't get delivered properly? See
>> if you can trace when the e1000 emulation starts kicking an interrupt and
>> when the guest tries to fetch it (there should be an ack register for IRQs
>> somewhere).
>>
>> If we kick it but the guest doesn't react, the problem is further down -
>> check whether the IRQ ever got injected into the guest with trace points.
>>
>> If we don't kick it, we mask it somewhere in the e1000 emulation and need to
>> make sure we do kick once we unmask :). I don't know whether the LSC mask is
>> the only one involved.
> So it turns out everything I thought I knew (which was little indeed)
> was more or less wrong. The problem, as far as I'm observing it now,
> is that on PIIX, the OS X guest obsessively reads the ICR in a tight
> loop. It reads the injected LSC (and probably discards it) before
> unmasking the corresponding interrupt bit; later on, when it unmasks
> LSC, giving the emulated e1000 hardware a chance to raise the irq
> line, the actual LSC event has been flushed from the ICR, and the
> driver does not detect the link coming up.
>
> Here's how things work normally on Q35 (with INTERRUPT and PHY debugging
> enabled, // my comments on the side):
>
> e1000: set_imc ffffffff
> e1000: set_ics 0, ICR 0, IMR 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: set_imc ffffffff
> e1000: set_ics 0, ICR 0, IMR 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: ICR read: 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: set_imc ffffffff
> e1000: set_ics 0, ICR 0, IMR 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: ICR read: 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: set_imc ffffffff
> e1000: set_ics 0, ICR 0, IMR 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: set_imc ffffffff
> e1000: set_ics 0, ICR 0, IMR 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: ICR read: 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: set_imc ffffffff
> e1000: set_ics 0, ICR 0, IMR 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: ICR read: 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: Start link auto negotiation
> e1000: Auto negotiation is completed
> e1000: set_ics 4, ICR 0, IMR 0               // autoneg timer injects LSC
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: set_ics 2, ICR 4, IMR 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: set_ics 2, ICR 6, IMR 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: set_ims df                            // guest unmasks interrupts
> e1000: set_ics 0, ICR 6, IMR df
> e1000: set_interrupt_cause: mit_irq_level=1  // first raising irq edge
> e1000: ICR read: 6                           // guest receives LSC (+more)
>
> ... and things work nicely from here on out :)
>
>
> On PIIX, however, things look like this:
>
> e1000: set_imc ffffffff
> e1000: set_ics 0, ICR 0, IMR 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: set_imc ffffffff
> e1000: set_ics 0, ICR 0, IMR 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: ICR read: 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: set_imc ffffffff
> e1000: set_ics 0, ICR 0, IMR 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: ICR read: 0
> e1000: set_interrupt_cause: mit_irq_level=0
>
> ... <8155 "ICR read" repetitions deleted> ...
>
> e1000: ICR read: 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: set_imc ffffffff
> e1000: set_ics 0, ICR 0, IMR 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: ICR read: 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: ICR read: 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: ICR read: 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: ICR read: 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: ICR read: 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: set_imc ffffffff
> e1000: set_ics 0, ICR 0, IMR 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: ICR read: 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: ICR read: 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: set_imc ffffffff
> e1000: set_ics 0, ICR 0, IMR 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: ICR read: 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: Start link auto negotiation
> e1000: ICR read: 0
> e1000: set_interrupt_cause: mit_irq_level=0
>
> ... <145 "ICR read" repetitions deleted> ...
>
> e1000: ICR read: 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: Auto negotiation is completed
> e1000: set_ics 4, ICR 0, IMR 0               // autoneg timer injects LSC
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: ICR read: 4                           // guest reads (and discards?) LSC
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: ICR read: 0
> e1000: set_interrupt_cause: mit_irq_level=0
>
> ... <23 "ICR read" repetitions deleted> ...
>
> e1000: ICR read: 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: set_ics 2, ICR 0, IMR 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: ICR read: 2
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: set_ics 2, ICR 0, IMR 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: set_ims df                            // guest unmasks interrupts
> e1000: set_ics 0, ICR 2, IMR df
> e1000: set_interrupt_cause: mit_irq_level=1  // first raising irq edge
> e1000: ICR read: 2                           // too late, LSC is gone :(
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: set_imc 2
> e1000: set_ics 0, ICR 0, IMR dd
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: set_ims df
> e1000: set_ics 0, ICR 0, IMR df
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: ICR read: 0
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: set_ics 80, ICR 0, IMR df
> e1000: set_interrupt_cause: mit_irq_level=1
> e1000: set_ics 3, ICR 80, IMR df
> e1000: set_interrupt_cause: mit_irq_level=1
> e1000: ICR read: 83
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: set_imc 82
> e1000: set_ics 0, ICR 0, IMR 5d
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: set_ims df
> e1000: set_ics 0, ICR 0, IMR df
> e1000: set_interrupt_cause: mit_irq_level=0
> e1000: ICR read: 0
> e1000: set_interrupt_cause: mit_irq_level=0
>
> ... and so on...
>
>
> Any clue as to why ICR gets read like that on PIIX, but not Q35 ?

Maybe it's using MSI on q35? :)
Maybe we also share the same IRQ line with another device on PIIX that 
gets polled all the time? IDE maybe?

Either way, why does the bit get cleared even though it hasn't been 
raised? What does real hardware do with interrupts that have been masked?


Alex
Gabriel L. Somlo July 2, 2014, 9:14 p.m. UTC | #13
On Wed, Jul 02, 2014 at 11:02:30PM +0200, Alexander Graf wrote:
> 
> On 02.07.14 22:49, Gabriel L. Somlo wrote:
> >So it turns out everything I thought I knew (which was little indeed)
> >was more or less wrong. The problem, as far as I'm observing it now,
> >is that on PIIX, the OS X guest obsessively reads the ICR in a tight
> >loop. It reads the injected LSC (and probably discards it) before
> >unmasking the corresponding interrupt bit; later on, when it unmasks
> >LSC, giving the emulated e1000 hardware a chance to raise the irq
> >line, the actual LSC event has been flushed from the ICR, and the
> >driver does not detect the link coming up.
> >
> > [...]
> >
> >Any clue as to why ICR gets read like that on PIIX, but not Q35 ?
> 
> Either way, why does the bit get cleared even though it hasn't been raised?
> What does real hardware do with interrupts that have been masked?

The e1000 manual says ICR bits are cleared on read. It also says PCI
interrupts are only generated if the corresponding bit in *both* ICR
and IMS registers is 1. ICR bits are still cleared if read, even if
masked and no actual interrupt is raised.

> Maybe it's using MSI on q35? :)
> Maybe we also share the same IRQ line with another device on PIIX that gets
> polled all the time? IDE maybe?

Even if that were the case, how come it's reading precisely our
device's ICR register ? Isn't that too much of a coincidence ? 

/me goes back to reading about PIIX and ICH10 (840 pages FTW!) :)
Alexander Graf July 2, 2014, 9:54 p.m. UTC | #14
> Am 02.07.2014 um 23:14 schrieb "Gabriel L. Somlo" <gsomlo@gmail.com>:
> 
>> On Wed, Jul 02, 2014 at 11:02:30PM +0200, Alexander Graf wrote:
>> 
>>> On 02.07.14 22:49, Gabriel L. Somlo wrote:
>>> So it turns out everything I thought I knew (which was little indeed)
>>> was more or less wrong. The problem, as far as I'm observing it now,
>>> is that on PIIX, the OS X guest obsessively reads the ICR in a tight
>>> loop. It reads the injected LSC (and probably discards it) before
>>> unmasking the corresponding interrupt bit; later on, when it unmasks
>>> LSC, giving the emulated e1000 hardware a chance to raise the irq
>>> line, the actual LSC event has been flushed from the ICR, and the
>>> driver does not detect the link coming up.
>>> 
>>> [...]
>>> 
>>> Any clue as to why ICR gets read like that on PIIX, but not Q35 ?
>> 
>> Either way, why does the bit get cleared even though it hasn't been raised?
>> What does real hardware do with interrupts that have been masked?
> 
> The e1000 manual says ICR bits are cleared on read. It also says PCI
> interrupts are only generated if the corresponding bit in *both* ICR
> and IMS registers is 1. ICR bits are still cleared if read, even if
> masked and no actual interrupt is raised.
> 
>> Maybe it's using MSI on q35? :)
>> Maybe we also share the same IRQ line with another device on PIIX that gets
>> polled all the time? IDE maybe?
> 
> Even if that were the case, how come it's reading precisely our
> device's ICR register ? Isn't that too much of a coincidence ? 

When PCI devices share an irq line, the OS needs to poke all devices on that line to figure out which device the IRQ came from. Maybe it's reading ICR for that purpose?

Alex

> 
> /me goes back to reading about PIIX and ICH10 (840 pages FTW!) :)
>
Gabriel L. Somlo July 2, 2014, 10:02 p.m. UTC | #15
On Wed, Jul 02, 2014 at 05:14:26PM -0400, Gabriel L. Somlo wrote:
> On Wed, Jul 02, 2014 at 11:02:30PM +0200, Alexander Graf wrote:
> > 
> > On 02.07.14 22:49, Gabriel L. Somlo wrote:
> > >So it turns out everything I thought I knew (which was little indeed)
> > >was more or less wrong. The problem, as far as I'm observing it now,
> > >is that on PIIX, the OS X guest obsessively reads the ICR in a tight
> > >loop. It reads the injected LSC (and probably discards it) before
> > >unmasking the corresponding interrupt bit; later on, when it unmasks
> > >LSC, giving the emulated e1000 hardware a chance to raise the irq
> > >line, the actual LSC event has been flushed from the ICR, and the
> > >driver does not detect the link coming up.
> > >
> > > [...]
> > >
> > >Any clue as to why ICR gets read like that on PIIX, but not Q35 ?
> > 
> > Either way, why does the bit get cleared even though it hasn't been raised?
> > What does real hardware do with interrupts that have been masked?
> 
> The e1000 manual says ICR bits are cleared on read. It also says PCI
> interrupts are only generated if the corresponding bit in *both* ICR
> and IMS registers is 1. ICR bits are still cleared if read, even if
> masked and no actual interrupt is raised.
> 
> > Maybe it's using MSI on q35? :)
> > Maybe we also share the same IRQ line with another device on PIIX that gets
> > polled all the time? IDE maybe?
> 
> Even if that were the case, how come it's reading precisely our
> device's ICR register ? Isn't that too much of a coincidence ? 

Unless some other device on a shared IRQ line generated the interrupt,
and our e1000 driver then checked ICR. Although if that happened, our
driver should happily acknowledge the link-up event (I checked, and it
does NOT read IMS, so it has no way of knowing it's masked). Except it
does -- it knows the last thing it did was set_imc(0xFFFFFFFF), and
that it hasn't yet set_ims(anything) yet... 

And I think you're right:

(qemu) info pci
  Bus  0, device   0, function 0:
    Host bridge: PCI device 8086:1237
      id ""
  Bus  0, device   1, function 0:
    ISA bridge: PCI device 8086:7000
      id ""
  Bus  0, device   1, function 1:
    IDE controller: PCI device 8086:7010
      BAR4: I/O at 0xc080 [0xc08f].
      id ""
  Bus  0, device   1, function 2:
    USB controller: PCI device 8086:7020
      IRQ 11.
      BAR4: I/O at 0xc040 [0xc05f].
      id ""
  Bus  0, device   1, function 3:
    Bridge: PCI device 8086:7113
      IRQ 9.
      id ""
  Bus  0, device   2, function 0:
    VGA controller: PCI device 1013:00b8
      BAR0: 32 bit prefetchable memory at 0xfc000000 [0xfdffffff].
      BAR1: 32 bit memory at 0xfebf0000 [0xfebf0fff].
      BAR6: 32 bit memory at 0xffffffffffffffff [0x0000fffe].
      id ""
  Bus  0, device   3, function 0:
    SATA controller: PCI device 8086:2922
      IRQ 11.
      BAR4: I/O at 0xc060 [0xc07f].
      BAR5: 32 bit memory at 0xfebf1000 [0xfebf1fff].
      id "ide"
  Bus  0, device   4, function 0:
    Ethernet controller: PCI device 8086:100f
      IRQ 11.
      BAR0: 32 bit memory at 0xfebc0000 [0xfebdffff].
      BAR1: I/O at 0xffffffffffffffff [0x003e].
      BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe].
      id "vnet0"

so Ethernet, SATA, and USB, all sharing IRQ 11. Is there an easy way
to force one of those to use a different IRQ ?
Alexander Graf July 3, 2014, 8:04 a.m. UTC | #16
> Am 03.07.2014 um 00:02 schrieb "Gabriel L. Somlo" <gsomlo@gmail.com>:
> 
>> On Wed, Jul 02, 2014 at 05:14:26PM -0400, Gabriel L. Somlo wrote:
>>> On Wed, Jul 02, 2014 at 11:02:30PM +0200, Alexander Graf wrote:
>>> 
>>>> On 02.07.14 22:49, Gabriel L. Somlo wrote:
>>>> So it turns out everything I thought I knew (which was little indeed)
>>>> was more or less wrong. The problem, as far as I'm observing it now,
>>>> is that on PIIX, the OS X guest obsessively reads the ICR in a tight
>>>> loop. It reads the injected LSC (and probably discards it) before
>>>> unmasking the corresponding interrupt bit; later on, when it unmasks
>>>> LSC, giving the emulated e1000 hardware a chance to raise the irq
>>>> line, the actual LSC event has been flushed from the ICR, and the
>>>> driver does not detect the link coming up.
>>>> 
>>>> [...]
>>>> 
>>>> Any clue as to why ICR gets read like that on PIIX, but not Q35 ?
>>> 
>>> Either way, why does the bit get cleared even though it hasn't been raised?
>>> What does real hardware do with interrupts that have been masked?
>> 
>> The e1000 manual says ICR bits are cleared on read. It also says PCI
>> interrupts are only generated if the corresponding bit in *both* ICR
>> and IMS registers is 1. ICR bits are still cleared if read, even if
>> masked and no actual interrupt is raised.
>> 
>>> Maybe it's using MSI on q35? :)
>>> Maybe we also share the same IRQ line with another device on PIIX that gets
>>> polled all the time? IDE maybe?
>> 
>> Even if that were the case, how come it's reading precisely our
>> device's ICR register ? Isn't that too much of a coincidence ?
> 
> Unless some other device on a shared IRQ line generated the interrupt,
> and our e1000 driver then checked ICR. Although if that happened, our
> driver should happily acknowledge the link-up event (I checked, and it
> does NOT read IMS, so it has no way of knowing it's masked). Except it
> does -- it knows the last thing it did was set_imc(0xFFFFFFFF), and
> that it hasn't yet set_ims(anything) yet... 
> 
> And I think you're right:
> 
> (qemu) info pci
>  Bus  0, device   0, function 0:
>    Host bridge: PCI device 8086:1237
>      id ""
>  Bus  0, device   1, function 0:
>    ISA bridge: PCI device 8086:7000
>      id ""
>  Bus  0, device   1, function 1:
>    IDE controller: PCI device 8086:7010
>      BAR4: I/O at 0xc080 [0xc08f].
>      id ""
>  Bus  0, device   1, function 2:
>    USB controller: PCI device 8086:7020
>      IRQ 11.
>      BAR4: I/O at 0xc040 [0xc05f].
>      id ""
>  Bus  0, device   1, function 3:
>    Bridge: PCI device 8086:7113
>      IRQ 9.
>      id ""
>  Bus  0, device   2, function 0:
>    VGA controller: PCI device 1013:00b8
>      BAR0: 32 bit prefetchable memory at 0xfc000000 [0xfdffffff].
>      BAR1: 32 bit memory at 0xfebf0000 [0xfebf0fff].
>      BAR6: 32 bit memory at 0xffffffffffffffff [0x0000fffe].
>      id ""
>  Bus  0, device   3, function 0:
>    SATA controller: PCI device 8086:2922
>      IRQ 11.
>      BAR4: I/O at 0xc060 [0xc07f].
>      BAR5: 32 bit memory at 0xfebf1000 [0xfebf1fff].
>      id "ide"
>  Bus  0, device   4, function 0:
>    Ethernet controller: PCI device 8086:100f
>      IRQ 11.
>      BAR0: 32 bit memory at 0xfebc0000 [0xfebdffff].
>      BAR1: I/O at 0xffffffffffffffff [0x003e].
>      BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe].
>      id "vnet0"
> 
> so Ethernet, SATA, and USB, all sharing IRQ 11. Is there an easy way
> to force one of those to use a different IRQ ?

IIRC if you plug the device in a different slot, the irq distribution should be different :).

Alex

> 
>
Gabriel L. Somlo July 3, 2014, 1:17 p.m. UTC | #17
On Thu, Jul 03, 2014 at 10:04:55AM +0200, Alexander Graf wrote:
> > so Ethernet, SATA, and USB, all sharing IRQ 11. Is there an easy way
> > to force one of those to use a different IRQ ?

Oh, and on Q35, while Ethernet (and one of the USBs) is still on IRQ 11,
SATA ended up on IRQ 10, and things are fine there...

> IIRC if you plug the device in a different slot, the irq distribution
> should be different :).

Sorry for being thick, but I'm still trying to figure out if there's a
command-line way of making that happen. Re-ordering the "-device"
arguments to qemu obviously doesn't make a difference in how they're
assigned...

Gabriel
Alexander Graf July 3, 2014, 1:20 p.m. UTC | #18
On 03.07.2014, at 15:17, Gabriel L. Somlo <gsomlo@gmail.com> wrote:

> On Thu, Jul 03, 2014 at 10:04:55AM +0200, Alexander Graf wrote:
>>> so Ethernet, SATA, and USB, all sharing IRQ 11. Is there an easy way
>>> to force one of those to use a different IRQ ?
> 
> Oh, and on Q35, while Ethernet (and one of the USBs) is still on IRQ 11,
> SATA ended up on IRQ 10, and things are fine there...
> 
>> IIRC if you plug the device in a different slot, the irq distribution
>> should be different :).
> 
> Sorry for being thick, but I'm still trying to figure out if there's a
> command-line way of making that happen. Re-ordering the "-device"
> arguments to qemu obviously doesn't make a difference in how they're
> assigned...

The magic word is "devfn" :). It's basically the token PCI uses to find out which slot and function id a device is on. You can set "devfn" with the "addr" attribute on -device.


Alex
Gabriel L. Somlo July 3, 2014, 1:58 p.m. UTC | #19
On Thu, Jul 03, 2014 at 03:20:50PM +0200, Alexander Graf wrote:
> 
> On 03.07.2014, at 15:17, Gabriel L. Somlo <gsomlo@gmail.com> wrote:
> 
> > On Thu, Jul 03, 2014 at 10:04:55AM +0200, Alexander Graf wrote:
> >>> so Ethernet, SATA, and USB, all sharing IRQ 11. Is there an easy way
> >>> to force one of those to use a different IRQ ?
> > 
> > Oh, and on Q35, while Ethernet (and one of the USBs) is still on IRQ 11,
> > SATA ended up on IRQ 10, and things are fine there...
> > 
> >> IIRC if you plug the device in a different slot, the irq distribution
> >> should be different :).
> > 
> > Sorry for being thick, but I'm still trying to figure out if there's a
> > command-line way of making that happen. Re-ordering the "-device"
> > arguments to qemu obviously doesn't make a difference in how they're
> > assigned...
> 
> The magic word is "devfn" :). It's basically the token PCI uses to find out which slot and function id a device is on. You can set "devfn" with the "addr" attribute on -device.

OK. So simply doing "-device e1000-82545em,...,bus=pci.0,addr=5" to
move it up one slot from its default of "4" gets it set up with IRQ 10.
As long as it doesn't share an irq line with the SATA controller, the
network link gets detected just fine, same as on Q35.

So this is not really an autonegotiation or timing bug, it's an
unfortunate side effect of the OS X built-in proprietary e1000 driver
not playing nice when IRQs are shared with SATA in particular. At
least as far as I'm able to tell so far... :)

So I'll send out some autoneg cleanup patches after 2.1 is officially
out, but the "osx link on piix not working" mystery is solved as far
as I can tell :)

Thanks much,
--Gabriel
Alexander Graf July 3, 2014, 2:02 p.m. UTC | #20
On 03.07.14 15:58, Gabriel L. Somlo wrote:
> On Thu, Jul 03, 2014 at 03:20:50PM +0200, Alexander Graf wrote:
>> On 03.07.2014, at 15:17, Gabriel L. Somlo <gsomlo@gmail.com> wrote:
>>
>>> On Thu, Jul 03, 2014 at 10:04:55AM +0200, Alexander Graf wrote:
>>>>> so Ethernet, SATA, and USB, all sharing IRQ 11. Is there an easy way
>>>>> to force one of those to use a different IRQ ?
>>> Oh, and on Q35, while Ethernet (and one of the USBs) is still on IRQ 11,
>>> SATA ended up on IRQ 10, and things are fine there...
>>>
>>>> IIRC if you plug the device in a different slot, the irq distribution
>>>> should be different :).
>>> Sorry for being thick, but I'm still trying to figure out if there's a
>>> command-line way of making that happen. Re-ordering the "-device"
>>> arguments to qemu obviously doesn't make a difference in how they're
>>> assigned...
>> The magic word is "devfn" :). It's basically the token PCI uses to find out which slot and function id a device is on. You can set "devfn" with the "addr" attribute on -device.
> OK. So simply doing "-device e1000-82545em,...,bus=pci.0,addr=5" to
> move it up one slot from its default of "4" gets it set up with IRQ 10.
> As long as it doesn't share an irq line with the SATA controller, the
> network link gets detected just fine, same as on Q35.
>
> So this is not really an autonegotiation or timing bug, it's an
> unfortunate side effect of the OS X built-in proprietary e1000 driver
> not playing nice when IRQs are shared with SATA in particular. At
> least as far as I'm able to tell so far... :)
>
> So I'll send out some autoneg cleanup patches after 2.1 is officially
> out, but the "osx link on piix not working" mystery is solved as far
> as I can tell :)

Maybe only ICR bits that actually would trigger an interrupt get cleared?


Alex
Gabriel L. Somlo July 3, 2014, 2:14 p.m. UTC | #21
On Thu, Jul 03, 2014 at 04:02:06PM +0200, Alexander Graf wrote:
> 
> On 03.07.14 15:58, Gabriel L. Somlo wrote:
> >On Thu, Jul 03, 2014 at 03:20:50PM +0200, Alexander Graf wrote:
> >>On 03.07.2014, at 15:17, Gabriel L. Somlo <gsomlo@gmail.com> wrote:
> >>
> >>>On Thu, Jul 03, 2014 at 10:04:55AM +0200, Alexander Graf wrote:
> >>>>>so Ethernet, SATA, and USB, all sharing IRQ 11. Is there an easy way
> >>>>>to force one of those to use a different IRQ ?
> >>>Oh, and on Q35, while Ethernet (and one of the USBs) is still on IRQ 11,
> >>>SATA ended up on IRQ 10, and things are fine there...
> >>>
> >>>>IIRC if you plug the device in a different slot, the irq distribution
> >>>>should be different :).
> >>>Sorry for being thick, but I'm still trying to figure out if there's a
> >>>command-line way of making that happen. Re-ordering the "-device"
> >>>arguments to qemu obviously doesn't make a difference in how they're
> >>>assigned...
> >>The magic word is "devfn" :). It's basically the token PCI uses to find out which slot and function id a device is on. You can set "devfn" with the "addr" attribute on -device.
> >OK. So simply doing "-device e1000-82545em,...,bus=pci.0,addr=5" to
> >move it up one slot from its default of "4" gets it set up with IRQ 10.
> >As long as it doesn't share an irq line with the SATA controller, the
> >network link gets detected just fine, same as on Q35.
> >
> >So this is not really an autonegotiation or timing bug, it's an
> >unfortunate side effect of the OS X built-in proprietary e1000 driver
> >not playing nice when IRQs are shared with SATA in particular. At
> >least as far as I'm able to tell so far... :)
> >
> >So I'll send out some autoneg cleanup patches after 2.1 is officially
> >out, but the "osx link on piix not working" mystery is solved as far
> >as I can tell :)
> 
> Maybe only ICR bits that actually would trigger an interrupt get cleared?

you mean, modify e1000's mac_icr_read() to only clear bits on read if
the mask says they should be "active" at that moment ?

The e1000 spec doesn't seem to say that though, it's "you read it,
it's now 0", regardless of the mask configuration.

http://download.intel.com/design/network/manuals/8254x_GBe_SDM.pdf
on page 293.

Thanks,
--Gabriel
Alexander Graf July 3, 2014, 2:51 p.m. UTC | #22
On 03.07.14 16:14, Gabriel L. Somlo wrote:
> On Thu, Jul 03, 2014 at 04:02:06PM +0200, Alexander Graf wrote:
>> On 03.07.14 15:58, Gabriel L. Somlo wrote:
>>> On Thu, Jul 03, 2014 at 03:20:50PM +0200, Alexander Graf wrote:
>>>> On 03.07.2014, at 15:17, Gabriel L. Somlo <gsomlo@gmail.com> wrote:
>>>>
>>>>> On Thu, Jul 03, 2014 at 10:04:55AM +0200, Alexander Graf wrote:
>>>>>>> so Ethernet, SATA, and USB, all sharing IRQ 11. Is there an easy way
>>>>>>> to force one of those to use a different IRQ ?
>>>>> Oh, and on Q35, while Ethernet (and one of the USBs) is still on IRQ 11,
>>>>> SATA ended up on IRQ 10, and things are fine there...
>>>>>
>>>>>> IIRC if you plug the device in a different slot, the irq distribution
>>>>>> should be different :).
>>>>> Sorry for being thick, but I'm still trying to figure out if there's a
>>>>> command-line way of making that happen. Re-ordering the "-device"
>>>>> arguments to qemu obviously doesn't make a difference in how they're
>>>>> assigned...
>>>> The magic word is "devfn" :). It's basically the token PCI uses to find out which slot and function id a device is on. You can set "devfn" with the "addr" attribute on -device.
>>> OK. So simply doing "-device e1000-82545em,...,bus=pci.0,addr=5" to
>>> move it up one slot from its default of "4" gets it set up with IRQ 10.
>>> As long as it doesn't share an irq line with the SATA controller, the
>>> network link gets detected just fine, same as on Q35.
>>>
>>> So this is not really an autonegotiation or timing bug, it's an
>>> unfortunate side effect of the OS X built-in proprietary e1000 driver
>>> not playing nice when IRQs are shared with SATA in particular. At
>>> least as far as I'm able to tell so far... :)
>>>
>>> So I'll send out some autoneg cleanup patches after 2.1 is officially
>>> out, but the "osx link on piix not working" mystery is solved as far
>>> as I can tell :)
>> Maybe only ICR bits that actually would trigger an interrupt get cleared?
> you mean, modify e1000's mac_icr_read() to only clear bits on read if
> the mask says they should be "active" at that moment ?
>
> The e1000 spec doesn't seem to say that though, it's "you read it,
> it's now 0", regardless of the mask configuration.
>
> http://download.intel.com/design/network/manuals/8254x_GBe_SDM.pdf
> on page 293.

Hrm. So could we just delay notification of the LSC change until IMS.LCR 
= 1? In other words, when the guest writes IMS.LCR = 1, it was 0 before 
and the link is up, send an interrupt?


Alex
diff mbox

Patch

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 2376910..2300477 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -209,7 +209,7 @@  set_phy_ctrl(E1000State *s, int index, uint16_t val)
         e1000_link_down(s);
         DBGOUT(PHY, "Start link auto negotiation\n");
         timer_mod(s->autoneg_timer,
-                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 500);
+                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 5500);
     }
 }