diff mbox

e1000: work around win 8.0 boot hang

Message ID 1424373859-2019-1-git-send-email-rkrcmar@redhat.com
State New
Headers show

Commit Message

Radim Krčmář Feb. 19, 2015, 7:24 p.m. UTC
Window 8.0 driver has a particular behavior for a small time frame after
it enables rx interrupts:  the interrupt handler never clears
E1000_ICR_RXT0.  The handler does this something like this:
  set_imc(-1)               (1) disable all interrupts
  val = read_icr()          (2) clear ICR
  handled = magic(val)      (3) do nothing to E1000_ICR_RXT0
  set_ics(val & ~handled)   (4) set unhandled interrupts back to ICR
  set_ims(157)              (5) enable some interrupts

so if we started with RXT0, then every time the handler re-enables e1000
interrupts, it receives one.  This likely wouldn't matter in real
hardware, because it is slow enough to make some progress between
interrupts, but KVM instantly interrupts it, and boot hangs.
(If we have multiple VCPUs, the interrupt gets load-balanced and
 everything is fine.)

I haven't found any problem in earlier phase of initialization and
windows writes 0 to RADV and RDTR, so some workaround looks like the
only way if we want to support win8.0 on uniprocessors.  (I vote NO.)

This workaround uses the fact that a constant is cleared from ICR and
later set back to it.  After detecting this situation, we reuse the
mitigation framework to inject an interrupt 10 microseconds later.
(It's not exactly 10 microseconds, to keep the existing logic intact.)

The detection is done by checking at (1), (2), and (5).  (2) and (5)
require that the only bit in ICR is RXT0.  We could also check at (4),
and on writes to any other register, but it would most likely only add
more useless code, because normal operations shouldn't behave like that
anyway.  (An OS that deliberately keeps bits in ICR to notify itself
that there are more packets, or for more creative reasons, is nothing we
should care about.)

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 The patch is still untested -- it only approximates the behavior of RHEL
 patches that worked, I'll try to get a reproducer ...

 hw/net/e1000.c | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

Comments

Radim Krčmář Feb. 19, 2015, 8:37 p.m. UTC | #1
2015-02-19 20:24+0100, Radim Krčmář:
> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
> @@ -138,6 +138,10 @@ typedef struct E1000State_st {
> +#define E1000_WIN8_WORKAROUND_ICR       E1000_ICR_RXT0
> +#define E1000_WIN8_WORKAROUND_DELAY_US  10
> +    bool win8_workaround_needed;
> @@ -288,7 +292,7 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
> @@ -316,13 +320,17 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
> +        if (s->win8_workround_needed) {

So I read the patch again and noticed a typo here, which reminds me that
QEMU does not compile on rawhide for several reasons ...
I'll fix that to compensate.
Stefan Hajnoczi Feb. 23, 2015, 10:45 a.m. UTC | #2
On Thu, Feb 19, 2015 at 09:37:46PM +0100, Radim Krčmář wrote:
> 2015-02-19 20:24+0100, Radim Krčmář:
> > diff --git a/hw/net/e1000.c b/hw/net/e1000.c
> > @@ -138,6 +138,10 @@ typedef struct E1000State_st {
> > +#define E1000_WIN8_WORKAROUND_ICR       E1000_ICR_RXT0
> > +#define E1000_WIN8_WORKAROUND_DELAY_US  10
> > +    bool win8_workaround_needed;
> > @@ -288,7 +292,7 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
> > @@ -316,13 +320,17 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
> > +        if (s->win8_workround_needed) {
> 
> So I read the patch again and noticed a typo here, which reminds me that
> QEMU does not compile on rawhide for several reasons ...
> I'll fix that to compensate.

Just to clarify, you are NACKing this patch and will send a new series?

I want to make sure this fix doesn't get forgotten.
Radim Krčmář Feb. 23, 2015, 1:45 p.m. UTC | #3
2015-02-23 10:45+0000, Stefan Hajnoczi:
> On Thu, Feb 19, 2015 at 09:37:46PM +0100, Radim Krčmář wrote:
> > 2015-02-19 20:24+0100, Radim Krčmář:
> > > diff --git a/hw/net/e1000.c b/hw/net/e1000.c
> > > @@ -138,6 +138,10 @@ typedef struct E1000State_st {
> > > +#define E1000_WIN8_WORKAROUND_ICR       E1000_ICR_RXT0
> > > +#define E1000_WIN8_WORKAROUND_DELAY_US  10
> > > +    bool win8_workaround_needed;
> > > @@ -288,7 +292,7 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
> > > @@ -316,13 +320,17 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
> > > +        if (s->win8_workround_needed) {
> > 
> > So I read the patch again and noticed a typo here, which reminds me that
> > QEMU does not compile on rawhide for several reasons ...
> > I'll fix that to compensate.
> 
> Just to clarify, you are NACKing this patch and will send a new series?

I would only change this line in v2 so far, so I wait for more comments
before respin.  It is possible that maintainers find fixing a typo
easier than handling v2, but it most likely is a NACK.

(The build fixes are already posted in an independent series, they were
 just a reason why I haven't even compile-tested this patch.)

> I want to make sure this fix doesn't get forgotten.

Thanks!
Stefan Hajnoczi Feb. 23, 2015, 2:39 p.m. UTC | #4
On Mon, Feb 23, 2015 at 1:45 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> 2015-02-23 10:45+0000, Stefan Hajnoczi:
>> On Thu, Feb 19, 2015 at 09:37:46PM +0100, Radim Krčmář wrote:
>> > 2015-02-19 20:24+0100, Radim Krčmář:
>> > > diff --git a/hw/net/e1000.c b/hw/net/e1000.c
>> > > @@ -138,6 +138,10 @@ typedef struct E1000State_st {
>> > > +#define E1000_WIN8_WORKAROUND_ICR       E1000_ICR_RXT0
>> > > +#define E1000_WIN8_WORKAROUND_DELAY_US  10
>> > > +    bool win8_workaround_needed;
>> > > @@ -288,7 +292,7 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
>> > > @@ -316,13 +320,17 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
>> > > +        if (s->win8_workround_needed) {
>> >
>> > So I read the patch again and noticed a typo here, which reminds me that
>> > QEMU does not compile on rawhide for several reasons ...
>> > I'll fix that to compensate.
>>
>> Just to clarify, you are NACKing this patch and will send a new series?
>
> I would only change this line in v2 so far, so I wait for more comments
> before respin.  It is possible that maintainers find fixing a typo
> easier than handling v2, but it most likely is a NACK.

Since the typo breaks compilation, it shows that the patch was not
tested.  That makes me nervous.

Please send v2 after testing the Windows 8 guest.

Stefan
Radim Krčmář Feb. 23, 2015, 4:07 p.m. UTC | #5
2015-02-23 14:39+0000, Stefan Hajnoczi:
> Since the typo breaks compilation, it shows that the patch was not
> tested.  That makes me nervous.

(It was based on a tested RHEL7 patch.  Upstream allowed for better
 code, so I didn't want to push the ugly, yet working, version.)

Honestly, I was hoping that we don't want to fix this kind of bugs in
QEMU at all, so posting early would have the same effect.

> Please send v2 after testing the Windows 8 guest.

Ok, (I don't have the reproducer, so it might take a while,)

thanks.
Wei Huang Feb. 23, 2015, 4:13 p.m. UTC | #6
On 02/23/2015 10:07 AM, Radim Krčmář wrote:
> 2015-02-23 14:39+0000, Stefan Hajnoczi:
>> Since the typo breaks compilation, it shows that the patch was not
>> tested.  That makes me nervous.
> 
> (It was based on a tested RHEL7 patch.  Upstream allowed for better
>  code, so I didn't want to push the ugly, yet working, version.)
> 
> Honestly, I was hoping that we don't want to fix this kind of bugs in
> QEMU at all, so posting early would have the same effect.
> 
>> Please send v2 after testing the Windows 8 guest.
> 
> Ok, (I don't have the reproducer, so it might take a while,)
Given that it is related to one of my BZs, I will help to validate the
patch using upstream QEMU.

-Wei
> 
> thanks.
>
Stefan Hajnoczi Feb. 24, 2015, 11:35 a.m. UTC | #7
On Thu, Feb 19, 2015 at 08:24:19PM +0100, Radim Krčmář wrote:
> Window 8.0 driver has a particular behavior for a small time frame after
> it enables rx interrupts:  the interrupt handler never clears
> E1000_ICR_RXT0.  The handler does this something like this:
>   set_imc(-1)               (1) disable all interrupts
>   val = read_icr()          (2) clear ICR
>   handled = magic(val)      (3) do nothing to E1000_ICR_RXT0
>   set_ics(val & ~handled)   (4) set unhandled interrupts back to ICR
>   set_ims(157)              (5) enable some interrupts
> 
> so if we started with RXT0, then every time the handler re-enables e1000
> interrupts, it receives one.  This likely wouldn't matter in real
> hardware, because it is slow enough to make some progress between
> interrupts, but KVM instantly interrupts it, and boot hangs.
> (If we have multiple VCPUs, the interrupt gets load-balanced and
>  everything is fine.)
> 
> I haven't found any problem in earlier phase of initialization and
> windows writes 0 to RADV and RDTR, so some workaround looks like the
> only way if we want to support win8.0 on uniprocessors.  (I vote NO.)
> 
> This workaround uses the fact that a constant is cleared from ICR and
> later set back to it.  After detecting this situation, we reuse the
> mitigation framework to inject an interrupt 10 microseconds later.
> (It's not exactly 10 microseconds, to keep the existing logic intact.)
> 
> The detection is done by checking at (1), (2), and (5).  (2) and (5)
> require that the only bit in ICR is RXT0.  We could also check at (4),
> and on writes to any other register, but it would most likely only add
> more useless code, because normal operations shouldn't behave like that
> anyway.  (An OS that deliberately keeps bits in ICR to notify itself
> that there are more packets, or for more creative reasons, is nothing we
> should care about.)
> 
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  The patch is still untested -- it only approximates the behavior of RHEL
>  patches that worked, I'll try to get a reproducer ...
> 
>  hw/net/e1000.c | 29 ++++++++++++++++++++++-------
>  1 file changed, 22 insertions(+), 7 deletions(-)

Hi Alex,
I've CCed you in case you have any advice regarding QEMU's e1000
emulation.  It seems Windows 8 gets itself into a kind of interrupt
storm and a workaround in QEMU will be necessary.

Any thoughts?

Thanks,
Stefan

> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
> index a207e21bcf77..773aac47f0b2 100644
> --- a/hw/net/e1000.c
> +++ b/hw/net/e1000.c
> @@ -138,6 +138,10 @@ typedef struct E1000State_st {
>  #define E1000_FLAG_AUTONEG (1 << E1000_FLAG_AUTONEG_BIT)
>  #define E1000_FLAG_MIT (1 << E1000_FLAG_MIT_BIT)
>      uint32_t compat_flags;
> +
> +#define E1000_WIN8_WORKAROUND_ICR       E1000_ICR_RXT0
> +#define E1000_WIN8_WORKAROUND_DELAY_US  10
> +    bool win8_workaround_needed;
>  } E1000State;
>  
>  typedef struct E1000BaseClass {
> @@ -288,7 +292,7 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
>  {
>      PCIDevice *d = PCI_DEVICE(s);
>      uint32_t pending_ints;
> -    uint32_t mit_delay;
> +    uint32_t mit_delay = 0;
>  
>      s->mac_reg[ICR] = val;
>  
> @@ -316,13 +320,17 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
>          if (s->mit_timer_on) {
>              return;
>          }
> +
> +        if (s->win8_workround_needed) {
> +            mit_delay = E1000_WIN8_WORKAROUND_DELAY_US * 4;
> +        }
> +
>          if (s->compat_flags & E1000_FLAG_MIT) {
>              /* Compute the next mitigation delay according to pending
>               * interrupts and the current values of RADV (provided
>               * RDTR!=0), TADV and ITR.
>               * Then rearm the timer.
>               */
> -            mit_delay = 0;
>              if (s->mit_ide &&
>                      (pending_ints & (E1000_ICR_TXQE | E1000_ICR_TXDW))) {
>                  mit_update_delay(&mit_delay, s->mac_reg[TADV] * 4);
> @@ -332,13 +340,14 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
>              }
>              mit_update_delay(&mit_delay, s->mac_reg[ITR]);
>  
> -            if (mit_delay) {
> -                s->mit_timer_on = 1;
> -                timer_mod(s->mit_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
> -                          mit_delay * 256);
> -            }
>              s->mit_ide = 0;
>          }
> +
> +        if (mit_delay) {
> +            s->mit_timer_on = 1;
> +            timer_mod(s->mit_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
> +                      mit_delay * 256);
> +        }
>      }
>  
>      s->mit_irq_level = (pending_ints != 0);
> @@ -411,6 +420,7 @@ static void e1000_reset(void *opaque)
>      d->mit_timer_on = 0;
>      d->mit_irq_level = 0;
>      d->mit_ide = 0;
> +    d->win8_workaround_needed = false;
>      memset(d->phy_reg, 0, sizeof d->phy_reg);
>      memmove(d->phy_reg, phy_reg_init, sizeof phy_reg_init);
>      d->phy_reg[PHY_ID2] = edc->phy_id2;
> @@ -1114,6 +1124,8 @@ mac_icr_read(E1000State *s, int index)
>  {
>      uint32_t ret = s->mac_reg[ICR];
>  
> +    s->win8_workaround_needed &= ret == E1000_WIN8_WORKAROUND_ICR;
> +
>      DBGOUT(INTERRUPT, "ICR read: %x\n", ret);
>      set_interrupt_cause(s, 0, 0);
>      return ret;
> @@ -1192,6 +1204,7 @@ static void
>  set_imc(E1000State *s, int index, uint32_t val)
>  {
>      s->mac_reg[IMS] &= ~val;
> +    s->win8_workaround_needed = ~val == 0;
>      set_ics(s, 0, 0);
>  }
>  
> @@ -1199,7 +1212,9 @@ static void
>  set_ims(E1000State *s, int index, uint32_t val)
>  {
>      s->mac_reg[IMS] |= val;
> +    s->win8_workaround_needed &= s->mac_reg[ICR] == E1000_WIN8_WORKAROUND_ICR;
>      set_ics(s, 0, 0);
> +    s->win8_workaround_needed = false;
>  }
>  
>  #define getreg(x)	[x] = mac_readreg
> -- 
> 2.3.0
> 
>
Stefan Hajnoczi Feb. 24, 2015, 11:46 a.m. UTC | #8
On Tue, Feb 24, 2015 at 11:35 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Thu, Feb 19, 2015 at 08:24:19PM +0100, Radim Krčmář wrote:
>> Window 8.0 driver has a particular behavior for a small time frame after
>> it enables rx interrupts:  the interrupt handler never clears
>> E1000_ICR_RXT0.  The handler does this something like this:
>>   set_imc(-1)               (1) disable all interrupts
>>   val = read_icr()          (2) clear ICR
>>   handled = magic(val)      (3) do nothing to E1000_ICR_RXT0
>>   set_ics(val & ~handled)   (4) set unhandled interrupts back to ICR
>>   set_ims(157)              (5) enable some interrupts
>>
>> so if we started with RXT0, then every time the handler re-enables e1000
>> interrupts, it receives one.  This likely wouldn't matter in real
>> hardware, because it is slow enough to make some progress between
>> interrupts, but KVM instantly interrupts it, and boot hangs.
>> (If we have multiple VCPUs, the interrupt gets load-balanced and
>>  everything is fine.)
>>
>> I haven't found any problem in earlier phase of initialization and
>> windows writes 0 to RADV and RDTR, so some workaround looks like the
>> only way if we want to support win8.0 on uniprocessors.  (I vote NO.)
>>
>> This workaround uses the fact that a constant is cleared from ICR and
>> later set back to it.  After detecting this situation, we reuse the
>> mitigation framework to inject an interrupt 10 microseconds later.
>> (It's not exactly 10 microseconds, to keep the existing logic intact.)
>>
>> The detection is done by checking at (1), (2), and (5).  (2) and (5)
>> require that the only bit in ICR is RXT0.  We could also check at (4),
>> and on writes to any other register, but it would most likely only add
>> more useless code, because normal operations shouldn't behave like that
>> anyway.  (An OS that deliberately keeps bits in ICR to notify itself
>> that there are more packets, or for more creative reasons, is nothing we
>> should care about.)
>>
>> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
>> ---
>>  The patch is still untested -- it only approximates the behavior of RHEL
>>  patches that worked, I'll try to get a reproducer ...
>>
>>  hw/net/e1000.c | 29 ++++++++++++++++++++++-------
>>  1 file changed, 22 insertions(+), 7 deletions(-)
>
> Hi Alex,
> I've CCed you in case you have any advice regarding QEMU's e1000
> emulation.  It seems Windows 8 gets itself into a kind of interrupt
> storm and a workaround in QEMU will be necessary.
>
> Any thoughts?

Okay, I guess Alex has changed jobs since the email has bounced.  Too
bad, it was worth a shot.

Regarding the workaround, I'm okay with it.  It's a hack for sure but
what other option do we have?

Stefan
Wei Huang March 20, 2015, 3:10 p.m. UTC | #9
On 02/24/2015 05:46 AM, Stefan Hajnoczi wrote:
> On Tue, Feb 24, 2015 at 11:35 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> On Thu, Feb 19, 2015 at 08:24:19PM +0100, Radim Krčmář wrote:
>>> Window 8.0 driver has a particular behavior for a small time frame after
>>> it enables rx interrupts:  the interrupt handler never clears
>>> E1000_ICR_RXT0.  The handler does this something like this:
>>>   set_imc(-1)               (1) disable all interrupts
>>>   val = read_icr()          (2) clear ICR
>>>   handled = magic(val)      (3) do nothing to E1000_ICR_RXT0
>>>   set_ics(val & ~handled)   (4) set unhandled interrupts back to ICR
>>>   set_ims(157)              (5) enable some interrupts
>>>
>>> so if we started with RXT0, then every time the handler re-enables e1000
>>> interrupts, it receives one.  This likely wouldn't matter in real
>>> hardware, because it is slow enough to make some progress between
>>> interrupts, but KVM instantly interrupts it, and boot hangs.
>>> (If we have multiple VCPUs, the interrupt gets load-balanced and
>>>  everything is fine.)
>>>
>>> I haven't found any problem in earlier phase of initialization and
>>> windows writes 0 to RADV and RDTR, so some workaround looks like the
>>> only way if we want to support win8.0 on uniprocessors.  (I vote NO.)
>>>
>>> This workaround uses the fact that a constant is cleared from ICR and
>>> later set back to it.  After detecting this situation, we reuse the
>>> mitigation framework to inject an interrupt 10 microseconds later.
>>> (It's not exactly 10 microseconds, to keep the existing logic intact.)
>>>
>>> The detection is done by checking at (1), (2), and (5).  (2) and (5)
>>> require that the only bit in ICR is RXT0.  We could also check at (4),
>>> and on writes to any other register, but it would most likely only add
>>> more useless code, because normal operations shouldn't behave like that
>>> anyway.  (An OS that deliberately keeps bits in ICR to notify itself
>>> that there are more packets, or for more creative reasons, is nothing we
>>> should care about.)
>>>
>>> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
>>> ---
>>>  The patch is still untested -- it only approximates the behavior of RHEL
>>>  patches that worked, I'll try to get a reproducer ...
>>>
>>>  hw/net/e1000.c | 29 ++++++++++++++++++++++-------
>>>  1 file changed, 22 insertions(+), 7 deletions(-)
>>
>> Hi Alex,
>> I've CCed you in case you have any advice regarding QEMU's e1000
>> emulation.  It seems Windows 8 gets itself into a kind of interrupt
>> storm and a workaround in QEMU will be necessary.
>>
>> Any thoughts?
> 
> Okay, I guess Alex has changed jobs since the email has bounced.  Too
> bad, it was worth a shot.
> 
> Regarding the workaround, I'm okay with it.  It's a hack for sure but
> what other option do we have?
> 
I wasn't able to reproduce this problem with upstream QEMU. According to
Radim, this bug requires a very subtle timing during guest installation.
So probably my testing didn't hit the right timing. Additionally our QE
confirmed that this patch fixed a Win8 installation issue that were seen
on in-house QEMU (e.g. qemu-kvm-rhev). With that, I am OK with this
patch. The only thing left is to fix the compilation in this patch (as
Radim pointed out). Anyway,

Reviewed-by: Wei Huang <wei@redhat.com>

Thanks,
-Wei

> Stefan
>
Jason Wang March 31, 2015, 5:26 a.m. UTC | #10
On 02/20/2015 03:24 AM, Radim Krčmář wrote:
> Window 8.0 driver has a particular behavior for a small time frame after
> it enables rx interrupts:  the interrupt handler never clears
> E1000_ICR_RXT0.  The handler does this something like this:
>   set_imc(-1)               (1) disable all interrupts
>   val = read_icr()          (2) clear ICR
>   handled = magic(val)      (3) do nothing to E1000_ICR_RXT0
>   set_ics(val & ~handled)   (4) set unhandled interrupts back to ICR
>   set_ims(157)              (5) enable some interrupts
>
> so if we started with RXT0, then every time the handler re-enables e1000
> interrupts, it receives one.  This likely wouldn't matter in real
> hardware, because it is slow enough to make some progress between
> interrupts, but KVM instantly interrupts it, and boot hangs.
> (If we have multiple VCPUs, the interrupt gets load-balanced and
>  everything is fine.)
>
> I haven't found any problem in earlier phase of initialization and
> windows writes 0 to RADV and RDTR, so some workaround looks like the
> only way if we want to support win8.0 on uniprocessors.  (I vote NO.)
>
> This workaround uses the fact that a constant is cleared from ICR and
> later set back to it.  After detecting this situation, we reuse the
> mitigation framework to inject an interrupt 10 microseconds later.
> (It's not exactly 10 microseconds, to keep the existing logic intact.)
>
> The detection is done by checking at (1), (2), and (5).  (2) and (5)
> require that the only bit in ICR is RXT0.  We could also check at (4),
> and on writes to any other register, but it would most likely only add
> more useless code, because normal operations shouldn't behave like that
> anyway.  (An OS that deliberately keeps bits in ICR to notify itself
> that there are more packets, or for more creative reasons, is nothing we
> should care about.)
>
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  The patch is still untested -- it only approximates the behavior of RHEL
>  patches that worked, I'll try to get a reproducer ...
>
>

Hi:

Two questions:

- Does Win8 still support 82540EM. According to
https://downloadcenter.intel.com/download/23071/Network-Adapter-Driver-for-Windows-8-1-
, it was not in the supported list. As a reference, 82540EM was in the
list of win2008:
https://downloadcenter.intel.com/download/18720/Network-Adapter-Driver-for-Windows-Server-2008-Final-Release.
If it was not supported officially, there's probably no need to
workaround a buggy driver in guest.
- The issue looks similar to the one that has been addressed by kernel
commit 184564efae4d775225c8fe3b762a56956fb1f827. Is this still
reproducible with this commit?

Thanks
Radim Krčmář March 31, 2015, 10:17 a.m. UTC | #11
2015-03-31 13:26+0800, Jason Wang:
> On 02/20/2015 03:24 AM, Radim Krčmář wrote:
> > Window 8.0 driver has a particular behavior for a small time frame after
> > it enables rx interrupts:  the interrupt handler never clears
> > E1000_ICR_RXT0.  The handler does this something like this:
> >   set_imc(-1)               (1) disable all interrupts
> >   val = read_icr()          (2) clear ICR
> >   handled = magic(val)      (3) do nothing to E1000_ICR_RXT0
> >   set_ics(val & ~handled)   (4) set unhandled interrupts back to ICR
> >   set_ims(157)              (5) enable some interrupts
> >
> > so if we started with RXT0, then every time the handler re-enables e1000
> > interrupts, it receives one.  This likely wouldn't matter in real
> > hardware, because it is slow enough to make some progress between
> > interrupts, but KVM instantly interrupts it, and boot hangs.
> > (If we have multiple VCPUs, the interrupt gets load-balanced and
> >  everything is fine.)
> >
> > I haven't found any problem in earlier phase of initialization and
> > windows writes 0 to RADV and RDTR, so some workaround looks like the
> > only way if we want to support win8.0 on uniprocessors.  (I vote NO.)
> >
> > This workaround uses the fact that a constant is cleared from ICR and
> > later set back to it.  After detecting this situation, we reuse the
> > mitigation framework to inject an interrupt 10 microseconds later.
> > (It's not exactly 10 microseconds, to keep the existing logic intact.)
> >
> > The detection is done by checking at (1), (2), and (5).  (2) and (5)
> > require that the only bit in ICR is RXT0.  We could also check at (4),
> > and on writes to any other register, but it would most likely only add
> > more useless code, because normal operations shouldn't behave like that
> > anyway.  (An OS that deliberately keeps bits in ICR to notify itself
> > that there are more packets, or for more creative reasons, is nothing we
> > should care about.)
> >
> > Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> > ---
> >  The patch is still untested -- it only approximates the behavior of RHEL
> >  patches that worked, I'll try to get a reproducer ...
> 
> Hi:
> 
> Two questions:
> 
> - Does Win8 still support 82540EM. According to
> https://downloadcenter.intel.com/download/23071/Network-Adapter-Driver-for-Windows-8-1-
> , it was not in the supported list. As a reference, 82540EM was in the
> list of win2008:
> https://downloadcenter.intel.com/download/18720/Network-Adapter-Driver-for-Windows-Server-2008-Final-Release.
> If it was not supported officially, there's probably no need to
> workaround a buggy driver in guest.

Probably not:
http://www.intel.com/support/network/adapter/pro100/sb/CS-033693.htm
https://downloadcenter.intel.com/download/21642/Network-Adapter-Driver-for-Windows-8-

That makes things simple, thank you.
I see no reason to sabotage QEMU with this patch now.

> - The issue looks similar to the one that has been addressed by kernel
> commit 184564efae4d775225c8fe3b762a56956fb1f827. Is this still
> reproducible with this commit?

Windows issues EOI between steps (1) and (2), while the line is down, so
the patch doesn't recognize it as EOI storm.  It's another problem with
zero latencies ... we could workaround it in the kernel by remembering
last interrupts and delaying down the injection a bit if the same one is
injected too often within some time frame; I wouldn't do that either.
Jason Wang April 1, 2015, 1:44 a.m. UTC | #12
On Tue, Mar 31, 2015 at 6:17 PM, Radim Krčmář <rkrcmar@redhat.com> 
wrote:
> 2015-03-31 13:26+0800, Jason Wang:
>>  On 02/20/2015 03:24 AM, Radim Krčmář wrote:
>>  > Window 8.0 driver has a particular behavior for a small time 
>> frame after
>>  > it enables rx interrupts:  the interrupt handler never clears
>>  > E1000_ICR_RXT0.  The handler does this something like this:
>>  >   set_imc(-1)               (1) disable all interrupts
>>  >   val = read_icr()          (2) clear ICR
>>  >   handled = magic(val)      (3) do nothing to E1000_ICR_RXT0
>>  >   set_ics(val & ~handled)   (4) set unhandled interrupts back to 
>> ICR
>>  >   set_ims(157)              (5) enable some interrupts
>>  >
>>  > so if we started with RXT0, then every time the handler 
>> re-enables e1000
>>  > interrupts, it receives one.  This likely wouldn't matter in real
>>  > hardware, because it is slow enough to make some progress between
>>  > interrupts, but KVM instantly interrupts it, and boot hangs.
>>  > (If we have multiple VCPUs, the interrupt gets load-balanced and
>>  >  everything is fine.)
>>  >
>>  > I haven't found any problem in earlier phase of initialization and
>>  > windows writes 0 to RADV and RDTR, so some workaround looks like 
>> the
>>  > only way if we want to support win8.0 on uniprocessors.  (I vote 
>> NO.)
>>  >
>>  > This workaround uses the fact that a constant is cleared from ICR 
>> and
>>  > later set back to it.  After detecting this situation, we reuse 
>> the
>>  > mitigation framework to inject an interrupt 10 microseconds later.
>>  > (It's not exactly 10 microseconds, to keep the existing logic 
>> intact.)
>>  >
>>  > The detection is done by checking at (1), (2), and (5).  (2) and 
>> (5)
>>  > require that the only bit in ICR is RXT0.  We could also check at 
>> (4),
>>  > and on writes to any other register, but it would most likely 
>> only add
>>  > more useless code, because normal operations shouldn't behave 
>> like that
>>  > anyway.  (An OS that deliberately keeps bits in ICR to notify 
>> itself
>>  > that there are more packets, or for more creative reasons, is 
>> nothing we
>>  > should care about.)
>>  >
>>  > Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
>>  > ---
>>  >  The patch is still untested -- it only approximates the behavior 
>> of RHEL
>>  >  patches that worked, I'll try to get a reproducer ...
>>  
>>  Hi:
>>  
>>  Two questions:
>>  
>>  - Does Win8 still support 82540EM. According to
>>  
>> https://downloadcenter.intel.com/download/23071/Network-Adapter-Driver-for-Windows-8-1-
>>  , it was not in the supported list. As a reference, 82540EM was in 
>> the
>>  list of win2008:
>>  
>> https://downloadcenter.intel.com/download/18720/Network-Adapter-Driver-for-Windows-Server-2008-Final-Release.
>>  If it was not supported officially, there's probably no need to
>>  workaround a buggy driver in guest.
> 
> Probably not:
> http://www.intel.com/support/network/adapter/pro100/sb/CS-033693.htm
> https://downloadcenter.intel.com/download/21642/Network-Adapter-Driver-for-Windows-8-
> 
> That makes things simple, thank you.
> I see no reason to sabotage QEMU with this patch now.
> 
>>  - The issue looks similar to the one that has been addressed by 
>> kernel
>>  commit 184564efae4d775225c8fe3b762a56956fb1f827. Is this still
>>  reproducible with this commit?
> 
> Windows issues EOI between steps (1) and (2), while the line is down, 
> so
> the patch doesn't recognize it as EOI storm. 

I see.

>  It's another problem with
> zero latencies ... we could workaround it in the kernel by remembering
> last interrupts and delaying down the injection a bit if the same one 
> is
> injected too often within some time frame; I wouldn't do that either.

Agree, thanks for the explanation.
diff mbox

Patch

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index a207e21bcf77..773aac47f0b2 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -138,6 +138,10 @@  typedef struct E1000State_st {
 #define E1000_FLAG_AUTONEG (1 << E1000_FLAG_AUTONEG_BIT)
 #define E1000_FLAG_MIT (1 << E1000_FLAG_MIT_BIT)
     uint32_t compat_flags;
+
+#define E1000_WIN8_WORKAROUND_ICR       E1000_ICR_RXT0
+#define E1000_WIN8_WORKAROUND_DELAY_US  10
+    bool win8_workaround_needed;
 } E1000State;
 
 typedef struct E1000BaseClass {
@@ -288,7 +292,7 @@  set_interrupt_cause(E1000State *s, int index, uint32_t val)
 {
     PCIDevice *d = PCI_DEVICE(s);
     uint32_t pending_ints;
-    uint32_t mit_delay;
+    uint32_t mit_delay = 0;
 
     s->mac_reg[ICR] = val;
 
@@ -316,13 +320,17 @@  set_interrupt_cause(E1000State *s, int index, uint32_t val)
         if (s->mit_timer_on) {
             return;
         }
+
+        if (s->win8_workround_needed) {
+            mit_delay = E1000_WIN8_WORKAROUND_DELAY_US * 4;
+        }
+
         if (s->compat_flags & E1000_FLAG_MIT) {
             /* Compute the next mitigation delay according to pending
              * interrupts and the current values of RADV (provided
              * RDTR!=0), TADV and ITR.
              * Then rearm the timer.
              */
-            mit_delay = 0;
             if (s->mit_ide &&
                     (pending_ints & (E1000_ICR_TXQE | E1000_ICR_TXDW))) {
                 mit_update_delay(&mit_delay, s->mac_reg[TADV] * 4);
@@ -332,13 +340,14 @@  set_interrupt_cause(E1000State *s, int index, uint32_t val)
             }
             mit_update_delay(&mit_delay, s->mac_reg[ITR]);
 
-            if (mit_delay) {
-                s->mit_timer_on = 1;
-                timer_mod(s->mit_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
-                          mit_delay * 256);
-            }
             s->mit_ide = 0;
         }
+
+        if (mit_delay) {
+            s->mit_timer_on = 1;
+            timer_mod(s->mit_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
+                      mit_delay * 256);
+        }
     }
 
     s->mit_irq_level = (pending_ints != 0);
@@ -411,6 +420,7 @@  static void e1000_reset(void *opaque)
     d->mit_timer_on = 0;
     d->mit_irq_level = 0;
     d->mit_ide = 0;
+    d->win8_workaround_needed = false;
     memset(d->phy_reg, 0, sizeof d->phy_reg);
     memmove(d->phy_reg, phy_reg_init, sizeof phy_reg_init);
     d->phy_reg[PHY_ID2] = edc->phy_id2;
@@ -1114,6 +1124,8 @@  mac_icr_read(E1000State *s, int index)
 {
     uint32_t ret = s->mac_reg[ICR];
 
+    s->win8_workaround_needed &= ret == E1000_WIN8_WORKAROUND_ICR;
+
     DBGOUT(INTERRUPT, "ICR read: %x\n", ret);
     set_interrupt_cause(s, 0, 0);
     return ret;
@@ -1192,6 +1204,7 @@  static void
 set_imc(E1000State *s, int index, uint32_t val)
 {
     s->mac_reg[IMS] &= ~val;
+    s->win8_workaround_needed = ~val == 0;
     set_ics(s, 0, 0);
 }
 
@@ -1199,7 +1212,9 @@  static void
 set_ims(E1000State *s, int index, uint32_t val)
 {
     s->mac_reg[IMS] |= val;
+    s->win8_workaround_needed &= s->mac_reg[ICR] == E1000_WIN8_WORKAROUND_ICR;
     set_ics(s, 0, 0);
+    s->win8_workaround_needed = false;
 }
 
 #define getreg(x)	[x] = mac_readreg