diff mbox

e1000: make ICS write-only

Message ID 20130109105100.GA17137@redhat.com
State New
Headers show

Commit Message

Michael S. Tsirkin Jan. 9, 2013, 10:51 a.m. UTC
Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968,
qemu started updating ICS register when interrupt
is sent, with the intent to match spec better
(guests do not actually read this register).
However, the function set_interrupt_cause where ICS
is updated is often called internally by
device emulation so reading it does not produce the last value
written by driver.  Looking closer at the spec,
it documents ICS as write-only, so there's no need
to update it at all. I conclude that while harmless this line is useless
code so removing it is a bit cleaner than keeping it in.

Tested with windows and linux guests.

Cc: Bill Paul <wpaul@windriver.com>
Reported-by: Yan Vugenfirer <yan@daynix.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/e1000.c | 1 -
 1 file changed, 1 deletion(-)

Comments

Stefan Hajnoczi Jan. 9, 2013, 2:14 p.m. UTC | #1
On Wed, Jan 09, 2013 at 12:51:00PM +0200, Michael S. Tsirkin wrote:
> Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968,
> qemu started updating ICS register when interrupt
> is sent, with the intent to match spec better
> (guests do not actually read this register).
> However, the function set_interrupt_cause where ICS
> is updated is often called internally by
> device emulation so reading it does not produce the last value
> written by driver.  Looking closer at the spec,
> it documents ICS as write-only, so there's no need
> to update it at all. I conclude that while harmless this line is useless
> code so removing it is a bit cleaner than keeping it in.
> 
> Tested with windows and linux guests.
> 
> Cc: Bill Paul <wpaul@windriver.com>
> Reported-by: Yan Vugenfirer <yan@daynix.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  hw/e1000.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/hw/e1000.c b/hw/e1000.c
> index 92fb00a..928d804 100644
> --- a/hw/e1000.c
> +++ b/hw/e1000.c
> @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
>          val |= E1000_ICR_INT_ASSERTED;
>      }
>      s->mac_reg[ICR] = val;
> -    s->mac_reg[ICS] = val;
>      qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
>  }

Looks good according to datasheet but let's give Bill a chance to review
this patch before merging.

Stefan
Anthony Liguori Jan. 9, 2013, 2:45 p.m. UTC | #2
"Michael S. Tsirkin" <mst@redhat.com> writes:

> Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968,
> qemu started updating ICS register when interrupt
> is sent, with the intent to match spec better
> (guests do not actually read this register).
> However, the function set_interrupt_cause where ICS
> is updated is often called internally by
> device emulation so reading it does not produce the last value
> written by driver.  Looking closer at the spec,
> it documents ICS as write-only, so there's no need
> to update it at all. I conclude that while harmless this line is useless
> code so removing it is a bit cleaner than keeping it in.
>
> Tested with windows and linux guests.
>
> Cc: Bill Paul <wpaul@windriver.com>
> Reported-by: Yan Vugenfirer <yan@daynix.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Yeah, I'm a little confused as to why we would need to set the ICS
register too.  Unless Bill had a reason that I'm missing:

Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>

Regards,

Anthony Liguori

> ---
>  hw/e1000.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/hw/e1000.c b/hw/e1000.c
> index 92fb00a..928d804 100644
> --- a/hw/e1000.c
> +++ b/hw/e1000.c
> @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
>          val |= E1000_ICR_INT_ASSERTED;
>      }
>      s->mac_reg[ICR] = val;
> -    s->mac_reg[ICS] = val;
>      qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
>  }
>  
> -- 
> MST
Jason Wang Jan. 9, 2013, 3:28 p.m. UTC | #3
On 01/09/2013 06:51 PM, Michael S. Tsirkin wrote:
> Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968,
> qemu started updating ICS register when interrupt
> is sent, with the intent to match spec better
> (guests do not actually read this register).
> However, the function set_interrupt_cause where ICS
> is updated is often called internally by
> device emulation so reading it does not produce the last value
> written by driver.  Looking closer at the spec,
> it documents ICS as write-only, so there's no need
> to update it at all. I conclude that while harmless this line is useless
> code so removing it is a bit cleaner than keeping it in.
>
> Tested with windows and linux guests.
>
> Cc: Bill Paul <wpaul@windriver.com>
> Reported-by: Yan Vugenfirer <yan@daynix.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  hw/e1000.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/hw/e1000.c b/hw/e1000.c
> index 92fb00a..928d804 100644
> --- a/hw/e1000.c
> +++ b/hw/e1000.c
> @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
>          val |= E1000_ICR_INT_ASSERTED;
>      }
>      s->mac_reg[ICR] = val;
> -    s->mac_reg[ICS] = val;
>      qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
>  }
>  

If my memory is correct, though ICS is marked as read only in the spec,
we do can read it when I'm examining a real e1000 card.
Michael S. Tsirkin Jan. 9, 2013, 3:34 p.m. UTC | #4
On Wed, Jan 09, 2013 at 11:28:29PM +0800, Jason Wang wrote:
> On 01/09/2013 06:51 PM, Michael S. Tsirkin wrote:
> > Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968,
> > qemu started updating ICS register when interrupt
> > is sent, with the intent to match spec better
> > (guests do not actually read this register).
> > However, the function set_interrupt_cause where ICS
> > is updated is often called internally by
> > device emulation so reading it does not produce the last value
> > written by driver.  Looking closer at the spec,
> > it documents ICS as write-only, so there's no need
> > to update it at all. I conclude that while harmless this line is useless
> > code so removing it is a bit cleaner than keeping it in.
> >
> > Tested with windows and linux guests.
> >
> > Cc: Bill Paul <wpaul@windriver.com>
> > Reported-by: Yan Vugenfirer <yan@daynix.com>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  hw/e1000.c | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/hw/e1000.c b/hw/e1000.c
> > index 92fb00a..928d804 100644
> > --- a/hw/e1000.c
> > +++ b/hw/e1000.c
> > @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
> >          val |= E1000_ICR_INT_ASSERTED;
> >      }
> >      s->mac_reg[ICR] = val;
> > -    s->mac_reg[ICS] = val;
> >      qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
> >  }
> >  
> 
> If my memory is correct, though ICS is marked as read only in the spec,
> we do can read it when I'm examining a real e1000 card.

Interesting, this was not Bill's motivation.
I haven't seen any reads with linux or windows guests -
which guest did trigger them for you?
Also, what's the value one would expect?
Jason Wang Jan. 9, 2013, 3:36 p.m. UTC | #5
On 01/09/2013 11:34 PM, Michael S. Tsirkin wrote:
> On Wed, Jan 09, 2013 at 11:28:29PM +0800, Jason Wang wrote:
>> On 01/09/2013 06:51 PM, Michael S. Tsirkin wrote:
>>> Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968,
>>> qemu started updating ICS register when interrupt
>>> is sent, with the intent to match spec better
>>> (guests do not actually read this register).
>>> However, the function set_interrupt_cause where ICS
>>> is updated is often called internally by
>>> device emulation so reading it does not produce the last value
>>> written by driver.  Looking closer at the spec,
>>> it documents ICS as write-only, so there's no need
>>> to update it at all. I conclude that while harmless this line is useless
>>> code so removing it is a bit cleaner than keeping it in.
>>>
>>> Tested with windows and linux guests.
>>>
>>> Cc: Bill Paul <wpaul@windriver.com>
>>> Reported-by: Yan Vugenfirer <yan@daynix.com>
>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>> ---
>>>  hw/e1000.c | 1 -
>>>  1 file changed, 1 deletion(-)
>>>
>>> diff --git a/hw/e1000.c b/hw/e1000.c
>>> index 92fb00a..928d804 100644
>>> --- a/hw/e1000.c
>>> +++ b/hw/e1000.c
>>> @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
>>>          val |= E1000_ICR_INT_ASSERTED;
>>>      }
>>>      s->mac_reg[ICR] = val;
>>> -    s->mac_reg[ICS] = val;
>>>      qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
>>>  }
>>>  
>> If my memory is correct, though ICS is marked as read only in the spec,
>> we do can read it when I'm examining a real e1000 card.
> Interesting, this was not Bill's motivation.
> I haven't seen any reads with linux or windows guests -
> which guest did trigger them for you?
> Also, what's the value one would expect?
>

I also find this violation of spec in the past, so I just hack the linux
driver to see the result. I'm not sure if there are some driver depends
on this value. I forget the detail of what value it returns, may worth
to try again to see.
Michael S. Tsirkin Jan. 9, 2013, 3:48 p.m. UTC | #6
On Wed, Jan 09, 2013 at 11:36:50PM +0800, Jason Wang wrote:
> On 01/09/2013 11:34 PM, Michael S. Tsirkin wrote:
> > On Wed, Jan 09, 2013 at 11:28:29PM +0800, Jason Wang wrote:
> >> On 01/09/2013 06:51 PM, Michael S. Tsirkin wrote:
> >>> Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968,
> >>> qemu started updating ICS register when interrupt
> >>> is sent, with the intent to match spec better
> >>> (guests do not actually read this register).
> >>> However, the function set_interrupt_cause where ICS
> >>> is updated is often called internally by
> >>> device emulation so reading it does not produce the last value
> >>> written by driver.  Looking closer at the spec,
> >>> it documents ICS as write-only, so there's no need
> >>> to update it at all. I conclude that while harmless this line is useless
> >>> code so removing it is a bit cleaner than keeping it in.
> >>>
> >>> Tested with windows and linux guests.
> >>>
> >>> Cc: Bill Paul <wpaul@windriver.com>
> >>> Reported-by: Yan Vugenfirer <yan@daynix.com>
> >>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >>> ---
> >>>  hw/e1000.c | 1 -
> >>>  1 file changed, 1 deletion(-)
> >>>
> >>> diff --git a/hw/e1000.c b/hw/e1000.c
> >>> index 92fb00a..928d804 100644
> >>> --- a/hw/e1000.c
> >>> +++ b/hw/e1000.c
> >>> @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
> >>>          val |= E1000_ICR_INT_ASSERTED;
> >>>      }
> >>>      s->mac_reg[ICR] = val;
> >>> -    s->mac_reg[ICS] = val;
> >>>      qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
> >>>  }
> >>>  
> >> If my memory is correct, though ICS is marked as read only in the spec,
> >> we do can read it when I'm examining a real e1000 card.
> > Interesting, this was not Bill's motivation.
> > I haven't seen any reads with linux or windows guests -
> > which guest did trigger them for you?
> > Also, what's the value one would expect?
> >
> 
> I also find this violation of spec in the past, so I just hack the linux
> driver to see the result. I'm not sure if there are some driver depends
> on this value. I forget the detail of what value it returns, may worth
> to try again to see.

What Bill wrote in the commit log is that he didn't see any guest
reading this.
Bill Paul Jan. 9, 2013, 5:30 p.m. UTC | #7
Of all the gin joints in all the towns in all the world, Michael S. Tsirkin 
had to walk into mine at 02:51:00 on Wednesday 09 January 2013 and say:

> Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968,
> qemu started updating ICS register when interrupt
> is sent, with the intent to match spec better
> (guests do not actually read this register).
> However, the function set_interrupt_cause where ICS
> is updated is often called internally by
> device emulation so reading it does not produce the last value
> written by driver.  Looking closer at the spec,
> it documents ICS as write-only, so there's no need
> to update it at all. I conclude that while harmless this line is useless
> code so removing it is a bit cleaner than keeping it in.

You are wrong.

I know what the spec says. The spec lies (or at the very least, it doesn't 
agree with reality). With actual PRO/1000 hardware, the ICS register is 
readable. It provides a way to obtain the currently pending interrupt bits 
without the auto-clear behavior of the ICR register (which in some cases is 
actually detrimental).

The Intel 10GbE NICs (82598, 82599, x540) are the same way (they're very 
similar in design to the PRO/1000s, particularly the 82575, 82576, 82580 and 
later devices). The spes for the 10GbE devices _also_ say that ICS is read 
only. But if you look at the Intel drivers for those chips, you'll see they 
have actually implemented a workaround for a device errata (I think the for 
the 82598) that actually requires reading the ICS register. If you had 
implemented a PRO/10GbE virtual device based on the same chip and obeyed the 
spec the same way, I suspect Intel's driver would break.

I actually have in my possession real PRO/1000 silicon going all the way back 
to the 82543 and have tested every single one of them, and they all work like 
this. I've also asked the Intel LAN access division people about it and they 
said that in spite of it not being documented as readable, there's nothing 
particularly wrong with doing this.

The problem with using ICR is that the auto-clear behavior can sometimes 
result in some awkward interrupt handling code. You need to test ICR in 
interrupt context to see if there are events pending, and then schedule some 
other thread to handle them. But since reading ICR clears the bits, you need 
to save the events somewhere so that the other thread knows what events need 
servicing. Keeping the saved copy of pending events properly synchronized so 
that you don't lose any is actually big challenge. The VxWorks PRO/1000 driver 
used to have some very ugly code in it to deal with it, all of which became 
much simpler when using the ICS register instead.

I know what the spec says. But this is a case where the spec only says that 
because Intel decided not to disclose what the hardware actually does. They 
also don't tell you about all the hidden debug registers in the hardware 
either, but that doesn't mean they don't exist.

So pretty please, with sugar on top, leave this code alone.

-Bill

> Tested with windows and linux guests.
> 
> Cc: Bill Paul <wpaul@windriver.com>
> Reported-by: Yan Vugenfirer <yan@daynix.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  hw/e1000.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/hw/e1000.c b/hw/e1000.c
> index 92fb00a..928d804 100644
> --- a/hw/e1000.c
> +++ b/hw/e1000.c
> @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index, uint32_t
> val) val |= E1000_ICR_INT_ASSERTED;
>      }
>      s->mac_reg[ICR] = val;
> -    s->mac_reg[ICS] = val;
>      qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
>  }
Michael S. Tsirkin Jan. 9, 2013, 5:51 p.m. UTC | #8
On Wed, Jan 09, 2013 at 09:30:43AM -0800, Bill Paul wrote:
> Of all the gin joints in all the towns in all the world, Michael S. Tsirkin 
> had to walk into mine at 02:51:00 on Wednesday 09 January 2013 and say:
> 
> > Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968,
> > qemu started updating ICS register when interrupt
> > is sent, with the intent to match spec better
> > (guests do not actually read this register).
> > However, the function set_interrupt_cause where ICS
> > is updated is often called internally by
> > device emulation so reading it does not produce the last value
> > written by driver.  Looking closer at the spec,
> > it documents ICS as write-only, so there's no need
> > to update it at all. I conclude that while harmless this line is useless
> > code so removing it is a bit cleaner than keeping it in.
> 
> You are wrong.
> 
> I know what the spec says. The spec lies (or at the very least, it doesn't 
> agree with reality). With actual PRO/1000 hardware, the ICS register is 
> readable. It provides a way to obtain the currently pending interrupt bits 
> without the auto-clear behavior of the ICR register (which in some cases is 
> actually detrimental).
> 
> The Intel 10GbE NICs (82598, 82599, x540) are the same way (they're very 
> similar in design to the PRO/1000s, particularly the 82575, 82576, 82580 and 
> later devices). The spes for the 10GbE devices _also_ say that ICS is read 
> only. But if you look at the Intel drivers for those chips, you'll see they 
> have actually implemented a workaround for a device errata (I think the for 
> the 82598) that actually requires reading the ICS register. If you had 
> implemented a PRO/10GbE virtual device based on the same chip and obeyed the 
> spec the same way, I suspect Intel's driver would break.
> 
> I actually have in my possession real PRO/1000 silicon going all the way back 
> to the 82543 and have tested every single one of them, and they all work like 
> this. I've also asked the Intel LAN access division people about it and they 
> said that in spite of it not being documented as readable, there's nothing 
> particularly wrong with doing this.
> 
> The problem with using ICR is that the auto-clear behavior can sometimes 
> result in some awkward interrupt handling code. You need to test ICR in 
> interrupt context to see if there are events pending, and then schedule some 
> other thread to handle them. But since reading ICR clears the bits, you need 
> to save the events somewhere so that the other thread knows what events need 
> servicing. Keeping the saved copy of pending events properly synchronized so 
> that you don't lose any is actually big challenge. The VxWorks PRO/1000 driver 
> used to have some very ugly code in it to deal with it, all of which became 
> much simpler when using the ICS register instead.
> 
> I know what the spec says. But this is a case where the spec only says that 
> because Intel decided not to disclose what the hardware actually does. They 
> also don't tell you about all the hidden debug registers in the hardware 
> either, but that doesn't mean they don't exist.
> 
> So pretty please, with sugar on top, leave this code alone.
> 
> -Bill

OKay, so it's an undocumented feature that some drivers use, instead of
(as commit log implies) being a documented feature that drivers don't
use. Thanks for the clarification.
I will put info in code comments.

> > Tested with windows and linux guests.
> > 
> > Cc: Bill Paul <wpaul@windriver.com>
> > Reported-by: Yan Vugenfirer <yan@daynix.com>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  hw/e1000.c | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/hw/e1000.c b/hw/e1000.c
> > index 92fb00a..928d804 100644
> > --- a/hw/e1000.c
> > +++ b/hw/e1000.c
> > @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index, uint32_t
> > val) val |= E1000_ICR_INT_ASSERTED;
> >      }
> >      s->mac_reg[ICR] = val;
> > -    s->mac_reg[ICS] = val;
> >      qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
> >  }
> 
> -- 
> =============================================================================
> -Bill Paul            (510) 749-2329 | Member of Technical Staff,
>                  wpaul@windriver.com | Master of Unix-Fu - Wind River Systems
> =============================================================================
>    "I put a dollar in a change machine. Nothing changed." - George Carlin
> =============================================================================
Anthony Liguori Jan. 9, 2013, 7:50 p.m. UTC | #9
Bill Paul <wpaul@windriver.com> writes:

> Of all the gin joints in all the towns in all the world, Michael S. Tsirkin 
> had to walk into mine at 02:51:00 on Wednesday 09 January 2013 and say:
>
>> Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968,
>> qemu started updating ICS register when interrupt
>> is sent, with the intent to match spec better
>> (guests do not actually read this register).
>> However, the function set_interrupt_cause where ICS
>> is updated is often called internally by
>> device emulation so reading it does not produce the last value
>> written by driver.  Looking closer at the spec,
>> it documents ICS as write-only, so there's no need
>> to update it at all. I conclude that while harmless this line is useless
>> code so removing it is a bit cleaner than keeping it in.
>
> You are wrong.
>
> I know what the spec says. The spec lies (or at the very least, it doesn't 
> agree with reality). With actual PRO/1000 hardware, the ICS register is 
> readable. It provides a way to obtain the currently pending interrupt bits 
> without the auto-clear behavior of the ICR register (which in some cases is 
> actually detrimental).
>
> The Intel 10GbE NICs (82598, 82599, x540) are the same way (they're very 
> similar in design to the PRO/1000s, particularly the 82575, 82576, 82580 and 
> later devices). The spes for the 10GbE devices _also_ say that ICS is read 
> only. But if you look at the Intel drivers for those chips, you'll see they 
> have actually implemented a workaround for a device errata (I think the for 
> the 82598) that actually requires reading the ICS register. If you had 
> implemented a PRO/10GbE virtual device based on the same chip and obeyed the 
> spec the same way, I suspect Intel's driver would break.
>
> I actually have in my possession real PRO/1000 silicon going all the way back 
> to the 82543 and have tested every single one of them, and they all work like 
> this. I've also asked the Intel LAN access division people about it and they 
> said that in spite of it not being documented as readable, there's nothing 
> particularly wrong with doing this.
>
> The problem with using ICR is that the auto-clear behavior can sometimes 
> result in some awkward interrupt handling code. You need to test ICR in 
> interrupt context to see if there are events pending, and then schedule some 
> other thread to handle them. But since reading ICR clears the bits, you need 
> to save the events somewhere so that the other thread knows what events need 
> servicing. Keeping the saved copy of pending events properly synchronized so 
> that you don't lose any is actually big challenge. The VxWorks PRO/1000 driver 
> used to have some very ugly code in it to deal with it, all of which became 
> much simpler when using the ICS register instead.
>
> I know what the spec says. But this is a case where the spec only says that 
> because Intel decided not to disclose what the hardware actually does. They 
> also don't tell you about all the hidden debug registers in the hardware 
> either, but that doesn't mean they don't exist.
>
> So pretty please, with sugar on top, leave this code alone.

I figured this would be the case but this is where a comment in the code
or commit message would have helped a lot in avoiding future confusion.

Regards,

Anthony Liguori

>
> -Bill
>
>> Tested with windows and linux guests.
>> 
>> Cc: Bill Paul <wpaul@windriver.com>
>> Reported-by: Yan Vugenfirer <yan@daynix.com>
>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>> ---
>>  hw/e1000.c | 1 -
>>  1 file changed, 1 deletion(-)
>> 
>> diff --git a/hw/e1000.c b/hw/e1000.c
>> index 92fb00a..928d804 100644
>> --- a/hw/e1000.c
>> +++ b/hw/e1000.c
>> @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index, uint32_t
>> val) val |= E1000_ICR_INT_ASSERTED;
>>      }
>>      s->mac_reg[ICR] = val;
>> -    s->mac_reg[ICS] = val;
>>      qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
>>  }
>
> -- 
> =============================================================================
> -Bill Paul            (510) 749-2329 | Member of Technical Staff,
>                  wpaul@windriver.com | Master of Unix-Fu - Wind River Systems
> =============================================================================
>    "I put a dollar in a change machine. Nothing changed." - George Carlin
> =============================================================================
Michael S. Tsirkin Jan. 9, 2013, 9:44 p.m. UTC | #10
On Wed, Jan 09, 2013 at 09:30:43AM -0800, Bill Paul wrote:
> Of all the gin joints in all the towns in all the world, Michael S. Tsirkin 
> had to walk into mine at 02:51:00 on Wednesday 09 January 2013 and say:
> 
> > Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968,
> > qemu started updating ICS register when interrupt
> > is sent, with the intent to match spec better
> > (guests do not actually read this register).
> > However, the function set_interrupt_cause where ICS
> > is updated is often called internally by
> > device emulation so reading it does not produce the last value
> > written by driver.  Looking closer at the spec,
> > it documents ICS as write-only, so there's no need
> > to update it at all. I conclude that while harmless this line is useless
> > code so removing it is a bit cleaner than keeping it in.
> 
> You are wrong.
> 
> I know what the spec says. The spec lies (or at the very least, it doesn't 
> agree with reality). With actual PRO/1000 hardware, the ICS register is 
> readable. It provides a way to obtain the currently pending interrupt bits 
> without the auto-clear behavior of the ICR register (which in some cases is 
> actually detrimental).
> 
> The Intel 10GbE NICs (82598, 82599, x540) are the same way (they're very 
> similar in design to the PRO/1000s, particularly the 82575, 82576, 82580 and 
> later devices). The spes for the 10GbE devices _also_ say that ICS is read 
> only. But if you look at the Intel drivers for those chips, you'll see they 
> have actually implemented a workaround for a device errata (I think the for 
> the 82598) that actually requires reading the ICS register. If you had 
> implemented a PRO/10GbE virtual device based on the same chip and obeyed the 
> spec the same way, I suspect Intel's driver would break.
> 
> I actually have in my possession real PRO/1000 silicon going all the way back 
> to the 82543 and have tested every single one of them, and they all work like 
> this. I've also asked the Intel LAN access division people about it and they 
> said that in spite of it not being documented as readable, there's nothing 
> particularly wrong with doing this.
> 
> The problem with using ICR is that the auto-clear behavior can sometimes 
> result in some awkward interrupt handling code. You need to test ICR in 
> interrupt context to see if there are events pending, and then schedule some 
> other thread to handle them. But since reading ICR clears the bits, you need 
> to save the events somewhere so that the other thread knows what events need 
> servicing. Keeping the saved copy of pending events properly synchronized so 
> that you don't lose any is actually big challenge. The VxWorks PRO/1000 driver 
> used to have some very ugly code in it to deal with it, all of which became 
> much simpler when using the ICS register instead.
> 
> I know what the spec says. But this is a case where the spec only says that 
> because Intel decided not to disclose what the hardware actually does. They 
> also don't tell you about all the hidden debug registers in the hardware 
> either, but that doesn't mean they don't exist.
> 
> So pretty please, with sugar on top, leave this code alone.
> 
> -Bill

OK now since there's no spec, I'd like to find out how does the
register behave. Let's assume I read ICR. This clears ICR - does it
also clear ICS?

Thanks,
MST





> > Tested with windows and linux guests.
> > 
> > Cc: Bill Paul <wpaul@windriver.com>
> > Reported-by: Yan Vugenfirer <yan@daynix.com>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  hw/e1000.c | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/hw/e1000.c b/hw/e1000.c
> > index 92fb00a..928d804 100644
> > --- a/hw/e1000.c
> > +++ b/hw/e1000.c
> > @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index, uint32_t
> > val) val |= E1000_ICR_INT_ASSERTED;
> >      }
> >      s->mac_reg[ICR] = val;
> > -    s->mac_reg[ICS] = val;
> >      qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
> >  }
> 
> -- 
> =============================================================================
> -Bill Paul            (510) 749-2329 | Member of Technical Staff,
>                  wpaul@windriver.com | Master of Unix-Fu - Wind River Systems
> =============================================================================
>    "I put a dollar in a change machine. Nothing changed." - George Carlin
> =============================================================================
Bill Paul Jan. 9, 2013, 10:05 p.m. UTC | #11
Of all the gin joints in all the towns in all the world, Michael S. Tsirkin 
had to walk into mine at 13:44:38 on Wednesday 09 January 2013 and say:

> On Wed, Jan 09, 2013 at 09:30:43AM -0800, Bill Paul wrote:
> > Of all the gin joints in all the towns in all the world, Michael S.
> > Tsirkin
> > 
> > had to walk into mine at 02:51:00 on Wednesday 09 January 2013 and say:
> > > Since commit b1332393cdd7d023de8f1f8aa136ee7866a18968,
> > > qemu started updating ICS register when interrupt
> > > is sent, with the intent to match spec better
> > > (guests do not actually read this register).
> > > However, the function set_interrupt_cause where ICS
> > > is updated is often called internally by
> > > device emulation so reading it does not produce the last value
> > > written by driver.  Looking closer at the spec,
> > > it documents ICS as write-only, so there's no need
> > > to update it at all. I conclude that while harmless this line is
> > > useless code so removing it is a bit cleaner than keeping it in.
> > 
> > You are wrong.
> > 
> > I know what the spec says. The spec lies (or at the very least, it
> > doesn't agree with reality). With actual PRO/1000 hardware, the ICS
> > register is readable. It provides a way to obtain the currently pending
> > interrupt bits without the auto-clear behavior of the ICR register
> > (which in some cases is actually detrimental).
> > 
> > The Intel 10GbE NICs (82598, 82599, x540) are the same way (they're very
> > similar in design to the PRO/1000s, particularly the 82575, 82576, 82580
> > and later devices). The spes for the 10GbE devices _also_ say that ICS
> > is read only. But if you look at the Intel drivers for those chips,
> > you'll see they have actually implemented a workaround for a device
> > errata (I think the for the 82598) that actually requires reading the
> > ICS register. If you had implemented a PRO/10GbE virtual device based on
> > the same chip and obeyed the spec the same way, I suspect Intel's driver
> > would break.
> > 
> > I actually have in my possession real PRO/1000 silicon going all the way
> > back to the 82543 and have tested every single one of them, and they all
> > work like this. I've also asked the Intel LAN access division people
> > about it and they said that in spite of it not being documented as
> > readable, there's nothing particularly wrong with doing this.
> > 
> > The problem with using ICR is that the auto-clear behavior can sometimes
> > result in some awkward interrupt handling code. You need to test ICR in
> > interrupt context to see if there are events pending, and then schedule
> > some other thread to handle them. But since reading ICR clears the bits,
> > you need to save the events somewhere so that the other thread knows
> > what events need servicing. Keeping the saved copy of pending events
> > properly synchronized so that you don't lose any is actually big
> > challenge. The VxWorks PRO/1000 driver used to have some very ugly code
> > in it to deal with it, all of which became much simpler when using the
> > ICS register instead.
> > 
> > I know what the spec says. But this is a case where the spec only says
> > that because Intel decided not to disclose what the hardware actually
> > does. They also don't tell you about all the hidden debug registers in
> > the hardware either, but that doesn't mean they don't exist.
> > 
> > So pretty please, with sugar on top, leave this code alone.
> > 
> > -Bill
> 
> OK now since there's no spec, I'd like to find out how does the
> register behave. Let's assume I read ICR. This clears ICR - does it
> also clear ICS?

Yes.

ICS mirrors ICR: reading ICS lets you peek at the contents of ICR without 
auto-clearing them. If you read ICR, you're acknowledging any events that may 
be pending.

Let's say the LSC (link state change) becomes set, because you unplugged the 
cable. The LSC bit becomes set in both ICR and in ICS.

If you read ICS first, then you'll see the LSC bit is set, and it'll stay set 
(the event stays unacked).

If you read ICR first, then you'll see the LSC bit is set, and you'll clear it 
(the event is acked).

If you read ICS _after_ you read ICR, then LSC will be clear (because reading 
ICR first acked it).

-Bill


> Thanks,
> MST
> 
> > > Tested with windows and linux guests.
> > > 
> > > Cc: Bill Paul <wpaul@windriver.com>
> > > Reported-by: Yan Vugenfirer <yan@daynix.com>
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > > 
> > >  hw/e1000.c | 1 -
> > >  1 file changed, 1 deletion(-)
> > > 
> > > diff --git a/hw/e1000.c b/hw/e1000.c
> > > index 92fb00a..928d804 100644
> > > --- a/hw/e1000.c
> > > +++ b/hw/e1000.c
> > > @@ -230,7 +230,6 @@ set_interrupt_cause(E1000State *s, int index,
> > > uint32_t val) val |= E1000_ICR_INT_ASSERTED;
> > > 
> > >      }
> > >      s->mac_reg[ICR] = val;
> > > 
> > > -    s->mac_reg[ICS] = val;
> > > 
> > >      qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) !=
> > >      0);
> > >  
> > >  }
Michael S. Tsirkin Jan. 9, 2013, 10:07 p.m. UTC | #12
On Wed, Jan 09, 2013 at 01:50:39PM -0600, Anthony Liguori wrote:
> I figured this would be the case but this is where a comment in the code
> or commit message would have helped a lot in avoiding future confusion.
> 
> Regards,
> 
> Anthony Liguori

I just sent a documentation patch, please take a look.
Thanks!
Bill Paul Jan. 9, 2013, 10:21 p.m. UTC | #13
Of all the gin joints in all the towns in all the world, Michael S. Tsirkin 
had to walk into mine at 14:07:53 on Wednesday 09 January 2013 and say:

> On Wed, Jan 09, 2013 at 01:50:39PM -0600, Anthony Liguori wrote:
> > I figured this would be the case but this is where a comment in the code
> > or commit message would have helped a lot in avoiding future confusion.
> > 
> > Regards,
> > 
> > Anthony Liguori
> 
> I just sent a documentation patch, please take a look.
> Thanks!

Well, strictly speaking, VxWorks doesn't rely on the ICS behavior anymore, but 
it prefers it. I ran into problems with VMware and Simics too, as they also 
don't emulate the ICS behavior, and getting that fixed was more of an uphill 
battle. So as a compromise I added a check to the driver to see if the ICS 
register has 'real hardware behavior' or 'emulated behavior' and made it use 
an alternate interrupt handling scheme in the emulated case. The alternate 
scheme is a little less efficient than the ICS scheme in some cases (mainly 
when the PRO/1000 device's interrupt is shared), but it still handles 
interrupts reliably.

That's a minor nit though. I'm fine with the comments as is. I just didn't 
want you do think VxWorks was completely brain damaged. :)

Note that I don't think VMware emulates the 'flexible RX mode' descriptor 
mechanism for the PRO/100 either, because the non-NDAed PRO/100 manual doesn't 
bother to mention it exists. I think this is something that was fixed in QEMU 
but I haven't had a chance to test it in a while. There's a similar problem 
with simulated the AMD PCnet/PCI devices (they only support the 'configuration 
block' setup method -- for the older non-PCI LANCE chips that was the only way 
to configure them but PCI devices starting with the am97c970 can be configured 
just by setting up registers).

Honestly I'm surprised I still have all my hair and that it's still the same 
color.

-Bill
Michael S. Tsirkin Jan. 9, 2013, 10:48 p.m. UTC | #14
On Wed, Jan 09, 2013 at 02:21:50PM -0800, Bill Paul wrote:
> Of all the gin joints in all the towns in all the world, Michael S. Tsirkin 
> had to walk into mine at 14:07:53 on Wednesday 09 January 2013 and say:
> 
> > On Wed, Jan 09, 2013 at 01:50:39PM -0600, Anthony Liguori wrote:
> > > I figured this would be the case but this is where a comment in the code
> > > or commit message would have helped a lot in avoiding future confusion.
> > > 
> > > Regards,
> > > 
> > > Anthony Liguori
> > 
> > I just sent a documentation patch, please take a look.
> > Thanks!
> 
> Well, strictly speaking, VxWorks doesn't rely on the ICS behavior anymore, but 
> it prefers it. I ran into problems with VMware and Simics too, as they also 
> don't emulate the ICS behavior, and getting that fixed was more of an uphill 
> battle. So as a compromise I added a check to the driver to see if the ICS 
> register has 'real hardware behavior' or 'emulated behavior' and made it use 
> an alternate interrupt handling scheme in the emulated case. The alternate 
> scheme is a little less efficient than the ICS scheme in some cases (mainly 
> when the PRO/1000 device's interrupt is shared), but it still handles 
> interrupts reliably.
> 
> That's a minor nit though. I'm fine with the comments as is. I just didn't 
> want you do think VxWorks was completely brain damaged. :)

I guess I'll change it to
"The VxWorks PRO/1000 driver uses this behaviour."

> Note that I don't think VMware emulates the 'flexible RX mode' descriptor 
> mechanism for the PRO/100 either, because the non-NDAed PRO/100 manual doesn't 
> bother to mention it exists. I think this is something that was fixed in QEMU 
> but I haven't had a chance to test it in a while. There's a similar problem 
> with simulated the AMD PCnet/PCI devices (they only support the 'configuration 
> block' setup method -- for the older non-PCI LANCE chips that was the only way 
> to configure them but PCI devices starting with the am97c970 can be configured 
> just by setting up registers).
> 
> Honestly I'm surprised I still have all my hair and that it's still the same 
> color.
> 
> -Bill

Yes virtio is much easier. All bugs there are our own.


> -- 
> =============================================================================
> -Bill Paul            (510) 749-2329 | Member of Technical Staff,
>                  wpaul@windriver.com | Master of Unix-Fu - Wind River Systems
> =============================================================================
>    "I put a dollar in a change machine. Nothing changed." - George Carlin
> =============================================================================
diff mbox

Patch

diff --git a/hw/e1000.c b/hw/e1000.c
index 92fb00a..928d804 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -230,7 +230,6 @@  set_interrupt_cause(E1000State *s, int index, uint32_t val)
         val |= E1000_ICR_INT_ASSERTED;
     }
     s->mac_reg[ICR] = val;
-    s->mac_reg[ICS] = val;
     qemu_set_irq(s->dev.irq[0], (s->mac_reg[IMS] & s->mac_reg[ICR]) != 0);
 }