Patchwork [V3] e1000: Handle IO Port.

login
register
mail settings
Submitter Anthony PERARD
Date June 30, 2011, 9:35 p.m.
Message ID <1309469716-19400-1-git-send-email-anthony.perard@citrix.com>
Download mbox | patch
Permalink /patch/102841/
State New
Headers show

Comments

Anthony PERARD - June 30, 2011, 9:35 p.m.
This patch introduces the two IOPorts on e1000, IOADDR and IODATA. The
IOADDR is used to specify which register we want to access when we read
or write on IODATA.

This patch fixes some weird behavior that I see when I use e1000 with
QEMU/Xen, the guest memory can be corrupted by this NIC because it will
write on memory that it doesn't own anymore after a reset. It's because
the kernel Linux use the IOPort to reset the network card instead of the
MMIO.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
---

New:
  - Do not set anymore CTRL register after a reset.

Change v1->v2:
  - remove ioport_reg[2], and use only ioport_addr.
  - append ioport_addr to VMState structure.
  - Reuse e1000_mmio_{readl,writel} in e1000_ioport_{readl,writel} to avoid
    duplication of code.

 hw/e1000.c |   83 ++++++++++++++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 72 insertions(+), 11 deletions(-)
Andreas Färber - July 2, 2011, 5:35 p.m.
Am 30.06.2011 um 23:35 schrieb Anthony PERARD:

> This patch introduces the two IOPorts on e1000, IOADDR and IODATA. The
> IOADDR is used to specify which register we want to access when we  
> read
> or write on IODATA.
>
> This patch fixes some weird behavior that I see when I use e1000 with
> QEMU/Xen, the guest memory can be corrupted by this NIC because it  
> will
> write on memory that it doesn't own anymore after a reset. It's  
> because
> the kernel Linux use the IOPort to reset the network card instead of  
> the
> MMIO.
>
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
> ---

> diff --git a/hw/e1000.c b/hw/e1000.c
> index 96d84f9..bd39d80 100644
> --- a/hw/e1000.c
> +++ b/hw/e1000.c

> @@ -971,7 +1006,7 @@ static bool is_version_1(void *opaque, int  
> version_id)
>
> static const VMStateDescription vmstate_e1000 = {
>     .name = "e1000",
> -    .version_id = 2,
> +    .version_id = 3,
>     .minimum_version_id = 1,
>     .minimum_version_id_old = 1,
>     .fields      = (VMStateField []) {
> @@ -1043,6 +1078,7 @@ static const VMStateDescription vmstate_e1000  
> = {
>         VMSTATE_UINT32_SUB_ARRAY(mac_reg, E1000State, RA, 32),
>         VMSTATE_UINT32_SUB_ARRAY(mac_reg, E1000State, MTA, 128),
>         VMSTATE_UINT32_SUB_ARRAY(mac_reg, E1000State, VFTA, 128),
> +        VMSTATE_UINT32_V(ioport_addr, E1000State, 3),
>         VMSTATE_END_OF_LIST()
>     }
> };

Juan, shouldn't this use a subsection instead of bumping the version?

Andreas
Anthony PERARD - July 18, 2011, 1:21 p.m.
On Sat, Jul 2, 2011 at 18:35, Andreas Färber <andreas.faerber@web.de> wrote:
> Am 30.06.2011 um 23:35 schrieb Anthony PERARD:
>
>> This patch introduces the two IOPorts on e1000, IOADDR and IODATA. The
>> IOADDR is used to specify which register we want to access when we read
>> or write on IODATA.
>>
>> This patch fixes some weird behavior that I see when I use e1000 with
>> QEMU/Xen, the guest memory can be corrupted by this NIC because it will
>> write on memory that it doesn't own anymore after a reset. It's because
>> the kernel Linux use the IOPort to reset the network card instead of the
>> MMIO.
>>
>> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
>> ---
>
>> diff --git a/hw/e1000.c b/hw/e1000.c
>> index 96d84f9..bd39d80 100644
>> --- a/hw/e1000.c
>> +++ b/hw/e1000.c
>
>> @@ -971,7 +1006,7 @@ static bool is_version_1(void *opaque, int
>> version_id)
>>
>> static const VMStateDescription vmstate_e1000 = {
>>    .name = "e1000",
>> -    .version_id = 2,
>> +    .version_id = 3,
>>    .minimum_version_id = 1,
>>    .minimum_version_id_old = 1,
>>    .fields      = (VMStateField []) {
>> @@ -1043,6 +1078,7 @@ static const VMStateDescription vmstate_e1000 = {
>>        VMSTATE_UINT32_SUB_ARRAY(mac_reg, E1000State, RA, 32),
>>        VMSTATE_UINT32_SUB_ARRAY(mac_reg, E1000State, MTA, 128),
>>        VMSTATE_UINT32_SUB_ARRAY(mac_reg, E1000State, VFTA, 128),
>> +        VMSTATE_UINT32_V(ioport_addr, E1000State, 3),
>>        VMSTATE_END_OF_LIST()
>>    }
>> };
>
> Juan, shouldn't this use a subsection instead of bumping the version?
>
> Andreas
>
>

Ping?
Juan Quintela - July 19, 2011, 1:41 p.m.
Anthony PERARD <anthony.perard@citrix.com> wrote:
> This patch introduces the two IOPorts on e1000, IOADDR and IODATA. The
> IOADDR is used to specify which register we want to access when we read
> or write on IODATA.
>
> This patch fixes some weird behavior that I see when I use e1000 with
> QEMU/Xen, the guest memory can be corrupted by this NIC because it will
> write on memory that it doesn't own anymore after a reset. It's because
> the kernel Linux use the IOPort to reset the network card instead of the
> MMIO.
>
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>

This "used" to work, so the question is:
- do ioport_addr normally has a value of 0, and then migration works?
- is very rare that we are in the middle of an io cycle?

To be able to use a subsection, we have to had a way to decide that the
old default value is going to go.  My understanding is that testing for
->ioport_addr == 0 should be the test for a subsection, but the code
never looks to put ioport_addr back to zero.

I am missing anything obvious?  Or is there any easy way to now if we
are in the middle of a couple of io operations?  For my reading of 
e100_ioport_read/writel() it looks like it should be used as:

write(base+IOADDR)
write(base+IODATA)

but, should this always be paired, and we can reset ioport_addr after
the second?  Then just setting ioport_addr to zero after the second
would made the subsection work in the normal case.

Any other clue about _when_ we should send ioport_addr?

Thanks, Juan.
> @@ -202,8 +201,12 @@ rxbufsize(uint32_t v)
>  static void
>  set_ctrl(E1000State *s, int index, uint32_t val)
>  {
> -    /* RST is self clearing */
> -    s->mac_reg[CTRL] = val & ~E1000_CTRL_RST;
> +    DBGOUT(IO, "set ctrl = %08x\n", val);
> +    if (val & E1000_CTRL_RST) {
> +        e1000_reset(s);
> +        return;
> +    }
> +    s->mac_reg[CTRL] = val;
>  }


This looks to me as a different fix that can go in a different patch.

> +    /* Writes that are less than 32 bits are ignored on IOADDR.
> +     * For the Flash access, a write can be less than 32 bits for
> +     * IODATA register, but is not handled.
> +     */

Code to implement it is almost the same lenght that this O:-)

> +
> +    register_ioport_read(addr, size, 1, e1000_ioport_readl, d);
> +
> +    register_ioport_read(addr, size, 2, e1000_ioport_readl, d);
> +
> +    register_ioport_write(addr, size, 4, e1000_ioport_writel, d);
> +    register_ioport_read(addr, size, 4, e1000_ioport_readl, d);

This is curiosity on my part.  Are we returinng 32bits reads for 1,2 and
4 bytes reads, or there is code at some other level that drops the bits
that we are not interested into?  My understanding of iport.c is that
this is not checked done (it is more, but I don't claim to fully
understand it, or if it mattres at all).

Later, Juan.
Anthony PERARD - July 22, 2011, 1:44 p.m.
On Tue, Jul 19, 2011 at 14:41, Juan Quintela <quintela@redhat.com> wrote:
> Anthony PERARD <anthony.perard@citrix.com> wrote:
>> This patch introduces the two IOPorts on e1000, IOADDR and IODATA. The
>> IOADDR is used to specify which register we want to access when we read
>> or write on IODATA.
>>
>> This patch fixes some weird behavior that I see when I use e1000 with
>> QEMU/Xen, the guest memory can be corrupted by this NIC because it will
>> write on memory that it doesn't own anymore after a reset. It's because
>> the kernel Linux use the IOPort to reset the network card instead of the
>> MMIO.
>>
>> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
>
> This "used" to work, so the question is:
> - do ioport_addr normally has a value of 0, and then migration works?

0 is the reset value. Otherwise, it's the last value write to the register.

> - is very rare that we are in the middle of an io cycle?

I'm not sure this will answer to your question, but I have only see
the use of these IO port in the kernel module once. And it's used to
reset the device. (So just 2 write, together).

> To be able to use a subsection, we have to had a way to decide that the
> old default value is going to go.  My understanding is that testing for
> ->ioport_addr == 0 should be the test for a subsection, but the code
> never looks to put ioport_addr back to zero.
>
> I am missing anything obvious?  Or is there any easy way to now if we
> are in the middle of a couple of io operations?  For my reading of
> e100_ioport_read/writel() it looks like it should be used as:
>
> write(base+IOADDR)
> write(base+IODATA)
>
> but, should this always be paired, and we can reset ioport_addr after
> the second?  Then just setting ioport_addr to zero after the second
> would made the subsection work in the normal case.

The value in ioport_addr is "retained until the next write or reset",
like said the documentation. So I suppose we do not have to pair the
two IO writes, and we can not reset the ioport_addr after a read/write
to ioport_data.

> Any other clue about _when_ we should send ioport_addr?
>
> Thanks, Juan.
>> @@ -202,8 +201,12 @@ rxbufsize(uint32_t v)
>>  static void
>>  set_ctrl(E1000State *s, int index, uint32_t val)
>>  {
>> -    /* RST is self clearing */
>> -    s->mac_reg[CTRL] = val & ~E1000_CTRL_RST;
>> +    DBGOUT(IO, "set ctrl = %08x\n", val);
>> +    if (val & E1000_CTRL_RST) {
>> +        e1000_reset(s);
>> +        return;
>> +    }
>> +    s->mac_reg[CTRL] = val;
>>  }
>
>
> This looks to me as a different fix that can go in a different patch.

OK, I will create another patch with this fix.

>> +    /* Writes that are less than 32 bits are ignored on IOADDR.
>> +     * For the Flash access, a write can be less than 32 bits for
>> +     * IODATA register, but is not handled.
>> +     */
>
> Code to implement it is almost the same lenght that this O:-)

:), fine, will add the code.

>> +
>> +    register_ioport_read(addr, size, 1, e1000_ioport_readl, d);
>> +
>> +    register_ioport_read(addr, size, 2, e1000_ioport_readl, d);
>> +
>> +    register_ioport_write(addr, size, 4, e1000_ioport_writel, d);
>> +    register_ioport_read(addr, size, 4, e1000_ioport_readl, d);
>
> This is curiosity on my part.  Are we returinng 32bits reads for 1,2 and
> 4 bytes reads, or there is code at some other level that drops the bits
> that we are not interested into?  My understanding of iport.c is that
> this is not checked done (it is more, but I don't claim to fully
> understand it, or if it mattres at all).

Well, the e1000 doc said to return a DWORD for a read of all size, and
the CPU/chipset will return only a subset of that dword. So I'm not
sure if I have to handle it here or not. But as the ioport code in
qemu will not be aware of what it will be returned by the function,
maybe I have to actually do the "convertion" here and return only the
subset of the register.

Thanks for the review,
Regards,

Patch

diff --git a/hw/e1000.c b/hw/e1000.c
index 96d84f9..bd39d80 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -58,6 +58,9 @@  static int debugflags = DBGBIT(TXERR) | DBGBIT(GENERAL);
 #define PNPMMIO_SIZE      0x20000
 #define MIN_BUF_SIZE      60 /* Min. octets in an ethernet frame sans FCS */
 
+#define REG_IOADDR 0x0
+#define REG_IODATA 0x4
+
 /*
  * HW models:
  *  E1000_DEV_ID_82540EM works with Windows and Linux
@@ -83,6 +86,8 @@  typedef struct E1000State_st {
     NICState *nic;
     NICConf conf;
     int mmio_index;
+    int ioport_base;
+    uint32_t ioport_addr;
 
     uint32_t mac_reg[0x8000];
     uint16_t phy_reg[0x20];
@@ -150,13 +155,7 @@  static const char phy_regcap[0x20] = {
     [PHY_ID2] = PHY_R,		[M88E1000_PHY_SPEC_STATUS] = PHY_R
 };
 
-static void
-ioport_map(PCIDevice *pci_dev, int region_num, pcibus_t addr,
-           pcibus_t size, int type)
-{
-    DBGOUT(IO, "e1000_ioport_map addr=0x%04"FMT_PCIBUS
-           " size=0x%08"FMT_PCIBUS"\n", addr, size);
-}
+static void e1000_reset(void *opaque);
 
 static void
 set_interrupt_cause(E1000State *s, int index, uint32_t val)
@@ -202,8 +201,12 @@  rxbufsize(uint32_t v)
 static void
 set_ctrl(E1000State *s, int index, uint32_t val)
 {
-    /* RST is self clearing */
-    s->mac_reg[CTRL] = val & ~E1000_CTRL_RST;
+    DBGOUT(IO, "set ctrl = %08x\n", val);
+    if (val & E1000_CTRL_RST) {
+        e1000_reset(s);
+        return;
+    }
+    s->mac_reg[CTRL] = val;
 }
 
 static void
@@ -964,6 +967,38 @@  e1000_mmio_readw(void *opaque, target_phys_addr_t addr)
             (8 * (addr & 3))) & 0xffff;
 }
 
+static void
+e1000_ioport_writel(void *opaque, uint32_t addr, uint32_t val)
+{
+    E1000State *s = opaque;
+
+    if (addr == s->ioport_base + REG_IOADDR) {
+        DBGOUT(IO, "e1000_ioport_writel write base: 0x%04x\n", val);
+        s->ioport_addr = val & 0xfffff;
+    } else if (addr == (s->ioport_base + REG_IODATA)) {
+        DBGOUT(IO, "e1000_ioport_writel %x: 0x%04x\n", s->ioport_addr, val);
+        e1000_mmio_writel(s, s->ioport_addr, val);
+    } else {
+        DBGOUT(UNKNOWN, "IO unknown write addr=0x%08x,val=0x%08x\n",
+               addr, val);
+    }
+}
+
+static uint32_t
+e1000_ioport_readl(void *opaque, uint32_t addr)
+{
+    E1000State *s = opaque;
+
+    if (addr == s->ioport_base + REG_IOADDR) {
+        return s->ioport_addr & 0xfffff;
+    } else if (addr == (s->ioport_base + REG_IODATA)) {
+        return e1000_mmio_readl(s, s->ioport_addr);
+    } else {
+        DBGOUT(UNKNOWN, "IO unknown read addr=0x%08x\n", addr);
+    }
+    return 0;
+}
+
 static bool is_version_1(void *opaque, int version_id)
 {
     return version_id == 1;
@@ -971,7 +1006,7 @@  static bool is_version_1(void *opaque, int version_id)
 
 static const VMStateDescription vmstate_e1000 = {
     .name = "e1000",
-    .version_id = 2,
+    .version_id = 3,
     .minimum_version_id = 1,
     .minimum_version_id_old = 1,
     .fields      = (VMStateField []) {
@@ -1043,6 +1078,7 @@  static const VMStateDescription vmstate_e1000 = {
         VMSTATE_UINT32_SUB_ARRAY(mac_reg, E1000State, RA, 32),
         VMSTATE_UINT32_SUB_ARRAY(mac_reg, E1000State, MTA, 128),
         VMSTATE_UINT32_SUB_ARRAY(mac_reg, E1000State, VFTA, 128),
+        VMSTATE_UINT32_V(ioport_addr, E1000State, 3),
         VMSTATE_END_OF_LIST()
     }
 };
@@ -1083,6 +1119,30 @@  static const uint32_t mac_reg_init[] = {
 
 /* PCI interface */
 
+static void
+e1000_ioport_map(PCIDevice *pci_dev, int region_num, pcibus_t addr,
+                 pcibus_t size, int type)
+{
+    E1000State *d = DO_UPCAST(E1000State, dev, pci_dev);
+
+    DBGOUT(IO, "e1000_ioport_map addr=0x%04" FMT_PCIBUS
+           " size=0x%08" FMT_PCIBUS "\n", addr, size);
+
+    d->ioport_base = addr;
+
+    /* Writes that are less than 32 bits are ignored on IOADDR.
+     * For the Flash access, a write can be less than 32 bits for
+     * IODATA register, but is not handled.
+     */
+
+    register_ioport_read(addr, size, 1, e1000_ioport_readl, d);
+
+    register_ioport_read(addr, size, 2, e1000_ioport_readl, d);
+
+    register_ioport_write(addr, size, 4, e1000_ioport_writel, d);
+    register_ioport_read(addr, size, 4, e1000_ioport_readl, d);
+}
+
 static CPUWriteMemoryFunc * const e1000_mmio_write[] = {
     e1000_mmio_writeb,	e1000_mmio_writew,	e1000_mmio_writel
 };
@@ -1137,6 +1197,7 @@  static void e1000_reset(void *opaque)
 {
     E1000State *d = opaque;
 
+    d->ioport_addr = 0;
     memset(d->phy_reg, 0, sizeof d->phy_reg);
     memmove(d->phy_reg, phy_reg_init, sizeof phy_reg_init);
     memset(d->mac_reg, 0, sizeof d->mac_reg);
@@ -1179,7 +1240,7 @@  static int pci_e1000_init(PCIDevice *pci_dev)
                            PCI_BASE_ADDRESS_SPACE_MEMORY, e1000_mmio_map);
 
     pci_register_bar(&d->dev, 1, IOPORT_SIZE,
-                           PCI_BASE_ADDRESS_SPACE_IO, ioport_map);
+                           PCI_BASE_ADDRESS_SPACE_IO, e1000_ioport_map);
 
     memmove(d->eeprom_data, e1000_eeprom_template,
         sizeof e1000_eeprom_template);