Patchwork [1/2] Ignore RX tail kicks when RX disabled.

login
register
mail settings
Submitter Dmitry Fleytman
Date Oct. 17, 2012, 6:31 p.m.
Message ID <1350498707-6749-2-git-send-email-dmitry@daynix.com>
Download mbox | patch
Permalink /patch/192118/
State New
Headers show

Comments

Dmitry Fleytman - Oct. 17, 2012, 6:31 p.m.
Device RX initization from driver's side consists of following steps:
  1. Initialize head and tail of RX ring to 0
  2. Enable Rx (set bit in RCTL register)
  3. Allocate buffers, fill descriptors
  4. Write ring tail

Forth operation signals hardware that RX buffers available
and it may start packets indication.

Current implementation treats first operation (write 0 to ring tail)
as signal of buffers availability and starts data transfers as soon
as RX enable indicaton arrives.

This is not correct because there is a chance that ring is still
empty (third action not performed yet) and then memory corruption
occures.

Device has to ignore RX tail kicks unless RX enabled.

Reported-by: Chris Webb <chris.webb@elastichosts.com>
Reported-by: Richard Davies <richard.davies@elastichosts.com>
Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
---
 hw/e1000.c | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)
Stefan Hajnoczi - Oct. 18, 2012, 7:31 a.m.
On Wed, Oct 17, 2012 at 08:31:46PM +0200, Dmitry Fleytman wrote:
> Device RX initization from driver's side consists of following steps:
>   1. Initialize head and tail of RX ring to 0
>   2. Enable Rx (set bit in RCTL register)
>   3. Allocate buffers, fill descriptors
>   4. Write ring tail
> 
> Forth operation signals hardware that RX buffers available
> and it may start packets indication.
> 
> Current implementation treats first operation (write 0 to ring tail)
> as signal of buffers availability and starts data transfers as soon
> as RX enable indicaton arrives.
> 
> This is not correct because there is a chance that ring is still
> empty (third action not performed yet) and then memory corruption
> occures.

The existing code tries to prevent this:

e1000_receive(NetClientState *nc, const uint8_t *buf, size_t size)
{
    [...]

    if (!(s->mac_reg[RCTL] & E1000_RCTL_EN))
        return -1;

    [...]
    total_size = size + fcs_len(s);
    if (!e1000_has_rxbufs(s, total_size)) {
            set_ics(s, 0, E1000_ICS_RXO);
            return -1;
    }

Why are these checks not enough?

Which memory gets corrupted?

Stefan
Dmitry Fleytman - Oct. 18, 2012, 8:08 a.m.
Hello, Stefan

The problem occurs between steps 2 and 3.
Let's say packet arrives after step 2 is done by driver.

Head and tail are 0 because of step 1

Check_rxov is 0 because of two reasons:
    1. On startup it is 0 by default
    2. It is zeroed by setting ring tail to 0 on first step

Then first check ( __if (!(s->mac_reg[RCTL] & E1000_RCTL_EN))__ )
passes because RX enabled on step 2.
e1000_has_rxbufs() returs true because it treats equal head and tail
as fully filled ring when check_rxov is 0:

    static bool e1000_has_rxbufs(E1000State *s, size_t total_size)
    {
        [...]

        if (total_size <= s->rxbuf_size) {
            return s->mac_reg[RDH] != s->mac_reg[RDT] || !s->check_rxov;

        [...]

        } else if (s->mac_reg[RDH] > s->mac_reg[RDT] || !s->check_rxov) {

       [...]
    }

So QEMU reads uninitialized descriptor and tries to perform "DMA" to
arbitrary address from descriptor.
Depending on address value it corrupts guest memory or abort()'s here:

void *qemu_get_ram_ptr(ram_addr_t addr)
{
    [...]

    fprintf(stderr, "Bad ram offset %" PRIx64 "\n", (uint64_t)addr);
    abort();

    [...]
}

Thanks for review,
Dmitry.

On Thu, Oct 18, 2012 at 9:31 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>
> On Wed, Oct 17, 2012 at 08:31:46PM +0200, Dmitry Fleytman wrote:
> > Device RX initization from driver's side consists of following steps:
> >   1. Initialize head and tail of RX ring to 0
> >   2. Enable Rx (set bit in RCTL register)
> >   3. Allocate buffers, fill descriptors
> >   4. Write ring tail
> >
> > Forth operation signals hardware that RX buffers available
> > and it may start packets indication.
> >
> > Current implementation treats first operation (write 0 to ring tail)
> > as signal of buffers availability and starts data transfers as soon
> > as RX enable indicaton arrives.
> >
> > This is not correct because there is a chance that ring is still
> > empty (third action not performed yet) and then memory corruption
> > occures.
>
> The existing code tries to prevent this:
>
> e1000_receive(NetClientState *nc, const uint8_t *buf, size_t size)
> {
>     [...]
>
>     if (!(s->mac_reg[RCTL] & E1000_RCTL_EN))
>         return -1;
>
>     [...]
>     total_size = size + fcs_len(s);
>     if (!e1000_has_rxbufs(s, total_size)) {
>             set_ics(s, 0, E1000_ICS_RXO);
>             return -1;
>     }
>
> Why are these checks not enough?
>
> Which memory gets corrupted?
>
> Stefan




--
Dmitry Fleytman
Technology Expert and Consultant,

Daynix Computing Ltd.

Cell: +972-54-2819481

Skype: dmitry.fleytman
Stefan Hajnoczi - Oct. 18, 2012, 8:09 a.m.
On Wed, Oct 17, 2012 at 08:31:46PM +0200, Dmitry Fleytman wrote:
> Device RX initization from driver's side consists of following steps:
>   1. Initialize head and tail of RX ring to 0
>   2. Enable Rx (set bit in RCTL register)
>   3. Allocate buffers, fill descriptors
>   4. Write ring tail
> 
> Forth operation signals hardware that RX buffers available
> and it may start packets indication.
> 
> Current implementation treats first operation (write 0 to ring tail)
> as signal of buffers availability and starts data transfers as soon
> as RX enable indicaton arrives.
> 
> This is not correct because there is a chance that ring is still
> empty (third action not performed yet) and then memory corruption
> occures.

Any idea what the point of hw/e1000.c check_rxov is?  I see nothing in
the datasheet that requires these semantics.

The Linux e1000 driver never enables the RXO (rx fifo overflow)
interrupt, only RXDMT0 (receive descriptor minimum threshold).  This
means hw/e1000.c will not upset the Linux e1000 driver when
e1000_receive() gets called with check_rxov == 1 and RDH == RDT == 0.

BTW the Linux e1000 driver does not follow the sequence recommended in
the datasheet 14.4 Receive Initialization, which would avoid the weird
window of time where RDH == RDT == 0.

If we get rid of check_rxov and always check rxbuf space then we have
the correct behavior.  I'm a little nervous of simply dropping it
because its purpose is unclear to me :(.

Stefan
Dmitry Fleytman - Oct. 18, 2012, 8:34 a.m.
Stefan,

The real purpose of check_rxov it a bit confusing indeed, mainly
because of unclear name (rename?),
however it works as following:

There are 2 possible when RDT == RDH for RX ring:
    1. Device used all the buffers from ring, no empty buffers available
    2. Driver fully refilled the ring and all buffers are empty and ready to use

check_rxov is used to distinguish these 2 cases:
    1. It must be 1 initially (init, reset, etc.)
    2. It must be set to one when device uses buffer
    3. It must be set to 0 when driver adds buffer to the ring
check_rxov == 1 - ring is empty
check_rxov == 0 - ring is full

Indeed, RX init sequence doesn't look logical, however this is the way
all Intel driver behave from e1000 and up to ixgbe.
Also see some explanation here:
http://permalink.gmane.org/gmane.linux.kernel/1375917

If we drop check_rxov and always treat RDH == RDT as empty ring we'll
probably get correct behavior for current Linux driver's code (needs
testing of course),
however we have no idea how Windows drivers work.

Also drivers tend to change...

Dmitry.

On Thu, Oct 18, 2012 at 10:09 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Wed, Oct 17, 2012 at 08:31:46PM +0200, Dmitry Fleytman wrote:
>> Device RX initization from driver's side consists of following steps:
>>   1. Initialize head and tail of RX ring to 0
>>   2. Enable Rx (set bit in RCTL register)
>>   3. Allocate buffers, fill descriptors
>>   4. Write ring tail
>>
>> Forth operation signals hardware that RX buffers available
>> and it may start packets indication.
>>
>> Current implementation treats first operation (write 0 to ring tail)
>> as signal of buffers availability and starts data transfers as soon
>> as RX enable indicaton arrives.
>>
>> This is not correct because there is a chance that ring is still
>> empty (third action not performed yet) and then memory corruption
>> occures.
>
> Any idea what the point of hw/e1000.c check_rxov is?  I see nothing in
> the datasheet that requires these semantics.
>
> The Linux e1000 driver never enables the RXO (rx fifo overflow)
> interrupt, only RXDMT0 (receive descriptor minimum threshold).  This
> means hw/e1000.c will not upset the Linux e1000 driver when
> e1000_receive() gets called with check_rxov == 1 and RDH == RDT == 0.
>
> BTW the Linux e1000 driver does not follow the sequence recommended in
> the datasheet 14.4 Receive Initialization, which would avoid the weird
> window of time where RDH == RDT == 0.
>
> If we get rid of check_rxov and always check rxbuf space then we have
> the correct behavior.  I'm a little nervous of simply dropping it
> because its purpose is unclear to me :(.
>
> Stefan
Stefan Hajnoczi - Oct. 18, 2012, 2:31 p.m.
On Thu, Oct 18, 2012 at 10:34 AM, Dmitry Fleytman <dmitry@daynix.com> wrote:
> The real purpose of check_rxov it a bit confusing indeed, mainly
> because of unclear name (rename?),
> however it works as following:
>
> There are 2 possible when RDT == RDH for RX ring:
>     1. Device used all the buffers from ring, no empty buffers available
>     2. Driver fully refilled the ring and all buffers are empty and ready to use
>
> check_rxov is used to distinguish these 2 cases:
>     1. It must be 1 initially (init, reset, etc.)
>     2. It must be set to one when device uses buffer
>     3. It must be set to 0 when driver adds buffer to the ring
> check_rxov == 1 - ring is empty
> check_rxov == 0 - ring is full
>
> Indeed, RX init sequence doesn't look logical, however this is the way
> all Intel driver behave from e1000 and up to ixgbe.
> Also see some explanation here:
> http://permalink.gmane.org/gmane.linux.kernel/1375917
>
> If we drop check_rxov and always treat RDH == RDT as empty ring we'll
> probably get correct behavior for current Linux driver's code (needs
> testing of course),
> however we have no idea how Windows drivers work.

Thanks, for the great explanation, Dmitry.

Alexander: I CCed you because I hope you might be able to explain what
the 82540EM card does when a driver sets RDT to the value of RDH.  The
QEMU NIC emulation code treats this as a full ring (i.e. the
descriptors are valid and will be filled in by the hardware).  Does
the real hardware act like this or will it treat this condition as
ring empty (i.e. if the driver sets RDT to the value of RDH then the
hardware stops receive because there are no descriptors available)?

I can't find a statement in the Intel datasheet about what happens
when the driver sets RDT = RDH.  The QEMU check_rxov variable is
trying to distinguish between ring empty (RDH has moved to RDT) and
ring full (driver has set RDH = RDT because the full descriptor ring
is available).

Dmitry: At this point we'd need to test what happens on real hardware
when RDH = RDT in order to be able to remove check_rxov.  As you
mentioned, with the Linux e1000 driver we don't see ring full RDH =
RDT:

        /* call E1000_DESC_UNUSED which always leaves
         * at least 1 descriptor unused to make sure
         * next_to_use != next_to_clean */
        for (i = 0; i < adapter->num_rx_queues; i++) {
                struct e1000_rx_ring *ring = &adapter->rx_ring[i];
                adapter->alloc_rx_buf(adapter, ring,
                                      E1000_DESC_UNUSED(ring));
        }

Here some sample output from a QEMU printf, notice how RDH is never
the same as RDT once rx begins:

set_rdt rdh=0 rdt_old=0 rdt_new=0
set_rdt rdh=0 rdt_old=0 rdt_new=254
set_rdt rdh=1 rdt_old=254 rdt_new=255
set_rdt rdh=2 rdt_old=255 rdt_new=0
set_rdt rdh=3 rdt_old=0 rdt_new=1
set_rdt rdh=4 rdt_old=1 rdt_new=2
set_rdt rdh=5 rdt_old=2 rdt_new=3
set_rdt rdh=6 rdt_old=3 rdt_new=4
set_rdt rdh=7 rdt_old=4 rdt_new=5
set_rdt rdh=9 rdt_old=5 rdt_new=7
set_rdt rdh=10 rdt_old=7 rdt_new=8
set_rdt rdh=11 rdt_old=8 rdt_new=9
set_rdt rdh=12 rdt_old=9 rdt_new=10
set_rdt rdh=13 rdt_old=10 rdt_new=11
set_rdt rdh=14 rdt_old=11 rdt_new=12

The iPXE 'intel' driver (supports e1000 cards) also does not set RDH =
RDT for full rx ring, instead it only uses 4 out of 8 descriptors at a
time.

The reason I'm digging into the need for check_rxov is because it's a
dangerous piece of code to have.  If check_rxov logic is ever out of
sync we risk memory corruption.  I'd really like to drop it
completely.

Stefan
Alexander Duyck - Oct. 18, 2012, 4:06 p.m.
On 10/18/2012 07:31 AM, Stefan Hajnoczi wrote:
> On Thu, Oct 18, 2012 at 10:34 AM, Dmitry Fleytman <dmitry@daynix.com> wrote:
>> The real purpose of check_rxov it a bit confusing indeed, mainly
>> because of unclear name (rename?),
>> however it works as following:
>>
>> There are 2 possible when RDT == RDH for RX ring:
>>     1. Device used all the buffers from ring, no empty buffers available
>>     2. Driver fully refilled the ring and all buffers are empty and ready to use

The 2nd case is not true.  We should only have RDT == RDH when the ring
is empty.  If RDT == RDH and the ring is full then we have a bug in the
driver.  The driver should only ever allow RDT to be one less than head,
or ring size - 1 if head is 0.

>> check_rxov is used to distinguish these 2 cases:
>>     1. It must be 1 initially (init, reset, etc.)
>>     2. It must be set to one when device uses buffer
>>     3. It must be set to 0 when driver adds buffer to the ring
>> check_rxov == 1 - ring is empty
>> check_rxov == 0 - ring is full
>>
>> Indeed, RX init sequence doesn't look logical, however this is the way
>> all Intel driver behave from e1000 and up to ixgbe.
>> Also see some explanation here:
>> http://permalink.gmane.org/gmane.linux.kernel/1375917
>>
>> If we drop check_rxov and always treat RDH == RDT as empty ring we'll
>> probably get correct behavior for current Linux driver's code (needs
>> testing of course),
>> however we have no idea how Windows drivers work.

The windows driver should work the same way.  If RDH == RDT the hardware
will treat that as a empty ring and will hang.  If there is a driver
that is setting RDH == RDT to indicate the ring is full please let us
know as that is likely a buggy driver.

> Thanks, for the great explanation, Dmitry.
>
> Alexander: I CCed you because I hope you might be able to explain what
> the 82540EM card does when a driver sets RDT to the value of RDH.  The
> QEMU NIC emulation code treats this as a full ring (i.e. the
> descriptors are valid and will be filled in by the hardware).  Does
> the real hardware act like this or will it treat this condition as
> ring empty (i.e. if the driver sets RDT to the value of RDH then the
> hardware stops receive because there are no descriptors available)?
>
> I can't find a statement in the Intel datasheet about what happens
> when the driver sets RDT = RDH.  The QEMU check_rxov variable is
> trying to distinguish between ring empty (RDH has moved to RDT) and
> ring full (driver has set RDH = RDT because the full descriptor ring
> is available).

If RDT == RDH then we should stop receiving traffic.  As far as I know
all of our e1000 hardware treat RDT == RDH as an empty ring state.  All
of the drivers should have code in place to stop it.  For example the
E1000_DESC_UNUSED macro should be returning ring size - 1 in the case of
RDT == RDH which will result in the head being 0 and the tail being ring
size - 2.

> Dmitry: At this point we'd need to test what happens on real hardware
> when RDH = RDT in order to be able to remove check_rxov.  As you
> mentioned, with the Linux e1000 driver we don't see ring full RDH =
> RDT:
>
>         /* call E1000_DESC_UNUSED which always leaves
>          * at least 1 descriptor unused to make sure
>          * next_to_use != next_to_clean */
>         for (i = 0; i < adapter->num_rx_queues; i++) {
>                 struct e1000_rx_ring *ring = &adapter->rx_ring[i];
>                 adapter->alloc_rx_buf(adapter, ring,
>                                       E1000_DESC_UNUSED(ring));
>         }
>
> Here some sample output from a QEMU printf, notice how RDH is never
> the same as RDT once rx begins:
>
> set_rdt rdh=0 rdt_old=0 rdt_new=0
> set_rdt rdh=0 rdt_old=0 rdt_new=254
> set_rdt rdh=1 rdt_old=254 rdt_new=255
> set_rdt rdh=2 rdt_old=255 rdt_new=0
> set_rdt rdh=3 rdt_old=0 rdt_new=1
> set_rdt rdh=4 rdt_old=1 rdt_new=2
> set_rdt rdh=5 rdt_old=2 rdt_new=3
> set_rdt rdh=6 rdt_old=3 rdt_new=4
> set_rdt rdh=7 rdt_old=4 rdt_new=5
> set_rdt rdh=9 rdt_old=5 rdt_new=7
> set_rdt rdh=10 rdt_old=7 rdt_new=8
> set_rdt rdh=11 rdt_old=8 rdt_new=9
> set_rdt rdh=12 rdt_old=9 rdt_new=10
> set_rdt rdh=13 rdt_old=10 rdt_new=11
> set_rdt rdh=14 rdt_old=11 rdt_new=12
>
> The iPXE 'intel' driver (supports e1000 cards) also does not set RDH =
> RDT for full rx ring, instead it only uses 4 out of 8 descriptors at a
> time.
>
> The reason I'm digging into the need for check_rxov is because it's a
> dangerous piece of code to have.  If check_rxov logic is ever out of
> sync we risk memory corruption.  I'd really like to drop it
> completely.
>
> Stefan

There should be no need for check_rxov.  As far as I know none of our
drivers will ever set RDT == RDH if there are descriptors available on
the ring.

Thanks,

Alex
Dmitry Fleytman - Oct. 18, 2012, 4:12 p.m.
Great! Thanks, Alex.

I'll prepare a new changeset that drops check_rxov completely.
Also migration-related patch becomes unneeded with this solution.

On Thu, Oct 18, 2012 at 6:06 PM, Alexander Duyck
<alexander.h.duyck@intel.com> wrote:
> On 10/18/2012 07:31 AM, Stefan Hajnoczi wrote:
>> On Thu, Oct 18, 2012 at 10:34 AM, Dmitry Fleytman <dmitry@daynix.com> wrote:
>>> The real purpose of check_rxov it a bit confusing indeed, mainly
>>> because of unclear name (rename?),
>>> however it works as following:
>>>
>>> There are 2 possible when RDT == RDH for RX ring:
>>>     1. Device used all the buffers from ring, no empty buffers available
>>>     2. Driver fully refilled the ring and all buffers are empty and ready to use
>
> The 2nd case is not true.  We should only have RDT == RDH when the ring
> is empty.  If RDT == RDH and the ring is full then we have a bug in the
> driver.  The driver should only ever allow RDT to be one less than head,
> or ring size - 1 if head is 0.
>
>>> check_rxov is used to distinguish these 2 cases:
>>>     1. It must be 1 initially (init, reset, etc.)
>>>     2. It must be set to one when device uses buffer
>>>     3. It must be set to 0 when driver adds buffer to the ring
>>> check_rxov == 1 - ring is empty
>>> check_rxov == 0 - ring is full
>>>
>>> Indeed, RX init sequence doesn't look logical, however this is the way
>>> all Intel driver behave from e1000 and up to ixgbe.
>>> Also see some explanation here:
>>> http://permalink.gmane.org/gmane.linux.kernel/1375917
>>>
>>> If we drop check_rxov and always treat RDH == RDT as empty ring we'll
>>> probably get correct behavior for current Linux driver's code (needs
>>> testing of course),
>>> however we have no idea how Windows drivers work.
>
> The windows driver should work the same way.  If RDH == RDT the hardware
> will treat that as a empty ring and will hang.  If there is a driver
> that is setting RDH == RDT to indicate the ring is full please let us
> know as that is likely a buggy driver.
>
>> Thanks, for the great explanation, Dmitry.
>>
>> Alexander: I CCed you because I hope you might be able to explain what
>> the 82540EM card does when a driver sets RDT to the value of RDH.  The
>> QEMU NIC emulation code treats this as a full ring (i.e. the
>> descriptors are valid and will be filled in by the hardware).  Does
>> the real hardware act like this or will it treat this condition as
>> ring empty (i.e. if the driver sets RDT to the value of RDH then the
>> hardware stops receive because there are no descriptors available)?
>>
>> I can't find a statement in the Intel datasheet about what happens
>> when the driver sets RDT = RDH.  The QEMU check_rxov variable is
>> trying to distinguish between ring empty (RDH has moved to RDT) and
>> ring full (driver has set RDH = RDT because the full descriptor ring
>> is available).
>
> If RDT == RDH then we should stop receiving traffic.  As far as I know
> all of our e1000 hardware treat RDT == RDH as an empty ring state.  All
> of the drivers should have code in place to stop it.  For example the
> E1000_DESC_UNUSED macro should be returning ring size - 1 in the case of
> RDT == RDH which will result in the head being 0 and the tail being ring
> size - 2.
>
>> Dmitry: At this point we'd need to test what happens on real hardware
>> when RDH = RDT in order to be able to remove check_rxov.  As you
>> mentioned, with the Linux e1000 driver we don't see ring full RDH =
>> RDT:
>>
>>         /* call E1000_DESC_UNUSED which always leaves
>>          * at least 1 descriptor unused to make sure
>>          * next_to_use != next_to_clean */
>>         for (i = 0; i < adapter->num_rx_queues; i++) {
>>                 struct e1000_rx_ring *ring = &adapter->rx_ring[i];
>>                 adapter->alloc_rx_buf(adapter, ring,
>>                                       E1000_DESC_UNUSED(ring));
>>         }
>>
>> Here some sample output from a QEMU printf, notice how RDH is never
>> the same as RDT once rx begins:
>>
>> set_rdt rdh=0 rdt_old=0 rdt_new=0
>> set_rdt rdh=0 rdt_old=0 rdt_new=254
>> set_rdt rdh=1 rdt_old=254 rdt_new=255
>> set_rdt rdh=2 rdt_old=255 rdt_new=0
>> set_rdt rdh=3 rdt_old=0 rdt_new=1
>> set_rdt rdh=4 rdt_old=1 rdt_new=2
>> set_rdt rdh=5 rdt_old=2 rdt_new=3
>> set_rdt rdh=6 rdt_old=3 rdt_new=4
>> set_rdt rdh=7 rdt_old=4 rdt_new=5
>> set_rdt rdh=9 rdt_old=5 rdt_new=7
>> set_rdt rdh=10 rdt_old=7 rdt_new=8
>> set_rdt rdh=11 rdt_old=8 rdt_new=9
>> set_rdt rdh=12 rdt_old=9 rdt_new=10
>> set_rdt rdh=13 rdt_old=10 rdt_new=11
>> set_rdt rdh=14 rdt_old=11 rdt_new=12
>>
>> The iPXE 'intel' driver (supports e1000 cards) also does not set RDH =
>> RDT for full rx ring, instead it only uses 4 out of 8 descriptors at a
>> time.
>>
>> The reason I'm digging into the need for check_rxov is because it's a
>> dangerous piece of code to have.  If check_rxov logic is ever out of
>> sync we risk memory corruption.  I'd really like to drop it
>> completely.
>>
>> Stefan
>
> There should be no need for check_rxov.  As far as I know none of our
> drivers will ever set RDT == RDH if there are descriptors available on
> the ring.
>
> Thanks,
>
> Alex

Patch

diff --git a/hw/e1000.c b/hw/e1000.c
index 63fee10..606bf3a 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -267,6 +267,7 @@  static void e1000_reset(void *opaque)
 {
     E1000State *d = opaque;
 
+    d->check_rxov = 1;
     qemu_del_timer(d->autoneg_timer);
     memset(d->phy_reg, 0, sizeof d->phy_reg);
     memmove(d->phy_reg, phy_reg_init, sizeof phy_reg_init);
@@ -285,6 +286,10 @@  set_ctrl(E1000State *s, int index, uint32_t val)
 {
     /* RST is self clearing */
     s->mac_reg[CTRL] = val & ~E1000_CTRL_RST;
+
+    if (val & E1000_CTRL_RST) {
+        s->check_rxov = 1;
+    }
 }
 
 static void
@@ -754,12 +759,18 @@  static bool e1000_has_rxbufs(E1000State *s, size_t total_size)
     return total_size <= bufs * s->rxbuf_size;
 }
 
+static inline bool
+is_receive_enabled(E1000State *s)
+{
+    return s->mac_reg[RCTL] & E1000_RCTL_EN;
+}
+
 static int
 e1000_can_receive(NetClientState *nc)
 {
     E1000State *s = DO_UPCAST(NICState, nc, nc)->opaque;
 
-    return (s->mac_reg[RCTL] & E1000_RCTL_EN) && e1000_has_rxbufs(s, 1);
+    return is_receive_enabled(s) && e1000_has_rxbufs(s, 1);
 }
 
 static uint64_t rx_desc_base(E1000State *s)
@@ -785,8 +796,9 @@  e1000_receive(NetClientState *nc, const uint8_t *buf, size_t size)
     size_t desc_size;
     size_t total_size;
 
-    if (!(s->mac_reg[RCTL] & E1000_RCTL_EN))
+    if (!is_receive_enabled(s)) {
         return -1;
+    }
 
     /* Pad to minimum Ethernet frame length */
     if (size < sizeof(min_buf)) {
@@ -925,8 +937,12 @@  mac_writereg(E1000State *s, int index, uint32_t val)
 static void
 set_rdt(E1000State *s, int index, uint32_t val)
 {
-    s->check_rxov = 0;
     s->mac_reg[index] = val & 0xffff;
+
+    if (is_receive_enabled(s)) {
+        s->check_rxov = 0;
+    }
+
     if (e1000_has_rxbufs(s, 1)) {
         qemu_flush_queued_packets(&s->nic->nc);
     }
@@ -1065,7 +1081,12 @@  static void e1000_io_write(void *opaque, target_phys_addr_t addr,
 {
     E1000State *s = opaque;
 
-    (void)s;
+    switch (addr) {
+    case E1000_CTRL_DUP:
+        if (val & E1000_CTRL_RST) {
+            s->check_rxov = 1;
+        }
+    }
 }
 
 static const MemoryRegionOps e1000_io_ops = {