diff mbox

rtl8139: flush queued packets when RxBufPtr is written

Message ID 51A36CBB.7030502@dlhnet.de
State New
Headers show

Commit Message

Peter Lieven May 27, 2013, 2:24 p.m. UTC
On 27.05.2013 16:07, Oliver Francke wrote:
> Well,
>
> Am 27.05.2013 um 08:15 schrieb Peter Lieven <lieven-lists@dlhnet.de>:
>
>> Hi all,
>>
>> I ocassionally have seen a probably related problem in the past. It mainly happend with rtl8139 under
>> WinXP where we most likely use rtl8139 due to lack of shipped e1000 drivers.
>>
>> My question is if you see increasing dropped packets on the tap device if this problem occurs?
>>
>> tap36     Link encap:Ethernet  HWaddr b2:84:23:c0:e2:c0
>>           inet6 addr: fe80::b084:23ff:fec0:e2c0/64 Scope:Link
>>           UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
>>           RX packets:5816096 errors:0 dropped:0 overruns:0 frame:0
>>           TX packets:3878744 errors:0 dropped:13775 overruns:0 carrier:0
>>           collisions:0 txqueuelen:500
>>           RX bytes:5161769434 (5.1 GB)  TX bytes:380415916 (380.4 MB)
>>
>> In my case as well the only option to recover without shutting down the whole vServer is Live Migration
>> to another Node.
>>
> ACK, tried it and every network-devices might have been re-created into a defined state qemu-wise.
>
>> However, I also see this problem under qemu-kvm-1.2.0 while Oliver reported it does not happen there.
>>
> Neither me nor any  affected customers have ever seen such failures in qemu-1.2.0, so this was my last-known-good ;)
I cherry-picked

net: add receive_disabled logic to iov delivery path

to my qemu-1.2.0 build. I think this might be why I see this.

have to tried to patch qemu-1.2.0 with something like this?



Peter

>
> Oliver.
>
>> Thank you,
>> Peter
>>
>> On 22.05.2013 14:50, Stefan Hajnoczi wrote:
>>> Net queues support efficient "receive disable".  For example, tap's file
>>> descriptor will not be polled while its peer has receive disabled.  This
>>> saves CPU cycles for needlessly copying and then dropping packets which
>>> the peer cannot receive.
>>>
>>> rtl8139 is missing the qemu_flush_queued_packets() call that wakes the
>>> queue up when receive becomes possible again.
>>>
>>> As a result, the Windows 7 guest driver reaches a state where the
>>> rtl8139 cannot receive packets.  The driver has actually refilled the
>>> receive buffer but we never resume reception.
>>>
>>> The bug can be reproduced by running a large FTP 'get' inside a Windows
>>> 7 guest:
>>>
>>>    $ qemu -netdev tap,id=tap0,...
>>>           -device rtl8139,netdev=tap0
>>>
>>> The Linux guest driver does not trigger the bug, probably due to a
>>> different buffer management strategy.
>>>
>>> Reported-by: Oliver Francke <oliver.francke@filoo.de>
>>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>> ---
>>>   hw/net/rtl8139.c | 3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
>>> index 9369507..7993f9f 100644
>>> --- a/hw/net/rtl8139.c
>>> +++ b/hw/net/rtl8139.c
>>> @@ -2575,6 +2575,9 @@ static void rtl8139_RxBufPtr_write(RTL8139State *s, uint32_t val)
>>>       /* this value is off by 16 */
>>>       s->RxBufPtr = MOD2(val + 0x10, s->RxBufferSize);
>>>   +    /* more buffer space may be available so try to receive */
>>> +    qemu_flush_queued_packets(qemu_get_queue(s->nic));
>>> +
>>>       DPRINTF(" CAPR write: rx buffer length %d head 0x%04x read 0x%04x\n",
>>>           s->RxBufferSize, s->RxBufAddr, s->RxBufPtr);
>>>   }

Comments

Stefan Hajnoczi May 27, 2013, 3:29 p.m. UTC | #1
On Mon, May 27, 2013 at 04:24:59PM +0200, Peter Lieven wrote:
> On 27.05.2013 16:07, Oliver Francke wrote:
> >Well,
> >
> >Am 27.05.2013 um 08:15 schrieb Peter Lieven <lieven-lists@dlhnet.de>:
> >
> >>Hi all,
> >>
> >>I ocassionally have seen a probably related problem in the past. It mainly happend with rtl8139 under
> >>WinXP where we most likely use rtl8139 due to lack of shipped e1000 drivers.
> >>
> >>My question is if you see increasing dropped packets on the tap device if this problem occurs?
> >>
> >>tap36     Link encap:Ethernet  HWaddr b2:84:23:c0:e2:c0
> >>          inet6 addr: fe80::b084:23ff:fec0:e2c0/64 Scope:Link
> >>          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
> >>          RX packets:5816096 errors:0 dropped:0 overruns:0 frame:0
> >>          TX packets:3878744 errors:0 dropped:13775 overruns:0 carrier:0
> >>          collisions:0 txqueuelen:500
> >>          RX bytes:5161769434 (5.1 GB)  TX bytes:380415916 (380.4 MB)
> >>
> >>In my case as well the only option to recover without shutting down the whole vServer is Live Migration
> >>to another Node.
> >>
> >ACK, tried it and every network-devices might have been re-created into a defined state qemu-wise.
> >
> >>However, I also see this problem under qemu-kvm-1.2.0 while Oliver reported it does not happen there.
> >>
> >Neither me nor any  affected customers have ever seen such failures in qemu-1.2.0, so this was my last-known-good ;)
> I cherry-picked
> 
> net: add receive_disabled logic to iov delivery path

This one exposes the bug that Oliver reported:

commit a9d8f7b1c41a8a346f4cf5a0c6963a79fbd1249e
Author: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Date:   Mon Aug 20 13:35:23 2012 +0100

    net: do not report queued packets as sent
Peter Lieven May 28, 2013, 6:27 a.m. UTC | #2
On 27.05.2013 17:29, Stefan Hajnoczi wrote:
> On Mon, May 27, 2013 at 04:24:59PM +0200, Peter Lieven wrote:
>> On 27.05.2013 16:07, Oliver Francke wrote:
>>> Well,
>>>
>>> Am 27.05.2013 um 08:15 schrieb Peter Lieven <lieven-lists@dlhnet.de>:
>>>
>>>> Hi all,
>>>>
>>>> I ocassionally have seen a probably related problem in the past. It mainly happend with rtl8139 under
>>>> WinXP where we most likely use rtl8139 due to lack of shipped e1000 drivers.
>>>>
>>>> My question is if you see increasing dropped packets on the tap device if this problem occurs?
>>>>
>>>> tap36     Link encap:Ethernet  HWaddr b2:84:23:c0:e2:c0
>>>>           inet6 addr: fe80::b084:23ff:fec0:e2c0/64 Scope:Link
>>>>           UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
>>>>           RX packets:5816096 errors:0 dropped:0 overruns:0 frame:0
>>>>           TX packets:3878744 errors:0 dropped:13775 overruns:0 carrier:0
>>>>           collisions:0 txqueuelen:500
>>>>           RX bytes:5161769434 (5.1 GB)  TX bytes:380415916 (380.4 MB)
>>>>
>>>> In my case as well the only option to recover without shutting down the whole vServer is Live Migration
>>>> to another Node.
>>>>
>>> ACK, tried it and every network-devices might have been re-created into a defined state qemu-wise.
>>>
>>>> However, I also see this problem under qemu-kvm-1.2.0 while Oliver reported it does not happen there.
>>>>
>>> Neither me nor any  affected customers have ever seen such failures in qemu-1.2.0, so this was my last-known-good ;)
>> I cherry-picked
>>
>> net: add receive_disabled logic to iov delivery path
> This one exposes the bug that Oliver reported:
>
> commit a9d8f7b1c41a8a346f4cf5a0c6963a79fbd1249e
> Author: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> Date:   Mon Aug 20 13:35:23 2012 +0100
>
>      net: do not report queued packets as sent
This was also in the series I cherry-picked for my 1.2.0 build. So its likely I hit the same bug.

Thank you,
Peter
diff mbox

Patch

--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -2575,6 +2575,9 @@  static void rtl8139_RxBufPtr_write(RTL8139State *s, uint32_t val)
      /* this value is off by 16 */
      s->RxBufPtr = MOD2(val + 0x10, s->RxBufferSize);

+    /* more buffer space may be available so try to receive */
+    qemu_flush_queued_packets(&s->nic->nc);
+
      DPRINTF(" CAPR write: rx buffer length %d head 0x%04x read 0x%04x\n",
          s->RxBufferSize, s->RxBufAddr, s->RxBufPtr);
  }