Patchwork [Bug,1179731,NEW] is networking broken on windows hosts?

login
register
mail settings
Submitter Paolo Bonzini
Date May 14, 2013, 11:48 a.m.
Message ID <519224A4.4000800@redhat.com>
Download mbox | patch
Permalink /patch/243685/
State New
Headers show

Comments

Paolo Bonzini - May 14, 2013, 11:48 a.m.
Il 14/05/2013 13:39, TeLeMan ha scritto:
> On Tue, May 14, 2013 at 6:46 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 14/05/2013 12:24, TeLeMan ha scritto:
>>> On Tue, May 14, 2013 at 3:51 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>>> On Tue, May 14, 2013 at 12:02:24AM -0000, therock247uk wrote:
>>>>> just wondering as i just compiled the latest git and qemu goes into none
>>>>> responding mode when i try to do any networking stuff on guests (both
>>>>> linux and windows)
>>>>
>>>> Works for me on qemu.git/master on Linux:
>>>>
>>>>   $ git rev-parse HEAD
>>>>   b087143b4d010451208264b7c841436aafe1cbb1
>>>>   $ x86_64-softmmu/qemu-system-x86_64 -m 1024 -enable-kvm -cpu host \
>>>>           -drive if=virtio,cache=none,file=test.img
>>>>
>>>> Please include more information, like the QEMU command-line and commit
>>>> ID.
>>>>
>>>> Stefan
>>>>
>>>
>>> This regression occurs on the Windows host. SLIRP hangs in sorecvfrom().
>>
>> Can you bisect it?
>>
>> Paolo
>>
> The first break is the commit
> 5e3bc735d93dd23f074b5116fd11e1ad8cd4962f. But it need more packets
> than HEAD to reproduce this regression.

Please check if this partial revert of that commit fixes it:
Paolo Bonzini - May 14, 2013, 11:55 a.m.
Il 14/05/2013 13:48, Paolo Bonzini ha scritto:
> Il 14/05/2013 13:39, TeLeMan ha scritto:
>> On Tue, May 14, 2013 at 6:46 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> Il 14/05/2013 12:24, TeLeMan ha scritto:
>>>> On Tue, May 14, 2013 at 3:51 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>>>> On Tue, May 14, 2013 at 12:02:24AM -0000, therock247uk wrote:
>>>>>> just wondering as i just compiled the latest git and qemu goes into none
>>>>>> responding mode when i try to do any networking stuff on guests (both
>>>>>> linux and windows)
>>>>>
>>>>> Works for me on qemu.git/master on Linux:
>>>>>
>>>>>   $ git rev-parse HEAD
>>>>>   b087143b4d010451208264b7c841436aafe1cbb1
>>>>>   $ x86_64-softmmu/qemu-system-x86_64 -m 1024 -enable-kvm -cpu host \
>>>>>           -drive if=virtio,cache=none,file=test.img
>>>>>
>>>>> Please include more information, like the QEMU command-line and commit
>>>>> ID.
>>>>>
>>>>> Stefan
>>>>>
>>>>
>>>> This regression occurs on the Windows host. SLIRP hangs in sorecvfrom().
>>>
>>> Can you bisect it?
>>>
>>> Paolo
>>>
>> The first break is the commit
>> 5e3bc735d93dd23f074b5116fd11e1ad8cd4962f. But it need more packets
>> than HEAD to reproduce this regression.
> 
> Please check if this partial revert of that commit fixes it:

Yeah, this should work...  WSAEventSelect is edge-triggered and the
event will not be signaled if the socket handler does not consume all
the data in the socket buffer.

Paolo
TeLeMan - May 15, 2013, 1:38 a.m.
On Tue, May 14, 2013 at 7:55 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 14/05/2013 13:48, Paolo Bonzini ha scritto:
>> Il 14/05/2013 13:39, TeLeMan ha scritto:
>>> On Tue, May 14, 2013 at 6:46 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>> Il 14/05/2013 12:24, TeLeMan ha scritto:
>>>>> On Tue, May 14, 2013 at 3:51 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>>>>> On Tue, May 14, 2013 at 12:02:24AM -0000, therock247uk wrote:
>>>>>>> just wondering as i just compiled the latest git and qemu goes into none
>>>>>>> responding mode when i try to do any networking stuff on guests (both
>>>>>>> linux and windows)
>>>>>>
>>>>>> Works for me on qemu.git/master on Linux:
>>>>>>
>>>>>>   $ git rev-parse HEAD
>>>>>>   b087143b4d010451208264b7c841436aafe1cbb1
>>>>>>   $ x86_64-softmmu/qemu-system-x86_64 -m 1024 -enable-kvm -cpu host \
>>>>>>           -drive if=virtio,cache=none,file=test.img
>>>>>>
>>>>>> Please include more information, like the QEMU command-line and commit
>>>>>> ID.
>>>>>>
>>>>>> Stefan
>>>>>>
>>>>>
>>>>> This regression occurs on the Windows host. SLIRP hangs in sorecvfrom().
>>>>
>>>> Can you bisect it?
>>>>
>>>> Paolo
>>>>
>>> The first break is the commit
>>> 5e3bc735d93dd23f074b5116fd11e1ad8cd4962f. But it need more packets
>>> than HEAD to reproduce this regression.
>>
>> Please check if this partial revert of that commit fixes it:
>
> Yeah, this should work...  WSAEventSelect is edge-triggered and the
> event will not be signaled if the socket handler does not consume all
> the data in the socket buffer.
>
> Paolo
Unfortunately, it does not work.
Paolo Bonzini - May 15, 2013, 8:37 a.m.
Il 15/05/2013 03:38, TeLeMan ha scritto:
> On Tue, May 14, 2013 at 7:55 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 14/05/2013 13:48, Paolo Bonzini ha scritto:
>>> Il 14/05/2013 13:39, TeLeMan ha scritto:
>>>> On Tue, May 14, 2013 at 6:46 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>> Il 14/05/2013 12:24, TeLeMan ha scritto:
>>>>>> On Tue, May 14, 2013 at 3:51 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>>>>>> On Tue, May 14, 2013 at 12:02:24AM -0000, therock247uk wrote:
>>>>>>>> just wondering as i just compiled the latest git and qemu goes into none
>>>>>>>> responding mode when i try to do any networking stuff on guests (both
>>>>>>>> linux and windows)
>>>>>>>
>>>>>>> Works for me on qemu.git/master on Linux:
>>>>>>>
>>>>>>>   $ git rev-parse HEAD
>>>>>>>   b087143b4d010451208264b7c841436aafe1cbb1
>>>>>>>   $ x86_64-softmmu/qemu-system-x86_64 -m 1024 -enable-kvm -cpu host \
>>>>>>>           -drive if=virtio,cache=none,file=test.img
>>>>>>>
>>>>>>> Please include more information, like the QEMU command-line and commit
>>>>>>> ID.
>>>>>>>
>>>>>>> Stefan
>>>>>>>
>>>>>>
>>>>>> This regression occurs on the Windows host. SLIRP hangs in sorecvfrom().
>>>>>
>>>>> Can you bisect it?
>>>>>
>>>>> Paolo
>>>>>
>>>> The first break is the commit
>>>> 5e3bc735d93dd23f074b5116fd11e1ad8cd4962f. But it need more packets
>>>> than HEAD to reproduce this regression.
>>>
>>> Please check if this partial revert of that commit fixes it:
>>
>> Yeah, this should work...  WSAEventSelect is edge-triggered and the
>> event will not be signaled if the socket handler does not consume all
>> the data in the socket buffer.
>
> Unfortunately, it does not work.

Ok... as you can see the patch is just moving a block of code just
before g_main_context_prepare(context, &max_priority).

Can you please try doing the same on top of these six commits:

134a03e0b3d34b01b68107104c525c3bff1211d4
cbff4b342b000a7642125dbdabf61113e05eee44
48ce11ff972c733afaed3e2a2613a2e56081ec92
8917c3bdba37d6fe4393db0fad3fabbde9530d6b
a3e4b4a8091cc4fcf7cb619570c72c54c2d6a6e9
9cbaacf999b01b27dc3a22502705178057af66de

Paolo
TeLeMan - May 16, 2013, 5:52 a.m.
The patch is working on 134a03e0b3d34b01b68107104c525c3bff1211d4 and
is not working from cbff4b342b000a7642125dbdabf61113e05eee44.
--
SUN OF A BEACH


On Wed, May 15, 2013 at 4:37 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 15/05/2013 03:38, TeLeMan ha scritto:
>> On Tue, May 14, 2013 at 7:55 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> Il 14/05/2013 13:48, Paolo Bonzini ha scritto:
>>>> Il 14/05/2013 13:39, TeLeMan ha scritto:
>>>>> On Tue, May 14, 2013 at 6:46 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>>> Il 14/05/2013 12:24, TeLeMan ha scritto:
>>>>>>> On Tue, May 14, 2013 at 3:51 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>>>>>>> On Tue, May 14, 2013 at 12:02:24AM -0000, therock247uk wrote:
>>>>>>>>> just wondering as i just compiled the latest git and qemu goes into none
>>>>>>>>> responding mode when i try to do any networking stuff on guests (both
>>>>>>>>> linux and windows)
>>>>>>>>
>>>>>>>> Works for me on qemu.git/master on Linux:
>>>>>>>>
>>>>>>>>   $ git rev-parse HEAD
>>>>>>>>   b087143b4d010451208264b7c841436aafe1cbb1
>>>>>>>>   $ x86_64-softmmu/qemu-system-x86_64 -m 1024 -enable-kvm -cpu host \
>>>>>>>>           -drive if=virtio,cache=none,file=test.img
>>>>>>>>
>>>>>>>> Please include more information, like the QEMU command-line and commit
>>>>>>>> ID.
>>>>>>>>
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>
>>>>>>> This regression occurs on the Windows host. SLIRP hangs in sorecvfrom().
>>>>>>
>>>>>> Can you bisect it?
>>>>>>
>>>>>> Paolo
>>>>>>
>>>>> The first break is the commit
>>>>> 5e3bc735d93dd23f074b5116fd11e1ad8cd4962f. But it need more packets
>>>>> than HEAD to reproduce this regression.
>>>>
>>>> Please check if this partial revert of that commit fixes it:
>>>
>>> Yeah, this should work...  WSAEventSelect is edge-triggered and the
>>> event will not be signaled if the socket handler does not consume all
>>> the data in the socket buffer.
>>
>> Unfortunately, it does not work.
>
> Ok... as you can see the patch is just moving a block of code just
> before g_main_context_prepare(context, &max_priority).
>
> Can you please try doing the same on top of these six commits:
>
> 134a03e0b3d34b01b68107104c525c3bff1211d4
> cbff4b342b000a7642125dbdabf61113e05eee44
> 48ce11ff972c733afaed3e2a2613a2e56081ec92
> 8917c3bdba37d6fe4393db0fad3fabbde9530d6b
> a3e4b4a8091cc4fcf7cb619570c72c54c2d6a6e9
> 9cbaacf999b01b27dc3a22502705178057af66de
>
> Paolo
Paolo Bonzini - May 16, 2013, 8:59 a.m.
Il 16/05/2013 07:52, TeLeMan ha scritto:
> The patch is working on 134a03e0b3d34b01b68107104c525c3bff1211d4 and
> is not working from cbff4b342b000a7642125dbdabf61113e05eee44.

Thanks.

Fabien or Stefan, can you take a look?

TeLeMan, can you post the exact patches that you tested on those two
commits?  (To recap, there are at least two bugs.  The first is fixed by
the patch at
http://article.gmane.org/gmane.comp.emulators.qemu/211333/raw, the
second is introduced by patch cbff4b3, main-loop: switch to g_poll() on
POSIX hosts, 2013-02-20).

Paolo

> --
> SUN OF A BEACH
> 
> 
> On Wed, May 15, 2013 at 4:37 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 15/05/2013 03:38, TeLeMan ha scritto:
>>> On Tue, May 14, 2013 at 7:55 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>> Il 14/05/2013 13:48, Paolo Bonzini ha scritto:
>>>>> Il 14/05/2013 13:39, TeLeMan ha scritto:
>>>>>> On Tue, May 14, 2013 at 6:46 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>>>> Il 14/05/2013 12:24, TeLeMan ha scritto:
>>>>>>>> On Tue, May 14, 2013 at 3:51 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>>>>>>>> On Tue, May 14, 2013 at 12:02:24AM -0000, therock247uk wrote:
>>>>>>>>>> just wondering as i just compiled the latest git and qemu goes into none
>>>>>>>>>> responding mode when i try to do any networking stuff on guests (both
>>>>>>>>>> linux and windows)
>>>>>>>>>
>>>>>>>>> Works for me on qemu.git/master on Linux:
>>>>>>>>>
>>>>>>>>>   $ git rev-parse HEAD
>>>>>>>>>   b087143b4d010451208264b7c841436aafe1cbb1
>>>>>>>>>   $ x86_64-softmmu/qemu-system-x86_64 -m 1024 -enable-kvm -cpu host \
>>>>>>>>>           -drive if=virtio,cache=none,file=test.img
>>>>>>>>>
>>>>>>>>> Please include more information, like the QEMU command-line and commit
>>>>>>>>> ID.
>>>>>>>>>
>>>>>>>>> Stefan
>>>>>>>>>
>>>>>>>>
>>>>>>>> This regression occurs on the Windows host. SLIRP hangs in sorecvfrom().
>>>>>>>
>>>>>>> Can you bisect it?
>>>>>>>
>>>>>>> Paolo
>>>>>>>
>>>>>> The first break is the commit
>>>>>> 5e3bc735d93dd23f074b5116fd11e1ad8cd4962f. But it need more packets
>>>>>> than HEAD to reproduce this regression.
>>>>>
>>>>> Please check if this partial revert of that commit fixes it:
>>>>
>>>> Yeah, this should work...  WSAEventSelect is edge-triggered and the
>>>> event will not be signaled if the socket handler does not consume all
>>>> the data in the socket buffer.
>>>
>>> Unfortunately, it does not work.
>>
>> Ok... as you can see the patch is just moving a block of code just
>> before g_main_context_prepare(context, &max_priority).
>>
>> Can you please try doing the same on top of these six commits:
>>
>> 134a03e0b3d34b01b68107104c525c3bff1211d4
>> cbff4b342b000a7642125dbdabf61113e05eee44
>> 48ce11ff972c733afaed3e2a2613a2e56081ec92
>> 8917c3bdba37d6fe4393db0fad3fabbde9530d6b
>> a3e4b4a8091cc4fcf7cb619570c72c54c2d6a6e9
>> 9cbaacf999b01b27dc3a22502705178057af66de
>>
>> Paolo
> 
>
Fabien Chouteau - May 16, 2013, 10:33 a.m.
On 05/16/2013 10:59 AM, Paolo Bonzini wrote:
> Il 16/05/2013 07:52, TeLeMan ha scritto:
>> The patch is working on 134a03e0b3d34b01b68107104c525c3bff1211d4 and
>> is not working from cbff4b342b000a7642125dbdabf61113e05eee44.
> 
> Thanks.
> 
> Fabien or Stefan, can you take a look?
> 

Unfortunately I don't have time to investigate these days.

Patch

diff --git a/main-loop.c b/main-loop.c
index f46aece..79c45b8 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -394,6 +394,20 @@  static int os_host_main_loop_wait(uint32_t timeout)
         return ret;
     }
 
+    FD_ZERO(&rfds);
+    FD_ZERO(&wfds);
+    FD_ZERO(&xfds);
+    nfds = pollfds_fill(gpollfds, &rfds, &wfds, &xfds);
+    if (nfds >= 0) {
+        select_ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv0);
+        if (select_ret != 0) {
+            timeout = 0;
+        }
+        if (select_ret > 0) {
+            pollfds_poll(gpollfds, nfds, &rfds, &wfds, &xfds);
+        }
+    }
+
     g_main_context_prepare(context, &max_priority);
     n_poll_fds = g_main_context_query(context, max_priority, &poll_timeout,
                                       poll_fds, ARRAY_SIZE(poll_fds));
@@ -426,24 +440,11 @@  static int os_host_main_loop_wait(uint32_t timeout)
         g_main_context_dispatch(context);
     }
 
-    /* Call select after g_poll to avoid a useless iteration and therefore
-     * improve socket latency.
+    /* If an edge-triggered socket event occurred, select will return a
+     * positive result on the next iteration.  We do not need to do anything
+     * here.
      */
 
-    FD_ZERO(&rfds);
-    FD_ZERO(&wfds);
-    FD_ZERO(&xfds);
-    nfds = pollfds_fill(gpollfds, &rfds, &wfds, &xfds);
-    if (nfds >= 0) {
-        select_ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv0);
-        if (select_ret != 0) {
-            timeout = 0;
-        }
-        if (select_ret > 0) {
-            pollfds_poll(gpollfds, nfds, &rfds, &wfds, &xfds);
-        }
-    }
-
     return select_ret || g_poll_ret;
 }
 #endif