Message ID | 6389e86d0d68cc476c04c1e9b944208e.squirrel@clownix.net |
---|---|
State | New |
Headers | show |
On Sun, Feb 24, 2013 at 02:14:43PM -0600, clownix@clownix.net wrote: > I use qemu inside a gplv3 software called cloonix, I have patched qemu to > have unix sockets instead of inet ones but the bug I have with unix > sockets may also happen with inet ones. > > The bug can be reproduced in cloonix context by using iperf, it occurs > randomly in a virtual cloonix network but occurs within seconds using > iperf in nested virtualisation (cloonix inside cloonix), the problem > begins when a lot of packets must be transmitted and the socket (inet in > the classical qemu, unix in cloonix) gets full and > qemu_net_queue_append_iov is called, then tx never restarts. > > See under in the patch attached, the way I avoided queuing anything, it > works, even if it is not a correction to the bug... > > The patch is for version 21.3 of cloonix which uses qemu-1.4.0-rc1, but > I now use qemu-1.4.0 and the bug is still there. Thanks for the bug report. This sounds like a problem with net/socket.c - it's supposed to restart the queue when the socket becomes writable again. Can you share a way to reproduce this tx stall with vanilla QEMU? Stefan
On Sun, Feb 24, 2013 at 02:14:43PM -0600, clownix@clownix.net wrote: > The patch is for version 21.3 of cloonix which uses qemu-1.4.0-rc1, but > I now use qemu-1.4.0 and the bug is still there. Please post the QEMU command-line so we can see how the socket netdev was configured. Stefan
Hello Stefan, I coded a socket-based cable between 2 vanilla kvm, here are the commands to do: tar xvf qemu_test_sock.tar.gz cd qemu_test_sock make ./qemu_test_sock kvm \ -nodefaults \ -nographic \ -serial stdio \ -drive file=guest1,media=disk,if=virtio \ -device virtio-net-pci,tx=bh,vlan=1,mac=02:01:01:01:01:01 \ -net socket,vlan=1,connect=127.0.0.1:47654 kvm \ -nodefaults \ -nographic \ -serial stdio \ -drive file=guest2,media=disk,if=virtio \ -device virtio-net-pci,tx=bh,vlan=1,mac=02:02:02:02:02:02 \ -net socket,vlan=1,connect=127.0.0.1:47655 ifconfig eth0 1.1.1.1 iperf -s -u ifconfig eth0 1.1.1.2 iperf -c 1.1.1.1 -u -b 100M then when you feel something is not right: ping 1.1.1.1 From 1.1.1.2 icmp_seq=24 Destination Host Unreachable From 1.1.1.2 icmp_seq=25 Destination Host Unreachable From 1.1.1.2 icmp_seq=26 Destination Host Unreachable 64 bytes from 1.1.1.1: icmp_req=1 ttl=64 time=29128 ms 64 bytes from 1.1.1.1: icmp_req=2 ttl=64 time=28121 ms The principle: the process between the kvm just takes messages from one side to the other, then it stops working for 5 seconde every 30 seconds to create a socket full problem. I have not the same trouble as with my unix socket, in the inet case the ethernet access has an empty moment (of 30 sec in above case) and starts again... I hope that helps, but I could not reproduce the total ethernet stop I had in cloonix... > On Sun, Feb 24, 2013 at 02:14:43PM -0600, clownix@clownix.net wrote: >> The patch is for version 21.3 of cloonix which uses qemu-1.4.0-rc1, but >> I now use qemu-1.4.0 and the bug is still there. > > Please post the QEMU command-line so we can see how the socket netdev > was configured. > > Stefan >
On Mon, Feb 25, 2013 at 11:38 PM, <clownix@clownix.net> wrote: > I coded a socket-based cable between 2 vanilla kvm, here are the commands > to do: Please try: kvm \ -nodefaults \ -nographic \ -serial stdio \ -drive file=guest1,media=disk,if=virtio \ -netdev socket,id=socket0,connect=127.0.0.1:47654 \ -device virtio-net-pci,tx=bh,netdev=socket0,mac=02:01:01:01:01:01 kvm \ -nodefaults \ -nographic \ -serial stdio \ -drive file=guest2,media=disk,if=virtio \ -netdev socket,id=socket0,connect=127.0.0.1:47655 \ -device virtio-net-pci,tx=bh,netdev=socket0,mac=02:02:02:02:02:02 Notice that -netdev socket is used instead of -net socket,vlan=1. Luigi Rizzo recently fixed a bug where traffic could stall when using the QEMU "vlan" feature: http://lists.gnu.org/archive/html/qemu-devel/2013-02/msg00679.html If you want to try this fix, use the git://github.com/stefanha/qemu.git net branch. Stefan
On 2013-02-26 10:21, Stefan Hajnoczi wrote: > On Mon, Feb 25, 2013 at 11:38 PM, <clownix@clownix.net> wrote: >> I coded a socket-based cable between 2 vanilla kvm, here are the commands >> to do: > > Please try: > > kvm \ > -nodefaults \ > -nographic \ > -serial stdio \ > -drive file=guest1,media=disk,if=virtio \ > -netdev socket,id=socket0,connect=127.0.0.1:47654 \ > -device virtio-net-pci,tx=bh,netdev=socket0,mac=02:01:01:01:01:01 > > kvm \ > -nodefaults \ > -nographic \ > -serial stdio \ > -drive file=guest2,media=disk,if=virtio \ > -netdev socket,id=socket0,connect=127.0.0.1:47655 \ > -device virtio-net-pci,tx=bh,netdev=socket0,mac=02:02:02:02:02:02 > > Notice that -netdev socket is used instead of -net socket,vlan=1. That's pointless. -netdev socket is still broken, only -net works. Jan
diff -Naur qemu-1.4.0-rc1/net/net.c new_qemu-1.4.0-rc1/net/net.c --- qemu-1.4.0-rc1/net/net.c 2013-02-07 01:40:56.000000000 +0100 +++ new_qemu-1.4.0-rc1/net/net.c 2013-02-24 16:03:45.139853349 +0100 @@ -388,10 +388,14 @@ } if (sender->peer->receive_disabled) { - return 0; +//cloonix DROP +// return 0; + return 1; } else if (sender->peer->info->can_receive && !sender->peer->info->can_receive(sender->peer)) { - return 0; +//cloonix DROP +// return 0; + return 1; } return 1; }
Hello, I use qemu inside a gplv3 software called cloonix, I have patched qemu to have unix sockets instead of inet ones but the bug I have with unix sockets may also happen with inet ones. The bug can be reproduced in cloonix context by using iperf, it occurs randomly in a virtual cloonix network but occurs within seconds using iperf in nested virtualisation (cloonix inside cloonix), the problem begins when a lot of packets must be transmitted and the socket (inet in the classical qemu, unix in cloonix) gets full and qemu_net_queue_append_iov is called, then tx never restarts. See under in the patch attached, the way I avoided queuing anything, it works, even if it is not a correction to the bug... The patch is for version 21.3 of cloonix which uses qemu-1.4.0-rc1, but I now use qemu-1.4.0 and the bug is still there. Regards Vincent Perrier ---------------------------- Original Message ---------------------------- Subject: qemu tx stop in cloonix From: clownix@clownix.net Date: Sun, February 24, 2013 9:14 am To: "list" <cloonix-list@clownix.net> -------------------------------------------------------------------------- There is a bug visible more particularly when doing nested cloonix and iperf inside the second level nested machines. The ethernet interface emitting a big load stops working, this has been corrected in my version but I will not deliver the correction outside the regular deliveries (every 2 or 3 months). If you have a stopping of your ethernet access after a burst of traffic, here is the cause: From the kernel virtio_net driver inside the guest, piles of messages are sent into a virtio queue to the qemu user process. The qemu user process does what it can to give the messages to a unix socket (to cloonix). When too much traffic arrives, the unix socket writes 0 bytes as it gets full. Then qemu, instead of droping packet (too much is too much, no need to try harder) qemu does not want to drop, it tries to enqueue packets until the unix socket clears... The mechanics of this unusual case management is too complex, I did not get into it to repair it, I just dropped the packets, simplest solution, and the above layers know low level packets can just disapear... Here is the solution: in "sources/Cloonix-Net-Lab/qemu", in file cmd, add the following qemu_drop_burst.patch line after having put the qemu_drop_burst.patch in the qemu directory. patch -p1 < ../cloonix_qemu.patch patch -p1 < ../qemu_drop_burst.patch The qemu_drop_burst.patch should be with this mail...