Message ID | 1522316251-16399-1-git-send-email-thuth@redhat.com |
---|---|
State | New |
Headers | show |
Series | [for-2.13] pc-bios/s390-ccw: Increase virtio timeout to 30 seconds | expand |
On 03/29/2018 11:37 AM, Thomas Huth wrote: > The current timeout is set to only three seconds - and considering that > vring_wait_reply() or rather get_second() is not doing any rounding, > the real timeout is likely rather 2 seconds in most cases. When the > host is really badly loaded and we run the guest in TCG mode instead > of KVM, it's possible that we hit this timeout by mistake. So let's > increase the timeout to 30 seconds instead to ease this situation (30 > seconds is also the timeout that is used by the Linux SCSI subsystem > for example, so this seems to be a sane value for block IO timeout). I have never seen this, but cant this also with KVM (e.g. host paging and the swap disk is busy for some seconds). Wouldnt it also qualify for 2.12? > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1549079 > Signed-off-by: Thomas Huth <thuth@redhat.com> > --- > pc-bios/s390-ccw/virtio.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c > index 817e7f5..cdb66f4 100644 > --- a/pc-bios/s390-ccw/virtio.c > +++ b/pc-bios/s390-ccw/virtio.c > @@ -14,7 +14,7 @@ > #include "virtio-scsi.h" > #include "bswap.h" > > -#define VRING_WAIT_REPLY_TIMEOUT 3 > +#define VRING_WAIT_REPLY_TIMEOUT 30 > > static VRing block[VIRTIO_MAX_VQS]; > static char ring_area[VIRTIO_RING_SIZE * VIRTIO_MAX_VQS] >
On Thu, 29 Mar 2018 12:03:42 +0200 Christian Borntraeger <borntraeger@de.ibm.com> wrote: > On 03/29/2018 11:37 AM, Thomas Huth wrote: > > The current timeout is set to only three seconds - and considering that > > vring_wait_reply() or rather get_second() is not doing any rounding, > > the real timeout is likely rather 2 seconds in most cases. When the > > host is really badly loaded and we run the guest in TCG mode instead > > of KVM, it's possible that we hit this timeout by mistake. So let's > > increase the timeout to 30 seconds instead to ease this situation (30 > > seconds is also the timeout that is used by the Linux SCSI subsystem > > for example, so this seems to be a sane value for block IO timeout). > > I have never seen this, but cant this also with KVM (e.g. host paging > and the swap disk is busy for some seconds). It's probably less likely, but I don't see why not. 30s sounds like a more forgiving value. > > Wouldnt it also qualify for 2.12? I can queue it (and a bios rebuild) to s390-fixes, it's certainly reasonable. Else, Reviewed-by: Cornelia Huck <cohuck@redhat.com> > > > > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1549079 > > Signed-off-by: Thomas Huth <thuth@redhat.com> > > --- > > pc-bios/s390-ccw/virtio.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c > > index 817e7f5..cdb66f4 100644 > > --- a/pc-bios/s390-ccw/virtio.c > > +++ b/pc-bios/s390-ccw/virtio.c > > @@ -14,7 +14,7 @@ > > #include "virtio-scsi.h" > > #include "bswap.h" > > > > -#define VRING_WAIT_REPLY_TIMEOUT 3 > > +#define VRING_WAIT_REPLY_TIMEOUT 30 > > > > static VRing block[VIRTIO_MAX_VQS]; > > static char ring_area[VIRTIO_RING_SIZE * VIRTIO_MAX_VQS] > > >
On 29.03.2018 12:03, Christian Borntraeger wrote: > > > On 03/29/2018 11:37 AM, Thomas Huth wrote: >> The current timeout is set to only three seconds - and considering that >> vring_wait_reply() or rather get_second() is not doing any rounding, >> the real timeout is likely rather 2 seconds in most cases. When the >> host is really badly loaded and we run the guest in TCG mode instead >> of KVM, it's possible that we hit this timeout by mistake. So let's >> increase the timeout to 30 seconds instead to ease this situation (30 >> seconds is also the timeout that is used by the Linux SCSI subsystem >> for example, so this seems to be a sane value for block IO timeout). > > I have never seen this, but cant this also with KVM (e.g. host paging > and the swap disk is busy for some seconds). Sure, I even assume that Lukáš might have hit this with KVM, and not with TCG. It's just a little bit more unlikely there. > Wouldnt it also qualify for 2.12? Yes, I was just not sure whether we want to rebuild the s390-ccw.img just because of this ... I'd like to leave that decision to you and Cornelia. Thomas
On 03/29/2018 11:37 AM, Thomas Huth wrote: > The current timeout is set to only three seconds - and considering that > vring_wait_reply() or rather get_second() is not doing any rounding, > the real timeout is likely rather 2 seconds in most cases. When the > host is really badly loaded and we run the guest in TCG mode instead > of KVM, it's possible that we hit this timeout by mistake. So let's > increase the timeout to 30 seconds instead to ease this situation (30 > seconds is also the timeout that is used by the Linux SCSI subsystem > for example, so this seems to be a sane value for block IO timeout). > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1549079 > Signed-off-by: Thomas Huth <thuth@redhat.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> I think this should go into 2.12 with a rebuild > --- > pc-bios/s390-ccw/virtio.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c > index 817e7f5..cdb66f4 100644 > --- a/pc-bios/s390-ccw/virtio.c > +++ b/pc-bios/s390-ccw/virtio.c > @@ -14,7 +14,7 @@ > #include "virtio-scsi.h" > #include "bswap.h" > > -#define VRING_WAIT_REPLY_TIMEOUT 3 > +#define VRING_WAIT_REPLY_TIMEOUT 30 > > static VRing block[VIRTIO_MAX_VQS]; > static char ring_area[VIRTIO_RING_SIZE * VIRTIO_MAX_VQS] >
On Thu, 29 Mar 2018 11:37:31 +0200 Thomas Huth <thuth@redhat.com> wrote: > The current timeout is set to only three seconds - and considering that > vring_wait_reply() or rather get_second() is not doing any rounding, > the real timeout is likely rather 2 seconds in most cases. When the > host is really badly loaded and we run the guest in TCG mode instead > of KVM, it's possible that we hit this timeout by mistake. Let's tweak this sentence to "When the host is really badly loaded, it's possible that we hit this timeout by mistake; it's even more likely if we run the guest in TCG mode instead of KVM." ? > So let's > increase the timeout to 30 seconds instead to ease this situation (30 > seconds is also the timeout that is used by the Linux SCSI subsystem > for example, so this seems to be a sane value for block IO timeout). > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1549079 > Signed-off-by: Thomas Huth <thuth@redhat.com> > --- > pc-bios/s390-ccw/virtio.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-)
On 29.03.2018 15:15, Cornelia Huck wrote: > On Thu, 29 Mar 2018 11:37:31 +0200 > Thomas Huth <thuth@redhat.com> wrote: > >> The current timeout is set to only three seconds - and considering that >> vring_wait_reply() or rather get_second() is not doing any rounding, >> the real timeout is likely rather 2 seconds in most cases. When the >> host is really badly loaded and we run the guest in TCG mode instead >> of KVM, it's possible that we hit this timeout by mistake. > > Let's tweak this sentence to > > "When the host is really badly loaded, it's possible that we hit this > timeout by mistake; it's even more likely if we run the guest in TCG > mode instead of KVM." > > ? Fine for me. You could even remove that TCG vs. KVM part if you like. Thomas
On Thu, 29 Mar 2018 11:37:31 +0200 Thomas Huth <thuth@redhat.com> wrote: > The current timeout is set to only three seconds - and considering that > vring_wait_reply() or rather get_second() is not doing any rounding, > the real timeout is likely rather 2 seconds in most cases. When the > host is really badly loaded and we run the guest in TCG mode instead > of KVM, it's possible that we hit this timeout by mistake. So let's > increase the timeout to 30 seconds instead to ease this situation (30 > seconds is also the timeout that is used by the Linux SCSI subsystem > for example, so this seems to be a sane value for block IO timeout). > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1549079 > Signed-off-by: Thomas Huth <thuth@redhat.com> > --- > pc-bios/s390-ccw/virtio.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c > index 817e7f5..cdb66f4 100644 > --- a/pc-bios/s390-ccw/virtio.c > +++ b/pc-bios/s390-ccw/virtio.c > @@ -14,7 +14,7 @@ > #include "virtio-scsi.h" > #include "bswap.h" > > -#define VRING_WAIT_REPLY_TIMEOUT 3 > +#define VRING_WAIT_REPLY_TIMEOUT 30 > > static VRing block[VIRTIO_MAX_VQS]; > static char ring_area[VIRTIO_RING_SIZE * VIRTIO_MAX_VQS] Thanks, merged this and a rebuild into s390-fixes.
Hi, This series failed docker-quick@centos6 build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. Type: series Message-id: 1522316251-16399-1-git-send-email-thuth@redhat.com Subject: [Qemu-devel] [PATCH for-2.13] pc-bios/s390-ccw: Increase virtio timeout to 30 seconds === TEST SCRIPT BEGIN === #!/bin/bash set -e git submodule update --init dtc # Let docker tests dump environment info export SHOW_ENV=1 export J=8 time make docker-test-quick@centos6 === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 Switched to a new branch 'test' eab5813eba pc-bios/s390-ccw: Increase virtio timeout to 30 seconds === OUTPUT BEGIN === Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc' Cloning into '/var/tmp/patchew-tester-tmp-btm39b6h/src/dtc'... Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42' BUILD centos6 make[1]: Entering directory '/var/tmp/patchew-tester-tmp-btm39b6h/src' GEN /var/tmp/patchew-tester-tmp-btm39b6h/src/docker-src.2018-03-31-02.27.43.19655/qemu.tar Cloning into '/var/tmp/patchew-tester-tmp-btm39b6h/src/docker-src.2018-03-31-02.27.43.19655/qemu.tar.vroot'... done. Checking out files: 41% (2535/6066) Checking out files: 42% (2548/6066) Checking out files: 43% (2609/6066) Checking out files: 44% (2670/6066) Checking out files: 45% (2730/6066) Checking out files: 46% (2791/6066) Checking out files: 47% (2852/6066) Checking out files: 48% (2912/6066) Checking out files: 49% (2973/6066) Checking out files: 50% (3033/6066) Checking out files: 51% (3094/6066) Checking out files: 52% (3155/6066) Checking out files: 53% (3215/6066) Checking out files: 54% (3276/6066) Checking out files: 55% (3337/6066) Checking out files: 56% (3397/6066) Checking out files: 57% (3458/6066) Checking out files: 58% (3519/6066) Checking out files: 59% (3579/6066) Checking out files: 60% (3640/6066) Checking out files: 60% (3652/6066) Checking out files: 61% (3701/6066) Checking out files: 62% (3761/6066) Checking out files: 63% (3822/6066) Checking out files: 64% (3883/6066) Checking out files: 65% (3943/6066) Checking out files: 66% (4004/6066) Checking out files: 67% (4065/6066) Checking out files: 68% (4125/6066) Checking out files: 69% (4186/6066) Checking out files: 69% (4194/6066) Checking out files: 70% (4247/6066) Checking out files: 71% (4307/6066) Checking out files: 72% (4368/6066) Checking out files: 73% (4429/6066) Checking out files: 73% (4481/6066) Checking out files: 74% (4489/6066) Checking out files: 75% (4550/6066) Checking out files: 76% (4611/6066) Checking out files: 77% (4671/6066) Checking out files: 78% (4732/6066) Checking out files: 79% (4793/6066) Checking out files: 80% (4853/6066) Checking out files: 81% (4914/6066) Checking out files: 82% (4975/6066) Checking out files: 83% (5035/6066) Checking out files: 84% (5096/6066) Checking out files: 85% (5157/6066) Checking out files: 86% (5217/6066) Checking out files: 87% (5278/6066) Checking out files: 88% (5339/6066) Checking out files: 89% (5399/6066) Checking out files: 90% (5460/6066) Checking out files: 91% (5521/6066) Checking out files: 92% (5581/6066) Checking out files: 93% (5642/6066) Checking out files: 94% (5703/6066) Checking out files: 95% (5763/6066) Checking out files: 96% (5824/6066) Checking out files: 97% (5885/6066) Checking out files: 98% (5945/6066) Checking out files: 99% (6006/6066) Checking out files: 100% (6066/6066) Checking out files: 100% (6066/6066), done. Your branch is up-to-date with 'origin/test'. Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc' Cloning into '/var/tmp/patchew-tester-tmp-btm39b6h/src/docker-src.2018-03-31-02.27.43.19655/qemu.tar.vroot/dtc'... Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42' Submodule 'ui/keycodemapdb' (git://git.qemu.org/keycodemapdb.git) registered for path 'ui/keycodemapdb' Cloning into '/var/tmp/patchew-tester-tmp-btm39b6h/src/docker-src.2018-03-31-02.27.43.19655/qemu.tar.vroot/ui/keycodemapdb'... Submodule path 'ui/keycodemapdb': checked out '6b3d716e2b6472eb7189d3220552280ef3d832ce' tar: /var/tmp/patchew-tester-tmp-btm39b6h/src/docker-src.2018-03-31-02.27.43.19655/qemu.tar: Wrote only 4096 of 10240 bytes tar: Error is not recoverable: exiting now failed to create tar file COPY RUNNER RUN test-quick in qemu:centos6 tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now /var/tmp/qemu/run: line 32: prep_fail: command not found Packages installed: SDL-devel-1.2.14-7.el6_7.1.x86_64 bison-2.4.1-5.el6.x86_64 bzip2-devel-1.0.5-7.el6_0.x86_64 ccache-3.1.6-2.el6.x86_64 csnappy-devel-0-6.20150729gitd7bc683.el6.x86_64 flex-2.5.35-9.el6.x86_64 gcc-4.4.7-18.el6.x86_64 gettext-0.17-18.el6.x86_64 git-1.7.1-9.el6_9.x86_64 glib2-devel-2.28.8-9.el6.x86_64 libepoxy-devel-1.2-3.el6.x86_64 libfdt-devel-1.4.0-1.el6.x86_64 librdmacm-devel-1.0.21-0.el6.x86_64 lzo-devel-2.03-3.1.el6_5.1.x86_64 make-3.81-23.el6.x86_64 mesa-libEGL-devel-11.0.7-4.el6.x86_64 mesa-libgbm-devel-11.0.7-4.el6.x86_64 package g++ is not installed pixman-devel-0.32.8-1.el6.x86_64 spice-glib-devel-0.26-8.el6.x86_64 spice-server-devel-0.12.4-16.el6.x86_64 tar-1.23-15.el6_8.x86_64 vte-devel-0.25.1-9.el6.x86_64 xen-devel-4.6.6-2.el6.x86_64 zlib-devel-1.2.3-29.el6.x86_64 Environment variables: PACKAGES=bison bzip2-devel ccache csnappy-devel flex g++ gcc gettext git glib2-devel libepoxy-devel libfdt-devel librdmacm-devel lzo-devel make mesa-libEGL-devel mesa-libgbm-devel pixman-devel SDL-devel spice-glib-devel spice-server-devel tar vte-devel xen-devel zlib-devel HOSTNAME=da49c0188004 MAKEFLAGS= -j8 J=8 CCACHE_DIR=/var/tmp/ccache EXTRA_CONFIGURE_OPTS= V= SHOW_ENV=1 PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin PWD=/ TARGET_LIST= SHLVL=1 HOME=/root TEST_DIR=/tmp/qemu-test FEATURES= dtc DEBUG= _=/usr/bin/env Configure options: --enable-werror --target-list=x86_64-softmmu,aarch64-softmmu --prefix=/tmp/qemu-test/install ERROR: DTC (libfdt) version >= 1.4.2 not present. Please install the DTC (libfdt) devel package Traceback (most recent call last): File "./tests/docker/docker.py", line 407, in <module> sys.exit(main()) File "./tests/docker/docker.py", line 404, in main return args.cmdobj.run(args, argv) File "./tests/docker/docker.py", line 261, in run return Docker().run(argv, args.keep, quiet=args.quiet) File "./tests/docker/docker.py", line 229, in run quiet=quiet) File "./tests/docker/docker.py", line 147, in _do_check return subprocess.check_call(self._command + cmd, **kwargs) File "/usr/lib64/python2.7/subprocess.py", line 186, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['docker', 'run', '--label', 'com.qemu.instance.uuid=a9e9312034ac11e8bfdf52540069c830', '-u', '0', '--security-opt', 'seccomp=unconfined', '--rm', '--net=none', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=8', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/root/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-btm39b6h/src/docker-src.2018-03-31-02.27.43.19655:/var/tmp/qemu:z,ro', 'qemu:centos6', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 1 make[1]: *** [tests/docker/Makefile.include:129: docker-run] Error 1 make[1]: Leaving directory '/var/tmp/patchew-tester-tmp-btm39b6h/src' make: *** [tests/docker/Makefile.include:163: docker-run-test-quick@centos6] Error 2 real 0m52.122s user 0m8.994s sys 0m6.805s === OUTPUT END === Test command exited with code: 2 --- Email generated automatically by Patchew [http://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c index 817e7f5..cdb66f4 100644 --- a/pc-bios/s390-ccw/virtio.c +++ b/pc-bios/s390-ccw/virtio.c @@ -14,7 +14,7 @@ #include "virtio-scsi.h" #include "bswap.h" -#define VRING_WAIT_REPLY_TIMEOUT 3 +#define VRING_WAIT_REPLY_TIMEOUT 30 static VRing block[VIRTIO_MAX_VQS]; static char ring_area[VIRTIO_RING_SIZE * VIRTIO_MAX_VQS]
The current timeout is set to only three seconds - and considering that vring_wait_reply() or rather get_second() is not doing any rounding, the real timeout is likely rather 2 seconds in most cases. When the host is really badly loaded and we run the guest in TCG mode instead of KVM, it's possible that we hit this timeout by mistake. So let's increase the timeout to 30 seconds instead to ease this situation (30 seconds is also the timeout that is used by the Linux SCSI subsystem for example, so this seems to be a sane value for block IO timeout). Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1549079 Signed-off-by: Thomas Huth <thuth@redhat.com> --- pc-bios/s390-ccw/virtio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)