diff mbox series

[v2,6/7] iotests: amend QEMU NBD process synchronization

Message ID 1560276131-683243-7-git-send-email-andrey.shinkevich@virtuozzo.com
State New
Headers show
Series Allow Valgrind checking all QEMU processes | expand

Commit Message

Andrey Shinkevich June 11, 2019, 6:02 p.m. UTC
Processes are dying harder under the Valgring. It results in counting
the dying process as a newborn one. Make it sure that old NBD job get
finished before starting a new one.

Suggested-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 tests/qemu-iotests/common.nbd | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Vladimir Sementsov-Ogievskiy June 13, 2019, 9:59 a.m. UTC | #1
11.06.2019 21:02, Andrey Shinkevich wrote:
> Processes are dying harder under the Valgring. It results in counting
> the dying process as a newborn one. Make it sure that old NBD job get
> finished before starting a new one.
> 
> Suggested-by: Roman Kagan <rkagan@virtuozzo.com>
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>   tests/qemu-iotests/common.nbd | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/tests/qemu-iotests/common.nbd b/tests/qemu-iotests/common.nbd
> index 25fc9ff..e3dcc60 100644
> --- a/tests/qemu-iotests/common.nbd
> +++ b/tests/qemu-iotests/common.nbd
> @@ -22,6 +22,7 @@
>   nbd_unix_socket="${TEST_DIR}/qemu-nbd.sock"
>   nbd_tcp_addr="127.0.0.1"
>   nbd_pid_file="${TEST_DIR}/qemu-nbd.pid"
> +nbd_job_pid=""
>   
>   nbd_server_stop()
>   {
> @@ -33,6 +34,9 @@ nbd_server_stop()
>               kill "$NBD_PID"
>           fi
>       fi

Honestly, I don't understand the problem from commit message, but anyway comment
should be added here, to mark that this is for valgrind... But you don't check for
VALGRIND enabled.. I it intentional?

> +    if [ -n "$nbd_job_pid" ] && kill -s 0 "$nbd_job_pid" 2>/dev/null; then
> +        wait "$nbd_job_pid"
> +    fi
>       rm -f "$nbd_unix_socket"
>   }
>   
> @@ -61,6 +65,7 @@ nbd_server_start_unix_socket()
>   {
>       nbd_server_stop
>       $QEMU_NBD -v -t -k "$nbd_unix_socket" "$@" &
> +    nbd_job_pid=$!
>       nbd_server_wait_for_unix_socket $!
>   }
>   
> @@ -105,5 +110,6 @@ nbd_server_start_tcp_socket()
>   {
>       nbd_server_stop
>       $QEMU_NBD -v -t -b $nbd_tcp_addr -p $nbd_tcp_port "$@" &
> +    nbd_job_pid=$!
>       nbd_server_wait_for_tcp_socket $!
>   }
>
Roman Kagan June 17, 2019, 12:38 p.m. UTC | #2
On Tue, Jun 11, 2019 at 09:02:10PM +0300, Andrey Shinkevich wrote:
> Processes are dying harder under the Valgring. It results in counting
> the dying process as a newborn one. Make it sure that old NBD job get
> finished before starting a new one.

I think this log message is confusing.

The problem this patch addresses is that nbd_server_stop only sends a
signal to the nbd process and immediately returns, without waiting for
it to actually terminate.  The next operation is often starting a new
instance of nbd; this races with the termination of the old one, and may
result in various failures (like nbd_server_start_* taking the
terminating nbd as the one just started, or the starting nbd
encountering a busy listening socket).

Without valgrind the race window is very small and the problem didn't
surface for long.  However, under valgrind process termination takes
much longer so the race bites every test run.

Since nbd is run in a background job of the test, record the nbd pid at
the daemon start in a shell variable and perform a wait for it when
terminating it.

Roman.

> Suggested-by: Roman Kagan <rkagan@virtuozzo.com>
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>  tests/qemu-iotests/common.nbd | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/tests/qemu-iotests/common.nbd b/tests/qemu-iotests/common.nbd
> index 25fc9ff..e3dcc60 100644
> --- a/tests/qemu-iotests/common.nbd
> +++ b/tests/qemu-iotests/common.nbd
> @@ -22,6 +22,7 @@
>  nbd_unix_socket="${TEST_DIR}/qemu-nbd.sock"
>  nbd_tcp_addr="127.0.0.1"
>  nbd_pid_file="${TEST_DIR}/qemu-nbd.pid"
> +nbd_job_pid=""
>  
>  nbd_server_stop()
>  {
> @@ -33,6 +34,9 @@ nbd_server_stop()
>              kill "$NBD_PID"
>          fi
>      fi
> +    if [ -n "$nbd_job_pid" ] && kill -s 0 "$nbd_job_pid" 2>/dev/null; then
> +        wait "$nbd_job_pid"
> +    fi
>      rm -f "$nbd_unix_socket"
>  }
>  
> @@ -61,6 +65,7 @@ nbd_server_start_unix_socket()
>  {
>      nbd_server_stop
>      $QEMU_NBD -v -t -k "$nbd_unix_socket" "$@" &
> +    nbd_job_pid=$!
>      nbd_server_wait_for_unix_socket $!
>  }
>  
> @@ -105,5 +110,6 @@ nbd_server_start_tcp_socket()
>  {
>      nbd_server_stop
>      $QEMU_NBD -v -t -b $nbd_tcp_addr -p $nbd_tcp_port "$@" &
> +    nbd_job_pid=$!
>      nbd_server_wait_for_tcp_socket $!
>  }
> -- 
> 1.8.3.1
>
Roman Kagan June 17, 2019, 12:45 p.m. UTC | #3
On Thu, Jun 13, 2019 at 12:59:53PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 11.06.2019 21:02, Andrey Shinkevich wrote:
> > Processes are dying harder under the Valgring. It results in counting
> > the dying process as a newborn one. Make it sure that old NBD job get
> > finished before starting a new one.
> > 
> > Suggested-by: Roman Kagan <rkagan@virtuozzo.com>
> > Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> > ---
> >   tests/qemu-iotests/common.nbd | 6 ++++++
> >   1 file changed, 6 insertions(+)
> > 
> > diff --git a/tests/qemu-iotests/common.nbd b/tests/qemu-iotests/common.nbd
> > index 25fc9ff..e3dcc60 100644
> > --- a/tests/qemu-iotests/common.nbd
> > +++ b/tests/qemu-iotests/common.nbd
> > @@ -22,6 +22,7 @@
> >   nbd_unix_socket="${TEST_DIR}/qemu-nbd.sock"
> >   nbd_tcp_addr="127.0.0.1"
> >   nbd_pid_file="${TEST_DIR}/qemu-nbd.pid"
> > +nbd_job_pid=""
> >   
> >   nbd_server_stop()
> >   {
> > @@ -33,6 +34,9 @@ nbd_server_stop()
> >               kill "$NBD_PID"
> >           fi
> >       fi
> 
> Honestly, I don't understand the problem from commit message, but anyway comment
> should be added here, to mark that this is for valgrind... But you don't check for
> VALGRIND enabled.. I it intentional?

It is.  The problem this patch fixes exists regardless of valgrind.
valgrind just makes it easier to notice.  See my reply to the patch
itself.

Roman.

> 
> > +    if [ -n "$nbd_job_pid" ] && kill -s 0 "$nbd_job_pid" 2>/dev/null; then
> > +        wait "$nbd_job_pid"
> > +    fi
> >       rm -f "$nbd_unix_socket"
> >   }
> >   
> > @@ -61,6 +65,7 @@ nbd_server_start_unix_socket()
> >   {
> >       nbd_server_stop
> >       $QEMU_NBD -v -t -k "$nbd_unix_socket" "$@" &
> > +    nbd_job_pid=$!
> >       nbd_server_wait_for_unix_socket $!
> >   }
> >   
> > @@ -105,5 +110,6 @@ nbd_server_start_tcp_socket()
> >   {
> >       nbd_server_stop
> >       $QEMU_NBD -v -t -b $nbd_tcp_addr -p $nbd_tcp_port "$@" &
> > +    nbd_job_pid=$!
> >       nbd_server_wait_for_tcp_socket $!
> >   }
> > 
> 
> 
> -- 
> Best regards,
> Vladimir
diff mbox series

Patch

diff --git a/tests/qemu-iotests/common.nbd b/tests/qemu-iotests/common.nbd
index 25fc9ff..e3dcc60 100644
--- a/tests/qemu-iotests/common.nbd
+++ b/tests/qemu-iotests/common.nbd
@@ -22,6 +22,7 @@ 
 nbd_unix_socket="${TEST_DIR}/qemu-nbd.sock"
 nbd_tcp_addr="127.0.0.1"
 nbd_pid_file="${TEST_DIR}/qemu-nbd.pid"
+nbd_job_pid=""
 
 nbd_server_stop()
 {
@@ -33,6 +34,9 @@  nbd_server_stop()
             kill "$NBD_PID"
         fi
     fi
+    if [ -n "$nbd_job_pid" ] && kill -s 0 "$nbd_job_pid" 2>/dev/null; then
+        wait "$nbd_job_pid"
+    fi
     rm -f "$nbd_unix_socket"
 }
 
@@ -61,6 +65,7 @@  nbd_server_start_unix_socket()
 {
     nbd_server_stop
     $QEMU_NBD -v -t -k "$nbd_unix_socket" "$@" &
+    nbd_job_pid=$!
     nbd_server_wait_for_unix_socket $!
 }
 
@@ -105,5 +110,6 @@  nbd_server_start_tcp_socket()
 {
     nbd_server_stop
     $QEMU_NBD -v -t -b $nbd_tcp_addr -p $nbd_tcp_port "$@" &
+    nbd_job_pid=$!
     nbd_server_wait_for_tcp_socket $!
 }