diff mbox series

[1/1] qemu-nbd: fix regression with qemu-nbd --fork run over ssh

Message ID 20230706191545.130087-1-den@openvz.org
State New
Headers show
Series [1/1] qemu-nbd: fix regression with qemu-nbd --fork run over ssh | expand

Commit Message

Denis V. Lunev July 6, 2023, 7:15 p.m. UTC
Commit e6df58a5578fee7a50bbf36f4a50a2781cff855d
    Author: Hanna Reitz <hreitz@redhat.com>
    Date:   Wed May 8 23:18:18 2019 +0200
    qemu-nbd: Do not close stderr
has introduced an interesting regression. Original behavior of
    ssh somehost qemu-nbd /home/den/tmp/file -f raw --fork
was the following:
 * qemu-nbd was started as a daemon
 * the command execution is done and ssh exited with success

The patch has changed this behavior and 'ssh' command now hangs forever.

According to the normal specification of the daemon() call, we should
endup with STDERR pointing to /dev/null. That should be done at the
very end of the successful startup sequence when the pipe to the
bootstrap process (used for diagnostics) is no longer needed.

This could be achived in the same way as done for 'qemu-nbd -c' case.
STDOUT copying to STDERR does the trick.

This also leads to proper 'ssh' connection closing which fixes my
original problem.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Eric Blake <eblake@redhat.com>
CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
CC: Hanna Reitz <hreitz@redhat.com>
---
 qemu-nbd.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

Comments

Denis V. Lunev July 14, 2023, 10:32 a.m. UTC | #1
On 7/6/23 21:15, Denis V. Lunev wrote:
> Commit e6df58a5578fee7a50bbf36f4a50a2781cff855d
>      Author: Hanna Reitz <hreitz@redhat.com>
>      Date:   Wed May 8 23:18:18 2019 +0200
>      qemu-nbd: Do not close stderr
> has introduced an interesting regression. Original behavior of
>      ssh somehost qemu-nbd /home/den/tmp/file -f raw --fork
> was the following:
>   * qemu-nbd was started as a daemon
>   * the command execution is done and ssh exited with success
>
> The patch has changed this behavior and 'ssh' command now hangs forever.
>
> According to the normal specification of the daemon() call, we should
> endup with STDERR pointing to /dev/null. That should be done at the
> very end of the successful startup sequence when the pipe to the
> bootstrap process (used for diagnostics) is no longer needed.
>
> This could be achived in the same way as done for 'qemu-nbd -c' case.
> STDOUT copying to STDERR does the trick.
>
> This also leads to proper 'ssh' connection closing which fixes my
> original problem.
>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Eric Blake <eblake@redhat.com>
> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> CC: Hanna Reitz <hreitz@redhat.com>
> ---
>   qemu-nbd.c | 9 +--------
>   1 file changed, 1 insertion(+), 8 deletions(-)
>
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index 4276163564..e9e118dfdb 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -575,7 +575,6 @@ int main(int argc, char **argv)
>       bool writethrough = false; /* Client will flush as needed. */
>       bool fork_process = false;
>       bool list = false;
> -    int old_stderr = -1;
>       unsigned socket_activation;
>       const char *pid_file_name = NULL;
>       const char *selinux_label = NULL;
> @@ -930,11 +929,6 @@ int main(int argc, char **argv)
>           } else if (pid == 0) {
>               close(stderr_fd[0]);
>   
> -            /* Remember parent's stderr if we will be restoring it. */
> -            if (fork_process) {
> -                old_stderr = dup(STDERR_FILENO);
> -            }
> -
>               ret = qemu_daemon(1, 0);
>   
>               /* Temporarily redirect stderr to the parent's pipe...  */
> @@ -1152,8 +1146,7 @@ int main(int argc, char **argv)
>       }
>   
>       if (fork_process) {
> -        dup2(old_stderr, STDERR_FILENO);
> -        close(old_stderr);
> +        dup2(STDOUT_FILENO, STDERR_FILENO);
>       }
>   
>       state = RUNNING;
ping
Eric Blake July 14, 2023, 3:35 p.m. UTC | #2
On Thu, Jul 06, 2023 at 09:15:45PM +0200, Denis V. Lunev wrote:
> Commit e6df58a5578fee7a50bbf36f4a50a2781cff855d
>     Author: Hanna Reitz <hreitz@redhat.com>
>     Date:   Wed May 8 23:18:18 2019 +0200
>     qemu-nbd: Do not close stderr
> has introduced an interesting regression. Original behavior of
>     ssh somehost qemu-nbd /home/den/tmp/file -f raw --fork
> was the following:
>  * qemu-nbd was started as a daemon
>  * the command execution is done and ssh exited with success
> 
> The patch has changed this behavior and 'ssh' command now hangs forever.
> 
> According to the normal specification of the daemon() call, we should
> endup with STDERR pointing to /dev/null. That should be done at the
> very end of the successful startup sequence when the pipe to the
> bootstrap process (used for diagnostics) is no longer needed.
> 
> This could be achived in the same way as done for 'qemu-nbd -c' case.

That was commit 0eaf453e, also fixing up e6df58a5.

> STDOUT copying to STDERR does the trick.

It took me a while to see how this worked (the double-negative of
passing 0 to the 'noclose' parameter of daemon() doesn't make it easy
to track what is supposed to happen), but I agree that our goal is
that after daemon()izing, we temporarily set stderr to the inherited
pipe for passing back any startup error messages, then usually want to
restore it to /dev/null for the remainder of the process.

> 
> This also leads to proper 'ssh' connection closing which fixes my
> original problem.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Eric Blake <eblake@redhat.com>
> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> CC: Hanna Reitz <hreitz@redhat.com>
> ---
>  qemu-nbd.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)

I indeed see how this fixes a regression under 'fork_process'.
However, the code that calls qemu_daemon() is also reachable under the
condition 'device && !verbose'.  Does it make sense for either:

qemu-nbd -v -c /dev/nbd0
qemu-nbd -f -v -c /dev/nbd0

as a way to connect to the kernel device, but explicitly ask to remain
verbose, as a way to debug issues in connecting to the kernel via
stderr?

Going back to the emails at the time of Hanna's original commit...

https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01872.html

I don't see any consideration about that case; capturing the original
stderr was done to fix what looked like an easy bug fix to a botched
implementation of old_stderr in commit ffb31e1d without considering
the ramifications.

But seeing as how pre-existing code DID want at least 'qemu-nbd -v -c
/dev/nbd0' to log indefinitely, I think we need to squash in:

diff --git i/qemu-nbd.c w/qemu-nbd.c
index 4276163564b..4d037798b9b 100644
--- i/qemu-nbd.c
+++ w/qemu-nbd.c
@@ -313,7 +313,7 @@ static void *nbd_client_thread(void *arg)
     /* update partition table */
     pthread_create(&show_parts_thread, NULL, show_parts, device);

-    if (verbose) {
+    if (verbose && !fork_process) {
         fprintf(stderr, "NBD device %s is now connected to %s\n",
                 device, srcpath);
     } else {


With that tweak, I'm fine with adding:

Reviewed-by: Eric Blake <eblake@redhat.com>

Do you agree with my tweak?  If so, I can queue this through my NBD
tree without needing to see a v2 post.  However...

> 
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index 4276163564..e9e118dfdb 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -575,7 +575,6 @@ int main(int argc, char **argv)
>      bool writethrough = false; /* Client will flush as needed. */
>      bool fork_process = false;
>      bool list = false;
> -    int old_stderr = -1;
>      unsigned socket_activation;
>      const char *pid_file_name = NULL;
>      const char *selinux_label = NULL;
> @@ -930,11 +929,6 @@ int main(int argc, char **argv)
>          } else if (pid == 0) {
>              close(stderr_fd[0]);
>  
> -            /* Remember parent's stderr if we will be restoring it. */
> -            if (fork_process) {
> -                old_stderr = dup(STDERR_FILENO);
> -            }
> -
>              ret = qemu_daemon(1, 0);
>  
>              /* Temporarily redirect stderr to the parent's pipe...  */
> @@ -1152,8 +1146,7 @@ int main(int argc, char **argv)

Pre-existing, but your patch made me notice nearby (minor corner-case)
bugs:

            ret = qemu_daemon(1, 0);

            /* Temporarily redirect stderr to the parent's pipe...  */
            dup2(stderr_fd[1], STDERR_FILENO);

We aren't checking for dup2() failure.  But even if it succeeds (which
we expect), it could have modified the contents of errno compared to
what they were after qemu_daemon()...

            if (ret < 0) {
                error_report("Failed to daemonize: %s", strerror(errno));

...so this could be reporting garbage.  I wouldn't mind additional
patches to make the code more robust against unlikely failure
scenarios.
Denis V. Lunev July 14, 2023, 4:55 p.m. UTC | #3
On 7/14/23 17:35, Eric Blake wrote:
> On Thu, Jul 06, 2023 at 09:15:45PM +0200, Denis V. Lunev wrote:
>> Commit e6df58a5578fee7a50bbf36f4a50a2781cff855d
>>      Author: Hanna Reitz <hreitz@redhat.com>
>>      Date:   Wed May 8 23:18:18 2019 +0200
>>      qemu-nbd: Do not close stderr
>> has introduced an interesting regression. Original behavior of
>>      ssh somehost qemu-nbd /home/den/tmp/file -f raw --fork
>> was the following:
>>   * qemu-nbd was started as a daemon
>>   * the command execution is done and ssh exited with success
>>
>> The patch has changed this behavior and 'ssh' command now hangs forever.
>>
>> According to the normal specification of the daemon() call, we should
>> endup with STDERR pointing to /dev/null. That should be done at the
>> very end of the successful startup sequence when the pipe to the
>> bootstrap process (used for diagnostics) is no longer needed.
>>
>> This could be achived in the same way as done for 'qemu-nbd -c' case.
> That was commit 0eaf453e, also fixing up e6df58a5.
>
>> STDOUT copying to STDERR does the trick.
> It took me a while to see how this worked (the double-negative of
> passing 0 to the 'noclose' parameter of daemon() doesn't make it easy
> to track what is supposed to happen), but I agree that our goal is
> that after daemon()izing, we temporarily set stderr to the inherited
> pipe for passing back any startup error messages, then usually want to
> restore it to /dev/null for the remainder of the process.
>
>> This also leads to proper 'ssh' connection closing which fixes my
>> original problem.
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Eric Blake <eblake@redhat.com>
>> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> CC: Hanna Reitz <hreitz@redhat.com>
>> ---
>>   qemu-nbd.c | 9 +--------
>>   1 file changed, 1 insertion(+), 8 deletions(-)
> I indeed see how this fixes a regression under 'fork_process'.
> However, the code that calls qemu_daemon() is also reachable under the
> condition 'device && !verbose'.  Does it make sense for either:
>
> qemu-nbd -v -c /dev/nbd0
> qemu-nbd -f -v -c /dev/nbd0
>
> as a way to connect to the kernel device, but explicitly ask to remain
> verbose, as a way to debug issues in connecting to the kernel via
> stderr?
>
> Going back to the emails at the time of Hanna's original commit...
>
> https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01872.html
>
> I don't see any consideration about that case; capturing the original
> stderr was done to fix what looked like an easy bug fix to a botched
> implementation of old_stderr in commit ffb31e1d without considering
> the ramifications.
>
> But seeing as how pre-existing code DID want at least 'qemu-nbd -v -c
> /dev/nbd0' to log indefinitely, I think we need to squash in:
>
> diff --git i/qemu-nbd.c w/qemu-nbd.c
> index 4276163564b..4d037798b9b 100644
> --- i/qemu-nbd.c
> +++ w/qemu-nbd.c
> @@ -313,7 +313,7 @@ static void *nbd_client_thread(void *arg)
>       /* update partition table */
>       pthread_create(&show_parts_thread, NULL, show_parts, device);
>
> -    if (verbose) {
> +    if (verbose && !fork_process) {
>           fprintf(stderr, "NBD device %s is now connected to %s\n",
>                   device, srcpath);
>       } else {
>
>
> With that tweak, I'm fine with adding:
>
> Reviewed-by: Eric Blake <eblake@redhat.com>
>
> Do you agree with my tweak?  If so, I can queue this through my NBD
> tree without needing to see a v2 post.  However...
I like this fix. Thanks for pointing this out.


>> diff --git a/qemu-nbd.c b/qemu-nbd.c
>> index 4276163564..e9e118dfdb 100644
>> --- a/qemu-nbd.c
>> +++ b/qemu-nbd.c
>> @@ -575,7 +575,6 @@ int main(int argc, char **argv)
>>       bool writethrough = false; /* Client will flush as needed. */
>>       bool fork_process = false;
>>       bool list = false;
>> -    int old_stderr = -1;
>>       unsigned socket_activation;
>>       const char *pid_file_name = NULL;
>>       const char *selinux_label = NULL;
>> @@ -930,11 +929,6 @@ int main(int argc, char **argv)
>>           } else if (pid == 0) {
>>               close(stderr_fd[0]);
>>   
>> -            /* Remember parent's stderr if we will be restoring it. */
>> -            if (fork_process) {
>> -                old_stderr = dup(STDERR_FILENO);
>> -            }
>> -
>>               ret = qemu_daemon(1, 0);
>>   
>>               /* Temporarily redirect stderr to the parent's pipe...  */
>> @@ -1152,8 +1146,7 @@ int main(int argc, char **argv)
> Pre-existing, but your patch made me notice nearby (minor corner-case)
> bugs:
>
>              ret = qemu_daemon(1, 0);
>
>              /* Temporarily redirect stderr to the parent's pipe...  */
>              dup2(stderr_fd[1], STDERR_FILENO);
>
> We aren't checking for dup2() failure.  But even if it succeeds (which
> we expect), it could have modified the contents of errno compared to
> what they were after qemu_daemon()...
>
>              if (ret < 0) {
>                  error_report("Failed to daemonize: %s", strerror(errno));
>
> ...so this could be reporting garbage.  I wouldn't mind additional
> patches to make the code more robust against unlikely failure
> scenarios.
>
OK. Will do. Hope that I'll not forget this again.

Den
Denis V. Lunev July 17, 2023, 1:44 p.m. UTC | #4
On 7/14/23 17:35, Eric Blake wrote:
> On Thu, Jul 06, 2023 at 09:15:45PM +0200, Denis V. Lunev wrote:
>> Commit e6df58a5578fee7a50bbf36f4a50a2781cff855d
>>      Author: Hanna Reitz <hreitz@redhat.com>
>>      Date:   Wed May 8 23:18:18 2019 +0200
>>      qemu-nbd: Do not close stderr
>> has introduced an interesting regression. Original behavior of
>>      ssh somehost qemu-nbd /home/den/tmp/file -f raw --fork
>> was the following:
>>   * qemu-nbd was started as a daemon
>>   * the command execution is done and ssh exited with success
>>
>> The patch has changed this behavior and 'ssh' command now hangs forever.
>>
>> According to the normal specification of the daemon() call, we should
>> endup with STDERR pointing to /dev/null. That should be done at the
>> very end of the successful startup sequence when the pipe to the
>> bootstrap process (used for diagnostics) is no longer needed.
>>
>> This could be achived in the same way as done for 'qemu-nbd -c' case.
> That was commit 0eaf453e, also fixing up e6df58a5.
>
>> STDOUT copying to STDERR does the trick.
> It took me a while to see how this worked (the double-negative of
> passing 0 to the 'noclose' parameter of daemon() doesn't make it easy
> to track what is supposed to happen), but I agree that our goal is
> that after daemon()izing, we temporarily set stderr to the inherited
> pipe for passing back any startup error messages, then usually want to
> restore it to /dev/null for the remainder of the process.
>
>> This also leads to proper 'ssh' connection closing which fixes my
>> original problem.
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Eric Blake <eblake@redhat.com>
>> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> CC: Hanna Reitz <hreitz@redhat.com>
>> ---
>>   qemu-nbd.c | 9 +--------
>>   1 file changed, 1 insertion(+), 8 deletions(-)
> I indeed see how this fixes a regression under 'fork_process'.
> However, the code that calls qemu_daemon() is also reachable under the
> condition 'device && !verbose'.  Does it make sense for either:
>
> qemu-nbd -v -c /dev/nbd0
> qemu-nbd -f -v -c /dev/nbd0
>
> as a way to connect to the kernel device, but explicitly ask to remain
> verbose, as a way to debug issues in connecting to the kernel via
> stderr?
>
> Going back to the emails at the time of Hanna's original commit...
>
> https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01872.html
>
> I don't see any consideration about that case; capturing the original
> stderr was done to fix what looked like an easy bug fix to a botched
> implementation of old_stderr in commit ffb31e1d without considering
> the ramifications.
>
> But seeing as how pre-existing code DID want at least 'qemu-nbd -v -c
> /dev/nbd0' to log indefinitely, I think we need to squash in:
>
> diff --git i/qemu-nbd.c w/qemu-nbd.c
> index 4276163564b..4d037798b9b 100644
> --- i/qemu-nbd.c
> +++ w/qemu-nbd.c
> @@ -313,7 +313,7 @@ static void *nbd_client_thread(void *arg)
>       /* update partition table */
>       pthread_create(&show_parts_thread, NULL, show_parts, device);
>
> -    if (verbose) {
> +    if (verbose && !fork_process) {
>           fprintf(stderr, "NBD device %s is now connected to %s\n",
>                   device, srcpath);
>       } else {
>
>
> With that tweak, I'm fine with adding:
>
> Reviewed-by: Eric Blake <eblake@redhat.com>
>
> Do you agree with my tweak?  If so, I can queue this through my NBD
> tree without needing to see a v2 post.  However...
>
I have started to fix error paths and it is revealed
that this tricks needs additional dances in order
to force the code to compile.

I have done that and will send corrected patch
plus error paths cleanups combined after tests today.

Thank you in advance,
     Den
diff mbox series

Patch

diff --git a/qemu-nbd.c b/qemu-nbd.c
index 4276163564..e9e118dfdb 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -575,7 +575,6 @@  int main(int argc, char **argv)
     bool writethrough = false; /* Client will flush as needed. */
     bool fork_process = false;
     bool list = false;
-    int old_stderr = -1;
     unsigned socket_activation;
     const char *pid_file_name = NULL;
     const char *selinux_label = NULL;
@@ -930,11 +929,6 @@  int main(int argc, char **argv)
         } else if (pid == 0) {
             close(stderr_fd[0]);
 
-            /* Remember parent's stderr if we will be restoring it. */
-            if (fork_process) {
-                old_stderr = dup(STDERR_FILENO);
-            }
-
             ret = qemu_daemon(1, 0);
 
             /* Temporarily redirect stderr to the parent's pipe...  */
@@ -1152,8 +1146,7 @@  int main(int argc, char **argv)
     }
 
     if (fork_process) {
-        dup2(old_stderr, STDERR_FILENO);
-        close(old_stderr);
+        dup2(STDOUT_FILENO, STDERR_FILENO);
     }
 
     state = RUNNING;