diff mbox series

[LEDE-DEV] procd: Restore respawn on SIGTERM timeout

Message ID 20171019130215.31354-1-kristian.evensen@gmail.com
State New
Delegated to: John Crispin
Headers show
Series [LEDE-DEV] procd: Restore respawn on SIGTERM timeout | expand

Commit Message

Kristian Evensen Oct. 19, 2017, 1:02 p.m. UTC
When SIGTERM times out, procd sends SIGKILL and then restarts the
process once SIGCHLD has been received. This all works fine, with one
exception - respawn is not restored when instance_start() is called from
instance_exit(). The reason is that respawn is always set to false in
instance_stop(), and the same service_instance struct is used for the
instance_start()-call.

The consequence is that if the process is killed/crashes again, it will
not respawn. Solve this issue by adding a variable used to store the
original value of respawn in instance_stop(), and then restore the
original respawn-value in instance_exit().

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
---
 service/instance.c | 6 ++++--
 service/instance.h | 1 +
 2 files changed, 5 insertions(+), 2 deletions(-)

Comments

Karl Palsson Oct. 19, 2017, 4:24 p.m. UTC | #1
Kristian Evensen <kristian.evensen@gmail.com> wrote:
> When SIGTERM times out, procd sends SIGKILL and then restarts
> the process once SIGCHLD has been received. This all works
> fine, with one exception - respawn is not restored when
> instance_start() is called from instance_exit(). The reason is
> that respawn is always set to false in instance_stop(), and the
> same service_instance struct is used for the
> instance_start()-call.
> 
> The consequence is that if the process is killed/crashes again,
> it will not respawn. Solve this issue by adding a variable used
> to store the original value of respawn in instance_stop(), and
> then restore the original respawn-value in instance_exit().

It smells like this likely applies to many other fields. Is there
a path here that's not using the copy/compare routines for a
service/instance? Should they be? Does your path even restore all
the parameters of respawn?

Cheers,
Karl P


> 
> Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
> ---
>  service/instance.c | 6 ++++--
>  service/instance.h | 1 +
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/service/instance.c b/service/instance.c
> index b7cb523..76c74ed 100644
> --- a/service/instance.c
> +++ b/service/instance.c
> @@ -532,9 +532,10 @@ instance_exit(struct uloop_process *p, int ret)
>  
>  	if (in->halt) {
>  		instance_removepid(in);
> -		if (in->restart)
> +		if (in->restart) {
> +			in->respawn = in->respawn_org;
>  			instance_start(in);
> -		else {
> +		} else {
>  			struct service *s = in->srv;
>  
>  			avl_delete(&s->instances.avl, &in->node.avl);
> @@ -567,6 +568,7 @@ instance_stop(struct service_instance *in, bool halt)
>  	if (!in->proc.pending)
>  		return;
>  	in->halt = halt;
> +	in->respawn_org = in->respawn;
>  	in->restart = in->respawn = false;
>  	kill(in->proc.pid, SIGTERM);
>  	uloop_timeout_set(&in->timeout, in->term_timeout * 1000);
> diff --git a/service/instance.h b/service/instance.h
> index bdd14de..a0ac302 100644
> --- a/service/instance.h
> +++ b/service/instance.h
> @@ -48,6 +48,7 @@ struct service_instance {
>  	bool halt;
>  	bool restart;
>  	bool respawn;
> +	bool respawn_org;
>  	int respawn_count;
>  	int reload_signal;
>  	struct timespec start;
> -- 
> 2.11.0
> 
> 
> _______________________________________________
> Lede-dev mailing list
> Lede-dev@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/lede-dev
Kristian Evensen Nov. 28, 2017, 7:17 p.m. UTC | #2
Hi Karl,

Sorry for my extremely late reply. For some reason, it was not picked
up by Gmail and I did not see it before today. Since it was not picked
up by Gmail, I had to do some creative c&p, sorry in advance for weird
formatting.

>>Kristian Evensen <kristian.evensen@gmail.com> wrote:
>> When SIGTERM times out, procd sends SIGKILL and then restarts
>> the process once SIGCHLD has been received. This all works
>> fine, with one exception - respawn is not restored when
>> instance_start() is called from instance_exit(). The reason is
>> that respawn is always set to false in instance_stop(), and the
>> same service_instance struct is used for the
>> instance_start()-call.
>>
>> The consequence is that if the process is killed/crashes again,
>> it will not respawn. Solve this issue by adding a variable used
>> to store the original value of respawn in instance_stop(), and
>> then restore the original respawn-value in instance_exit().
>
> It smells like this likely applies to many other fields. Is there
> a path here that's not using the copy/compare routines for a
> service/instance? Should they be? Does your path even restore all
> the parameters of respawn?

As far as I can tell (and based on my tests), only in->respawn is
overwritten and not recovered. All other fields keep their value. The
rc.local restart-operation seems to be a stop() and then a
start()-call, so in->restart is correctly set to false.

BR,
Kristian
Kristian Evensen Jan. 4, 2018, 12:17 p.m. UTC | #3
On Thu, Oct 19, 2017 at 3:02 PM, Kristian Evensen
<kristian.evensen@gmail.com> wrote:
> When SIGTERM times out, procd sends SIGKILL and then restarts the
> process once SIGCHLD has been received. This all works fine, with one
> exception - respawn is not restored when instance_start() is called from
> instance_exit(). The reason is that respawn is always set to false in
> instance_stop(), and the same service_instance struct is used for the
> instance_start()-call.
>
> The consequence is that if the process is killed/crashes again, it will
> not respawn. Solve this issue by adding a variable used to store the
> original value of respawn in instance_stop(), and then restore the
> original respawn-value in instance_exit().
>
> Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
> ---

Ping on this patch :)

-Kristian
John Crispin Jan. 4, 2018, 12:49 p.m. UTC | #4
On 04/01/18 13:17, Kristian Evensen wrote:
> On Thu, Oct 19, 2017 at 3:02 PM, Kristian Evensen
> <kristian.evensen@gmail.com> wrote:
>> When SIGTERM times out, procd sends SIGKILL and then restarts the
>> process once SIGCHLD has been received. This all works fine, with one
>> exception - respawn is not restored when instance_start() is called from
>> instance_exit(). The reason is that respawn is always set to false in
>> instance_stop(), and the same service_instance struct is used for the
>> instance_start()-call.
>>
>> The consequence is that if the process is killed/crashes again, it will
>> not respawn. Solve this issue by adding a variable used to store the
>> original value of respawn in instance_stop(), and then restore the
>> original respawn-value in instance_exit().
>>
>> Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
>> ---
> Ping on this patch :)
>
> -Kristian
>
Hi Kristian,

its still on my todo list. the bug is confirmed i just dont like the 
solution. however i have so far failed to come up with a better more 
generic one. i'll give it another shot the next few days

     John
diff mbox series

Patch

diff --git a/service/instance.c b/service/instance.c
index b7cb523..76c74ed 100644
--- a/service/instance.c
+++ b/service/instance.c
@@ -532,9 +532,10 @@  instance_exit(struct uloop_process *p, int ret)
 
 	if (in->halt) {
 		instance_removepid(in);
-		if (in->restart)
+		if (in->restart) {
+			in->respawn = in->respawn_org;
 			instance_start(in);
-		else {
+		} else {
 			struct service *s = in->srv;
 
 			avl_delete(&s->instances.avl, &in->node.avl);
@@ -567,6 +568,7 @@  instance_stop(struct service_instance *in, bool halt)
 	if (!in->proc.pending)
 		return;
 	in->halt = halt;
+	in->respawn_org = in->respawn;
 	in->restart = in->respawn = false;
 	kill(in->proc.pid, SIGTERM);
 	uloop_timeout_set(&in->timeout, in->term_timeout * 1000);
diff --git a/service/instance.h b/service/instance.h
index bdd14de..a0ac302 100644
--- a/service/instance.h
+++ b/service/instance.h
@@ -48,6 +48,7 @@  struct service_instance {
 	bool halt;
 	bool restart;
 	bool respawn;
+	bool respawn_org;
 	int respawn_count;
 	int reload_signal;
 	struct timespec start;