Message ID | 20240321054146.2303245-2-dominique.martinet@atmark-techno.com |
---|---|
State | Accepted |
Headers | show |
Series | [1/2] network_thread: block SIGPIPE in network thread | expand |
Hi Dominique, On 21.03.24 06:41, Dominique Martinet wrote: > The process notification thread gets messages from swupdate to submit > them to the hawkbit server. > > Before this patch, ipc_get_status_timeout() was called in a tight loop, > probably with the assumption that it would wait until the timeout > happens if no message was available. Correct. > Unfortunately, the server just replies immediately that there is no > message, so this would just hammer the server in a tight loop. > > Furthermore, on slow devices if the server takes more than 100ms to > reply we close our socket so the server would crash to sigpipe without > the previous fix. Thanks for the previous fix, I agree there is a race and the patch fixes it. > > Adjust this loop to: > - since ipc_get_status() always returns immediately the timeout is > useless, use the version without timeout. > - wait 100ms before calling ipc_get_status() again if no message was > available. It looks to me safe. Acked-by: Stefano Babic <stefano.babic@swupdate.org> Best regards, Stefano > > Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com> > --- > suricatta/server_hawkbit.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/suricatta/server_hawkbit.c b/suricatta/server_hawkbit.c > index 1633c5cca65b..7ceb1dc038fd 100644 > --- a/suricatta/server_hawkbit.c > +++ b/suricatta/server_hawkbit.c > @@ -997,7 +997,7 @@ static void *process_notification_thread(void *data) > for (;;) { > ipc_message msg; > bool data_avail = false; > - int ret = ipc_get_status_timeout(&msg, 100); > + int ret = ipc_get_status(&msg); > > data_avail = ret > 0 && (strlen(msg.data.status.desc) != 0); > > @@ -1060,6 +1060,10 @@ static void *process_notification_thread(void *data) > > if (stop && !data_avail) > break; > + > + // wait a bit for next message... > + if (!data_avail) > + usleep(100000); > } > > pthread_mutex_unlock(¬ifylock);
Stefano Babic wrote on Thu, Mar 21, 2024 at 04:23:33PM +0100: > Hi Dominique, Thanks for the prompt review! > > Adjust this loop to: > > - since ipc_get_status() always returns immediately the timeout is > > useless, use the version without timeout. Just an extra comment here: I got pointed out during internal review that this was the last user of ipc_get_status_timeout() in tree. Given that function is user-facing (network_ipc) I don't think we should remove it in this patch, but we might want to consider if it's useful to keep and mark it deprecated (__attribute__((deprecated)) or equivalent?) in a follow-up commit as we don't have any waiting-style ipc as far as I can see, so leaving it might cause more misunderstandings like we had here.
Hi Dominique, On 25.03.24 02:32, Dominique Martinet wrote: > Stefano Babic wrote on Thu, Mar 21, 2024 at 04:23:33PM +0100: >> Hi Dominique, > > Thanks for the prompt review! > >>> Adjust this loop to: >>> - since ipc_get_status() always returns immediately the timeout is >>> useless, use the version without timeout. > > Just an extra comment here: I got pointed out during internal review > that this was the last user of ipc_get_status_timeout() in tree. > > Given that function is user-facing (network_ipc) I don't think we should > remove it in this patch, but we might want to consider if it's useful to > keep and mark it deprecated (__attribute__((deprecated)) or equivalent?) Well, the whole interface is mostly deprecated (for external usage) and replaced by the progress interface. But last one requires more for the integrator, as it should runs it as background task, it is event driven, etc. So older application should be ported to use this interface, and it is not what many want. I won't remove function from the API to avoid breakages and to remain backward compatible, everyone should decide the most suitable way for the own project. Best regards, Stefano > in a follow-up commit as we don't have any waiting-style ipc as far as I > can see, so leaving it might cause more misunderstandings like we had > here. >
diff --git a/suricatta/server_hawkbit.c b/suricatta/server_hawkbit.c index 1633c5cca65b..7ceb1dc038fd 100644 --- a/suricatta/server_hawkbit.c +++ b/suricatta/server_hawkbit.c @@ -997,7 +997,7 @@ static void *process_notification_thread(void *data) for (;;) { ipc_message msg; bool data_avail = false; - int ret = ipc_get_status_timeout(&msg, 100); + int ret = ipc_get_status(&msg); data_avail = ret > 0 && (strlen(msg.data.status.desc) != 0); @@ -1060,6 +1060,10 @@ static void *process_notification_thread(void *data) if (stop && !data_avail) break; + + // wait a bit for next message... + if (!data_avail) + usleep(100000); } pthread_mutex_unlock(¬ifylock);
The process notification thread gets messages from swupdate to submit them to the hawkbit server. Before this patch, ipc_get_status_timeout() was called in a tight loop, probably with the assumption that it would wait until the timeout happens if no message was available. Unfortunately, the server just replies immediately that there is no message, so this would just hammer the server in a tight loop. Furthermore, on slow devices if the server takes more than 100ms to reply we close our socket so the server would crash to sigpipe without the previous fix. Adjust this loop to: - since ipc_get_status() always returns immediately the timeout is useless, use the version without timeout. - wait 100ms before calling ipc_get_status() again if no message was available. Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com> --- suricatta/server_hawkbit.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)