diff mbox series

[RFC] ipc/swupdate_async_thread: check before writing to socket

Message ID 20220307092709.3306065-1-dominique.martinet@atmark-techno.com
State Changes Requested
Headers show
Series [RFC] ipc/swupdate_async_thread: check before writing to socket | expand

Commit Message

Dominique Martinet March 7, 2022, 9:27 a.m. UTC
swupdate-client could get stuck writing swu content to swupdate when
swupdate was itself busy sending other notifications and not reading its
own socket.

Use select() to make sure the socket is writable before attempting to
write, and keep waiting for more input instead.

Link: https://groups.google.com/g/swupdate/c/iWEgNaZ46uw/m/wQRTa9MsAwAJ
Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
---
Notes:
 - I've seen your comment about using exit() in here, this is just a
quick version so you can test, we should be consistent with that so I'm
waiting for your fix on this
 - It's still possible the 'wr' function will read too much of the swu,
that the socket is in writable state but we cannot write the full buffer
content... Async IO is complicated, and since we always check if the
write was full we cannot just make the socket non-blocking here for
continued writes.
Honestly after looking at this I'm tempted to agree this can be fully
rewritten and this is just a stopgap fix. I'm leaving the decision for
this to you.

Thanks!

 ipc/network_ipc-if.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

Comments

Stefano Babic March 7, 2022, 10:07 a.m. UTC | #1
Hi Dominique,

On 07.03.22 10:27, Dominique Martinet wrote:
> swupdate-client could get stuck writing swu content to swupdate when
> swupdate was itself busy sending other notifications and not reading its
> own socket.
> 
> Use select() to make sure the socket is writable before attempting to
> write, and keep waiting for more input instead.
> 
> Link: https://groups.google.com/g/swupdate/c/iWEgNaZ46uw/m/wQRTa9MsAwAJ
> Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
> ---
> Notes:
>   - I've seen your comment about using exit() in here, this is just a
> quick version so you can test, we should be consistent with that so I'm
> waiting for your fix on this
>   - It's still possible the 'wr' function will read too much of the swu,
> that the socket is in writable state but we cannot write the full buffer
> content... Async IO is complicated, and since we always check if the
> write was full we cannot just make the socket non-blocking here for
> continued writes.
> Honestly after looking at this I'm tempted to agree this can be fully
> rewritten and this is just a stopgap fix. I'm leaving the decision for
> this to you.
> 
> Thanks!
> 
>   ipc/network_ipc-if.c | 26 ++++++++++++++++++++++----
>   1 file changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/ipc/network_ipc-if.c b/ipc/network_ipc-if.c
> index 90c49709b0fb..202ecbf09ef3 100644
> --- a/ipc/network_ipc-if.c
> +++ b/ipc/network_ipc-if.c
> @@ -10,6 +10,7 @@
>   #include <string.h>
>   #include <errno.h>
>   #include <signal.h>
> +#include <sys/select.h>
>   #include <pthread.h>
>   #include "network_ipc.h"
>   
> @@ -37,7 +38,8 @@ static void *swupdate_async_thread(void *data)
>   	sigset_t saved_mask;
>   	struct timespec zerotime = {0, 0};
>   	struct async_lib *rq = (struct async_lib *)data;
> -	int notify_fd, ret;
> +	int notify_fd, ret, maxfd;
> +	fd_set rfds, wfds;
>   	ipc_message msg;
>   	msg.data.notify.status = RUN;
>   
> @@ -55,14 +57,30 @@ static void *swupdate_async_thread(void *data)
>   		exit(1);
>   	}
>   
> +	FD_ZERO(&rfds);
> +	FD_ZERO(&wfds);
> +	maxfd = (notify_fd > rq->connfd ? notify_fd : rq->connfd) + 1;
> +
>   	/* Start writing the image until end */
>   	do {
>   		if (!rq->wr)
>   			break;
>   
> -		rq->wr(&pbuf, &size);
> -		if (size)
> -			swupdate_image_write(pbuf, size);
> +		FD_SET(notify_fd, &rfds);
> +		FD_SET(rq->connfd, &wfds);
> +		ret = select(maxfd, &rfds, &wfds, NULL, NULL);
> +		if (ret < 0) {
> +			if (errno == EINTR)
> +				continue;
> +			perror("select");
> +			exit(1);
> +		}
> +
> +		if (FD_ISSET(rq->connfd, &wfds)) {
> +			rq->wr(&pbuf, &size);
> +			if (size)
> +				swupdate_image_write(pbuf, size);
> +		}

This would be nice if it works, but it cannot. In fact, select returns 
with success if something can be written, without any knowledge how many 
bytes can be written. If just one byte can be written, select() returns 
with success, but of course writing a bigger buffer can block again, and 
we will see again the deadlock.

Best regards,
Stefano

>   
>   		/* handle any notification coming */
>   		while ((ret = ipc_notify_receive(&notify_fd, &msg, 0))
Dominique Martinet March 7, 2022, 12:15 p.m. UTC | #2
Stefano Babic wrote on Mon, Mar 07, 2022 at 11:07:12AM +0100:
> This would be nice if it works, but it cannot. In fact, select returns with
> success if something can be written, without any knowledge how many bytes
> can be written. If just one byte can be written, select() returns with
> success, but of course writing a bigger buffer can block again, and we will
> see again the deadlock.

Yes and no: this is what I was referring to in my note, this will stop
working if swupdate-client's readimage is updated to use a bigger
buffer, but with 256 bytes there is no risk and it will always work, at
least for linux.

I'd need to spend more time that I'm willing to spend outside of working
hours to check, but with linux if select returns the fd as writable you
are guaranteed to be able to write at least so much data (experimentally
a bit over 200k), so with such tiny chunks there really is no risk.
I can check tomorrow if this is properly defined somewhere, and if this
holds true for BSDs as well.
(I think it'd make sense, but I'd prefer to have a proof)


That being said, I agree it's not something we want to rely on for the
long term, this is at most a stop-gap measure, that is more convenient
for me than restoring the race condition.
Well, I guess I can keep the patches locally so it doesn't really matter
at the end of the day, but I can guarantee it works on linux :)
Stefano Babic March 7, 2022, 1:01 p.m. UTC | #3
Hi Dominique,

On 07.03.22 13:15, Dominique Martinet wrote:
> Stefano Babic wrote on Mon, Mar 07, 2022 at 11:07:12AM +0100:
>> This would be nice if it works, but it cannot. In fact, select returns with
>> success if something can be written, without any knowledge how many bytes
>> can be written. If just one byte can be written, select() returns with
>> success, but of course writing a bigger buffer can block again, and we will
>> see again the deadlock.
> 
> Yes and no: this is what I was referring to in my note, this will stop
> working if swupdate-client's readimage is updated to use a bigger
> buffer, but with 256 bytes there is no risk and it will always work, at
> least for linux.

Well, swupdate-client is just a way to reproduce the deadlock. It is 
part of the library, and user will decide himself how big a buffer can 
be. We cannot force the buffer (user can have good reason to set it). It 
should work independently from buffer size.

> 
> I'd need to spend more time that I'm willing to spend outside of working
> hours to check, but with linux if select returns the fd as writable you
> are guaranteed to be able to write at least so much data (experimentally
> a bit over 200k), so with such tiny chunks there really is no risk.

As far as I know, the size of unix domain socket buffer is architecture 
dependent, and different between 32 and 64 bit. Even if true, I do not 
want to depend on a behavior that can change in future.

> I can check tomorrow if this is properly defined somewhere, and if this
> holds true for BSDs as well.
> (I think it'd make sense, but I'd prefer to have a proof)
> 
> 
> That being said, I agree it's not something we want to rely on for the
> long term,

Right

> this is at most a stop-gap measure, that is more convenient
> for me than restoring the race condition.
> Well, I guess I can keep the patches locally so it doesn't really matter
> at the end of the day, but I can guarantee it works on linux :)
> 

Ok - so let's say I will revert first the two patches, I fix, too, the 
issue I have found during debugging (I will send a series), and then I 
will take a look at the race condition you reported to check if there is 
another or similar way to fix it.

Best regards,
Stefano
Dominique Martinet March 7, 2022, 10:50 p.m. UTC | #4
Stefano Babic wrote on Mon, Mar 07, 2022 at 02:01:42PM +0100:
> > Yes and no: this is what I was referring to in my note, this will stop
> > working if swupdate-client's readimage is updated to use a bigger
> > buffer, but with 256 bytes there is no risk and it will always work, at
> > least for linux.
> 
> Well, swupdate-client is just a way to reproduce the deadlock. It is part of
> the library, and user will decide himself how big a buffer can be. We cannot
> force the buffer (user can have good reason to set it). It should work
> independently from buffer size.

Right.
The only proper solution I can think of for this would be to use
non-blocking writes there (it's possible by using send() with
MSG_DONTWAIT flag for example), and wouldn't really be hard to do in
this particular case since we have no other writer on this socket, but
it sure is ugly and since no other part of the code behaves like this I
didn't want to do that.


> > I'd need to spend more time that I'm willing to spend outside of working
> > hours to check, but with linux if select returns the fd as writable you
> > are guaranteed to be able to write at least so much data (experimentally
> > a bit over 200k), so with such tiny chunks there really is no risk.
> 
> As far as I know, the size of unix domain socket buffer is architecture
> dependent, and different between 32 and 64 bit. Even if true, I do not want
> to depend on a behavior that can change in future.

I was curious so I checked yesterday anyway -- on linux it is driven by
sock_writable():
        return refcount_read(&sk->sk_wmem_alloc) < (READ_ONCE(sk->sk_sndbuf) >> 1);

which I understand as "the writable buffer is at least half free":
sk_wmem_alloc is the amount of data currently commited and pending in
memory waiting for a reader, and sndbuf is the currently allocated send
buffer (which can grow up to net.core.wmem_max sysctl, I think)

So, this is very variable, and there is no hard figure we can rely on,
and I didn't checks with BSDs.

Anyway, this point is moot because we both agree this isn't correct to
rely on this behaviour.

> > this is at most a stop-gap measure, that is more convenient
> > for me than restoring the race condition.
> > Well, I guess I can keep the patches locally so it doesn't really matter
> > at the end of the day, but I can guarantee it works on linux :)
> > 
> 
> Ok - so let's say I will revert first the two patches, I fix, too, the issue
> I have found during debugging (I will send a series), and then I will take a
> look at the race condition you reported to check if there is another or
> similar way to fix it.

Ok, let's go with that for now.
If you can't come up with a better solution we can either spawn yet
another thread (which I agree is overengineering...) or I can rewrite
the select loop to use non-blocking sends instead, but I'll give you
time to have a look first.

Thanks!
Michael Adler March 8, 2022, 7:11 a.m. UTC | #5
Hi,

> The only proper solution I can think of for this would be to use non-blocking writes there (it's possible by using
> send() with MSG_DONTWAIT flag for example), and wouldn't really be hard to do in this particular case since we have no
> other writer on this socket, but it sure is ugly and since no other part of the code behaves like this I didn't want
> to do that.

I've been curiously following this thread since I've had a similar issue a couple of days ago: if there is a buggy IPC
progress client which does not read its incoming progress messages, SWUpdate hangs forever due to a blocking write
(send) operation. One could argue that this is good behavior because you have no other choice than to fix the faulty IPC
client. However, personally I would prefer robustness of the server and either skip the message for the client if its
queue is full or fail the update after a certain timeout (which makes finding the root cause easier).
So here's another potential candidate for non-blocking writes.

Kind regards,
  Michael
Dominique Martinet March 8, 2022, 8:32 a.m. UTC | #6
Hi Michael,

Thanks for the feedback!

Michael Adler wrote on Tue, Mar 08, 2022 at 08:11:16AM +0100:
> I've been curiously following this thread since I've had a similar issue a couple of days ago: if there is a buggy IPC
> progress client which does not read its incoming progress messages, SWUpdate hangs forever due to a blocking write
> (send) operation. One could argue that this is good behavior because you have no other choice than to fix the faulty IPC
> client. However, personally I would prefer robustness of the server and either skip the message for the client if its
> queue is full or fail the update after a certain timeout (which makes finding the root cause easier).
> So here's another potential candidate for non-blocking writes.

That's even harder: the server can send multiple different IPCs to a
given socket, so we can't just interrupt a notify IPC in the middle and
hope we'll be able to finish it before another IPC is started.

We'd need to keep a queue per client to at least finish the current IPC
if any is pending and a can't-be-dropped IPC has to be sent, or
something like that...


(I do agree on principle bad clients shouldn't hang the server, we
should protect ourselves from bad clients, it's just not easy the
current way swupdate is designed... I'm can try to find some time to
take a look next month at making the server more asynchronous in
general but I would understand if Stefano doesn't want me to break this
too much :))
Stefano Babic March 8, 2022, 9:08 a.m. UTC | #7
Hi Michael,

On 08.03.22 08:11, Michael Adler wrote:
> Hi,
> 
>> The only proper solution I can think of for this would be to use non-blocking writes there (it's possible by using
>> send() with MSG_DONTWAIT flag for example), and wouldn't really be hard to do in this particular case since we have no
>> other writer on this socket, but it sure is ugly and since no other part of the code behaves like this I didn't want
>> to do that.
> 
> I've been curiously following this thread since I've had a similar issue a couple of days ago: if there is a buggy IPC
> progress client which does not read its incoming progress messages, SWUpdate hangs forever due to a blocking write
> (send) operation. One could argue that this is good behavior because you have no other choice than to fix the faulty IPC
> client.

This "one" is me ;-)


> However, personally I would prefer robustness of the server and either skip the message for the client if its
> queue is full

One point to this is that IPC is thought to run locally on the device 
itself, and it should be well tested before delivering. Connecting 
client to SWUpdate make them part of the update process, and they must 
be designed with care. The update process can depend on decisions taken 
by the client, SWUpdate cannot take then decisions itself. A case you 
had, too, is that notification are introduced in sw-description, and 
SWUpdate simply forwards them to the listeners without beeing aware of 
them. A notification can have information that block or change the 
update process, and they cannot be lost.


> or fail the update after a certain timeout (which makes finding the root cause easier).

Failing can be better as skipping.

> So here's another potential candidate for non-blocking writes.

Best regards,
Stefano
Stefano Babic March 8, 2022, 9:11 a.m. UTC | #8
Hi Dominique,

On 08.03.22 09:32, Dominique Martinet wrote:
> Hi Michael,
> 
> Thanks for the feedback!
> 
> Michael Adler wrote on Tue, Mar 08, 2022 at 08:11:16AM +0100:
>> I've been curiously following this thread since I've had a similar issue a couple of days ago: if there is a buggy IPC
>> progress client which does not read its incoming progress messages, SWUpdate hangs forever due to a blocking write
>> (send) operation. One could argue that this is good behavior because you have no other choice than to fix the faulty IPC
>> client. However, personally I would prefer robustness of the server and either skip the message for the client if its
>> queue is full or fail the update after a certain timeout (which makes finding the root cause easier).
>> So here's another potential candidate for non-blocking writes.
> 
> That's even harder: the server can send multiple different IPCs to a
> given socket, so we can't just interrupt a notify IPC in the middle and
> hope we'll be able to finish it before another IPC is started.
> 

Correct.

> We'd need to keep a queue per client to at least finish the current IPC
> if any is pending and a can't-be-dropped IPC has to be sent, or
> something like that...

Right.

> 
> 
> (I do agree on principle bad clients shouldn't hang the server, we
> should protect ourselves from bad clients, it's just not easy the
> current way swupdate is designed

Yes, and the principle here is to maintain things as much as possible 
easy. More complexity is bad in an update agent, because reliability is 
everything.

>... I'm can try to find some time to
> take a look next month at making the server more asynchronous in
> general but I would understand if Stefano doesn't want me to break this
> too much :))
> 

Best regards,
Stefano
Storm, Christian March 8, 2022, 9:18 a.m. UTC | #9
> > The only proper solution I can think of for this would be to use
> > non-blocking writes there (it's possible by using send() with
> > MSG_DONTWAIT flag for example), and wouldn't really be hard to do in
> > this particular case since we have no other writer on this socket,
> > but it sure is ugly and since no other part of the code behaves like
> > this I didn't want to do that.
> 
> I've been curiously following this thread since I've had a similar
> issue a couple of days ago: if there is a buggy IPC progress client
> which does not read its incoming progress messages, SWUpdate hangs
> forever due to a blocking write (send) operation. One could argue that
> this is good behavior because you have no other choice than to fix the
> faulty IPC client. However, personally I would prefer robustness of
> the server and either skip the message for the client if its queue is
> full or fail the update after a certain timeout (which makes finding
> the root cause easier). So here's another potential candidate for
> non-blocking writes.

Doesn't read may also include crash. I did find this behavior as well
and came up with this patch (not published yet due to lack of time for
testing up to now):

----------------------------<>----------------------------
Author: Christian Storm <christian.storm@siemens.com>
Date:   Fr 2022-02-11 17:32

    network_thread: Don't SIGPIPE-die on IPC client crash

    If an IPC message is sent to, e.g., suricatta/hawkBit and
    the IPC initiating client crashes before reading the result,
    SWUpdate dies with SIGPIPE.

    Hence, block SIGPIPE for the whole thread as opposed to only
    for send_subprocess_reply()'s write() operation since there's
    also an opportunity window while other socket operations.

    Signed-off-by: Christian Storm <christian.storm@siemens.com>

diff --git a/core/network_thread.c b/core/network_thread.c
index 4694fef..88042f1 100644
--- a/core/network_thread.c
+++ b/core/network_thread.c
@@ -21,6 +21,7 @@
 #include <arpa/inet.h>
 #include <netinet/in.h>
 #include <pthread.h>
+#include <signal.h>

 #include "bsdqueue.h"
 #include "util.h"
@@ -325,7 +326,7 @@ static void send_subprocess_reply(
 {
    if (write(subprocess_msg->client, &subprocess_msg->message,
            sizeof(subprocess_msg->message)) < 0)
-       ERROR("Error write on socket ctrl");
+       ERROR("Error writing on ctrl socket: %s", strerror(errno));
 }

 static void handle_subprocess_ipc(struct subprocess_msg_elem *subprocess_msg)
@@ -402,6 +403,12 @@ static void *subprocess_thread (void *data)
 {
    (void)data;
    thread_ready();
+
+   sigset_t sigpipe_mask;
+   sigemptyset(&sigpipe_mask);
+   sigaddset(&sigpipe_mask, SIGPIPE);
+   pthread_sigmask(SIG_BLOCK, &sigpipe_mask, NULL);
+
    pthread_mutex_lock(&subprocess_msg_lock);

    while(1) {
----------------------------<>----------------------------

I think this is regardless a sensible fix?

What do you think?



Kind regards,
   Christian
Stefano Babic March 8, 2022, 5:33 p.m. UTC | #10
On 08.03.22 10:18, Christian Storm wrote:
>>> The only proper solution I can think of for this would be to use
>>> non-blocking writes there (it's possible by using send() with
>>> MSG_DONTWAIT flag for example), and wouldn't really be hard to do in
>>> this particular case since we have no other writer on this socket,
>>> but it sure is ugly and since no other part of the code behaves like
>>> this I didn't want to do that.
>>
>> I've been curiously following this thread since I've had a similar
>> issue a couple of days ago: if there is a buggy IPC progress client
>> which does not read its incoming progress messages, SWUpdate hangs
>> forever due to a blocking write (send) operation. One could argue that
>> this is good behavior because you have no other choice than to fix the
>> faulty IPC client. However, personally I would prefer robustness of
>> the server and either skip the message for the client if its queue is
>> full or fail the update after a certain timeout (which makes finding
>> the root cause easier). So here's another potential candidate for
>> non-blocking writes.
> 
> Doesn't read may also include crash. I did find this behavior as well
> and came up with this patch (not published yet due to lack of time for
> testing up to now):
> 
> ----------------------------<>----------------------------
> Author: Christian Storm <christian.storm@siemens.com>
> Date:   Fr 2022-02-11 17:32
> 
>      network_thread: Don't SIGPIPE-die on IPC client crash
> 
>      If an IPC message is sent to, e.g., suricatta/hawkBit and
>      the IPC initiating client crashes before reading the result,
>      SWUpdate dies with SIGPIPE.
> 
>      Hence, block SIGPIPE for the whole thread as opposed to only
>      for send_subprocess_reply()'s write() operation since there's
>      also an opportunity window while other socket operations.
> 
>      Signed-off-by: Christian Storm <christian.storm@siemens.com>
> 
> diff --git a/core/network_thread.c b/core/network_thread.c
> index 4694fef..88042f1 100644
> --- a/core/network_thread.c
> +++ b/core/network_thread.c
> @@ -21,6 +21,7 @@
>   #include <arpa/inet.h>
>   #include <netinet/in.h>
>   #include <pthread.h>
> +#include <signal.h>
> 
>   #include "bsdqueue.h"
>   #include "util.h"
> @@ -325,7 +326,7 @@ static void send_subprocess_reply(
>   {
>      if (write(subprocess_msg->client, &subprocess_msg->message,
>              sizeof(subprocess_msg->message)) < 0)
> -       ERROR("Error write on socket ctrl");
> +       ERROR("Error writing on ctrl socket: %s", strerror(errno));
>   }
> 
>   static void handle_subprocess_ipc(struct subprocess_msg_elem *subprocess_msg)
> @@ -402,6 +403,12 @@ static void *subprocess_thread (void *data)
>   {
>      (void)data;
>      thread_ready();
> +
> +   sigset_t sigpipe_mask;
> +   sigemptyset(&sigpipe_mask);
> +   sigaddset(&sigpipe_mask, SIGPIPE);
> +   pthread_sigmask(SIG_BLOCK, &sigpipe_mask, NULL);
> +
>      pthread_mutex_lock(&subprocess_msg_lock);
> 
>      while(1) {
> ----------------------------<>----------------------------
> 
> I think this is regardless a sensible fix?

Yes, agree - post an official patch.

Best regards,
Stefano

> 
> What do you think?
> 
> 
> 
> Kind regards,
>     Christian
>
Dominique Martinet April 22, 2022, 11:47 p.m. UTC | #11
Hi Stefano,

Stefano Babic wrote on Mon, Mar 07, 2022 at 02:01:42PM +0100:
> > this is at most a stop-gap measure, that is more convenient
> > for me than restoring the race condition.
> > Well, I guess I can keep the patches locally so it doesn't really matter
> > at the end of the day, but I can guarantee it works on linux :)
> 
> Ok - so let's say I will revert first the two patches, I fix, too, the issue
> I have found during debugging (I will send a series), and then I will take a
> look at the race condition you reported to check if there is another or
> similar way to fix it.


I finally ran on the problem you reported myself (thanksfully we don't
use verbose mode in the field so it's not been urgent...) and checked
master again today but my original problem is still present, swupdate -i
sometimes exits 1s after the update started if it's small with a >1s
processing time... And these we have plenty of.

Have you had time to think about a solution?

For now I think I'll just update with the patch I had suggested as we
don't need to worry about other users for the async thread with larger
io size, but I can't really see any other solution.. I had suggested
non-blocking IOs previously but since we're expecting other IPC messages
we can't just split things as we want.
At best we could keep the select approach relying on buffer logic, and
split the 'if (size) swupdate_image_write()' part of the loop into its
own loop if the buffer is large... Actually given how IPC expects things
to be done in a single read/write I think that's a hard requirement so
you're probably worrying about something that wouldn't work in the first
place?



I've now gathered a growing combinaison of swu to test with:
 - small (few kb) / big (few MBs) size, as that was key to your problem
 - success / failure
 - exits immediately / waits 3s before exiting
 - install with -i, -d or swupdate-client (I've also tested some with
hawkbit manually but it's a bit more of a pain to automate)
 - run on debian (glibc) or alpine (musl)


I'm generating them with my tooling so not great for sharing but perhaps
just having the sw-description and scripts somewhere would help add more
tests ran reguarly?


In the meantime I have rebased on master, and your other modification
from exit to pthread_exit causes another hang: exiting async_thread
without calling rq->end leaves install_from_file waiting for endupdate's
pthread_cond_signal indefinitely.

I'll send a patch for this and other cleanups now, and we can continue
discussing my original problem here.

The rebased version is here for reference:
https://github.com/atmark-techno/swupdate/commit/09c3012ee225d6a7d0bac15a2b6978003b64b497
diff mbox series

Patch

diff --git a/ipc/network_ipc-if.c b/ipc/network_ipc-if.c
index 90c49709b0fb..202ecbf09ef3 100644
--- a/ipc/network_ipc-if.c
+++ b/ipc/network_ipc-if.c
@@ -10,6 +10,7 @@ 
 #include <string.h>
 #include <errno.h>
 #include <signal.h>
+#include <sys/select.h>
 #include <pthread.h>
 #include "network_ipc.h"
 
@@ -37,7 +38,8 @@  static void *swupdate_async_thread(void *data)
 	sigset_t saved_mask;
 	struct timespec zerotime = {0, 0};
 	struct async_lib *rq = (struct async_lib *)data;
-	int notify_fd, ret;
+	int notify_fd, ret, maxfd;
+	fd_set rfds, wfds;
 	ipc_message msg;
 	msg.data.notify.status = RUN;
 
@@ -55,14 +57,30 @@  static void *swupdate_async_thread(void *data)
 		exit(1);
 	}
 
+	FD_ZERO(&rfds);
+	FD_ZERO(&wfds);
+	maxfd = (notify_fd > rq->connfd ? notify_fd : rq->connfd) + 1;
+
 	/* Start writing the image until end */
 	do {
 		if (!rq->wr)
 			break;
 
-		rq->wr(&pbuf, &size);
-		if (size)
-			swupdate_image_write(pbuf, size);
+		FD_SET(notify_fd, &rfds);
+		FD_SET(rq->connfd, &wfds);
+		ret = select(maxfd, &rfds, &wfds, NULL, NULL);
+		if (ret < 0) {
+			if (errno == EINTR)
+				continue;
+			perror("select");
+			exit(1);
+		}
+
+		if (FD_ISSET(rq->connfd, &wfds)) {
+			rq->wr(&pbuf, &size);
+			if (size)
+				swupdate_image_write(pbuf, size);
+		}
 
 		/* handle any notification coming */
 		while ((ret = ipc_notify_receive(&notify_fd, &msg, 0))