Message ID | 20210218195157.3175119-1-sava.jakovljev@teufel.de |
---|---|
State | Changes Requested |
Headers | show |
Series | [1/2] Core: Properly close socket after processing in networking thread | expand |
Hi Sava, On 18.02.21 20:51, Sava Jakovljev wrote: > --- > core/progress_thread.c | 38 ++++++++++++++++++++++++++++++++++++++ > 1 file changed, 38 insertions(+) > > diff --git a/core/progress_thread.c b/core/progress_thread.c > index 1d4facd..8705bfd 100644 > --- a/core/progress_thread.c > +++ b/core/progress_thread.c > @@ -56,6 +56,43 @@ struct swupdate_progress { > }; > static struct swupdate_progress progress; > > +static bool is_socket_closed(int sock) > +{ > + fd_set fdset; > + > + FD_ZERO(&fdset); > + FD_SET(sock, &fdset); > + > + struct timeval tv = {0}; > + select(sock + 1, &fdset, 0, 0, &tv); > + > + if (!FD_ISSET(sock, &fdset)) > + return false; > + > + int n = 0; > + ioctl(sock, FIONREAD, &n); > + Why the socket is closed if there are no data to be read ? It looks to me unrelated. > + return n == 0; > +} > + > +static void remove_dead_sockets(struct swupdate_progress* pprog) > +{ > + struct progress_conn *conn, *tmp; > + > + SIMPLEQ_FOREACH_SAFE(conn, &pprog->conns, next, tmp) > + { > + if (is_socket_closed(conn->sockfd)) > + { > + TRACE("A progress client disappeared, removing it."); > + close(conn->sockfd); > + > + SIMPLEQ_REMOVE(&pprog->conns, conn, > + progress_conn, next); > + free(conn); > + } > + } > +} > + > /* > * This must be called after acquiring the mutex > * for the progress structure > @@ -282,6 +319,7 @@ void *progress_bar_thread (void __attribute__ ((__unused__)) *data) > conn->sockfd = connfd; > pthread_mutex_lock(&pprog->lock); > SIMPLEQ_INSERT_TAIL(&pprog->conns, conn, next); > + remove_dead_sockets(pprog); Why do we need this housekeeping here ? The connection are verified any time a message is sent in send_progress_msg(). If socket is closed, write() returns with error and the connection is removed from the list. This seems to me arbitrary. Best regards, Stefano Babic > pthread_mutex_unlock(&pprog->lock); > } while(1); > } >
Hi Stefano, It's not really necessary, but it deals with the situation when clients connect and disconnect, and there are no progress messages in between. We encountered that situation in a test-scenario, not a real-world one, but in my opinion, having this kind of resource leak should be handled as soon as possible, and not just when we want to send a message. Also, it is not impossible for problem to occur in real-world - one just has to try hard enough. And yes, it will be solved on first progress message sent, but then it might just be a bit too late. For example, if one wants to use control IPC, having the response of "you have to wait until we close our connections" is a no-go - and that will happen if this problem occurs. I agree, it is unlikely - but it is possible. I don't see much need to periodically check for dead connections, so this is a middle-ground - if my reasoning is right, we are actually preventing any resource leakage this way without too much overhead. Thank you. Best regards, Sava Jakovljev Stefano Babic schrieb am Freitag, 19. Februar 2021 um 08:56:34 UTC+1: > Hi Sava, > > On 18.02.21 20:51, Sava Jakovljev wrote: > > --- > > core/progress_thread.c | 38 ++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 38 insertions(+) > > > > diff --git a/core/progress_thread.c b/core/progress_thread.c > > index 1d4facd..8705bfd 100644 > > --- a/core/progress_thread.c > > +++ b/core/progress_thread.c > > @@ -56,6 +56,43 @@ struct swupdate_progress { > > }; > > static struct swupdate_progress progress; > > > > +static bool is_socket_closed(int sock) > > +{ > > + fd_set fdset; > > + > > + FD_ZERO(&fdset); > > + FD_SET(sock, &fdset); > > + > > + struct timeval tv = {0}; > > + select(sock + 1, &fdset, 0, 0, &tv); > > + > > + if (!FD_ISSET(sock, &fdset)) > > + return false; > > + > > + int n = 0; > > + ioctl(sock, FIONREAD, &n); > > + > > Why the socket is closed if there are no data to be read ? It looks to > me unrelated. > > > + return n == 0; > > +} > > + > > +static void remove_dead_sockets(struct swupdate_progress* pprog) > > +{ > > + struct progress_conn *conn, *tmp; > > + > > + SIMPLEQ_FOREACH_SAFE(conn, &pprog->conns, next, tmp) > > + { > > + if (is_socket_closed(conn->sockfd)) > > + { > > + TRACE("A progress client disappeared, removing it."); > > + close(conn->sockfd); > > + > > + SIMPLEQ_REMOVE(&pprog->conns, conn, > > + progress_conn, next); > > + free(conn); > > + } > > + } > > +} > > + > > /* > > * This must be called after acquiring the mutex > > * for the progress structure > > @@ -282,6 +319,7 @@ void *progress_bar_thread (void __attribute__ > ((__unused__)) *data) > > conn->sockfd = connfd; > > pthread_mutex_lock(&pprog->lock); > > SIMPLEQ_INSERT_TAIL(&pprog->conns, conn, next); > > + remove_dead_sockets(pprog); > > Why do we need this housekeeping here ? > > The connection are verified any time a message is sent in > send_progress_msg(). If socket is closed, write() returns with error and > the connection is removed from the list. This seems to me arbitrary. > > Best regards, > Stefano Babic > > > > > > pthread_mutex_unlock(&pprog->lock); > > } while(1); > > } > > > > -- > ===================================================================== > DENX Software Engineering GmbH, Managing Director: Wolfgang Denk > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > Phone: +49-8142-66989-53 <+49%208142%206698953> Fax: +49-8142-66989-80 > <+49%208142%206698980> Email: sba...@denx.de > ===================================================================== >
Hi Sava, On 19.02.21 14:04, Sava Jakovljev wrote: > Hi Stefano, > > It's not really necessary, but it deals with the situation when clients > connect and disconnect, and there are no progress messages in between. The progress IPC is local, ther eis now way to connect it outside the device. If a client connects and disconnects, it is simply a buggy application that must be fixed. > We encountered that situation in a test-scenario, not a real-world one, > but in my opinion, having this kind of resource leak should be handled > as soon as possible, and not just when we want to send a message. This is not a resouce leak because SWUpdate does not really know if the connection is still on and using FIONREAD to detect this is not correct IMHO. > Also, it is not impossible for problem to occur in real-world - one just > has to try hard enough. It remains a buggy application - if we want to protect this, we should use what Linux already offers, for example setting accesses to the socket. > And yes, it will be solved on first progress > message sent, but then it might just be a bit too late. I do not think so. > For example, if > one wants to use control IPC, control IPC does not flow into the progress, they have an own connection where you found resource leak that you fix with the first patch. > having the response of "you have to wait > until we close our connections" is a no-go But again, this is caused by the resource leak in [1]. > - and that will happen if > this problem occurs. I agree, it is unlikely - but it is possible. This is out of context - you are mixing the progress interface with the control interface. The progress interface is unidirectional, and data flow from SWUpdate to the registered client. > I don't see much need to periodically check for dead connections, I do not see it at all. > so > this is a middle-ground - if my reasoning is right, we are actually > preventing any resource leakage this way without too much overhead. I do not think there is a resource leak in progress interface, while there is clearly in the control interface. Best regards, Stefano Babic > > Thank you. > Best regards, > Sava Jakovljev > Stefano Babic schrieb am Freitag, 19. Februar 2021 um 08:56:34 UTC+1: > > Hi Sava, > > On 18.02.21 20:51, Sava Jakovljev wrote: > > --- > > core/progress_thread.c | 38 ++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 38 insertions(+) > > > > diff --git a/core/progress_thread.c b/core/progress_thread.c > > index 1d4facd..8705bfd 100644 > > --- a/core/progress_thread.c > > +++ b/core/progress_thread.c > > @@ -56,6 +56,43 @@ struct swupdate_progress { > > }; > > static struct swupdate_progress progress; > > > > +static bool is_socket_closed(int sock) > > +{ > > + fd_set fdset; > > + > > + FD_ZERO(&fdset); > > + FD_SET(sock, &fdset); > > + > > + struct timeval tv = {0}; > > + select(sock + 1, &fdset, 0, 0, &tv); > > + > > + if (!FD_ISSET(sock, &fdset)) > > + return false; > > + > > + int n = 0; > > + ioctl(sock, FIONREAD, &n); > > + > > Why the socket is closed if there are no data to be read ? It looks to > me unrelated. > > > + return n == 0; > > +} > > + > > +static void remove_dead_sockets(struct swupdate_progress* pprog) > > +{ > > + struct progress_conn *conn, *tmp; > > + > > + SIMPLEQ_FOREACH_SAFE(conn, &pprog->conns, next, tmp) > > + { > > + if (is_socket_closed(conn->sockfd)) > > + { > > + TRACE("A progress client disappeared, removing it."); > > + close(conn->sockfd); > > + > > + SIMPLEQ_REMOVE(&pprog->conns, conn, > > + progress_conn, next); > > + free(conn); > > + } > > + } > > +} > > + > > /* > > * This must be called after acquiring the mutex > > * for the progress structure > > @@ -282,6 +319,7 @@ void *progress_bar_thread (void __attribute__ > ((__unused__)) *data) > > conn->sockfd = connfd; > > pthread_mutex_lock(&pprog->lock); > > SIMPLEQ_INSERT_TAIL(&pprog->conns, conn, next); > > + remove_dead_sockets(pprog); > > Why do we need this housekeeping here ? > > The connection are verified any time a message is sent in > send_progress_msg(). If socket is closed, write() returns with error > and > the connection is removed from the list. This seems to me arbitrary. > > Best regards, > Stefano Babic > > > > > > pthread_mutex_unlock(&pprog->lock); > > } while(1); > > } > > > > -- > ===================================================================== > DENX Software Engineering GmbH, Managing Director: Wolfgang Denk > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > Phone: +49-8142-66989-53 <tel:+49%208142%206698953> Fax: > +49-8142-66989-80 <tel:+49%208142%206698980> Email: sba...@denx.de > ===================================================================== > > -- > You received this message because you are subscribed to the Google > Groups "swupdate" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to swupdate+unsubscribe@googlegroups.com > <mailto:swupdate+unsubscribe@googlegroups.com>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/swupdate/5f07cb91-d747-4d5e-8abd-a45a3b546653n%40googlegroups.com > <https://groups.google.com/d/msgid/swupdate/5f07cb91-d747-4d5e-8abd-a45a3b546653n%40googlegroups.com?utm_medium=email&utm_source=footer>.
Hi Stefano, Sure, if you want to keep it this way, no problem for me - as stated, it's mostly to have a bit better clean-up. I would argue the clean-up could be done better (to be precise, you are right and there is no leak, but one could get into a bad situation anyway, (not easy, granted) so technicality doesn't really matter). For our needs we will keep this kind of clean-up, taking your comments into account, but yeah, feel free to ignore it if you think that's the best. Thank you. Best regards, Sava Jakovljev Stefano Babic schrieb am Freitag, 19. Februar 2021 um 14:25:11 UTC+1: > Hi Sava, > > On 19.02.21 14:04, Sava Jakovljev wrote: > > Hi Stefano, > > > > It's not really necessary, but it deals with the situation when clients > > connect and disconnect, and there are no progress messages in between. > > The progress IPC is local, ther eis now way to connect it outside the > device. If a client connects and disconnects, it is simply a buggy > application that must be fixed. > > > We encountered that situation in a test-scenario, not a real-world one, > > but in my opinion, having this kind of resource leak should be handled > > as soon as possible, and not just when we want to send a message. > > This is not a resouce leak because SWUpdate does not really know if the > connection is still on and using FIONREAD to detect this is not correct > IMHO. > > > Also, it is not impossible for problem to occur in real-world - one just > > has to try hard enough. > > It remains a buggy application - if we want to protect this, we should > use what Linux already offers, for example setting accesses to the socket. > > > And yes, it will be solved on first progress > > message sent, but then it might just be a bit too late. > > I do not think so. > > > For example, if > > one wants to use control IPC, > > control IPC does not flow into the progress, they have an own connection > where you found resource leak that you fix with the first patch. > > > having the response of "you have to wait > > until we close our connections" is a no-go > > But again, this is caused by the resource leak in [1]. > > > - and that will happen if > > this problem occurs. I agree, it is unlikely - but it is possible. > > This is out of context - you are mixing the progress interface with the > control interface. The progress interface is unidirectional, and data > flow from SWUpdate to the registered client. > > > I don't see much need to periodically check for dead connections, > > I do not see it at all. > > > so > > this is a middle-ground - if my reasoning is right, we are actually > > preventing any resource leakage this way without too much overhead. > > I do not think there is a resource leak in progress interface, while > there is clearly in the control interface. > > Best regards, > Stefano Babic > > > > > Thank you. > > Best regards, > > Sava Jakovljev > > Stefano Babic schrieb am Freitag, 19. Februar 2021 um 08:56:34 UTC+1: > > > > Hi Sava, > > > > On 18.02.21 20:51, Sava Jakovljev wrote: > > > --- > > > core/progress_thread.c | 38 ++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 38 insertions(+) > > > > > > diff --git a/core/progress_thread.c b/core/progress_thread.c > > > index 1d4facd..8705bfd 100644 > > > --- a/core/progress_thread.c > > > +++ b/core/progress_thread.c > > > @@ -56,6 +56,43 @@ struct swupdate_progress { > > > }; > > > static struct swupdate_progress progress; > > > > > > +static bool is_socket_closed(int sock) > > > +{ > > > + fd_set fdset; > > > + > > > + FD_ZERO(&fdset); > > > + FD_SET(sock, &fdset); > > > + > > > + struct timeval tv = {0}; > > > + select(sock + 1, &fdset, 0, 0, &tv); > > > + > > > + if (!FD_ISSET(sock, &fdset)) > > > + return false; > > > + > > > + int n = 0; > > > + ioctl(sock, FIONREAD, &n); > > > + > > > > Why the socket is closed if there are no data to be read ? It looks to > > me unrelated. > > > > > + return n == 0; > > > +} > > > + > > > +static void remove_dead_sockets(struct swupdate_progress* pprog) > > > +{ > > > + struct progress_conn *conn, *tmp; > > > + > > > + SIMPLEQ_FOREACH_SAFE(conn, &pprog->conns, next, tmp) > > > + { > > > + if (is_socket_closed(conn->sockfd)) > > > + { > > > + TRACE("A progress client disappeared, removing it."); > > > + close(conn->sockfd); > > > + > > > + SIMPLEQ_REMOVE(&pprog->conns, conn, > > > + progress_conn, next); > > > + free(conn); > > > + } > > > + } > > > +} > > > + > > > /* > > > * This must be called after acquiring the mutex > > > * for the progress structure > > > @@ -282,6 +319,7 @@ void *progress_bar_thread (void __attribute__ > > ((__unused__)) *data) > > > conn->sockfd = connfd; > > > pthread_mutex_lock(&pprog->lock); > > > SIMPLEQ_INSERT_TAIL(&pprog->conns, conn, next); > > > + remove_dead_sockets(pprog); > > > > Why do we need this housekeeping here ? > > > > The connection are verified any time a message is sent in > > send_progress_msg(). If socket is closed, write() returns with error > > and > > the connection is removed from the list. This seems to me arbitrary. > > > > Best regards, > > Stefano Babic > > > > > > > > > > > pthread_mutex_unlock(&pprog->lock); > > > } while(1); > > > } > > > > > > > -- > > ===================================================================== > > DENX Software Engineering GmbH, Managing Director: Wolfgang Denk > > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > > Phone: +49-8142-66989-53 <+49%208142%206698953> > <tel:+49%208142%206698953> Fax: > > +49-8142-66989-80 <+49%208142%206698980> <tel:+49%208142%206698980> > Email: sba...@denx.de > > ===================================================================== > > > > -- > > You received this message because you are subscribed to the Google > > Groups "swupdate" group. > > To unsubscribe from this group and stop receiving emails from it, send > > an email to swupdate+u...@googlegroups.com > > <mailto:swupdate+u...@googlegroups.com>. > > To view this discussion on the web visit > > > https://groups.google.com/d/msgid/swupdate/5f07cb91-d747-4d5e-8abd-a45a3b546653n%40googlegroups.com > > < > https://groups.google.com/d/msgid/swupdate/5f07cb91-d747-4d5e-8abd-a45a3b546653n%40googlegroups.com?utm_medium=email&utm_source=footer > >. > > > -- > ===================================================================== > DENX Software Engineering GmbH, Managing Director: Wolfgang Denk > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > Phone: +49-8142-66989-53 <+49%208142%206698953> Fax: +49-8142-66989-80 > <+49%208142%206698980> Email: sba...@denx.de > ===================================================================== >
Hello, Just to provide a way that represents better what is the concern and why this is a leak in my opinion, even though sockets are saved in swupdate code: I ran a script which calls swupdate-progress in an infinite loop, waits for short period of time, then exits swupdate-progress and continue the loop. After few minutes swupdate has more than 1024 open fd's. This is clearly a bug - why shouldn't we check for dead connections when accepting a new one? As a consequence, command IPC is not working anymore - Linux forbids swupdate to open any new file - it doesn't matter at all that progress IPC and command IPC are separated, they are part of the same process - so I don't think this is out of context or that I am mixing any terms. From Linux point of view, a process is simply exceeding maximum number of open files so it won't allow it to take any more. Waiting for swupdate to send a message to progress IPC to clear them up is not a good solution, in my opinion. Giving that swupdate is a background process which sits idle most of the time and that progress IPC is only sent when there is an upgrade, clearing dangling fd's point when message is actually sent, which is not often, and worse, when upgrade status is changed, is unacceptable, IMHO. I'm just trying to be clear here, that swupdate should be robust and stable regardless of buggy clients which could spawn many re-connections. Having "Too many open files" error is not what I would expect from an upgrade service in any scenario. Regarding my implementation of the check, we can use getsockopt with SOL_SOCKET if you prefer that better, and if you agree with my reasoning - if I am making an error here, please let me know. Best regards, Sava Jakovljev Sava Jakovljev schrieb am Freitag, 19. Februar 2021 um 14:34:31 UTC+1: > Hi Stefano, > > Sure, if you want to keep it this way, no problem for me - as stated, it's > mostly to have a bit better clean-up. > I would argue the clean-up could be done better (to be precise, you are > right and there is no leak, but one could get into a bad situation anyway, > (not easy, granted) so technicality doesn't really matter). > For our needs we will keep this kind of clean-up, taking your comments > into account, but yeah, feel free to ignore it if you think that's the best. > > Thank you. > Best regards, > Sava Jakovljev > > Stefano Babic schrieb am Freitag, 19. Februar 2021 um 14:25:11 UTC+1: > >> Hi Sava, >> >> On 19.02.21 14:04, Sava Jakovljev wrote: >> > Hi Stefano, >> > >> > It's not really necessary, but it deals with the situation when clients >> > connect and disconnect, and there are no progress messages in between. >> >> The progress IPC is local, ther eis now way to connect it outside the >> device. If a client connects and disconnects, it is simply a buggy >> application that must be fixed. >> >> > We encountered that situation in a test-scenario, not a real-world one, >> > but in my opinion, having this kind of resource leak should be handled >> > as soon as possible, and not just when we want to send a message. >> >> This is not a resouce leak because SWUpdate does not really know if the >> connection is still on and using FIONREAD to detect this is not correct >> IMHO. >> >> > Also, it is not impossible for problem to occur in real-world - one just >> > has to try hard enough. >> >> It remains a buggy application - if we want to protect this, we should >> use what Linux already offers, for example setting accesses to the socket. >> >> > And yes, it will be solved on first progress >> > message sent, but then it might just be a bit too late. >> >> I do not think so. >> >> > For example, if >> > one wants to use control IPC, >> >> control IPC does not flow into the progress, they have an own connection >> where you found resource leak that you fix with the first patch. >> >> > having the response of "you have to wait >> > until we close our connections" is a no-go >> >> But again, this is caused by the resource leak in [1]. >> >> > - and that will happen if >> > this problem occurs. I agree, it is unlikely - but it is possible. >> >> This is out of context - you are mixing the progress interface with the >> control interface. The progress interface is unidirectional, and data >> flow from SWUpdate to the registered client. >> >> > I don't see much need to periodically check for dead connections, >> >> I do not see it at all. >> >> > so >> > this is a middle-ground - if my reasoning is right, we are actually >> > preventing any resource leakage this way without too much overhead. >> >> I do not think there is a resource leak in progress interface, while >> there is clearly in the control interface. >> >> Best regards, >> Stefano Babic >> >> > >> > Thank you. >> > Best regards, >> > Sava Jakovljev >> > Stefano Babic schrieb am Freitag, 19. Februar 2021 um 08:56:34 UTC+1: >> > >> > Hi Sava, >> > >> > On 18.02.21 20:51, Sava Jakovljev wrote: >> > > --- >> > > core/progress_thread.c | 38 ++++++++++++++++++++++++++++++++++++++ >> > > 1 file changed, 38 insertions(+) >> > > >> > > diff --git a/core/progress_thread.c b/core/progress_thread.c >> > > index 1d4facd..8705bfd 100644 >> > > --- a/core/progress_thread.c >> > > +++ b/core/progress_thread.c >> > > @@ -56,6 +56,43 @@ struct swupdate_progress { >> > > }; >> > > static struct swupdate_progress progress; >> > > >> > > +static bool is_socket_closed(int sock) >> > > +{ >> > > + fd_set fdset; >> > > + >> > > + FD_ZERO(&fdset); >> > > + FD_SET(sock, &fdset); >> > > + >> > > + struct timeval tv = {0}; >> > > + select(sock + 1, &fdset, 0, 0, &tv); >> > > + >> > > + if (!FD_ISSET(sock, &fdset)) >> > > + return false; >> > > + >> > > + int n = 0; >> > > + ioctl(sock, FIONREAD, &n); >> > > + >> > >> > Why the socket is closed if there are no data to be read ? It looks to >> > me unrelated. >> > >> > > + return n == 0; >> > > +} >> > > + >> > > +static void remove_dead_sockets(struct swupdate_progress* pprog) >> > > +{ >> > > + struct progress_conn *conn, *tmp; >> > > + >> > > + SIMPLEQ_FOREACH_SAFE(conn, &pprog->conns, next, tmp) >> > > + { >> > > + if (is_socket_closed(conn->sockfd)) >> > > + { >> > > + TRACE("A progress client disappeared, removing it."); >> > > + close(conn->sockfd); >> > > + >> > > + SIMPLEQ_REMOVE(&pprog->conns, conn, >> > > + progress_conn, next); >> > > + free(conn); >> > > + } >> > > + } >> > > +} >> > > + >> > > /* >> > > * This must be called after acquiring the mutex >> > > * for the progress structure >> > > @@ -282,6 +319,7 @@ void *progress_bar_thread (void __attribute__ >> > ((__unused__)) *data) >> > > conn->sockfd = connfd; >> > > pthread_mutex_lock(&pprog->lock); >> > > SIMPLEQ_INSERT_TAIL(&pprog->conns, conn, next); >> > > + remove_dead_sockets(pprog); >> > >> > Why do we need this housekeeping here ? >> > >> > The connection are verified any time a message is sent in >> > send_progress_msg(). If socket is closed, write() returns with error >> > and >> > the connection is removed from the list. This seems to me arbitrary. >> > >> > Best regards, >> > Stefano Babic >> > >> > >> > >> > >> > > pthread_mutex_unlock(&pprog->lock); >> > > } while(1); >> > > } >> > > >> > >> > -- >> > ===================================================================== >> > DENX Software Engineering GmbH, Managing Director: Wolfgang Denk >> > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany >> > Phone: +49-8142-66989-53 <+49%208142%206698953> >> <tel:+49%208142%206698953> Fax: >> > +49-8142-66989-80 <+49%208142%206698980> <tel:+49%208142%206698980> >> Email: sba...@denx.de >> > ===================================================================== >> > >> > -- >> > You received this message because you are subscribed to the Google >> > Groups "swupdate" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> > an email to swupdate+u...@googlegroups.com >> > <mailto:swupdate+u...@googlegroups.com>. >> > To view this discussion on the web visit >> > >> https://groups.google.com/d/msgid/swupdate/5f07cb91-d747-4d5e-8abd-a45a3b546653n%40googlegroups.com >> > < >> https://groups.google.com/d/msgid/swupdate/5f07cb91-d747-4d5e-8abd-a45a3b546653n%40googlegroups.com?utm_medium=email&utm_source=footer >> >. >> >> >> -- >> ===================================================================== >> DENX Software Engineering GmbH, Managing Director: Wolfgang Denk >> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany >> Phone: +49-8142-66989-53 <+49%208142%206698953> Fax: +49-8142-66989-80 >> <+49%208142%206698980> Email: sba...@denx.de >> ===================================================================== >> >
Hi Sava, On 19.02.21 17:59, Sava Jakovljev wrote: > Hello, > > Just to provide a way that represents better what is the concern and why > this is a leak in my opinion, even though sockets are saved in swupdate > code: > I ran a script which calls swupdate-progress in an infinite loop, waits > for short period of time, then exits swupdate-progress and continue the > loop. It is not a mistery what it happens... > After few minutes swupdate has more than 1024 open fd's. Nothing else expected. > This is clearly > a bug - why shouldn't we check for dead connections when accepting a new > one? You have just proofed that your application (the test script) is a mess - please consider that the progress interface is a *LOCAL* interface based on UDS, and it is not open to the world. It is duty of the integrator to build a suitable system, that is to estimate how many progress client should be started and sets accordingly rights to the sockets to avoid local exploits. > As a consequence, command IPC is not working anymore Of course > - Linux forbids > swupdate to open any new file according to limits and/or cgroups, and it is a good thing. > - it doesn't matter at all that progress > IPC and command IPC are separated, they are part of the same process - > so I don't think this is out of context or that I am mixing any terms. > From Linux point of view, a process is simply exceeding maximum number > of open files so it won't allow it to take any more. Exactly. But it is not that it is possible to generate any mess on the device and then try to fix somewhere else. I could also randomly write into /dev/mem and then complying that something crashes.. > > Waiting for swupdate to send a message to progress IPC to clear them up > is not a good solution, in my opinion. The correct way to check if a connection is still alive is to use the connection, that is sending / receiving or using the KEEP_ALIVE option. But as this is a UDS, keep alive is not available and it cannot be replaced by a message at the application level (in SWUpdate, then), because the progress interface is thought to be unidirectional (no ACK on the other side). > Giving that swupdate is a background process which sits idle most of the > time and that progress IPC is only sent when there is an upgrade, > clearing dangling fd's point when message is actually sent, which is > not often, and worse, when upgrade status is changed, is unacceptable, > IMHO. Agree, but what is unacceptable is to force this to a private and local interface, that is for definition unprotected. > I'm just trying to be clear here, that swupdate should be robust > and stable regardless of buggy clients which could spawn many > re-connections. Where are the buggy clients ? If we are talking about clients connecting through internet and try to force a DOS, fully agree. But if crappy clients are implemented and put on the device, this is not duty of SWUpdate. If any program can be executed, then socket's access must be protected. > Having "Too many open files" error is not what I would > expect from an upgrade service in any scenario. I can break any device if I can run what I want on it. > > Regarding my implementation of the check, we can use getsockopt with > SOL_SOCKET if you prefer that better, I guess this does not work - getsockopt just returns the last error if this happened. That means, after doing something on the socket, that is by reading (but not available here as the socket is just to send), or by writing (as it is already done). If you call getsockopt(sock, SOL_ERROR...) without any operation, there is no error at all. > and if you agree with my reasoning > - if I am making an error here, please let me know. If SWUpdate should monitor the client connections, a keep alive mechanism is necessary. Linux's keepalive does not work for UDS, so this should be added in SWUpdate, and the thread should send periodically messages to the client (messages that should be discarded by the client) to verify the connection Best regards, Stefano Babic > > Best regards, > Sava Jakovljev > Sava Jakovljev schrieb am Freitag, 19. Februar 2021 um 14:34:31 UTC+1: > > Hi Stefano, > > Sure, if you want to keep it this way, no problem for me - as > stated, it's mostly to have a bit better clean-up. > I would argue the clean-up could be done better (to be precise, you > are right and there is no leak, but one could get into a bad > situation anyway, (not easy, granted) so technicality doesn't really > matter). > For our needs we will keep this kind of clean-up, taking your > comments into account, but yeah, feel free to ignore it if you think > that's the best. > > Thank you. > Best regards, > Sava Jakovljev > > Stefano Babic schrieb am Freitag, 19. Februar 2021 um 14:25:11 UTC+1: > > Hi Sava, > > On 19.02.21 14:04, Sava Jakovljev wrote: > > Hi Stefano, > > > > It's not really necessary, but it deals with the situation > when clients > > connect and disconnect, and there are no progress messages in > between. > > The progress IPC is local, ther eis now way to connect it > outside the > device. If a client connects and disconnects, it is simply a buggy > application that must be fixed. > > > We encountered that situation in a test-scenario, not a > real-world one, > > but in my opinion, having this kind of resource leak should > be handled > > as soon as possible, and not just when we want to send a > message. > > This is not a resouce leak because SWUpdate does not really know > if the > connection is still on and using FIONREAD to detect this is not > correct > IMHO. > > > Also, it is not impossible for problem to occur in real-world > - one just > > has to try hard enough. > > It remains a buggy application - if we want to protect this, we > should > use what Linux already offers, for example setting accesses to > the socket. > > > And yes, it will be solved on first progress > > message sent, but then it might just be a bit too late. > > I do not think so. > > > For example, if > > one wants to use control IPC, > > control IPC does not flow into the progress, they have an own > connection > where you found resource leak that you fix with the first patch. > > > having the response of "you have to wait > > until we close our connections" is a no-go > > But again, this is caused by the resource leak in [1]. > > > - and that will happen if > > this problem occurs. I agree, it is unlikely - but it is > possible. > > This is out of context - you are mixing the progress interface > with the > control interface. The progress interface is unidirectional, and > data > flow from SWUpdate to the registered client. > > > I don't see much need to periodically check for dead > connections, > > I do not see it at all. > > > so > > this is a middle-ground - if my reasoning is right, we are > actually > > preventing any resource leakage this way without too much > overhead. > > I do not think there is a resource leak in progress interface, > while > there is clearly in the control interface. > > Best regards, > Stefano Babic > > > > > Thank you. > > Best regards, > > Sava Jakovljev > > Stefano Babic schrieb am Freitag, 19. Februar 2021 um > 08:56:34 UTC+1: > > > > Hi Sava, > > > > On 18.02.21 20:51, Sava Jakovljev wrote: > > > --- > > > core/progress_thread.c | 38 > ++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 38 insertions(+) > > > > > > diff --git a/core/progress_thread.c b/core/progress_thread.c > > > index 1d4facd..8705bfd 100644 > > > --- a/core/progress_thread.c > > > +++ b/core/progress_thread.c > > > @@ -56,6 +56,43 @@ struct swupdate_progress { > > > }; > > > static struct swupdate_progress progress; > > > > > > +static bool is_socket_closed(int sock) > > > +{ > > > + fd_set fdset; > > > + > > > + FD_ZERO(&fdset); > > > + FD_SET(sock, &fdset); > > > + > > > + struct timeval tv = {0}; > > > + select(sock + 1, &fdset, 0, 0, &tv); > > > + > > > + if (!FD_ISSET(sock, &fdset)) > > > + return false; > > > + > > > + int n = 0; > > > + ioctl(sock, FIONREAD, &n); > > > + > > > > Why the socket is closed if there are no data to be read ? It > looks to > > me unrelated. > > > > > + return n == 0; > > > +} > > > + > > > +static void remove_dead_sockets(struct swupdate_progress* > pprog) > > > +{ > > > + struct progress_conn *conn, *tmp; > > > + > > > + SIMPLEQ_FOREACH_SAFE(conn, &pprog->conns, next, tmp) > > > + { > > > + if (is_socket_closed(conn->sockfd)) > > > + { > > > + TRACE("A progress client disappeared, removing it."); > > > + close(conn->sockfd); > > > + > > > + SIMPLEQ_REMOVE(&pprog->conns, conn, > > > + progress_conn, next); > > > + free(conn); > > > + } > > > + } > > > +} > > > + > > > /* > > > * This must be called after acquiring the mutex > > > * for the progress structure > > > @@ -282,6 +319,7 @@ void *progress_bar_thread (void > __attribute__ > > ((__unused__)) *data) > > > conn->sockfd = connfd; > > > pthread_mutex_lock(&pprog->lock); > > > SIMPLEQ_INSERT_TAIL(&pprog->conns, conn, next); > > > + remove_dead_sockets(pprog); > > > > Why do we need this housekeeping here ? > > > > The connection are verified any time a message is sent in > > send_progress_msg(). If socket is closed, write() returns > with error > > and > > the connection is removed from the list. This seems to me > arbitrary. > > > > Best regards, > > Stefano Babic > > > > > > > > > > > pthread_mutex_unlock(&pprog->lock); > > > } while(1); > > > } > > > > > > > -- > > > ===================================================================== > > > DENX Software Engineering GmbH, Managing Director: Wolfgang Denk > > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, > Germany > > Phone: +49-8142-66989-53 <tel:+49%208142%206698953> > <tel:+49%208142%206698953> Fax: > > +49-8142-66989-80 <tel:+49%208142%206698980> > <tel:+49%208142%206698980> Email: sba...@denx.de > > > ===================================================================== > > > > > -- > > You received this message because you are subscribed to the > Google > > Groups "swupdate" group. > > To unsubscribe from this group and stop receiving emails from > it, send > > an email to swupdate+u...@googlegroups.com > > <mailto:swupdate+u...@googlegroups.com>. > > To view this discussion on the web visit > > > https://groups.google.com/d/msgid/swupdate/5f07cb91-d747-4d5e-8abd-a45a3b546653n%40googlegroups.com > > > > <https://groups.google.com/d/msgid/swupdate/5f07cb91-d747-4d5e-8abd-a45a3b546653n%40googlegroups.com?utm_medium=email&utm_source=footer>. > > > > -- > ===================================================================== > > DENX Software Engineering GmbH, Managing Director: Wolfgang Denk > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, > Germany > Phone: +49-8142-66989-53 <tel:+49%208142%206698953> Fax: > +49-8142-66989-80 <tel:+49%208142%206698980> Email: sba...@denx.de > ===================================================================== > > > -- > You received this message because you are subscribed to the Google > Groups "swupdate" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to swupdate+unsubscribe@googlegroups.com > <mailto:swupdate+unsubscribe@googlegroups.com>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/swupdate/e6f35507-db00-460b-b3cb-4a136c24ee65n%40googlegroups.com > <https://groups.google.com/d/msgid/swupdate/e6f35507-db00-460b-b3cb-4a136c24ee65n%40googlegroups.com?utm_medium=email&utm_source=footer>.
Hi, I have proved that *swupdate* can easily go into total chaos mode because it is not handling its resources appropriately - that's why I wrote that test script. If you think that having a program which can be taken into a broken state *that easily* is acceptable, then I will drop this issue. The test script it is not part of our system - it is a demonstration of how easily I can break the software. Upgrade service must not get into that state that easily - also, a program on embedded Linux device must take special care about it's resources. Your statement that you can brake any device if you try doesn't mean you should not try to prevent obvious cases. Upgrade software should simply work and be reliable - that means managing its own resources *at least*. It's not much to ask. Doing the check for dead connections only when actually sending a message is simply a bad design. Best regards, Sava Jakovljev Stefano Babic schrieb am Freitag, 19. Februar 2021 um 21:26:51 UTC+1: > Hi Sava, > > On 19.02.21 17:59, Sava Jakovljev wrote: > > Hello, > > > > Just to provide a way that represents better what is the concern and why > > this is a leak in my opinion, even though sockets are saved in swupdate > > code: > > I ran a script which calls swupdate-progress in an infinite loop, waits > > for short period of time, then exits swupdate-progress and continue the > > loop. > > It is not a mistery what it happens... > > > After few minutes swupdate has more than 1024 open fd's. > > Nothing else expected. > > > This is clearly > > a bug - why shouldn't we check for dead connections when accepting a new > > one? > > You have just proofed that your application (the test script) is a mess > - please consider that the progress interface is a *LOCAL* interface > based on UDS, and it is not open to the world. It is duty of the > integrator to build a suitable system, that is to estimate how many > progress client should be started and sets accordingly rights to the > sockets to avoid local exploits. > > > As a consequence, command IPC is not working anymore > > Of course > > > - Linux forbids > > swupdate to open any new file > > according to limits and/or cgroups, and it is a good thing. > > > - it doesn't matter at all that progress > > IPC and command IPC are separated, they are part of the same process - > > so I don't think this is out of context or that I am mixing any terms. > > From Linux point of view, a process is simply exceeding maximum number > > of open files so it won't allow it to take any more. > > Exactly. But it is not that it is possible to generate any mess on the > device and then try to fix somewhere else. I could also randomly write > into /dev/mem and then complying that something crashes.. > > > > > Waiting for swupdate to send a message to progress IPC to clear them up > > is not a good solution, in my opinion. > > The correct way to check if a connection is still alive is to use the > connection, that is sending / receiving or using the KEEP_ALIVE option. > But as this is a UDS, keep alive is not available and it cannot be > replaced by a message at the application level (in SWUpdate, then), > because the progress interface is thought to be unidirectional (no ACK > on the other side). > > > Giving that swupdate is a background process which sits idle most of the > > time and that progress IPC is only sent when there is an upgrade, > > clearing dangling fd's point when message is actually sent, which is > > not often, and worse, when upgrade status is changed, is unacceptable, > > IMHO. > > Agree, but what is unacceptable is to force this to a private and local > interface, that is for definition unprotected. > > > I'm just trying to be clear here, that swupdate should be robust > > and stable regardless of buggy clients which could spawn many > > re-connections. > > Where are the buggy clients ? If we are talking about clients connecting > through internet and try to force a DOS, fully agree. But if crappy > clients are implemented and put on the device, this is not duty of > SWUpdate. If any program can be executed, then socket's access must be > protected. > > > Having "Too many open files" error is not what I would > > expect from an upgrade service in any scenario. > > I can break any device if I can run what I want on it. > > > > > Regarding my implementation of the check, we can use getsockopt with > > SOL_SOCKET if you prefer that better, > > I guess this does not work - getsockopt just returns the last error if > this happened. That means, after doing something on the socket, that is > by reading (but not available here as the socket is just to send), or by > writing (as it is already done). If you call getsockopt(sock, > SOL_ERROR...) without any operation, there is no error at all. > > > and if you agree with my reasoning > > - if I am making an error here, please let me know. > > If SWUpdate should monitor the client connections, a keep alive > mechanism is necessary. Linux's keepalive does not work for UDS, so this > should be added in SWUpdate, and the thread should send periodically > messages to the client (messages that should be discarded by the client) > to verify the connection > > Best regards, > Stefano Babic > > > > > Best regards, > > Sava Jakovljev > > Sava Jakovljev schrieb am Freitag, 19. Februar 2021 um 14:34:31 UTC+1: > > > > Hi Stefano, > > > > Sure, if you want to keep it this way, no problem for me - as > > stated, it's mostly to have a bit better clean-up. > > I would argue the clean-up could be done better (to be precise, you > > are right and there is no leak, but one could get into a bad > > situation anyway, (not easy, granted) so technicality doesn't really > > matter). > > For our needs we will keep this kind of clean-up, taking your > > comments into account, but yeah, feel free to ignore it if you think > > that's the best. > > > > Thank you. > > Best regards, > > Sava Jakovljev > > > > Stefano Babic schrieb am Freitag, 19. Februar 2021 um 14:25:11 UTC+1: > > > > Hi Sava, > > > > On 19.02.21 14:04, Sava Jakovljev wrote: > > > Hi Stefano, > > > > > > It's not really necessary, but it deals with the situation > > when clients > > > connect and disconnect, and there are no progress messages in > > between. > > > > The progress IPC is local, ther eis now way to connect it > > outside the > > device. If a client connects and disconnects, it is simply a buggy > > application that must be fixed. > > > > > We encountered that situation in a test-scenario, not a > > real-world one, > > > but in my opinion, having this kind of resource leak should > > be handled > > > as soon as possible, and not just when we want to send a > > message. > > > > This is not a resouce leak because SWUpdate does not really know > > if the > > connection is still on and using FIONREAD to detect this is not > > correct > > IMHO. > > > > > Also, it is not impossible for problem to occur in real-world > > - one just > > > has to try hard enough. > > > > It remains a buggy application - if we want to protect this, we > > should > > use what Linux already offers, for example setting accesses to > > the socket. > > > > > And yes, it will be solved on first progress > > > message sent, but then it might just be a bit too late. > > > > I do not think so. > > > > > For example, if > > > one wants to use control IPC, > > > > control IPC does not flow into the progress, they have an own > > connection > > where you found resource leak that you fix with the first patch. > > > > > having the response of "you have to wait > > > until we close our connections" is a no-go > > > > But again, this is caused by the resource leak in [1]. > > > > > - and that will happen if > > > this problem occurs. I agree, it is unlikely - but it is > > possible. > > > > This is out of context - you are mixing the progress interface > > with the > > control interface. The progress interface is unidirectional, and > > data > > flow from SWUpdate to the registered client. > > > > > I don't see much need to periodically check for dead > > connections, > > > > I do not see it at all. > > > > > so > > > this is a middle-ground - if my reasoning is right, we are > > actually > > > preventing any resource leakage this way without too much > > overhead. > > > > I do not think there is a resource leak in progress interface, > > while > > there is clearly in the control interface. > > > > Best regards, > > Stefano Babic > > > > > > > > Thank you. > > > Best regards, > > > Sava Jakovljev > > > Stefano Babic schrieb am Freitag, 19. Februar 2021 um > > 08:56:34 UTC+1: > > > > > > Hi Sava, > > > > > > On 18.02.21 20:51, Sava Jakovljev wrote: > > > > --- > > > > core/progress_thread.c | 38 > > ++++++++++++++++++++++++++++++++++++++ > > > > 1 file changed, 38 insertions(+) > > > > > > > > diff --git a/core/progress_thread.c b/core/progress_thread.c > > > > index 1d4facd..8705bfd 100644 > > > > --- a/core/progress_thread.c > > > > +++ b/core/progress_thread.c > > > > @@ -56,6 +56,43 @@ struct swupdate_progress { > > > > }; > > > > static struct swupdate_progress progress; > > > > > > > > +static bool is_socket_closed(int sock) > > > > +{ > > > > + fd_set fdset; > > > > + > > > > + FD_ZERO(&fdset); > > > > + FD_SET(sock, &fdset); > > > > + > > > > + struct timeval tv = {0}; > > > > + select(sock + 1, &fdset, 0, 0, &tv); > > > > + > > > > + if (!FD_ISSET(sock, &fdset)) > > > > + return false; > > > > + > > > > + int n = 0; > > > > + ioctl(sock, FIONREAD, &n); > > > > + > > > > > > Why the socket is closed if there are no data to be read ? It > > looks to > > > me unrelated. > > > > > > > + return n == 0; > > > > +} > > > > + > > > > +static void remove_dead_sockets(struct swupdate_progress* > > pprog) > > > > +{ > > > > + struct progress_conn *conn, *tmp; > > > > + > > > > + SIMPLEQ_FOREACH_SAFE(conn, &pprog->conns, next, tmp) > > > > + { > > > > + if (is_socket_closed(conn->sockfd)) > > > > + { > > > > + TRACE("A progress client disappeared, removing it."); > > > > + close(conn->sockfd); > > > > + > > > > + SIMPLEQ_REMOVE(&pprog->conns, conn, > > > > + progress_conn, next); > > > > + free(conn); > > > > + } > > > > + } > > > > +} > > > > + > > > > /* > > > > * This must be called after acquiring the mutex > > > > * for the progress structure > > > > @@ -282,6 +319,7 @@ void *progress_bar_thread (void > > __attribute__ > > > ((__unused__)) *data) > > > > conn->sockfd = connfd; > > > > pthread_mutex_lock(&pprog->lock); > > > > SIMPLEQ_INSERT_TAIL(&pprog->conns, conn, next); > > > > + remove_dead_sockets(pprog); > > > > > > Why do we need this housekeeping here ? > > > > > > The connection are verified any time a message is sent in > > > send_progress_msg(). If socket is closed, write() returns > > with error > > > and > > > the connection is removed from the list. This seems to me > > arbitrary. > > > > > > Best regards, > > > Stefano Babic > > > > > > > > > > > > > > > > pthread_mutex_unlock(&pprog->lock); > > > > } while(1); > > > > } > > > > > > > > > > -- > > > > > ===================================================================== > > > > > DENX Software Engineering GmbH, Managing Director: Wolfgang Denk > > > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, > > Germany > > > Phone: +49-8142-66989-53 <+49%208142%206698953> > <tel:+49%208142%206698953> > > <tel:+49%208142%206698953> Fax: > > > +49-8142-66989-80 <+49%208142%206698980> <tel:+49%208142%206698980> > > <tel:+49%208142%206698980> Email: sba...@denx.de > > > > > ===================================================================== > > > > > > > > -- > > > You received this message because you are subscribed to the > > Google > > > Groups "swupdate" group. > > > To unsubscribe from this group and stop receiving emails from > > it, send > > > an email to swupdate+u...@googlegroups.com > > > <mailto:swupdate+u...@googlegroups.com>. > > > To view this discussion on the web visit > > > > > > https://groups.google.com/d/msgid/swupdate/5f07cb91-d747-4d5e-8abd-a45a3b546653n%40googlegroups.com > > > > > > > < > https://groups.google.com/d/msgid/swupdate/5f07cb91-d747-4d5e-8abd-a45a3b546653n%40googlegroups.com?utm_medium=email&utm_source=footer > >. > > > > > > > > -- > > ===================================================================== > > > > DENX Software Engineering GmbH, Managing Director: Wolfgang Denk > > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, > > Germany > > Phone: +49-8142-66989-53 <+49%208142%206698953> > <tel:+49%208142%206698953> Fax: > > +49-8142-66989-80 <+49%208142%206698980> <tel:+49%208142%206698980> > Email: sba...@denx.de > > ===================================================================== > > > > > > -- > > You received this message because you are subscribed to the Google > > Groups "swupdate" group. > > To unsubscribe from this group and stop receiving emails from it, send > > an email to swupdate+u...@googlegroups.com > > <mailto:swupdate+u...@googlegroups.com>. > > To view this discussion on the web visit > > > https://groups.google.com/d/msgid/swupdate/e6f35507-db00-460b-b3cb-4a136c24ee65n%40googlegroups.com > > < > https://groups.google.com/d/msgid/swupdate/e6f35507-db00-460b-b3cb-4a136c24ee65n%40googlegroups.com?utm_medium=email&utm_source=footer > >. >
Hi Sava, On 19.02.21 22:59, Sava Jakovljev wrote: > Hi, > > I have proved that *swupdate* can easily go into total chaos mode > because it is not handling its resources appropriately - that's why I > wrote that test script. What is the difference between your test and another one where a client calls in al loop progress_ipc_connect() ? This is not solved checking dead connections - there are no dead connections at all. > If you think that having a program which can be taken into a broken > state *that easily* is acceptable, see my test above, it is also so easy and your patch does not solve it. > then I will drop this issue. > The test script it is not part of our system - it is a demonstration of > how easily I can break the software. I am aware of it - but the IPC interface on SWUpdate are not thought to be fool proof. There is neither authentication nor other ways. What I suggest is to limit the number of client connections - it is currently unlimited. I cannot think any reason why there could be so many clients. If SWUpdate reaches the limit for progress IPC clients, will reject new connections. > Upgrade service must not get into > that state that easily - also, a program on embedded Linux device must > take special care about it's resources. Sure, but let's say: the progress IPC is not part of the upgrade, that is an SWUpdate does not (quite) need it itself. But if this interface is used, it is duty of the developers / integrators to be sure that the clients they write does not break the upgrade concept, because they clients *become* part of the updater. > > Your statement that you can brake any device if you try doesn't mean you > should not try to prevent obvious cases. > Upgrade software should simply work and be reliable - that means > managing its own resources *at least*. It's not much to ask. Doing the > check for dead connections It does not work, see my example above, it is just another test case. Then just limit the number of clients. > only when actually sending a message is > simply a bad design. It is not - this is the common way to check if a connection is established, the keep alive mechanism is not doing anything different. Best regards, Stefano Babic > > Best regards, > Sava Jakovljev > > > > Stefano Babic schrieb am Freitag, 19. Februar 2021 um 21:26:51 UTC+1: > > Hi Sava, > > On 19.02.21 17:59, Sava Jakovljev wrote: > > Hello, > > > > Just to provide a way that represents better what is the concern > and why > > this is a leak in my opinion, even though sockets are saved in > swupdate > > code: > > I ran a script which calls swupdate-progress in an infinite loop, > waits > > for short period of time, then exits swupdate-progress and > continue the > > loop. > > It is not a mistery what it happens... > > > After few minutes swupdate has more than 1024 open fd's. > > Nothing else expected. > > > This is clearly > > a bug - why shouldn't we check for dead connections when > accepting a new > > one? > > You have just proofed that your application (the test script) is a mess > - please consider that the progress interface is a *LOCAL* interface > based on UDS, and it is not open to the world. It is duty of the > integrator to build a suitable system, that is to estimate how many > progress client should be started and sets accordingly rights to the > sockets to avoid local exploits. > > > As a consequence, command IPC is not working anymore > > Of course > > > - Linux forbids > > swupdate to open any new file > > according to limits and/or cgroups, and it is a good thing. > > > - it doesn't matter at all that progress > > IPC and command IPC are separated, they are part of the same > process - > > so I don't think this is out of context or that I am mixing any > terms. > > From Linux point of view, a process is simply exceeding maximum > number > > of open files so it won't allow it to take any more. > > Exactly. But it is not that it is possible to generate any mess on the > device and then try to fix somewhere else. I could also randomly write > into /dev/mem and then complying that something crashes.. > > > > > Waiting for swupdate to send a message to progress IPC to clear > them up > > is not a good solution, in my opinion. > > The correct way to check if a connection is still alive is to use the > connection, that is sending / receiving or using the KEEP_ALIVE option. > But as this is a UDS, keep alive is not available and it cannot be > replaced by a message at the application level (in SWUpdate, then), > because the progress interface is thought to be unidirectional (no ACK > on the other side). > > > Giving that swupdate is a background process which sits idle most > of the > > time and that progress IPC is only sent when there is an upgrade, > > clearing dangling fd's point when message is actually sent, > which is > > not often, and worse, when upgrade status is changed, is > unacceptable, > > IMHO. > > Agree, but what is unacceptable is to force this to a private and local > interface, that is for definition unprotected. > > > I'm just trying to be clear here, that swupdate should be robust > > and stable regardless of buggy clients which could spawn many > > re-connections. > > Where are the buggy clients ? If we are talking about clients > connecting > through internet and try to force a DOS, fully agree. But if crappy > clients are implemented and put on the device, this is not duty of > SWUpdate. If any program can be executed, then socket's access must be > protected. > > > Having "Too many open files" error is not what I would > > expect from an upgrade service in any scenario. > > I can break any device if I can run what I want on it. > > > > > Regarding my implementation of the check, we can use getsockopt with > > SOL_SOCKET if you prefer that better, > > I guess this does not work - getsockopt just returns the last error if > this happened. That means, after doing something on the socket, that is > by reading (but not available here as the socket is just to send), > or by > writing (as it is already done). If you call getsockopt(sock, > SOL_ERROR...) without any operation, there is no error at all. > > > and if you agree with my reasoning > > - if I am making an error here, please let me know. > > If SWUpdate should monitor the client connections, a keep alive > mechanism is necessary. Linux's keepalive does not work for UDS, so > this > should be added in SWUpdate, and the thread should send periodically > messages to the client (messages that should be discarded by the > client) > to verify the connection > > Best regards, > Stefano Babic > > > > > Best regards, > > Sava Jakovljev > > Sava Jakovljev schrieb am Freitag, 19. Februar 2021 um 14:34:31 > UTC+1: > > > > Hi Stefano, > > > > Sure, if you want to keep it this way, no problem for me - as > > stated, it's mostly to have a bit better clean-up. > > I would argue the clean-up could be done better (to be precise, you > > are right and there is no leak, but one could get into a bad > > situation anyway, (not easy, granted) so technicality doesn't really > > matter). > > For our needs we will keep this kind of clean-up, taking your > > comments into account, but yeah, feel free to ignore it if you think > > that's the best. > > > > Thank you. > > Best regards, > > Sava Jakovljev > > > > Stefano Babic schrieb am Freitag, 19. Februar 2021 um 14:25:11 > UTC+1: > > > > Hi Sava, > > > > On 19.02.21 14:04, Sava Jakovljev wrote: > > > Hi Stefano, > > > > > > It's not really necessary, but it deals with the situation > > when clients > > > connect and disconnect, and there are no progress messages in > > between. > > > > The progress IPC is local, ther eis now way to connect it > > outside the > > device. If a client connects and disconnects, it is simply a buggy > > application that must be fixed. > > > > > We encountered that situation in a test-scenario, not a > > real-world one, > > > but in my opinion, having this kind of resource leak should > > be handled > > > as soon as possible, and not just when we want to send a > > message. > > > > This is not a resouce leak because SWUpdate does not really know > > if the > > connection is still on and using FIONREAD to detect this is not > > correct > > IMHO. > > > > > Also, it is not impossible for problem to occur in real-world > > - one just > > > has to try hard enough. > > > > It remains a buggy application - if we want to protect this, we > > should > > use what Linux already offers, for example setting accesses to > > the socket. > > > > > And yes, it will be solved on first progress > > > message sent, but then it might just be a bit too late. > > > > I do not think so. > > > > > For example, if > > > one wants to use control IPC, > > > > control IPC does not flow into the progress, they have an own > > connection > > where you found resource leak that you fix with the first patch. > > > > > having the response of "you have to wait > > > until we close our connections" is a no-go > > > > But again, this is caused by the resource leak in [1]. > > > > > - and that will happen if > > > this problem occurs. I agree, it is unlikely - but it is > > possible. > > > > This is out of context - you are mixing the progress interface > > with the > > control interface. The progress interface is unidirectional, and > > data > > flow from SWUpdate to the registered client. > > > > > I don't see much need to periodically check for dead > > connections, > > > > I do not see it at all. > > > > > so > > > this is a middle-ground - if my reasoning is right, we are > > actually > > > preventing any resource leakage this way without too much > > overhead. > > > > I do not think there is a resource leak in progress interface, > > while > > there is clearly in the control interface. > > > > Best regards, > > Stefano Babic > > > > > > > > Thank you. > > > Best regards, > > > Sava Jakovljev > > > Stefano Babic schrieb am Freitag, 19. Februar 2021 um > > 08:56:34 UTC+1: > > > > > > Hi Sava, > > > > > > On 18.02.21 20:51, Sava Jakovljev wrote: > > > > --- > > > > core/progress_thread.c | 38 > > ++++++++++++++++++++++++++++++++++++++ > > > > 1 file changed, 38 insertions(+) > > > > > > > > diff --git a/core/progress_thread.c b/core/progress_thread.c > > > > index 1d4facd..8705bfd 100644 > > > > --- a/core/progress_thread.c > > > > +++ b/core/progress_thread.c > > > > @@ -56,6 +56,43 @@ struct swupdate_progress { > > > > }; > > > > static struct swupdate_progress progress; > > > > > > > > +static bool is_socket_closed(int sock) > > > > +{ > > > > + fd_set fdset; > > > > + > > > > + FD_ZERO(&fdset); > > > > + FD_SET(sock, &fdset); > > > > + > > > > + struct timeval tv = {0}; > > > > + select(sock + 1, &fdset, 0, 0, &tv); > > > > + > > > > + if (!FD_ISSET(sock, &fdset)) > > > > + return false; > > > > + > > > > + int n = 0; > > > > + ioctl(sock, FIONREAD, &n); > > > > + > > > > > > Why the socket is closed if there are no data to be read ? It > > looks to > > > me unrelated. > > > > > > > + return n == 0; > > > > +} > > > > + > > > > +static void remove_dead_sockets(struct swupdate_progress* > > pprog) > > > > +{ > > > > + struct progress_conn *conn, *tmp; > > > > + > > > > + SIMPLEQ_FOREACH_SAFE(conn, &pprog->conns, next, tmp) > > > > + { > > > > + if (is_socket_closed(conn->sockfd)) > > > > + { > > > > + TRACE("A progress client disappeared, removing it."); > > > > + close(conn->sockfd); > > > > + > > > > + SIMPLEQ_REMOVE(&pprog->conns, conn, > > > > + progress_conn, next); > > > > + free(conn); > > > > + } > > > > + } > > > > +} > > > > + > > > > /* > > > > * This must be called after acquiring the mutex > > > > * for the progress structure > > > > @@ -282,6 +319,7 @@ void *progress_bar_thread (void > > __attribute__ > > > ((__unused__)) *data) > > > > conn->sockfd = connfd; > > > > pthread_mutex_lock(&pprog->lock); > > > > SIMPLEQ_INSERT_TAIL(&pprog->conns, conn, next); > > > > + remove_dead_sockets(pprog); > > > > > > Why do we need this housekeeping here ? > > > > > > The connection are verified any time a message is sent in > > > send_progress_msg(). If socket is closed, write() returns > > with error > > > and > > > the connection is removed from the list. This seems to me > > arbitrary. > > > > > > Best regards, > > > Stefano Babic > > > > > > > > > > > > > > > > pthread_mutex_unlock(&pprog->lock); > > > > } while(1); > > > > } > > > > > > > > > > -- > > > > > > ===================================================================== > > > > > DENX Software Engineering GmbH, Managing Director: Wolfgang Denk > > > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, > > Germany > > > Phone: +49-8142-66989-53 <tel:+49%208142%206698953> > <tel:+49%208142%206698953> > > <tel:+49%208142%206698953> Fax: > > > +49-8142-66989-80 <tel:+49%208142%206698980> > <tel:+49%208142%206698980> > > <tel:+49%208142%206698980> Email: sba...@denx.de > > > > > > ===================================================================== > > > > > > > > -- > > > You received this message because you are subscribed to the > > Google > > > Groups "swupdate" group. > > > To unsubscribe from this group and stop receiving emails from > > it, send > > > an email to swupdate+u...@googlegroups.com > > > <mailto:swupdate+u...@googlegroups.com>. > > > To view this discussion on the web visit > > > > > > https://groups.google.com/d/msgid/swupdate/5f07cb91-d747-4d5e-8abd-a45a3b546653n%40googlegroups.com > > > > > > > > > <https://groups.google.com/d/msgid/swupdate/5f07cb91-d747-4d5e-8abd-a45a3b546653n%40googlegroups.com?utm_medium=email&utm_source=footer>. > > > > > > > > > -- > > > ===================================================================== > > > > DENX Software Engineering GmbH, Managing Director: Wolfgang Denk > > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, > > Germany > > Phone: +49-8142-66989-53 <tel:+49%208142%206698953> > <tel:+49%208142%206698953> Fax: > > +49-8142-66989-80 <tel:+49%208142%206698980> > <tel:+49%208142%206698980> Email: sba...@denx.de > > > ===================================================================== > > > > > > -- > > You received this message because you are subscribed to the Google > > Groups "swupdate" group. > > To unsubscribe from this group and stop receiving emails from it, > send > > an email to swupdate+u...@googlegroups.com > > <mailto:swupdate+u...@googlegroups.com>. > > To view this discussion on the web visit > > > https://groups.google.com/d/msgid/swupdate/e6f35507-db00-460b-b3cb-4a136c24ee65n%40googlegroups.com > > > > <https://groups.google.com/d/msgid/swupdate/e6f35507-db00-460b-b3cb-4a136c24ee65n%40googlegroups.com?utm_medium=email&utm_source=footer>. > > > -- > You received this message because you are subscribed to the Google > Groups "swupdate" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to swupdate+unsubscribe@googlegroups.com > <mailto:swupdate+unsubscribe@googlegroups.com>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/swupdate/63660164-9138-4bf0-94ba-a029a06ea80cn%40googlegroups.com > <https://groups.google.com/d/msgid/swupdate/63660164-9138-4bf0-94ba-a029a06ea80cn%40googlegroups.com?utm_medium=email&utm_source=footer>.
diff --git a/core/progress_thread.c b/core/progress_thread.c index 1d4facd..8705bfd 100644 --- a/core/progress_thread.c +++ b/core/progress_thread.c @@ -56,6 +56,43 @@ struct swupdate_progress { }; static struct swupdate_progress progress; +static bool is_socket_closed(int sock) +{ + fd_set fdset; + + FD_ZERO(&fdset); + FD_SET(sock, &fdset); + + struct timeval tv = {0}; + select(sock + 1, &fdset, 0, 0, &tv); + + if (!FD_ISSET(sock, &fdset)) + return false; + + int n = 0; + ioctl(sock, FIONREAD, &n); + + return n == 0; +} + +static void remove_dead_sockets(struct swupdate_progress* pprog) +{ + struct progress_conn *conn, *tmp; + + SIMPLEQ_FOREACH_SAFE(conn, &pprog->conns, next, tmp) + { + if (is_socket_closed(conn->sockfd)) + { + TRACE("A progress client disappeared, removing it."); + close(conn->sockfd); + + SIMPLEQ_REMOVE(&pprog->conns, conn, + progress_conn, next); + free(conn); + } + } +} + /* * This must be called after acquiring the mutex * for the progress structure @@ -282,6 +319,7 @@ void *progress_bar_thread (void __attribute__ ((__unused__)) *data) conn->sockfd = connfd; pthread_mutex_lock(&pprog->lock); SIMPLEQ_INSERT_TAIL(&pprog->conns, conn, next); + remove_dead_sockets(pprog); pthread_mutex_unlock(&pprog->lock); } while(1); }