Message ID | 1409534896-372-1-git-send-email-willemb@google.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On So, 2014-08-31 at 21:28 -0400, Willem de Bruijn wrote: > When a socket error is pending, send()/recv() must abort their normal > operation and return the error. An error means having non-zero > sk->sk_err or having non-empty sk->sk_error_queue. > > Currently, the behavior for the second is inconsistent depending on > whether an error has previously been dequeued. In all cases, > recv()/send() test sk->sk_err. This is not modified on enqueue onto > the error queue, so may be 0. It is modified on dequeue, however, to > match the queued skb's errno. I observed the following when two errors > were queued: > > ret = poll(pollfd, 1, -1); > assert(ret == 1); > assert(pollfd.revents == POLLERR); > > ret = recv(fd, buf, size, MSG_NONBLOCK); > assert(ret == -1 && errno == EAGAIN); /* <-- A */ > > ret = recv(fd, buf, size, MSG_ERRQUEUE); > assert(ret > 0); > > ret = recv(fd, buf, size, MSG_NONBLOCK); > assert(ret == -1 && errno == ENOMSG); /* <-- B */ > > ret = recv(fd, buf, size, MSG_ERRQUEUE); > assert(ret > 0); > > The recv call in B returns the error code embedded in > SKB_EXT_ERR(skb), in this case ENOMSG, because I am working with > timestamps. The recv call in A should have returned the > same. > > Implement this behavior. This may surprise existing applications. > > Also make the wake-up signal when data is ready on the error queue > consistent between enqueue and dequeue: use sk_error_report in both > cases. > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > --- > > This approach leaves one issue: > The states of sk->sk_err and sk->sk_error_queue are related, but only > loosely. Error queue enqueue, dequeue and other code may overwrite > sk->sk_err unconditionally. For one, sock_error will reset > sk->sk_err to 0 even if sk->sk_error_queue is not empty. If socket > calls should abort on all errors, then should be change to test > sk_error_queue.qlen. But, doing so requires taking a lock in a busy > data path. > --- > net/core/skbuff.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 163b673..f7a280b 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -3485,8 +3485,11 @@ int sock_queue_err_skb(struct sock *sk, struct sk_buff *skb) > skb_dst_force(skb); > > skb_queue_tail(&sk->sk_error_queue, skb); > + sk->sk_err = SKB_EXT_ERR(skb)->ee.ee_errno; > + > if (!sock_flag(sk, SOCK_DEAD)) > - sk->sk_data_ready(sk); > + sk->sk_error_report(sk); > + > return 0; > } > EXPORT_SYMBOL(sock_queue_err_skb); From my experience in IPv6 code, we only do sk->sk_err updates directly in protocol error handling code. In case of UDP IPv6 errors for example we now notify sk_error_report two times with this patch (before the patch we did sk_data_ready (this is what you changed) and sk_error_report). I really wonder if setting sk->sk_err in this function is the right thing to do. It also depends on socket state bits (e.g. np->recverr) if the update happens. So we still cannot get rid of the protocol dependent sk->sk_err updates. It looks like we have to check all error handling functions in the protocols. Maybe timestamp code needs to adapt? Thanks, Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> From my experience in IPv6 code, we only do sk->sk_err updates directly > in protocol error handling code. In case of UDP IPv6 errors for example > we now notify sk_error_report two times with this patch (before the > patch we did sk_data_ready (this is what you changed) and > sk_error_report). If the event is that an error is ready, is this not correct? The wake up key should be POLLERR in both cases. In implementation of sock_def_error_report and sock_def_readable, the difference otherwise seems slim. I haven't checked all sk_data_ready and sk_error_report implementations, though, so may have missed differences for specific protocols. If this is not as obviously a strict improvement as I thought, I'll just drop it. > I really wonder if setting sk->sk_err in this function is the right > thing to do. I agree, in that it is hard to verify that this does not overwrite an existing error. This patch only makes the behavior consistent between enqueue and dequeue, but perhaps a better way to achieve that is to change the dequeue side: remove the assignment to sk->sk_err there. If so, then all locations that currently check the state of sk->sk_err should be changed to also check the qlen of the error queue and if non-zero return the embedded error of the first skb. I'll take a look whether that is feasible without adding locks or atomics in the common path. > It also depends on socket state bits (e.g. np->recverr) if > the update happens. So we still cannot get rid of the protocol dependent > sk->sk_err updates. > > It looks like we have to check all error handling functions in the > protocols. Maybe timestamp code needs to adapt? Does the above sound okay, or did you mean something else? > > Thanks, > Hannes > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Di, 2014-09-02 at 11:20 -0400, Willem de Bruijn wrote: > > From my experience in IPv6 code, we only do sk->sk_err updates directly > > in protocol error handling code. In case of UDP IPv6 errors for example > > we now notify sk_error_report two times with this patch (before the > > patch we did sk_data_ready (this is what you changed) and > > sk_error_report). > > If the event is that an error is ready, is this not correct? The > wake up key should be POLLERR in both cases. In implementation > of sock_def_error_report and sock_def_readable, the difference > otherwise seems slim. I haven't checked all sk_data_ready and > sk_error_report implementations, though, so may have missed > differences for specific protocols. If this is not as obviously a strict > improvement as I thought, I'll just drop it. I am not sure if waking up the socket two times has unforeseen consequences, we do so e.g. in __udp6_lib_err if recverr is set. > > I really wonder if setting sk->sk_err in this function is the right > > thing to do. > > I agree, in that it is hard to verify that this does not overwrite > an existing error. This patch only makes the behavior > consistent between enqueue and dequeue, but perhaps a > better way to achieve that is to change the dequeue side: > remove the assignment to sk->sk_err there. If so, then all > locations that currently check the state of sk->sk_err should > be changed to also check the qlen of the error queue and > if non-zero return the embedded error of the first skb. I'll > take a look whether that is feasible without adding locks > or atomics in the common path. That would be great. > > > It also depends on socket state bits (e.g. np->recverr) if > > the update happens. So we still cannot get rid of the protocol dependent > > sk->sk_err updates. > > > > It looks like we have to check all error handling functions in the > > protocols. Maybe timestamp code needs to adapt? > > Does the above sound okay, or did you mean something else? Best thing would be to not keep the error status two times per socket. Maybe it would make sense to always synchronize on the error queue and don't check for sk->sk_err at all? It seems to get very hairy without taking any locks though. I even don't know what the semantics for sk_err should be. Should we leave the oldest error in place until it got fetched? Then we could use cmpxchg in slow path with 0 as the old value. They could easily become unsynchronized if the user switches off recverr setsockopt. But I don't think we need to handle that. I think best effort should would be ok, too. Not having locked instructions in fast path is much more important. Thanks, Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> I agree, in that it is hard to verify that this does not overwrite >> an existing error. This patch only makes the behavior >> consistent between enqueue and dequeue, but perhaps a >> better way to achieve that is to change the dequeue side: >> remove the assignment to sk->sk_err there. If so, then all >> locations that currently check the state of sk->sk_err should >> be changed to also check the qlen of the error queue and >> if non-zero return the embedded error of the first skb. I'll >> take a look whether that is feasible without adding locks >> or atomics in the common path. > > That would be great. This looks feasible, though the patch is considerably longer. If we can be sure that in skb_dequeue_err_skb in this patch + sk->sk_err = SKB_EXT_ERR(skb)->ee.ee_errno; never overwrites an existing error, I think that this short patch is preferable. I wrote a preliminary alternative patch that modifies sock_error to check the queue and converts other code that manually tests sk->sk_err to instead call a new function sock_has_error that similarly tests both. The patch is rough and minimally tested, but I will send it out as rfc. >> >> > It also depends on socket state bits (e.g. np->recverr) if >> > the update happens. So we still cannot get rid of the protocol dependent >> > sk->sk_err updates. >> > >> > It looks like we have to check all error handling functions in the >> > protocols. Maybe timestamp code needs to adapt? >> >> Does the above sound okay, or did you mean something else? > > Best thing would be to not keep the error status two times per socket. I agree in principle, but this may be hard to achieve. > Maybe it would make sense to always synchronize on the error queue and > don't check for sk->sk_err at all? sk->sk_err is set in many locations. I don't think that we can change all those sites. >It seems to get very hairy without > taking any locks though. To fix this particular issue, I prefer to leave it sk->sk_err as is, instead separate the error queue logic from it and make sure that that has consistent semantics. > I even don't know what the semantics for sk_err should be. Neither do I. That's one reason I'd rather not touch it :) > Should we > leave the oldest error in place until it got fetched? Then we could use > cmpxchg in slow path with 0 as the old value. They could easily become > unsynchronized if the user switches off recverr setsockopt. But I don't > think we need to handle that. > > I think best effort should would be ok, too. Not having locked > instructions in fast path is much more important. We can optimistically test the qlen without a lock and only take the lock and peek into the queue in the unlikely case that qlen is non-zero. This is racy, but already done in various poll routines. It seems to be protected some other way, or the race is deemed benign. Either way, the patch does not change these callers. For recv/send, which it does change, best effort seems acceptable to me, too. Perhaps we can do better, but I'll send out the patch as is for now. On a related note, a comment in net/core/sock.c states that sk->sk_err is only changed under lock. This is not currently true. skb_dequeue_err_skb does not take the socket lock, for one. > Thanks, > Hannes Thanks for having a look at the patch! > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On So, 2014-08-31 at 21:28 -0400, Willem de Bruijn wrote: > When a socket error is pending, send()/recv() must abort their normal > operation and return the error. An error means having non-zero > sk->sk_err or having non-empty sk->sk_error_queue. > > Currently, the behavior for the second is inconsistent depending on > whether an error has previously been dequeued. In all cases, > recv()/send() test sk->sk_err. This is not modified on enqueue onto > the error queue, so may be 0. It is modified on dequeue, however, to > match the queued skb's errno. I observed the following when two errors > were queued: > > ret = poll(pollfd, 1, -1); > assert(ret == 1); > assert(pollfd.revents == POLLERR); > > ret = recv(fd, buf, size, MSG_NONBLOCK); > assert(ret == -1 && errno == EAGAIN); /* <-- A */ > > ret = recv(fd, buf, size, MSG_ERRQUEUE); > assert(ret > 0); > > ret = recv(fd, buf, size, MSG_NONBLOCK); > assert(ret == -1 && errno == ENOMSG); /* <-- B */ > > ret = recv(fd, buf, size, MSG_ERRQUEUE); > assert(ret > 0); > > The recv call in B returns the error code embedded in > SKB_EXT_ERR(skb), in this case ENOMSG, because I am working with > timestamps. The recv call in A should have returned the > same. > > Implement this behavior. This may surprise existing applications. > > Also make the wake-up signal when data is ready on the error queue > consistent between enqueue and dequeue: use sk_error_report in both > cases. > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > --- > > This approach leaves one issue: > The states of sk->sk_err and sk->sk_error_queue are related, but only > loosely. Error queue enqueue, dequeue and other code may overwrite > sk->sk_err unconditionally. For one, sock_error will reset > sk->sk_err to 0 even if sk->sk_error_queue is not empty. If socket > calls should abort on all errors, then should be change to test > sk_error_queue.qlen. But, doing so requires taking a lock in a busy > data path. > --- > net/core/skbuff.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 163b673..f7a280b 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -3485,8 +3485,11 @@ int sock_queue_err_skb(struct sock *sk, struct sk_buff *skb) > skb_dst_force(skb); > > skb_queue_tail(&sk->sk_error_queue, skb); > + sk->sk_err = SKB_EXT_ERR(skb)->ee.ee_errno; > + > if (!sock_flag(sk, SOCK_DEAD)) > - sk->sk_data_ready(sk); > + sk->sk_error_report(sk); > + > return 0; > } > EXPORT_SYMBOL(sock_queue_err_skb); I just noticed another problem with this approach why I think we cannot use it. In case we generate an error back to the socket locally because something (e.g. the packet was too big) in output path, we must not update sk->sk_err because we return it immediately to the sender but nonetheless must update the error queue. It seems to me that this patch would report the same error two times to the program then. I'll check your other patch later today, thanks! Bye, Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> I just noticed another problem with this approach why I think we cannot > use it. > > In case we generate an error back to the socket locally because > something (e.g. the packet was too big) in output path, we must not > update sk->sk_err because we return it immediately to the sender but > nonetheless must update the error queue. > > It seems to me that this patch would report the same error two times to > the program then. For instance in __ip_append_data reaching ip_local_error(sk, EMSGSIZE, ..); return -EMSGSIZE; With this patch, where sk->sk_err is set, the error will continuously be returned on send/recv until the caller reads from the error queue. A recvmsg MSG_ERRQUEUE will cause the error to be reset immediately. If socket option IP_RECVERR is set, the process should immediately read the error queue when a send call fails with EMSGSIZE or any other relevant error. In that case, the error is not reported more than once. The alternative patch does not overwrite sk->sk_err, but results in this same modified user-visible behavior. Without the patch, after a send call fails in the above code, the process can continue calling send/recv normally while ignoring the error, since all "if (sk->sk_err)" checks fall through. If we have to treat this as legacy behavior, then neither patch should be applied. Then, the semantics are that queued errors are non-blocking. In that case, the only patch needed for consistent semantics is to remove the sk->sk_err assigment in skb_dequeue_err_skb. On a related note, when returning an error, returning the errno from the item on the queue is a confusing signal. Some error codes are the result of protocol processing and have nothing queued (e.g., EINVAL), others are due to an queued error (ENOMSG) and some are simply ambiguous (EMSGSIZE). It is not reasonable to expect processes to figure out the three sets. One solution would be to dedicate a special error code to denote "there is a message queued onto sk_error_queue, read the actual errno from there". This choice only relevant if we decide to return an error, at all, of course. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, On Wed, Sep 3, 2014, at 21:40, Willem de Bruijn wrote: > > I just noticed another problem with this approach why I think we cannot > > use it. > > > > In case we generate an error back to the socket locally because > > something (e.g. the packet was too big) in output path, we must not > > update sk->sk_err because we return it immediately to the sender but > > nonetheless must update the error queue. > > > > It seems to me that this patch would report the same error two times to > > the program then. > > For instance in __ip_append_data reaching > > ip_local_error(sk, EMSGSIZE, ..); > return -EMSGSIZE; > > With this patch, where sk->sk_err is set, the error will continuously > be returned on send/recv until the caller reads from the error queue. > A recvmsg MSG_ERRQUEUE will cause the error to be reset immediately. sk->sk_err will be reset by sock_error(), called from __skb_recv_datagram or sock_alloc_send_pskb. So a normal receive or send (even without MSG_ERRQUEUE) should clean the error code and report it. No further looks into the error queue should happen currently (I am currently talking about the patch in this thread). I was concerned that we report the error twice. Once in the sendmsg path and again on recvmsg or sendmsg, which finally clears sk_err. > If socket option IP_RECVERR is set, the process should immediately > read the error queue when a send call fails with EMSGSIZE or any other > relevant error. In that case, the error is not reported more than > once. I agree but am afraid of applications start to break. > The alternative patch does not overwrite sk->sk_err, but results in > this same modified user-visible behavior. > > Without the patch, after a send call fails in the above code, the > process can continue calling send/recv normally while ignoring the > error, since all "if (sk->sk_err)" checks fall through. If we have to > treat this as legacy behavior, then neither patch should be applied. > Then, the semantics are that queued errors are non-blocking. In that > case, the only patch needed for consistent semantics is to remove the > sk->sk_err assigment in skb_dequeue_err_skb. I am afraid that we need to go down this route. :/ I would even let the sk->sk_error_report call be handled by the protocol error reporting function, as it would need to deal with sk->sk_err, too. Most applications susceptible to breakage here are ping and tracepath/traceroute. I don't think they all check MSG_ERRQUEUE always on every error. But it shouldn't matter that much because they are only short running apps and error queue will be freed if application exits before socket accounting will shut down the socket. If we start to report one instance of an error event multiple times, I think they might become confused. > On a related note, when returning an error, returning the errno from > the item on the queue is a confusing signal. Some error codes are the > result of protocol processing and have nothing queued (e.g., EINVAL), > others are due to an queued error (ENOMSG) and some are simply > ambiguous (EMSGSIZE). It is not reasonable to expect processes to > figure out the three sets. One solution would be to dedicate a special > error code to denote "there is a message queued onto sk_error_queue, > read the actual errno from there". This choice only relevant if we > decide to return an error, at all, of course. errno messages are higly sensitive regarding backwards compatibility. I am totally with you that a special errno marker to indicate that a packet was enqueued on the error queue would be much better. Currently applications always have to query error queue on error indication, otherwise socket accounting might shut down the socket at some point. Sadly I don't see this change to happen. Bye, Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> I was concerned that we report the error twice. Once in the sendmsg path > and again on recvmsg or sendmsg, which finally clears sk_err. I had overlooked that messages can be queued onto the error queue during send call processing (most cases of ip_local_error) as well as from asynchronous events (ip_icmp_error, skb_complete_tx_timestamp, ..). For the second case, the patch fixes that an enqueued message does not have its error returned on the first subsequent system call. By setting sk->sk_err unconditionally, the patch indeed introduces a double notification in the other case. Both paths can be made correct if sk->sk_err is set on enqueue only when not called from inside the send() context. The easiest change for timestamping is to set it in __skb_tstamp_tx directly. Slightly more general would be to add * /* queue errors from asynchronous events: report the error on the next syscall */ + int sock_queue_err_skb_and_notify(struct sock *sk, struct sk_buff *skb) + { + int ret = sock_queue_err_skb(sk); + if (!ret) + sk->sk_err = SKB_EXT_ERR(skb)->ee.ee_errno; + } and call this from __skb_tstamp_tx, skb_complete_tx_timestamp, skb_complete_wifi_ack. It is perhaps also safe to call from ip[v6]_icmp_error, but as you point out there are many legacy expectations there, so it may be best to leave that code path as is. >> If socket option IP_RECVERR is set, the process should immediately >> read the error queue when a send call fails with EMSGSIZE or any other >> relevant error. In that case, the error is not reported more than >> once. > > I agree but am afraid of applications start to break. > >> The alternative patch does not overwrite sk->sk_err, but results in >> this same modified user-visible behavior. >> >> Without the patch, after a send call fails in the above code, the >> process can continue calling send/recv normally while ignoring the >> error, since all "if (sk->sk_err)" checks fall through. If we have to >> treat this as legacy behavior, then neither patch should be applied. >> Then, the semantics are that queued errors are non-blocking. In that >> case, the only patch needed for consistent semantics is to remove the >> sk->sk_err assigment in skb_dequeue_err_skb. > > I am afraid that we need to go down this route. :/ > > I would even let the sk->sk_error_report call be handled by the protocol > error reporting function, as it would need to deal with sk->sk_err, too. > > Most applications susceptible to breakage here are ping and > tracepath/traceroute. I don't think they all check MSG_ERRQUEUE always > on every error. But it shouldn't matter that much because they are only > short running apps and error queue will be freed if application exits > before socket accounting will shut down the socket. If we start to > report one instance of an error event multiple times, I think they might > become confused. > >> On a related note, when returning an error, returning the errno from >> the item on the queue is a confusing signal. Some error codes are the >> result of protocol processing and have nothing queued (e.g., EINVAL), >> others are due to an queued error (ENOMSG) and some are simply >> ambiguous (EMSGSIZE). It is not reasonable to expect processes to >> figure out the three sets. One solution would be to dedicate a special >> error code to denote "there is a message queued onto sk_error_queue, >> read the actual errno from there". This choice only relevant if we >> decide to return an error, at all, of course. > > errno messages are higly sensitive regarding backwards compatibility. I > am totally with you that a special errno marker to indicate that a > packet was enqueued on the error queue would be much better. Currently > applications always have to query error queue on error indication, > otherwise socket accounting might shut down the socket at some point. > Sadly I don't see this change to happen. > > Bye, > Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 4, 2014 at 2:13 PM, Willem de Bruijn <willemb@google.com> wrote: >> I was concerned that we report the error twice. Once in the sendmsg path >> and again on recvmsg or sendmsg, which finally clears sk_err. > > I had overlooked that messages can be queued onto the error queue > during send call processing (most cases of ip_local_error) as well as from > asynchronous events (ip_icmp_error, skb_complete_tx_timestamp, ..). > > For the second case, the patch fixes that an enqueued message does > not have its error returned on the first subsequent system call. By > setting sk->sk_err unconditionally, the patch indeed introduces a > double notification in the other case. > > Both paths can be made correct if sk->sk_err is set on enqueue > only when not called from inside the send() context. The easiest > change for timestamping is to set it in __skb_tstamp_tx directly. > Slightly more general would be to add > > * /* queue errors from asynchronous events: report the error on the > next syscall */ > + int sock_queue_err_skb_and_notify(struct sock *sk, struct sk_buff *skb) > + { > + int ret = sock_queue_err_skb(sk); > + if (!ret) > + sk->sk_err = SKB_EXT_ERR(skb)->ee.ee_errno; > + } This was just a quick, buggy, sketch. The errno should be read and cached locally before calling sock_queue_err_skb. + int ret, errno; + + errno = SKB_EXT_ERR(skb)->ee.ee_errno; + ret = sock_queue_err_skb(sk, skb); + if (!ret) + sk->sk_err = errno; The assignment is also not atomic and not protected by the socket lock here. Neither is it on dequeue, but this warrants looking into. > and call this from __skb_tstamp_tx, skb_complete_tx_timestamp, > skb_complete_wifi_ack. It is perhaps also safe to call from > ip[v6]_icmp_error, but as you point out there are many legacy expectations > there, so it may be best to leave that code path as is. > >>> If socket option IP_RECVERR is set, the process should immediately >>> read the error queue when a send call fails with EMSGSIZE or any other >>> relevant error. In that case, the error is not reported more than >>> once. >> >> I agree but am afraid of applications start to break. >> >>> The alternative patch does not overwrite sk->sk_err, but results in >>> this same modified user-visible behavior. >>> >>> Without the patch, after a send call fails in the above code, the >>> process can continue calling send/recv normally while ignoring the >>> error, since all "if (sk->sk_err)" checks fall through. If we have to >>> treat this as legacy behavior, then neither patch should be applied. >>> Then, the semantics are that queued errors are non-blocking. In that >>> case, the only patch needed for consistent semantics is to remove the >>> sk->sk_err assigment in skb_dequeue_err_skb. >> >> I am afraid that we need to go down this route. :/ >> >> I would even let the sk->sk_error_report call be handled by the protocol >> error reporting function, as it would need to deal with sk->sk_err, too. >> >> Most applications susceptible to breakage here are ping and >> tracepath/traceroute. I don't think they all check MSG_ERRQUEUE always >> on every error. But it shouldn't matter that much because they are only >> short running apps and error queue will be freed if application exits >> before socket accounting will shut down the socket. If we start to >> report one instance of an error event multiple times, I think they might >> become confused. >> >>> On a related note, when returning an error, returning the errno from >>> the item on the queue is a confusing signal. Some error codes are the >>> result of protocol processing and have nothing queued (e.g., EINVAL), >>> others are due to an queued error (ENOMSG) and some are simply >>> ambiguous (EMSGSIZE). It is not reasonable to expect processes to >>> figure out the three sets. One solution would be to dedicate a special >>> error code to denote "there is a message queued onto sk_error_queue, >>> read the actual errno from there". This choice only relevant if we >>> decide to return an error, at all, of course. >> >> errno messages are higly sensitive regarding backwards compatibility. I >> am totally with you that a special errno marker to indicate that a >> packet was enqueued on the error queue would be much better. Currently >> applications always have to query error queue on error indication, >> otherwise socket accounting might shut down the socket at some point. >> Sadly I don't see this change to happen. >> >> Bye, >> Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 163b673..f7a280b 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3485,8 +3485,11 @@ int sock_queue_err_skb(struct sock *sk, struct sk_buff *skb) skb_dst_force(skb); skb_queue_tail(&sk->sk_error_queue, skb); + sk->sk_err = SKB_EXT_ERR(skb)->ee.ee_errno; + if (!sock_flag(sk, SOCK_DEAD)) - sk->sk_data_ready(sk); + sk->sk_error_report(sk); + return 0; } EXPORT_SYMBOL(sock_queue_err_skb);
When a socket error is pending, send()/recv() must abort their normal operation and return the error. An error means having non-zero sk->sk_err or having non-empty sk->sk_error_queue. Currently, the behavior for the second is inconsistent depending on whether an error has previously been dequeued. In all cases, recv()/send() test sk->sk_err. This is not modified on enqueue onto the error queue, so may be 0. It is modified on dequeue, however, to match the queued skb's errno. I observed the following when two errors were queued: ret = poll(pollfd, 1, -1); assert(ret == 1); assert(pollfd.revents == POLLERR); ret = recv(fd, buf, size, MSG_NONBLOCK); assert(ret == -1 && errno == EAGAIN); /* <-- A */ ret = recv(fd, buf, size, MSG_ERRQUEUE); assert(ret > 0); ret = recv(fd, buf, size, MSG_NONBLOCK); assert(ret == -1 && errno == ENOMSG); /* <-- B */ ret = recv(fd, buf, size, MSG_ERRQUEUE); assert(ret > 0); The recv call in B returns the error code embedded in SKB_EXT_ERR(skb), in this case ENOMSG, because I am working with timestamps. The recv call in A should have returned the same. Implement this behavior. This may surprise existing applications. Also make the wake-up signal when data is ready on the error queue consistent between enqueue and dequeue: use sk_error_report in both cases. Signed-off-by: Willem de Bruijn <willemb@google.com> --- This approach leaves one issue: The states of sk->sk_err and sk->sk_error_queue are related, but only loosely. Error queue enqueue, dequeue and other code may overwrite sk->sk_err unconditionally. For one, sock_error will reset sk->sk_err to 0 even if sk->sk_error_queue is not empty. If socket calls should abort on all errors, then should be change to test sk_error_queue.qlen. But, doing so requires taking a lock in a busy data path. --- net/core/skbuff.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)