[net] sctp: fix a success return may hide an error

Message ID	31f3b581258d0458edcf30f65ef9513bdc41acc1.1470919978.git.lucien.xin@gmail.com
State	Changes Requested, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> From: Xin Long <lucien.xin@gmail.com> To: network dev <netdev@vger.kernel.org>, linux-sctp@vger.kernel.org Cc: davem@davemloft.net, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>, Vlad Yasevich <vyasevich@gmail.com>, daniel@iogearbox.net Subject: [PATCH net] sctp: fix a success return may hide an error Date: Thu, 11 Aug 2016 20:52:58 +0800 Message-Id: <31f3b581258d0458edcf30f65ef9513bdc41acc1.1470919978.git.lucien.xin@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk

Xin Long Aug. 11, 2016, 12:52 p.m. UTC

Now in the end of sctp_outq_flush, sctp calls sctp_packet_transmit
in a loop. The return of current sctp_packet_transmit always covers
the prior one's. If the last call of sctp_packet_transmit return a
success, it may hide the error that returns from the prior call.

This patch is to fix this by keeping the old error until the new
error returns from sctp_packet_transmit. Did TAHI test against this
fix, no regression is found.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/sctp/outqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Marcelo Ricardo Leitner Aug. 11, 2016, 1:11 p.m. UTC | #1

On Thu, Aug 11, 2016 at 08:52:58PM +0800, Xin Long wrote:
> Now in the end of sctp_outq_flush, sctp calls sctp_packet_transmit
> in a loop. The return of current sctp_packet_transmit always covers
> the prior one's. If the last call of sctp_packet_transmit return a
> success, it may hide the error that returns from the prior call.
> 
> This patch is to fix this by keeping the old error until the new
> error returns from sctp_packet_transmit. Did TAHI test against this
> fix, no regression is found.
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

> ---
>  net/sctp/outqueue.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
> index 72e54a4..b97c8ad 100644
> --- a/net/sctp/outqueue.c
> +++ b/net/sctp/outqueue.c
> @@ -1193,7 +1193,7 @@ sctp_flush_out:
>  						      send_ready);
>  		packet = &t->packet;
>  		if (!sctp_packet_empty(packet))
> -			error = sctp_packet_transmit(packet, gfp);
> +			error = sctp_packet_transmit(packet, gfp) ? : error;
>  
>  		/* Clear the burst limited state, if any */
>  		sctp_transport_burst_reset(t);
> -- 
> 2.1.0
>

Neil Horman Aug. 11, 2016, 3:36 p.m. UTC | #2

On Thu, Aug 11, 2016 at 08:52:58PM +0800, Xin Long wrote:
> Now in the end of sctp_outq_flush, sctp calls sctp_packet_transmit
> in a loop. The return of current sctp_packet_transmit always covers
> the prior one's. If the last call of sctp_packet_transmit return a
> success, it may hide the error that returns from the prior call.
> 
> This patch is to fix this by keeping the old error until the new
> error returns from sctp_packet_transmit. Did TAHI test against this
> fix, no regression is found.
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>  net/sctp/outqueue.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
> index 72e54a4..b97c8ad 100644
> --- a/net/sctp/outqueue.c
> +++ b/net/sctp/outqueue.c
> @@ -1193,7 +1193,7 @@ sctp_flush_out:
>  						      send_ready);
>  		packet = &t->packet;
>  		if (!sctp_packet_empty(packet))
> -			error = sctp_packet_transmit(packet, gfp);
> +			error = sctp_packet_transmit(packet, gfp) ? : error;
>  
>  		/* Clear the burst limited state, if any */
>  		sctp_transport_burst_reset(t);
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Acked-by: Neil Horman <nhorman@tuxdriver.com>

David Miller Aug. 13, 2016, 4:11 a.m. UTC | #3

From: Xin Long <lucien.xin@gmail.com>
Date: Thu, 11 Aug 2016 20:52:58 +0800

> Now in the end of sctp_outq_flush, sctp calls sctp_packet_transmit
> in a loop. The return of current sctp_packet_transmit always covers
> the prior one's. If the last call of sctp_packet_transmit return a
> success, it may hide the error that returns from the prior call.
> 
> This patch is to fix this by keeping the old error until the new
> error returns from sctp_packet_transmit. Did TAHI test against this
> fix, no regression is found.
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

This style of error handling is dangerous.  The first error can be
lost.

For example, if sctp_outq_flush_rtx() earlier in this function returns
an error, it will be lost if any invocation of the function
sctp_packet_transmit() at the end function signals an error.

I think you should always preserve the first error that is recorded
into 'error'.

I also wonder about why sctp_outq_flush_rtx() errors are completely
ignored and don't influence the control flow here in any way.

Xin Long Aug. 13, 2016, 7:47 a.m. UTC | #4

>
> This style of error handling is dangerous.  The first error can be
> lost.
>
> For example, if sctp_outq_flush_rtx() earlier in this function returns
> an error, it will be lost if any invocation of the function
> sctp_packet_transmit() at the end function signals an error.
>
> I think you should always preserve the first error that is recorded
> into 'error'.
>
> I also wonder about why sctp_outq_flush_rtx() errors are completely
> ignored and don't influence the control flow here in any way.

Yes, the first error can be lost.
Here we just keep the last error. We don't really have to return the
first error or return it on the first failure.

[1]
Both sctp_outq_flush_rtx and sctp_packet_transmit can ONLY
return one error (-ENOMEM), as sctp_outq_flush_rtx also calls
sctp_packet_transmit.

[2]
It's the original codes that it doesn't return immediately when
sctp_outq_flush_rtx returns error. I guess it just doesn't want
to stop flushing out transport_list only because it fail to flush
rtx.
even sctp_packet_transmit_chunk in sctp_outq_flush also just
put the error into sk->sk_err, instread of returning immediately.

So we cannot return the err at the first failure as [2], the error
here is always -ENOMEM as [1].
I think to return the last error here is ok, at least  not dangerous,
can also fix the issue "a success return may hide an error" with
clear codes. :)

David Laight Aug. 16, 2016, 9:16 a.m. UTC | #5

From: Xin Long

> Sent: 13 August 2016 08:48

> >

> > This style of error handling is dangerous.  The first error can be

> > lost.

> >

> > For example, if sctp_outq_flush_rtx() earlier in this function returns

> > an error, it will be lost if any invocation of the function

> > sctp_packet_transmit() at the end function signals an error.

> >

> > I think you should always preserve the first error that is recorded

> > into 'error'.

> >

> > I also wonder about why sctp_outq_flush_rtx() errors are completely

> > ignored and don't influence the control flow here in any way.

> 

> Yes, the first error can be lost.

> Here we just keep the last error. We don't really have to return the

> first error or return it on the first failure.

> 

> [1]

> Both sctp_outq_flush_rtx and sctp_packet_transmit can ONLY

> return one error (-ENOMEM), as sctp_outq_flush_rtx also calls

> sctp_packet_transmit.


What is the effect of the error?
If it is 'just' equivalent to a lost ethernet packet (and the skb (etc)
is freed) then the protocol will recover.
If it is anything else then the error path is probably wrong.

Also after one error is it actually worth trying to send anything else
at all? ISTM that the code should either:
1) wait for resources and retry.
2) discard the entire queue (freeing resource) and hope the protocol
   timers will recover.

> [2]

> It's the original codes that it doesn't return immediately when

> sctp_outq_flush_rtx returns error. I guess it just doesn't want

> to stop flushing out transport_list only because it fail to flush

> rtx.

> even sctp_packet_transmit_chunk in sctp_outq_flush also just

> put the error into sk->sk_err, instread of returning immediately.

> 

> So we cannot return the err at the first failure as [2], the error

> here is always -ENOMEM as [1].

> I think to return the last error here is ok, at least  not dangerous,

> can also fix the issue "a success return may hide an error" with

> clear codes. :)


Which code looks at sk->sk_err?
It doesn't look right to be setting an error code on the socket due
a transmit packet discard.

	David

Xin Long Aug. 16, 2016, 11:34 a.m. UTC | #6

>>
>> [1]
>> Both sctp_outq_flush_rtx and sctp_packet_transmit can ONLY
>> return one error (-ENOMEM), as sctp_outq_flush_rtx also calls
>> sctp_packet_transmit.
>
> What is the effect of the error?
> If it is 'just' equivalent to a lost ethernet packet (and the skb (etc)
> is freed) then the protocol will recover.
> If it is anything else then the error path is probably wrong.
This err returns back to sctp_sendmsg, there sctp will abort asoc.
in this function, sctp tries to do 3 things:
1. flush rtx queue
2. transmit the packet of current transport
3. flush all the transports.
Now sctp would do them one by one, even if one of them returns err.

>
> Also after one error is it actually worth trying to send anything else
> at all? ISTM that the code should either:
yeah, that's the problem.
the "sctp_flush_out:" code tries to force clear all the transport before
returning even if there're errors already.

> 1) wait for resources and retry.
> 2) discard the entire queue (freeing resource) and hope the protocol
>    timers will recover.
It's a different process, will think about it.

>
>> [2]
>> It's the original codes that it doesn't return immediately when
>> sctp_outq_flush_rtx returns error. I guess it just doesn't want
>> to stop flushing out transport_list only because it fail to flush
>> rtx.
>> even sctp_packet_transmit_chunk in sctp_outq_flush also just
>> put the error into sk->sk_err, instread of returning immediately.
>>
>> So we cannot return the err at the first failure as [2], the error
>> here is always -ENOMEM as [1].
>> I think to return the last error here is ok, at least  not dangerous,
>> can also fix the issue "a success return may hide an error" with
>> clear codes. :)
>
> Which code looks at sk->sk_err?
> It doesn't look right to be setting an error code on the socket due
> a transmit packet discard.
I guess sctp_packet_transmit_chunk's return value is used for
'status' (like PMTU_FULL,RWND_FUL... ), that's why err was
put into sk->sk_err.   This err is supposed to be checked in
sctp_sendmsg, but there sctp_error check sk->sk_err only when
err == -EPIPE.
yes, we need to fix this, thanks.

David Laight Aug. 16, 2016, 4:01 p.m. UTC | #7

From: Xin Long

> Sent: 16 August 2016 12:34

>

> >> Both sctp_outq_flush_rtx and sctp_packet_transmit can ONLY

> >> return one error (-ENOMEM), as sctp_outq_flush_rtx also calls

> >> sctp_packet_transmit.

> >

> > What is the effect of the error?

> > If it is 'just' equivalent to a lost ethernet packet (and the skb (etc)

> > is freed) then the protocol will recover.

> > If it is anything else then the error path is probably wrong.

>

> This err returns back to sctp_sendmsg, there sctp will abort asoc.


That doesn't seem a good idea.
You don't want to abort the association if there is a transient
memory allocation failure.
You also can't drop data chunks.

> in this function, sctp tries to do 3 things:

> 1. flush rtx queue

> 2. transmit the packet of current transport

> 3. flush all the transports.

> Now sctp would do them one by one, even if one of them returns err.


You probably need to explain what 'flush' means here.
I think it means 'process and send', but it might mean 'discard the
contents of'.

Last time I looked at the sctp code my head exploded.
ISTR it is a mess of timing errors waiting to happen
(and I write comms protocol stack code for a living).

	David

Marcelo Ricardo Leitner Aug. 16, 2016, 5:24 p.m. UTC | #8

On Tue, Aug 16, 2016 at 04:01:50PM +0000, David Laight wrote:
> From: Xin Long
> > Sent: 16 August 2016 12:34
> >
> > >> Both sctp_outq_flush_rtx and sctp_packet_transmit can ONLY
> > >> return one error (-ENOMEM), as sctp_outq_flush_rtx also calls
> > >> sctp_packet_transmit.
> > >
> > > What is the effect of the error?
> > > If it is 'just' equivalent to a lost ethernet packet (and the skb (etc)
> > > is freed) then the protocol will recover.
> > > If it is anything else then the error path is probably wrong.
> >
> > This err returns back to sctp_sendmsg, there sctp will abort asoc.

That's not right I think. sctp_sendmsg will only free the asoc if it was
created to send that specific chunk. And in this case, this change
should have no effect as it can't have sctp_outq_flush() touching
several transports in a row.

I'm basing on:
out_free:
        if (new_asoc)
                sctp_association_free(asoc);

and sctp_recvmsg will just fetch, return and clear the error via
sctp_skb_recv_datagram, but not free it.

Do you see any other place freeing it?

> 
> That doesn't seem a good idea.
> You don't want to abort the association if there is a transient
> memory allocation failure.
> You also can't drop data chunks.

From a system-wise POV, this behavior - to free the new asoc in case of
transient memory allocation failure - doesn't seem bad to me.
That's what will have to happen if any allocation before it failed and
also it helps the system to reduce the stress a little bit. I don't see
any inconsistency/problems here because we are not dropping a single
random chunk but instead we are actually refusing to initialize a new
asoc in such conditions.

Nevertheless, I agree that letting the application see ENOMEM errors when
the data actually got queued and is being fully handled, as in, it will
be retransmitted later, is not be wise, as the application probably
won't be able to distinguish from ENOMEMs that it should retry or not.
Here I see a problem, yet it's not due to this specific change, perhaps
it just got attention because of it. In this situation, we should handle
ENOMEMs internally if possible so the application can know that if it
hits an ENOMEM, it's real and it has to retry.

Fixing this inconsistency may very well cause us to let that new asoc to
live longer, works for me too.

> 
> > in this function, sctp tries to do 3 things:
> > 1. flush rtx queue
> > 2. transmit the packet of current transport
> > 3. flush all the transports.
> > Now sctp would do them one by one, even if one of them returns err.
> 
> You probably need to explain what 'flush' means here.
> I think it means 'process and send', but it might mean 'discard the
> contents of'.

Yes, the first. He probably use the work 'flush' because the function is
called .._flush_..

> Last time I looked at the sctp code my head exploded.
> ISTR it is a mess of timing errors waiting to happen
> (and I write comms protocol stack code for a living).

Well, it may be, but we are trying to improve it.  Please continue
discussing the fixes so we can keep improving it. :)

  Marcelo

Xin Long Aug. 16, 2016, 6:24 p.m. UTC | #9

>> > This err returns back to sctp_sendmsg, there sctp will abort asoc.
>
> That's not right I think. sctp_sendmsg will only free the asoc if it was
> created to send that specific chunk. And in this case, this change
> should have no effect as it can't have sctp_outq_flush() touching
> several transports in a row.
>
> I'm basing on:
> out_free:
>         if (new_asoc)
>                 sctp_association_free(asoc);
>
> and sctp_recvmsg will just fetch, return and clear the error via
> sctp_skb_recv_datagram, but not free it.
>
> Do you see any other place freeing it?
Sorry, you are right, it free assoc just for new_asoc.

>
>>
>> That doesn't seem a good idea.
>> You don't want to abort the association if there is a transient
>> memory allocation failure.
>> You also can't drop data chunks.
>
> From a system-wise POV, this behavior - to free the new asoc in case of
> transient memory allocation failure - doesn't seem bad to me.
> That's what will have to happen if any allocation before it failed and
> also it helps the system to reduce the stress a little bit. I don't see
> any inconsistency/problems here because we are not dropping a single
> random chunk but instead we are actually refusing to initialize a new
> asoc in such conditions.
>
> Nevertheless, I agree that letting the application see ENOMEM errors when
> the data actually got queued and is being fully handled, as in, it will
> be retransmitted later, is not be wise, as the application probably
> won't be able to distinguish from ENOMEMs that it should retry or not.
> Here I see a problem, yet it's not due to this specific change, perhaps
> it just got attention because of it. In this situation, we should handle
> ENOMEMs internally if possible so the application can know that if it
> hits an ENOMEM, it's real and it has to retry.
If  letting the application see ENOMEM errors, and sctp has to drop this
chunk, instead of retransmiting the ENOMEM chunk, but the ENOMEM
chunk may not be the chunk from current msg, as it flush all the queue.
even if users get an ENOMEM error, they may re-send a chunk that is same
with the one that is still in retransmit queue.

>
> Fixing this inconsistency may very well cause us to let that new asoc to
> live longer, works for me too.
>
>>
>> > in this function, sctp tries to do 3 things:
>> > 1. flush rtx queue
>> > 2. transmit the packet of current transport
>> > 3. flush all the transports.
>> > Now sctp would do them one by one, even if one of them returns err.
>>
>> You probably need to explain what 'flush' means here.
>> I think it means 'process and send', but it might mean 'discard the
>> contents of'.
>
> Yes, the first. He probably use the work 'flush' because the function is
> called .._flush_..
Yes, :D

>
>> Last time I looked at the sctp code my head exploded.
>> ISTR it is a mess of timing errors waiting to happen
>> (and I write comms protocol stack code for a living).
>
> Well, it may be, but we are trying to improve it.  Please continue
> discussing the fixes so we can keep improving it. :)
>
>   Marcelo
>

Marcelo Ricardo Leitner Aug. 16, 2016, 6:33 p.m. UTC | #10

On Wed, Aug 17, 2016 at 02:24:19AM +0800, Xin Long wrote:
> >> > This err returns back to sctp_sendmsg, there sctp will abort asoc.
> >
> > That's not right I think. sctp_sendmsg will only free the asoc if it was
> > created to send that specific chunk. And in this case, this change
> > should have no effect as it can't have sctp_outq_flush() touching
> > several transports in a row.
> >
> > I'm basing on:
> > out_free:
> >         if (new_asoc)
> >                 sctp_association_free(asoc);
> >
> > and sctp_recvmsg will just fetch, return and clear the error via
> > sctp_skb_recv_datagram, but not free it.
> >
> > Do you see any other place freeing it?
> Sorry, you are right, it free assoc just for new_asoc.
> 
> >
> >>
> >> That doesn't seem a good idea.
> >> You don't want to abort the association if there is a transient
> >> memory allocation failure.
> >> You also can't drop data chunks.
> >
> > From a system-wise POV, this behavior - to free the new asoc in case of
> > transient memory allocation failure - doesn't seem bad to me.
> > That's what will have to happen if any allocation before it failed and
> > also it helps the system to reduce the stress a little bit. I don't see
> > any inconsistency/problems here because we are not dropping a single
> > random chunk but instead we are actually refusing to initialize a new
> > asoc in such conditions.
> >
> > Nevertheless, I agree that letting the application see ENOMEM errors when
> > the data actually got queued and is being fully handled, as in, it will
> > be retransmitted later, is not be wise, as the application probably
> > won't be able to distinguish from ENOMEMs that it should retry or not.
> > Here I see a problem, yet it's not due to this specific change, perhaps
> > it just got attention because of it. In this situation, we should handle
> > ENOMEMs internally if possible so the application can know that if it
> > hits an ENOMEM, it's real and it has to retry.
> If  letting the application see ENOMEM errors, and sctp has to drop this
> chunk, instead of retransmiting the ENOMEM chunk, but the ENOMEM
> chunk may not be the chunk from current msg, as it flush all the queue.
> even if users get an ENOMEM error, they may re-send a chunk that is same
> with the one that is still in retransmit queue.

Yep, one more reason to handle those internally when safe.

Marcelo Ricardo Leitner Aug. 16, 2016, 6:45 p.m. UTC | #11

On Tue, Aug 16, 2016 at 03:33:30PM -0300, Marcelo Ricardo Leitner wrote:
> On Wed, Aug 17, 2016 at 02:24:19AM +0800, Xin Long wrote:
> > >> > This err returns back to sctp_sendmsg, there sctp will abort asoc.
> > >
> > > That's not right I think. sctp_sendmsg will only free the asoc if it was
> > > created to send that specific chunk. And in this case, this change
> > > should have no effect as it can't have sctp_outq_flush() touching
> > > several transports in a row.
> > >
> > > I'm basing on:
> > > out_free:
> > >         if (new_asoc)
> > >                 sctp_association_free(asoc);
> > >
> > > and sctp_recvmsg will just fetch, return and clear the error via
> > > sctp_skb_recv_datagram, but not free it.
> > >
> > > Do you see any other place freeing it?
> > Sorry, you are right, it free assoc just for new_asoc.
> > 
> > >
> > >>
> > >> That doesn't seem a good idea.
> > >> You don't want to abort the association if there is a transient
> > >> memory allocation failure.
> > >> You also can't drop data chunks.
> > >
> > > From a system-wise POV, this behavior - to free the new asoc in case of
> > > transient memory allocation failure - doesn't seem bad to me.
> > > That's what will have to happen if any allocation before it failed and
> > > also it helps the system to reduce the stress a little bit. I don't see
> > > any inconsistency/problems here because we are not dropping a single
> > > random chunk but instead we are actually refusing to initialize a new
> > > asoc in such conditions.
> > >
> > > Nevertheless, I agree that letting the application see ENOMEM errors when
> > > the data actually got queued and is being fully handled, as in, it will
> > > be retransmitted later, is not be wise, as the application probably
> > > won't be able to distinguish from ENOMEMs that it should retry or not.
> > > Here I see a problem, yet it's not due to this specific change, perhaps
> > > it just got attention because of it. In this situation, we should handle
> > > ENOMEMs internally if possible so the application can know that if it
> > > hits an ENOMEM, it's real and it has to retry.
> > If  letting the application see ENOMEM errors, and sctp has to drop this
> > chunk, instead of retransmiting the ENOMEM chunk, but the ENOMEM
> > chunk may not be the chunk from current msg, as it flush all the queue.
> > even if users get an ENOMEM error, they may re-send a chunk that is same
> > with the one that is still in retransmit queue.
> 
> Yep, one more reason to handle those internally when safe.

Xin, maybe you can squash this patch and this ENOMEM handling? I'm
thinking that handling ENOMEM may result in similar situations in other
places, so we have a common reasoning on them.

David Laight Aug. 17, 2016, 9:01 a.m. UTC | #12

From: Marcelo Ricardo Leitner
> Sent: 16 August 2016 18:25
...
> > That doesn't seem a good idea.
> > You don't want to abort the association if there is a transient
> > memory allocation failure.
> > You also can't drop data chunks.
> 
> From a system-wise POV, this behavior - to free the new asoc in case of
> transient memory allocation failure - doesn't seem bad to me.
> That's what will have to happen if any allocation before it failed and
> also it helps the system to reduce the stress a little bit. I don't see
> any inconsistency/problems here because we are not dropping a single
> random chunk but instead we are actually refusing to initialize a new
> asoc in such conditions.

Failing a new association should be ok, whether purists will like
connect() failing ENOMEM is another matter.

> Nevertheless, I agree that letting the application see ENOMEM errors when
> the data actually got queued and is being fully handled, as in, it will
> be retransmitted later, is not be wise, as the application probably
> won't be able to distinguish from ENOMEMs that it should retry or not.

I think an application would be justified in thinking that an ENOMEM return
meant that the system call had no effect.

For send() even ENOMEM is really wrong, it should be treated as 'flow control'
and either block or return EAGAIN/EWOULDBLOCK.
Getting POLLOUT set is left as an exercise to the reader :-)

...
> Well, it may be, but we are trying to improve it.  Please continue
> discussing the fixes so we can keep improving it. :)

Indeed, we have customers who use sctp (for M3UA).
We don't do anything 'complicated', but do end up sending a lot of short
data chunks.

	David

Xin Long Aug. 17, 2016, 11:42 a.m. UTC | #13

>> > If  letting the application see ENOMEM errors, and sctp has to drop this
>> > chunk, instead of retransmiting the ENOMEM chunk, but the ENOMEM
>> > chunk may not be the chunk from current msg, as it flush all the queue.
>> > even if users get an ENOMEM error, they may re-send a chunk that is same
>> > with the one that is still in retransmit queue.
>>
>> Yep, one more reason to handle those internally when safe.
I just checked tcp_sendmsg, it doesn't return any transmit error to user,
*NOT ONLY* ENOMEM.  you can check __tcp_push_pending_frames
and tcp_push, their return type is even void. although it may get
err from sk->sk_err:
    err = sk_stream_error(sk, flags, err);
But I didn't see it put any err into sk->sk_err in the main transmit
path.

yes, tcp_write_xmit has return value, as well as tcp_transmit_skb and
err = icsk->icsk_af_ops->queue_xmit(sk, skb, &inet->cork.fl). but
all of them are just used for internal, never return to userspace

In tcp_write_xmit, it even uses "unlikely':
if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp)))
       break;


>
> Xin, maybe you can squash this patch and this ENOMEM handling? I'm
> thinking that handling ENOMEM may result in similar situations in other
> places, so we have a common reasoning on them.
>
So this reason does really matter, and not only for ENOMEM in transmit
path.

Marcelo Ricardo Leitner Aug. 18, 2016, 5:44 p.m. UTC | #14

On Wed, Aug 17, 2016 at 09:01:38AM +0000, David Laight wrote:
> From: Marcelo Ricardo Leitner
> > Sent: 16 August 2016 18:25
> ...
> > > That doesn't seem a good idea.
> > > You don't want to abort the association if there is a transient
> > > memory allocation failure.
> > > You also can't drop data chunks.
> > 
> > From a system-wise POV, this behavior - to free the new asoc in case of
> > transient memory allocation failure - doesn't seem bad to me.
> > That's what will have to happen if any allocation before it failed and
> > also it helps the system to reduce the stress a little bit. I don't see
> > any inconsistency/problems here because we are not dropping a single
> > random chunk but instead we are actually refusing to initialize a new
> > asoc in such conditions.
> 
> Failing a new association should be ok, whether purists will like
> connect() failing ENOMEM is another matter.
> 

Good point.

> > Nevertheless, I agree that letting the application see ENOMEM errors when
> > the data actually got queued and is being fully handled, as in, it will
> > be retransmitted later, is not be wise, as the application probably
> > won't be able to distinguish from ENOMEMs that it should retry or not.
> 
> I think an application would be justified in thinking that an ENOMEM return
> meant that the system call had no effect.
> 

Yep

> For send() even ENOMEM is really wrong, it should be treated as 'flow control'
> and either block or return EAGAIN/EWOULDBLOCK.

Agreed.

> Getting POLLOUT set is left as an exercise to the reader :-)
> 

:-)

> ...
> > Well, it may be, but we are trying to improve it.  Please continue
> > discussing the fixes so we can keep improving it. :)
> 
> Indeed, we have customers who use sctp (for M3UA).
> We don't do anything 'complicated', but do end up sending a lot of short
> data chunks.
> 
> 	David
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

[net] sctp: fix a success return may hide an error

Commit Message

Comments

Patch