diff mbox

[5/5] tcp: skip cwnd moderation in TCP_CA_Open in tcp_try_to_open

Message ID 1321469885-10885-5-git-send-email-ncardwell@google.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Neal Cardwell Nov. 16, 2011, 6:58 p.m. UTC
The problem: Senders were overriding cwnd values picked during an undo
by calling tcp_moderate_cwnd() in tcp_try_to_open().

The fix: Don't moderate cwnd in tcp_try_to_open() if we're in
TCP_CA_Open, since doing so is generally unnecessary and specifically
would override a DSACK-based undo of a cwnd reduction made in fast
recovery.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/tcp_input.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

Comments

Ilpo Järvinen Nov. 17, 2011, 5:14 a.m. UTC | #1
On Wed, 16 Nov 2011, Neal Cardwell wrote:

> The problem: Senders were overriding cwnd values picked during an undo
> by calling tcp_moderate_cwnd() in tcp_try_to_open().

I think it's intentional. Because of receiver lying bandwidth cheats all 
unlimited undos are bit dangerous.

> The fix: Don't moderate cwnd in tcp_try_to_open() if we're in
> TCP_CA_Open, since doing so is generally unnecessary and specifically
> would override a DSACK-based undo of a cwnd reduction made in fast
> recovery.
> 
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> ---
>  net/ipv4/tcp_input.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index a4efdd7..78dd38c 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -2881,7 +2881,8 @@ static void tcp_try_to_open(struct sock *sk, int flag)
>  
>  	if (inet_csk(sk)->icsk_ca_state != TCP_CA_CWR) {
>  		tcp_try_keep_open(sk);
> -		tcp_moderate_cwnd(tp);
> +		if (inet_csk(sk)->icsk_ca_state != TCP_CA_Open)
> +			tcp_moderate_cwnd(tp);
>  	} else {
>  		tcp_cwnd_down(sk, flag);
>  	}

Wouldn't it be enough if tcp max burst is increased to match IW (iirc we 
had 3 still there as a magic number)?
Neal Cardwell Nov. 19, 2011, 9:08 p.m. UTC | #2
On Thu, Nov 17, 2011 at 12:14 AM, Ilpo Järvinen
<ilpo.jarvinen@helsinki.fi> wrote:

> I think it's intentional. Because of receiver lying bandwidth cheats all
> unlimited undos are bit dangerous.

If your concern is receivers lying, then there are other easy ways
that a misbehaving receiver can get a sender to send too fast
(e.g. cumulatively ACKing the highest sequence number seen,
irrespective of holes). IMHO it would be a shame to penalize the vast
majority of well-behaved users to plug one potential attack vector
when there are other easy holes that an attacker would use.

> Wouldn't it be enough if tcp max burst is increased to match IW (iirc we
> had 3 still there as a magic number)?

Yes, tcp_max_burst() returns tp->reordering now. Changing it to return
max(tp->reordering, TCP_INIT_CWND) sounds good to me. I think that's
an excellent idea in any case, regardless of the outcome of this undo
discussion.

However, I don't think this is sufficient for request-response
protocols (e.g. RPC) running over long-lived connections over a path
with a large bandwidth-delay product. In such cases, the cwnd will
grow quite large (hundreds of packets). The DSACKs will often arrive
after the entire response is sent, so that cwnd moderation will
typically pull the cwnd down to 3 (or, with your proposal, 10)
packets. IMHO that seems like an unnecessarily steep price to pay.

neal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ilpo Järvinen Nov. 20, 2011, 8:38 a.m. UTC | #3
On Sat, 19 Nov 2011, Neal Cardwell wrote:

> On Thu, Nov 17, 2011 at 12:14 AM, Ilpo Järvinen
> <ilpo.jarvinen@helsinki.fi> wrote:
> 
> > I think it's intentional. Because of receiver lying bandwidth cheats all
> > unlimited undos are bit dangerous.
> 
> If your concern is receivers lying, then there are other easy ways
> that a misbehaving receiver can get a sender to send too fast
> (e.g. cumulatively ACKing the highest sequence number seen,
> irrespective of holes). IMHO it would be a shame to penalize the vast
> majority of well-behaved users to plug one potential attack vector
> when there are other easy holes that an attacker would use.

I disagree... there is huge difference as the receiver then has to gamble 
with the reliability of the flow. DSACK attack is nasty in that that it
allows maintaining reliability while cheating bw.

> > Wouldn't it be enough if tcp max burst is increased to match IW (iirc we
> > had 3 still there as a magic number)?
> 
> Yes, tcp_max_burst() returns tp->reordering now. Changing it to return
> max(tp->reordering, TCP_INIT_CWND) sounds good to me. I think that's
> an excellent idea in any case, regardless of the outcome of this undo
> discussion.

Feel free to provide a patch then as I'm likely to forget :).

> However, I don't think this is sufficient for request-response
> protocols (e.g. RPC) running over long-lived connections over a path
> with a large bandwidth-delay product. In such cases, the cwnd will
> grow quite large (hundreds of packets). The DSACKs will often arrive
> after the entire response is sent, so that cwnd moderation will
> typically pull the cwnd down to 3 (or, with your proposal, 10)
> packets. IMHO that seems like an unnecessarily steep price to pay.

Sounds rather hypotetical scenario that you'd happen to get those DSACKs 
in such scenario... DSACK would only be sent when receiver and sender were 
out of sync due to heavy ACK loss that exceeds SACK redundancy or heavy 
reordering... the later prevents large window in the first place (or 
reordering has already adapted). Honestly I don't believe this is likely 
scenario. ...And FRTO handles single unnecesary rexmit in case of spurious 
RTO even before DSACK could be seen.
Alexander Zimmermann Nov. 20, 2011, 10:19 a.m. UTC | #4
Hi Neal,

Am 19.11.2011 um 22:08 schrieb Neal Cardwell:

> On Thu, Nov 17, 2011 at 12:14 AM, Ilpo Järvinen
> <ilpo.jarvinen@helsinki.fi> wrote:
> 
>> I think it's intentional. Because of receiver lying bandwidth cheats all
>> unlimited undos are bit dangerous.
> 
> If your concern is receivers lying, then there are other easy ways
> that a misbehaving receiver can get a sender to send too fast
> (e.g. cumulatively ACKing the highest sequence number seen,
> irrespective of holes). IMHO it would be a shame to penalize the vast
> majority of well-behaved users to plug one potential attack vector
> when there are other easy holes that an attacker would use.
> 
>> Wouldn't it be enough if tcp max burst is increased to match IW (iirc we
>> had 3 still there as a magic number)?
> 
> Yes, tcp_max_burst() returns tp->reordering now. Changing it to return
> max(tp->reordering, TCP_INIT_CWND) sounds good to me. I think that's
> an excellent idea in any case, regardless of the outcome of this undo
> discussion.

For my Phd thesis - based on the Linux reordering detection we implemented
an adaptive Version of TCP-NCR (RFC 4653) - we change tcp_max_burst() to
return TCP_INIT_CWND. If the reordering delay is high, tp->reordering is
high as well, allowing to send a huge burst. In 2008 John Heffner* changed
tcp_max_burst() to return tp->reordering according to problem for reordering
on the reverse path. However, I never was be able to reproduce old problem...
Due to my experience I recommend to use TCP_INIT_CWND directly.

(*)http://git.kernel.org/linus/dd9e0dda66ba38a2ddd140 5ac279894260dc5c36.

> 
> However, I don't think this is sufficient for request-response
> protocols (e.g. RPC) running over long-lived connections over a path
> with a large bandwidth-delay product. In such cases, the cwnd will
> grow quite large (hundreds of packets). The DSACKs will often arrive
> after the entire response is sent, so that cwnd moderation will
> typically pull the cwnd down to 3 (or, with your proposal, 10)
> packets. IMHO that seems like an unnecessarily steep price to pay.
> 
> neal
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

//
// Dipl.-Inform. Alexander Zimmermann
// Department of Computer Science, Informatik 4
// RWTH Aachen University
// Ahornstr. 55, 52056 Aachen, Germany
// phone: (49-241) 80-21422, fax: (49-241) 80-22222
// email: zimmermann@cs.rwth-aachen.de
// web: http://www.umic-mesh.net
//
Yuchung Cheng Nov. 21, 2011, 10:57 p.m. UTC | #5
On Sun, Nov 20, 2011 at 12:38 AM, Ilpo Järvinen
<ilpo.jarvinen@helsinki.fi> wrote:
> On Sat, 19 Nov 2011, Neal Cardwell wrote:
>> If your concern is receivers lying, then there are other easy ways
>> that a misbehaving receiver can get a sender to send too fast
>> (e.g. cumulatively ACKing the highest sequence number seen,
>> irrespective of holes). IMHO it would be a shame to penalize the vast
>> majority of well-behaved users to plug one potential attack vector
>> when there are other easy holes that an attacker would use.
>
> I disagree... there is huge difference as the receiver then has to gamble
> with the reliability of the flow. DSACK attack is nasty in that that it
> allows maintaining reliability while cheating bw.
>

I feel the original change Neal is proposing is getting lost. This
patch proposes when

a. ack is dubious
b. but it's not time to recover yet
c. and other checks in tcp_try_keep_open affirms the network is in
good condition, i.e, stat == Open
3. do not moderate cwnd (not 3, not 10 or any magic number).

Your concern is valid, but if that's true for majority then I argue
all undo logic on dsack or TS are causing major threats since they all
require some trusts of the remote end. Why even bother processing
DSACK if we can't trust them?

AFAIK cwnd moderation is to lower bursty drops not to discourage
(dsack) cheating. I believe the reason is the same in
tcp_try_to_open(). We are in common cases, e.g., loss-recovery, this
logic hurts performance.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ilpo Järvinen Nov. 22, 2011, 11:47 a.m. UTC | #6
On Mon, 21 Nov 2011, Yuchung Cheng wrote:

> On Sun, Nov 20, 2011 at 12:38 AM, Ilpo Järvinen
> <ilpo.jarvinen@helsinki.fi> wrote:
> > On Sat, 19 Nov 2011, Neal Cardwell wrote:
> >> If your concern is receivers lying, then there are other easy ways
> >> that a misbehaving receiver can get a sender to send too fast
> >> (e.g. cumulatively ACKing the highest sequence number seen,
> >> irrespective of holes). IMHO it would be a shame to penalize the vast
> >> majority of well-behaved users to plug one potential attack vector
> >> when there are other easy holes that an attacker would use.
> >
> > I disagree... there is huge difference as the receiver then has to gamble
> > with the reliability of the flow. DSACK attack is nasty in that that it
> > allows maintaining reliability while cheating bw.
> 
> I feel the original change Neal is proposing is getting lost. This patch 
> proposes when
> 
> a. ack is dubious
> b. but it's not time to recover yet
> c. and other checks in tcp_try_keep_open affirms the network is in
> good condition, i.e, stat == Open

Presence of DSACK certainly does disagree with your condition c (which 
was the purpose of this change? Though it could be present in a later 
segment). Network was in such a bad condition that DSACKs were needed in 
the first place.

> 3. do not moderate cwnd (not 3, not 10 or any magic number).
> 
> Your concern is valid, but if that's true for majority then I argue
> all undo logic on dsack or TS are causing major threats since they all
> require some trusts of the remote end. Why even bother processing
> DSACK if we can't trust them?

Right, it seems that DSACK RFCs (RFC 3708 mainly) don't care too much 
about the cheating possibility, just states that it is possible and 
that a non-available nonce solution would help. So yes, I myself find 
DSACK quite useless, albeit fancy, mechanism (there are some other uses 
not related to undo mentioned though in the RFC but I've not spent too 
much thinking the details).

As regards TS, we don't trust them fully either (see e.g., some CUBIC
related change from Stephen Hemminger couple of years ago).

> AFAIK cwnd moderation is to lower bursty drops not to discourage
> (dsack) cheating. I believe the reason is the same in
> tcp_try_to_open(). We are in common cases, e.g., loss-recovery, this
> logic hurts performance.

Quote from the patch description: "Senders were overriding cwnd values 
picked during an undo by calling tcp_moderate_cwnd()" ...so after all it 
has to do with undo being limited. IMHO only up to orig_cwnd/2+IW is safe 
due to cheating opportunities. Also FRTO uses orig_cwnd/2 due to same 
reason (it could do the +IW too but IIRC it is only /2 currently). What 
would be the safeguard there after this one is removed? I kind of see your 
point about this particular line being related to burst mitigation but on 
the same time the end result of removal is that undo becomes potentially 
much more aggressive.
David Miller Nov. 28, 2011, 12:01 a.m. UTC | #7
From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Tue, 22 Nov 2011 13:47:13 +0200 (EET)

> On Mon, 21 Nov 2011, Yuchung Cheng wrote:
> 
>> AFAIK cwnd moderation is to lower bursty drops not to discourage
>> (dsack) cheating. I believe the reason is the same in
>> tcp_try_to_open(). We are in common cases, e.g., loss-recovery, this
>> logic hurts performance.
> 
> Quote from the patch description: "Senders were overriding cwnd values 
> picked during an undo by calling tcp_moderate_cwnd()" ...so after all it 
> has to do with undo being limited. IMHO only up to orig_cwnd/2+IW is safe 
> due to cheating opportunities. Also FRTO uses orig_cwnd/2 due to same 
> reason (it could do the +IW too but IIRC it is only /2 currently). What 
> would be the safeguard there after this one is removed? I kind of see your 
> point about this particular line being related to burst mitigation but on 
> the same time the end result of removal is that undo becomes potentially 
> much more aggressive.

I'm apply this patch to net-next now, but Neil and Yucheng are on the hook
to fully look into the issues Ilpo has raised.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neal Cardwell Nov. 28, 2011, 5:35 p.m. UTC | #8
On Sun, Nov 27, 2011 at 7:01 PM, David Miller <davem@davemloft.net> wrote:
> From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
> Date: Tue, 22 Nov 2011 13:47:13 +0200 (EET)
>
>> On Mon, 21 Nov 2011, Yuchung Cheng wrote:
>>
>>> AFAIK cwnd moderation is to lower bursty drops not to discourage
>>> (dsack) cheating. I believe the reason is the same in
>>> tcp_try_to_open(). We are in common cases, e.g., loss-recovery, this
>>> logic hurts performance.
>>
>> Quote from the patch description: "Senders were overriding cwnd values
>> picked during an undo by calling tcp_moderate_cwnd()" ...so after all it
>> has to do with undo being limited. IMHO only up to orig_cwnd/2+IW is safe
>> due to cheating opportunities. Also FRTO uses orig_cwnd/2 due to same
>> reason (it could do the +IW too but IIRC it is only /2 currently). What
>> would be the safeguard there after this one is removed? I kind of see your
>> point about this particular line being related to burst mitigation but on
>> the same time the end result of removal is that undo becomes potentially
>> much more aggressive.
>
> I'm apply this patch to net-next now, but Neil and Yucheng are on the hook
> to fully look into the issues Ilpo has raised.

Thanks! We will spend some time looking into these issues.

neal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yuchung Cheng Nov. 28, 2011, 8:33 p.m. UTC | #9
On Mon, Nov 28, 2011 at 9:35 AM, Neal Cardwell <ncardwell@google.com> wrote:
> On Sun, Nov 27, 2011 at 7:01 PM, David Miller <davem@davemloft.net> wrote:
>> I'm apply this patch to net-next now, but Neil and Yucheng are on the hook
>> to fully look into the issues Ilpo has raised.
>
> Thanks! We will spend some time looking into these issues.
>
Ilpo do you know any report on these types of cheating? I did some
literature search but didn't find any.

If the receiver simply responds DSACK on good (non-spurious)
retransmits, the sender may increase dupthresh (tp->reordering) and
slows down triggering fast-recovery? given the connection has real
losses, the receiver may end up shooting his feet.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ilpo Järvinen Nov. 29, 2011, 10:33 a.m. UTC | #10
On Mon, 28 Nov 2011, Yuchung Cheng wrote:

> On Mon, Nov 28, 2011 at 9:35 AM, Neal Cardwell <ncardwell@google.com> wrote:
> > On Sun, Nov 27, 2011 at 7:01 PM, David Miller <davem@davemloft.net> wrote:
> >> I'm apply this patch to net-next now, but Neil and Yucheng are on the hook
> >> to fully look into the issues Ilpo has raised.
> >
> > Thanks! We will spend some time looking into these issues.
> >
> Ilpo do you know any report on these types of cheating? I did some
> literature search but didn't find any.

No I know of, however, I'm not too surprised as so far such attacks have 
lacked a target at least among Linux endhosts. It is only possible to gain 
something now after you have one by one removed all safeguards which 
prevent such misuse :-). This possibility is certainly known to exists 
(even RFC3708 mentions it).

Besides, DSACK adoption, not to speak of adoption of any DSACK undo 
algorithm, is not that high as with SACK alone iirc.

> If the receiver simply responds DSACK on good (non-spurious)
> retransmits, the sender may increase dupthresh (tp->reordering) and
> slows down triggering fast-recovery? given the connection has real
> losses, the receiver may end up shooting his feet.

For some attack variants, yes. However, it is always "safe" to attack 
the current Linux sender with something that is past the acknowledgment 
number, there's only dummy counting done for that (trivial to arrange). If 
the attacker plays safe, no tp->reordering effect is ever imposed but 
attack eventually succeeds to hit the right number of undo_retrans no 
matter what if one keeps trying. This is what you just asked us to 
enable, unlimited attack window! Perhaps we could add a penalty of 
misguessing the number of rexmits but knowing the right number is not that 
hard anyway (and some caveats exist in such approach though to those who 
are hit by HW/path duping).


Btw, another thing that should be auditted now that this was applied: 
check that undo_marker checks do not behave bad due to 32-bit seqno 
wrap-arounds since its needs-to-be-valid period got just extented way 
beyond a single loss event.
diff mbox

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a4efdd7..78dd38c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2881,7 +2881,8 @@  static void tcp_try_to_open(struct sock *sk, int flag)
 
 	if (inet_csk(sk)->icsk_ca_state != TCP_CA_CWR) {
 		tcp_try_keep_open(sk);
-		tcp_moderate_cwnd(tp);
+		if (inet_csk(sk)->icsk_ca_state != TCP_CA_Open)
+			tcp_moderate_cwnd(tp);
 	} else {
 		tcp_cwnd_down(sk, flag);
 	}