Message ID | 1314229310-8074-1-git-send-email-hagen@jauu.net |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
This should do the trick Eric, Ilpo? Hagen -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 24, 2011 at 4:41 PM, Hagen Paul Pfeifer <hagen@jauu.net> wrote: > Check if calculated RTO is less then TCP_RTO_MIN. If this is true we > adjust the value to TCP_RTO_MIN. > but tp->rttvar is already lower-bounded via tcp_rto_min()? static inline void tcp_set_rto(struct sock *sk) { ... /* NOTE: clamping at TCP_RTO_MIN is not required, current algo * guarantees that rto is higher. */ tcp_bound_rto(sk); } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 24 août 2011 à 18:50 -0700, Yuchung Cheng a écrit : > On Wed, Aug 24, 2011 at 4:41 PM, Hagen Paul Pfeifer <hagen@jauu.net> wrote: > > Check if calculated RTO is less then TCP_RTO_MIN. If this is true we > > adjust the value to TCP_RTO_MIN. > > > but tp->rttvar is already lower-bounded via tcp_rto_min()? > > static inline void tcp_set_rto(struct sock *sk) > { > ... > > /* NOTE: clamping at TCP_RTO_MIN is not required, current algo > * guarantees that rto is higher. > */ > tcp_bound_rto(sk); > } Yes, and furthermore, we also limit ICMP rate, so in in my tests, I reach in a few rounds icsk_rto > 1sec 07:16:13.010633 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 3833540215:3833540263(48) ack 2593537670 win 305 07:16:13.221111 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 07:16:13.661151 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 07:16:14.541153 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 07:16:16.301152 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 <from this point, icsk_rto=1.76sec > 07:16:18.061158 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 07:16:19.821158 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 07:16:21.581018 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 07:16:23.341156 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 07:16:25.101151 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 07:16:26.861155 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 07:16:28.621158 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 07:16:30.381152 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 07:16:32.141157 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 Real question is : do we really want to process ~1000 timer interrupts per tcp session, ~2000 skb alloc/free/build/handling, possibly ~1000 ARP requests, only to make tcp revover in ~1sec when connectivity returns back. This just doesnt scale. On a server handling ~1.000.000 (long living) sessions, using application side keepalives (say one message sent every minute on each session), a temporary connectivity disruption _could_ makes it enter a critical zone, burning cpu and memory. It seems TCP-LCD (RFC6069) depends very much on ICMP being rate limited. I'll have to check what happens on multiple sessions : We might have cpus fighting on a single inetpeer and throtle, thus allowing backoff to increase after all. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Eric, Am 25.08.2011 um 07:28 schrieb Eric Dumazet: > Le mercredi 24 août 2011 à 18:50 -0700, Yuchung Cheng a écrit : >> On Wed, Aug 24, 2011 at 4:41 PM, Hagen Paul Pfeifer <hagen@jauu.net> wrote: >>> Check if calculated RTO is less then TCP_RTO_MIN. If this is true we >>> adjust the value to TCP_RTO_MIN. >>> >> but tp->rttvar is already lower-bounded via tcp_rto_min()? >> >> static inline void tcp_set_rto(struct sock *sk) >> { >> ... >> >> /* NOTE: clamping at TCP_RTO_MIN is not required, current algo >> * guarantees that rto is higher. >> */ >> tcp_bound_rto(sk); >> } > > Yes, and furthermore, we also limit ICMP rate, so in in my tests, I > reach in a few rounds icsk_rto > 1sec > > 07:16:13.010633 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 3833540215:3833540263(48) ack 2593537670 win 305 > 07:16:13.221111 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 > 07:16:13.661151 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 > 07:16:14.541153 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 > 07:16:16.301152 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 > <from this point, icsk_rto=1.76sec > > 07:16:18.061158 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 > 07:16:19.821158 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 > 07:16:21.581018 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 > 07:16:23.341156 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 > 07:16:25.101151 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 > 07:16:26.861155 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 > 07:16:28.621158 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 > 07:16:30.381152 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 > 07:16:32.141157 IP 10.2.1.2.59352 > 10.2.1.1.ssh: P 0:48(48) ack 1 win 305 > > Real question is : do we really want to process ~1000 timer interrupts > per tcp session, ~2000 skb alloc/free/build/handling, possibly ~1000 ARP > requests, only to make tcp revover in ~1sec when connectivity returns > back. This just doesnt scale. maybe a stupid question, but 1000?. With an minRTO of 200ms and a maximum probing time of 120s, we 600 retransmits in a worst-case-senario (assumed that we get for every rot retransmission an icmp). No? > > On a server handling ~1.000.000 (long living) sessions, using > application side keepalives (say one message sent every minute on each > session), a temporary connectivity disruption _could_ makes it enter a > critical zone, burning cpu and memory. > > It seems TCP-LCD (RFC6069) depends very much on ICMP being rate limited. This is right. We assume that a server/router sends only icmps when they have free cycles. > > I'll have to check what happens on multiple sessions : We might have > cpus fighting on a single inetpeer and throtle, thus allowing backoff to > increase after all. > > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html // // Dipl.-Inform. Alexander Zimmermann // Department of Computer Science, Informatik 4 // RWTH Aachen University // Ahornstr. 55, 52056 Aachen, Germany // phone: (49-241) 80-21422, fax: (49-241) 80-22222 // email: zimmermann@cs.rwth-aachen.de // web: http://www.umic-mesh.net // -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 25 août 2011 à 09:28 +0200, Alexander Zimmermann a écrit : > Hi Eric, > > Am 25.08.2011 um 07:28 schrieb Eric Dumazet: > > Real question is : do we really want to process ~1000 timer interrupts > > per tcp session, ~2000 skb alloc/free/build/handling, possibly ~1000 ARP > > requests, only to make tcp revover in ~1sec when connectivity returns > > back. This just doesnt scale. > > maybe a stupid question, but 1000?. With an minRTO of 200ms and a maximum > probing time of 120s, we 600 retransmits in a worst-case-senario > (assumed that we get for every rot retransmission an icmp). No? Where is asserted the "max probing time of 120s" ? It is not the case on my machine : I have way more retransmits than that, even if spaced by 1600 ms 07:16:13.389331 write(3, "\350F\235JC\357\376\363&\3\374\270R\21L\26\324{\37p\342\244i\304\356\241I:\301\332\222\26"..., 48) = 48 07:16:13.389417 select(7, [3 4], [], NULL, NULL) = 1 (in [3]) 07:31:39.901311 read(3, 0xff8c4c90, 8192) = -1 EHOSTUNREACH (No route to host) Old kernels where performing up to 15 retries, doing exponential backoff. Now its kind of unlimited, according to experimental results. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 25.08.2011 um 10:26 schrieb Eric Dumazet: > Le jeudi 25 août 2011 à 09:28 +0200, Alexander Zimmermann a écrit : >> Hi Eric, >> >> Am 25.08.2011 um 07:28 schrieb Eric Dumazet: > >>> Real question is : do we really want to process ~1000 timer interrupts >>> per tcp session, ~2000 skb alloc/free/build/handling, possibly ~1000 ARP >>> requests, only to make tcp revover in ~1sec when connectivity returns >>> back. This just doesnt scale. >> >> maybe a stupid question, but 1000?. With an minRTO of 200ms and a maximum >> probing time of 120s, we 600 retransmits in a worst-case-senario >> (assumed that we get for every rot retransmission an icmp). No? > > Where is asserted the "max probing time of 120s" ? > > It is not the case on my machine : > I have way more retransmits than that, even if spaced by 1600 ms > > 07:16:13.389331 write(3, "\350F\235JC\357\376\363&\3\374\270R\21L\26\324{\37p\342\244i\304\356\241I:\301\332\222\26"..., 48) = 48 > 07:16:13.389417 select(7, [3 4], [], NULL, NULL) = 1 (in [3]) > 07:31:39.901311 read(3, 0xff8c4c90, 8192) = -1 EHOSTUNREACH (No route to host) > > Old kernels where performing up to 15 retries, doing exponential backoff. Yes I know. And in combination with RFC6069 we have to convert this See Section 7.1 and http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6fa12c85031485dff38ce550c24f10da23b0adaa Is the transformation broken? Damian? > > Now its kind of unlimited, according to experimental results. Ok, unlimited is not what I expect... > > > // // Dipl.-Inform. Alexander Zimmermann // Department of Computer Science, Informatik 4 // RWTH Aachen University // Ahornstr. 55, 52056 Aachen, Germany // phone: (49-241) 80-21422, fax: (49-241) 80-22222 // email: zimmermann@cs.rwth-aachen.de // web: http://www.umic-mesh.net // -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Am 25.08.2011 10:26, schrieb Eric Dumazet: > Le jeudi 25 août 2011 à 09:28 +0200, Alexander Zimmermann a écrit : >> Hi Eric, >> >> Am 25.08.2011 um 07:28 schrieb Eric Dumazet: > >>> Real question is : do we really want to process ~1000 timer interrupts >>> per tcp session, ~2000 skb alloc/free/build/handling, possibly ~1000 ARP >>> requests, only to make tcp revover in ~1sec when connectivity returns >>> back. This just doesnt scale. >> >> maybe a stupid question, but 1000?. With an minRTO of 200ms and a maximum >> probing time of 120s, we 600 retransmits in a worst-case-senario >> (assumed that we get for every rot retransmission an icmp). No? > > Where is asserted the "max probing time of 120s" ? > > It is not the case on my machine : > I have way more retransmits than that, even if spaced by 1600 ms > > 07:16:13.389331 write(3, "\350F\235JC\357\376\363&\3\374\270R\21L\26\324{\37p\342\244i\304\356\241I:\301\332\222\26"..., 48) = 48 > 07:16:13.389417 select(7, [3 4], [], NULL, NULL) = 1 (in [3]) > 07:31:39.901311 read(3, 0xff8c4c90, 8192) = -1 EHOSTUNREACH (No route to host) > > Old kernels where performing up to 15 retries, doing exponential backoff. > > Now its kind of unlimited, according to experimental results. That shouldn't be. It should stop after the same time a TCP connection with an RTO of Minimum RTO which is doing 15 retries (tcp_retries2=15) and doing exponential backoff. So it should be around 900s*. But it could be that because of the icsk_retransmit wrapover this doesn't work as expected. * 200ms + 400ms + 800ms ... Best regards, Arnd -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 25 août 2011 à 10:46 +0200, Arnd Hannemann a écrit : > Hi, > > Am 25.08.2011 10:26, schrieb Eric Dumazet: > > Le jeudi 25 août 2011 à 09:28 +0200, Alexander Zimmermann a écrit : > >> Hi Eric, > >> > >> Am 25.08.2011 um 07:28 schrieb Eric Dumazet: > > > >>> Real question is : do we really want to process ~1000 timer interrupts > >>> per tcp session, ~2000 skb alloc/free/build/handling, possibly ~1000 ARP > >>> requests, only to make tcp revover in ~1sec when connectivity returns > >>> back. This just doesnt scale. > >> > >> maybe a stupid question, but 1000?. With an minRTO of 200ms and a maximum > >> probing time of 120s, we 600 retransmits in a worst-case-senario > >> (assumed that we get for every rot retransmission an icmp). No? > > > > Where is asserted the "max probing time of 120s" ? > > > > It is not the case on my machine : > > I have way more retransmits than that, even if spaced by 1600 ms > > > > 07:16:13.389331 write(3, "\350F\235JC\357\376\363&\3\374\270R\21L\26\324{\37p\342\244i\304\356\241I:\301\332\222\26"..., 48) = 48 > > 07:16:13.389417 select(7, [3 4], [], NULL, NULL) = 1 (in [3]) > > 07:31:39.901311 read(3, 0xff8c4c90, 8192) = -1 EHOSTUNREACH (No route to host) > > > > Old kernels where performing up to 15 retries, doing exponential backoff. > > > > Now its kind of unlimited, according to experimental results. > > That shouldn't be. It should stop after the same time a TCP connection with an > RTO of Minimum RTO which is doing 15 retries (tcp_retries2=15) and doing exponential backoff. > So it should be around 900s*. But it could be that because of the icsk_retransmit wrapover > this doesn't work as expected. > > * 200ms + 400ms + 800ms ... It is 924 second with retries2=15 (default value) I said ~1000 probes. If ICMP are not rate limited, that could be about 924*5 probes, instead of 15 probes on old kernels. Maybe we should refine the thing a bit, to not reverse backoff unless rto is > some_threshold. Say 10s being the value, that would give at most 92 tries. I mean, what is the gain to be able to restart a frozen TCP session with a 1sec latency instead of 10s if it was blocked more than 60 seconds ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Eric, Am 25.08.2011 11:09, schrieb Eric Dumazet: > Le jeudi 25 août 2011 à 10:46 +0200, Arnd Hannemann a écrit : >> Am 25.08.2011 10:26, schrieb Eric Dumazet: >>> Le jeudi 25 août 2011 à 09:28 +0200, Alexander Zimmermann a écrit : >>>> Am 25.08.2011 um 07:28 schrieb Eric Dumazet: >>> >>>>> Real question is : do we really want to process ~1000 timer interrupts >>>>> per tcp session, ~2000 skb alloc/free/build/handling, possibly ~1000 ARP >>>>> requests, only to make tcp revover in ~1sec when connectivity returns >>>>> back. This just doesnt scale. >>>> >>>> maybe a stupid question, but 1000?. With an minRTO of 200ms and a maximum >>>> probing time of 120s, we 600 retransmits in a worst-case-senario >>>> (assumed that we get for every rot retransmission an icmp). No? >>> >>> Where is asserted the "max probing time of 120s" ? >>> >>> It is not the case on my machine : >>> I have way more retransmits than that, even if spaced by 1600 ms >>> >>> 07:16:13.389331 write(3, "\350F\235JC\357\376\363&\3\374\270R\21L\26\324{\37p\342\244i\304\356\241I:\301\332\222\26"..., 48) = 48 >>> 07:16:13.389417 select(7, [3 4], [], NULL, NULL) = 1 (in [3]) >>> 07:31:39.901311 read(3, 0xff8c4c90, 8192) = -1 EHOSTUNREACH (No route to host) >>> >>> Old kernels where performing up to 15 retries, doing exponential backoff. >>> >>> Now its kind of unlimited, according to experimental results. >> >> That shouldn't be. It should stop after the same time a TCP connection with an >> RTO of Minimum RTO which is doing 15 retries (tcp_retries2=15) and doing exponential backoff. >> So it should be around 900s*. But it could be that because of the icsk_retransmit wrapover >> this doesn't work as expected. >> >> * 200ms + 400ms + 800ms ... > > It is 924 second with retries2=15 (default value) > > I said ~1000 probes. > > If ICMP are not rate limited, that could be about 924*5 probes, instead > of 15 probes on old kernels. At a rate of 5 packets/s if RTT is zero, yes. I would like to say: so what? But your example with millions of idle connections stands. > Maybe we should refine the thing a bit, to not reverse backoff unless > rto is > some_threshold. > > Say 10s being the value, that would give at most 92 tries. I personally think that 10s would be too large and eliminate the benefit of the algorithm, so I would prefer a different solution. In case of one bulk data TCP session, which was transmitting hundreds of packets/s before the connectivity disruption those worst case rate of 5 packet/s really seems conservative enough. However in case of a lot of idle connections, which were transmitting only a number of packets per minute. We might increase the rate drastically for a certain period until it throttles down. You say that we have a problem here correct? Do you think it would be possible without much hassle to use a kind of "global" rate limiting only for these probe packets of a TCP connection? > I mean, what is the gain to be able to restart a frozen TCP session with > a 1sec latency instead of 10s if it was blocked more than 60 seconds ? I'm afraid it does a lot, especially in highly dynamic environments. You don't have just the additional latency, you may actually miss the full period where connectivity was there, and then just retransmit into the next connectivity disrupted period. Best regards, Arnd -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 25 août 2011 à 11:46 +0200, Arnd Hannemann a écrit : > Hi Eric, > > Am 25.08.2011 11:09, schrieb Eric Dumazet: > > Maybe we should refine the thing a bit, to not reverse backoff unless > > rto is > some_threshold. > > > > Say 10s being the value, that would give at most 92 tries. > > I personally think that 10s would be too large and eliminate the benefit of the > algorithm, so I would prefer a different solution. > > In case of one bulk data TCP session, which was transmitting hundreds of packets/s > before the connectivity disruption those worst case rate of 5 packet/s really > seems conservative enough. > > However in case of a lot of idle connections, which were transmitting only > a number of packets per minute. We might increase the rate drastically for > a certain period until it throttles down. You say that we have a problem here > correct? > > Do you think it would be possible without much hassle to use a kind of "global" > rate limiting only for these probe packets of a TCP connection? > > > I mean, what is the gain to be able to restart a frozen TCP session with > > a 1sec latency instead of 10s if it was blocked more than 60 seconds ? > > I'm afraid it does a lot, especially in highly dynamic environments. You > don't have just the additional latency, you may actually miss the full > period where connectivity was there, and then just retransmit into the next > connectivity disrupted period. Problem with this is that with short and synchronized timers, all sessions will flood at the same time and you'll get congestion this time. The reason for exponential backoff is also to smooth the restarts of sessions, because timers are randomized. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 25 Aug 2011, Eric Dumazet wrote: > Le jeudi 25 août 2011 à 11:46 +0200, Arnd Hannemann a écrit : > > Hi Eric, > > > > Am 25.08.2011 11:09, schrieb Eric Dumazet: > > > > Maybe we should refine the thing a bit, to not reverse backoff unless > > > rto is > some_threshold. > > > > > > Say 10s being the value, that would give at most 92 tries. > > > > I personally think that 10s would be too large and eliminate the benefit of the > > algorithm, so I would prefer a different solution. > > > > In case of one bulk data TCP session, which was transmitting hundreds of packets/s > > before the connectivity disruption those worst case rate of 5 packet/s really > > seems conservative enough. > > > > However in case of a lot of idle connections, which were transmitting only > > a number of packets per minute. We might increase the rate drastically for > > a certain period until it throttles down. You say that we have a problem here > > correct? > > > > Do you think it would be possible without much hassle to use a kind of > > "global" rate limiting only for these probe packets of a TCP connection? > > > > > I mean, what is the gain to be able to restart a frozen TCP session with > > > a 1sec latency instead of 10s if it was blocked more than 60 seconds ? > > > > I'm afraid it does a lot, especially in highly dynamic environments. You > > don't have just the additional latency, you may actually miss the full > > period where connectivity was there, and then just retransmit into the next > > connectivity disrupted period. > > Problem with this is that with short and synchronized timers, all > sessions will flood at the same time and you'll get congestion this > time. > > The reason for exponential backoff is also to smooth the restarts of > sessions, because timers are randomized. But if you get a real congestion the system will self-regulate using exponential backoffs due to lack of ICMPs for some of the connections?
Hi Eric, Am 25.08.2011 12:02, schrieb Eric Dumazet: > Le jeudi 25 août 2011 à 11:46 +0200, Arnd Hannemann a écrit : >> Hi Eric, >> >> Am 25.08.2011 11:09, schrieb Eric Dumazet: > >>> Maybe we should refine the thing a bit, to not reverse backoff unless >>> rto is > some_threshold. >>> >>> Say 10s being the value, that would give at most 92 tries. >> >> I personally think that 10s would be too large and eliminate the benefit of the >> algorithm, so I would prefer a different solution. >> >> In case of one bulk data TCP session, which was transmitting hundreds of packets/s >> before the connectivity disruption those worst case rate of 5 packet/s really >> seems conservative enough. >> >> However in case of a lot of idle connections, which were transmitting only >> a number of packets per minute. We might increase the rate drastically for >> a certain period until it throttles down. You say that we have a problem here >> correct? >> >> Do you think it would be possible without much hassle to use a kind of "global" >> rate limiting only for these probe packets of a TCP connection? >> >>> I mean, what is the gain to be able to restart a frozen TCP session with >>> a 1sec latency instead of 10s if it was blocked more than 60 seconds ? >> >> I'm afraid it does a lot, especially in highly dynamic environments. You >> don't have just the additional latency, you may actually miss the full >> period where connectivity was there, and then just retransmit into the next >> connectivity disrupted period. > > Problem with this is that with short and synchronized timers, all > sessions will flood at the same time and you'll get congestion this > time. Why do you think the timers are "syncronized"? If you have congestion then you will do exponential backoff. > The reason for exponential backoff is also to smooth the restarts of > sessions, because timers are randomized. If the RTO of these sessions were "randomized" they keep this randomization, even if backoffs are reverted, at least they should. Best regards Arnd -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/tcp.h b/include/net/tcp.h index 149a415..9b5f4bf 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -520,6 +520,8 @@ static inline void tcp_bound_rto(const struct sock *sk) { if (inet_csk(sk)->icsk_rto > TCP_RTO_MAX) inet_csk(sk)->icsk_rto = TCP_RTO_MAX; + else if (inet_csk(sk)->icsk_rto < TCP_RTO_MIN) + inet_csk(sk)->icsk_rto = TCP_RTO_MIN; } static inline u32 __tcp_set_rto(const struct tcp_sock *tp)
Check if calculated RTO is less then TCP_RTO_MIN. If this is true we adjust the value to TCP_RTO_MIN. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> --- include/net/tcp.h | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-)