diff mbox

sctp: avoid refreshing heartbeat timer too often

Message ID 1459258897-21607-1-git-send-email-marcelo.leitner@gmail.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Marcelo Ricardo Leitner March 29, 2016, 1:41 p.m. UTC
Currently on high rate SCTP streams the heartbeat timer refresh can
consume quite a lot of resources as timer updates are costly and it
contains a random factor, which a) is also costly and b) invalidates
mod_timer() optimization for not editing a timer to the same value.
It may even cause the timer to be slightly advanced, for no good reason.

There are 3 places that call sctp_transport_reset_timers(), including
one that calls it for every single chunk added to a packet on
sctp_outq_flush(), even if they are bundled. As there is the possibility
of working on several transports, it's not trivial to just post-poned
this call as it would require to at least remember which transports were
used.

This change then moves the random factor to outside
sctp_transport_timeout(), allowing sctp_transport_reset_timers() to
check if a timer update is really necessary. For that, the random factor
is considered 0. If timer expires is still after it, the update is not
necessary as it's a possible random value and there is no
security/functional loss in doing so.

On loopback with MTU of 65535 and data chunks with 1636, so that we
have a considerable amount of chunks without stressing system calls,
netperf -t SCTP_STREAM -l 30, perf looked like this before:

Samples: 103K of event 'cpu-clock', Event count (approx.): 25833000000
  Overhead  Command  Shared Object      Symbol
+    6,15%  netperf  [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
-    5,43%  netperf  [kernel.vmlinux]   [k] _raw_write_unlock_irqrestore
   - _raw_write_unlock_irqrestore
      - 96,54% _raw_spin_unlock_irqrestore
         - 36,14% mod_timer
            + 97,24% sctp_transport_reset_timers
            + 2,76% sctp_do_sm
         + 33,65% __wake_up_sync_key
         + 28,77% sctp_ulpq_tail_event
         + 1,40% del_timer
      - 1,84% mod_timer
         + 99,03% sctp_transport_reset_timers
         + 0,97% sctp_do_sm
      + 1,50% sctp_ulpq_tail_event

And after this patch:

Samples: 120K of event 'cpu-clock', Event count (approx.): 30002250000
  Overhead  Command  Shared Object      Symbol
+    7,34%  netperf  [kernel.vmlinux]   [k] memcpy_erms
+    6,67%  netperf  [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
-    5,34%  netperf  [kernel.vmlinux]   [k] _raw_write_unlock_irqrestore
   - _raw_write_unlock_irqrestore
      - 98,00% _raw_spin_unlock_irqrestore
         + 54,24% __wake_up_sync_key
         + 41,54% sctp_ulpq_tail_event
         - 2,61% mod_timer
            + 79,88% sctp_transport_update_pmtu
            + 20,12% sctp_do_sm
         + 1,61% del_timer
      + 1,83% sctp_ulpq_tail_event
+    2,34%  netperf  [kernel.vmlinux]   [k] pvclock_clocksource_read
+    2,31%  netperf  [sctp]             [k] sctp_sendmsg
+    1,93%  netperf  [kernel.vmlinux]   [k] __slab_free
+    1,92%  netperf  [sctp]             [k] sctp_packet_transmit
+    1,85%  netperf  [kernel.vmlinux]   [k] kfree

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/sctp/transport.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Comments

David Laight March 30, 2016, 9:37 a.m. UTC | #1
From: Marcelo Ricardo Leitner
> Sent: 29 March 2016 14:42
>
> Currently on high rate SCTP streams the heartbeat timer refresh can
> consume quite a lot of resources as timer updates are costly and it
> contains a random factor, which a) is also costly and b) invalidates
> mod_timer() optimization for not editing a timer to the same value.
> It may even cause the timer to be slightly advanced, for no good reason.

Interesting thoughts:
1) Is it necessary to use a different 'random factor' until the timer actually
   expires?
2) It might be better to allow the heartbeat timer to expire, on expiry work
   out the new interval based on when the last 'refresh' was done.

	David
Marcelo Ricardo Leitner March 30, 2016, 12:13 p.m. UTC | #2
Em 30-03-2016 06:37, David Laight escreveu:
> From: Marcelo Ricardo Leitner
>> Sent: 29 March 2016 14:42
>>
>> Currently on high rate SCTP streams the heartbeat timer refresh can
>> consume quite a lot of resources as timer updates are costly and it
>> contains a random factor, which a) is also costly and b) invalidates
>> mod_timer() optimization for not editing a timer to the same value.
>> It may even cause the timer to be slightly advanced, for no good reason.
>
> Interesting thoughts:
> 1) Is it necessary to use a different 'random factor' until the timer actually
>     expires?

I don't understand you fully here, but we have to have a random factor 
on timer expire. As noted by Daniel Borkmann on his commit 8f61059a96c2 
("net: sctp: improve timer slack calculation for transport HBs"):

     RFC4960, section 8.3 says:

       On an idle destination address that is allowed to heartbeat,
       it is recommended that a HEARTBEAT chunk is sent once per RTO
       of that destination address plus the protocol parameter
       'HB.interval', with jittering of +/- 50% of the RTO value,
       and exponential backoff of the RTO if the previous HEARTBEAT
       is unanswered.

Previous to his commit, it was using a random factor based on jiffies.

This patch then assumes that random_A+2 is just as random as random_B as 
long as it is within the allowed range, avoiding the unnecessary updates.

> 2) It might be better to allow the heartbeat timer to expire, on expiry work
>     out the new interval based on when the last 'refresh' was done.

Cool, I thought about this too. It would introduce some extra complexity 
that is not really worth I think, specially because now we may be doing 
more timer updates even with this patch but it's not triggering any wake 
ups and we would need at least 2 wake ups then: one for the first 
timeout event, and then re-schedule the timer for the next updated one, 
and maybe again, and again.. less timer updates but more wake ups, one 
at every heartbeat interval even on a busy transport. Seems it's cheaper 
to just update the timer then.

Thanks,
Marcelo
David Laight March 31, 2016, 11:16 a.m. UTC | #3
From: Marcelo Ricardo Leitner
> Sent: 30 March 2016 13:13
> Em 30-03-2016 06:37, David Laight escreveu:
> > From: Marcelo Ricardo Leitner
> >> Sent: 29 March 2016 14:42
> >>
> >> Currently on high rate SCTP streams the heartbeat timer refresh can
> >> consume quite a lot of resources as timer updates are costly and it
> >> contains a random factor, which a) is also costly and b) invalidates
> >> mod_timer() optimization for not editing a timer to the same value.
> >> It may even cause the timer to be slightly advanced, for no good reason.
> >
> > Interesting thoughts:
> > 1) Is it necessary to use a different 'random factor' until the timer actually
> >     expires?
> 
> I don't understand you fully here, but we have to have a random factor
> on timer expire. As noted by Daniel Borkmann on his commit 8f61059a96c2
> ("net: sctp: improve timer slack calculation for transport HBs"):

When a HEARTBEAT chunk is sent determine the new interval, use that
interval until the timer actually expires when a new interval is
calculated. So the random number is only generated once per heartbeat.

>      RFC4960, section 8.3 says:
> 
>        On an idle destination address that is allowed to heartbeat,
>        it is recommended that a HEARTBEAT chunk is sent once per RTO
>        of that destination address plus the protocol parameter
>        'HB.interval', with jittering of +/- 50% of the RTO value,
>        and exponential backoff of the RTO if the previous HEARTBEAT
>        is unanswered.
> 
> Previous to his commit, it was using a random factor based on jiffies.
> 
> This patch then assumes that random_A+2 is just as random as random_B as
> long as it is within the allowed range, avoiding the unnecessary updates.
> 
> > 2) It might be better to allow the heartbeat timer to expire, on expiry work
> >     out the new interval based on when the last 'refresh' was done.
> 
> Cool, I thought about this too. It would introduce some extra complexity
> that is not really worth I think, specially because now we may be doing
> more timer updates even with this patch but it's not triggering any wake
> ups and we would need at least 2 wake ups then: one for the first
> timeout event, and then re-schedule the timer for the next updated one,
> and maybe again, and again.. less timer updates but more wake ups, one
> at every heartbeat interval even on a busy transport. Seems it's cheaper
> to just update the timer then.

One wakeup per heartbeat interval on a busy connection is probably noise.
Probably much less than the 1000s of timer updates that would otherwise happen.

A further optimisation would be to restart the timer if more than (say) 80%
of the way through the timeout period.

Similarly the HEARTBEAT could be sent if the 2nd wakeup would be almost immediate.

	David
Marcelo Ricardo Leitner March 31, 2016, 9:25 p.m. UTC | #4
On Thu, Mar 31, 2016 at 11:16:52AM +0000, David Laight wrote:
> From: Marcelo Ricardo Leitner
> > Sent: 30 March 2016 13:13
> > Em 30-03-2016 06:37, David Laight escreveu:
> > > From: Marcelo Ricardo Leitner
> > >> Sent: 29 March 2016 14:42
> > >>
> > >> Currently on high rate SCTP streams the heartbeat timer refresh can
> > >> consume quite a lot of resources as timer updates are costly and it
> > >> contains a random factor, which a) is also costly and b) invalidates
> > >> mod_timer() optimization for not editing a timer to the same value.
> > >> It may even cause the timer to be slightly advanced, for no good reason.
> > >
> > > Interesting thoughts:
> > > 1) Is it necessary to use a different 'random factor' until the timer actually
> > >     expires?
> > 
> > I don't understand you fully here, but we have to have a random factor
> > on timer expire. As noted by Daniel Borkmann on his commit 8f61059a96c2
> > ("net: sctp: improve timer slack calculation for transport HBs"):
> 
> When a HEARTBEAT chunk is sent determine the new interval, use that
> interval until the timer actually expires when a new interval is
> calculated. So the random number is only generated once per heartbeat.
> 
> >      RFC4960, section 8.3 says:
> > 
> >        On an idle destination address that is allowed to heartbeat,
> >        it is recommended that a HEARTBEAT chunk is sent once per RTO
> >        of that destination address plus the protocol parameter
> >        'HB.interval', with jittering of +/- 50% of the RTO value,
> >        and exponential backoff of the RTO if the previous HEARTBEAT
> >        is unanswered.
> > 
> > Previous to his commit, it was using a random factor based on jiffies.
> > 
> > This patch then assumes that random_A+2 is just as random as random_B as
> > long as it is within the allowed range, avoiding the unnecessary updates.
> > 
> > > 2) It might be better to allow the heartbeat timer to expire, on expiry work
> > >     out the new interval based on when the last 'refresh' was done.
> > 
> > Cool, I thought about this too. It would introduce some extra complexity
> > that is not really worth I think, specially because now we may be doing
> > more timer updates even with this patch but it's not triggering any wake
> > ups and we would need at least 2 wake ups then: one for the first
> > timeout event, and then re-schedule the timer for the next updated one,
> > and maybe again, and again.. less timer updates but more wake ups, one
> > at every heartbeat interval even on a busy transport. Seems it's cheaper
> > to just update the timer then.
> 
> One wakeup per heartbeat interval on a busy connection is probably noise.
> Probably much less than the 1000s of timer updates that would otherwise happen.

I was thinking more on near-idle systems, as the overhead for this
refresh looked rather small now even for busy transports if compared to
other points, a worth trade-off for reducing wake ups, imho.

But then I checked tcp, and it does what you're suggesting.
I'll rework the patch. Thanks

> A further optimisation would be to restart the timer if more than (say) 80%
> of the way through the timeout period.
> 
> Similarly the HEARTBEAT could be sent if the 2nd wakeup would be almost immediate.
Marcelo Ricardo Leitner April 1, 2016, 12:24 a.m. UTC | #5
On Thu, Mar 31, 2016 at 06:25:12PM -0300, 'Marcelo Ricardo Leitner' wrote:
> On Thu, Mar 31, 2016 at 11:16:52AM +0000, David Laight wrote:
> > From: Marcelo Ricardo Leitner
> > > Sent: 30 March 2016 13:13
> > > Em 30-03-2016 06:37, David Laight escreveu:
> > > > From: Marcelo Ricardo Leitner
> > > >> Sent: 29 March 2016 14:42
> > > >>
> > > >> Currently on high rate SCTP streams the heartbeat timer refresh can
> > > >> consume quite a lot of resources as timer updates are costly and it
> > > >> contains a random factor, which a) is also costly and b) invalidates
> > > >> mod_timer() optimization for not editing a timer to the same value.
> > > >> It may even cause the timer to be slightly advanced, for no good reason.
> > > >
> > > > Interesting thoughts:
> > > > 1) Is it necessary to use a different 'random factor' until the timer actually
> > > >     expires?
> > > 
> > > I don't understand you fully here, but we have to have a random factor
> > > on timer expire. As noted by Daniel Borkmann on his commit 8f61059a96c2
> > > ("net: sctp: improve timer slack calculation for transport HBs"):
> > 
> > When a HEARTBEAT chunk is sent determine the new interval, use that
> > interval until the timer actually expires when a new interval is
> > calculated. So the random number is only generated once per heartbeat.
> > 
> > >      RFC4960, section 8.3 says:
> > > 
> > >        On an idle destination address that is allowed to heartbeat,
> > >        it is recommended that a HEARTBEAT chunk is sent once per RTO
> > >        of that destination address plus the protocol parameter
> > >        'HB.interval', with jittering of +/- 50% of the RTO value,
> > >        and exponential backoff of the RTO if the previous HEARTBEAT
> > >        is unanswered.
> > > 
> > > Previous to his commit, it was using a random factor based on jiffies.
> > > 
> > > This patch then assumes that random_A+2 is just as random as random_B as
> > > long as it is within the allowed range, avoiding the unnecessary updates.
> > > 
> > > > 2) It might be better to allow the heartbeat timer to expire, on expiry work
> > > >     out the new interval based on when the last 'refresh' was done.
> > > 
> > > Cool, I thought about this too. It would introduce some extra complexity
> > > that is not really worth I think, specially because now we may be doing
> > > more timer updates even with this patch but it's not triggering any wake
> > > ups and we would need at least 2 wake ups then: one for the first
> > > timeout event, and then re-schedule the timer for the next updated one,
> > > and maybe again, and again.. less timer updates but more wake ups, one
> > > at every heartbeat interval even on a busy transport. Seems it's cheaper
> > > to just update the timer then.
> > 
> > One wakeup per heartbeat interval on a busy connection is probably noise.
> > Probably much less than the 1000s of timer updates that would otherwise happen.
> 
> I was thinking more on near-idle systems, as the overhead for this
> refresh looked rather small now even for busy transports if compared to
> other points, a worth trade-off for reducing wake ups, imho.
> 
> But then I checked tcp, and it does what you're suggesting.
> I'll rework the patch. Thanks

This is what I'm getting with the new approach. I splitted
sctp_transport_reset_timers into sctp_transport_reset_t3_rtx and
sctp_transport_reset_hb_timer, thus why sctp_transport_reset_t3_rtx in there
and it never updates the timer, only start if it's not running already
(as before). Ran netperf for 60 seconds now, to be sure that the timer
would expire twice (1st for initial path validation and 2nd for pure hb).

Samples: 230K of event 'cpu-clock', Event count (approx.): 57707250000
  Overhead  Command  Shared Object      Symbol
+    5,65%  netperf  [kernel.vmlinux]   [k] memcpy_erms
+    5,59%  netperf  [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
-    5,05%  netperf  [kernel.vmlinux]   [k] _raw_spin_unlock_irqrestore
   - _raw_spin_unlock_irqrestore
      + 49,89% __wake_up_sync_key
      + 45,68% sctp_ulpq_tail_event
      - 2,85% mod_timer
         + 76,51% sctp_transport_reset_t3_rtx
         + 23,49% sctp_do_sm
      + 1,55% del_timer
+    2,50%  netperf  [sctp]             [k] sctp_datamsg_from_user
+    2,26%  netperf  [sctp]             [k] sctp_sendmsg

Doesn't seem much different from v1, but ok.

Also I could do some more cleanups on heartbeat/timer code.

  Marcelo
diff mbox

Patch

diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index 9b6b48c7524e4b441a151b80f0babec81f539d49..c9d6c61f5a511f96f38a0f2c275e418e158e0632 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -185,6 +185,8 @@  static void sctp_transport_destroy(struct sctp_transport *transport)
  */
 void sctp_transport_reset_timers(struct sctp_transport *transport)
 {
+	unsigned long expires;
+
 	/* RFC 2960 6.3.2 Retransmission Timer Rules
 	 *
 	 * R1) Every time a DATA chunk is sent to any address(including a
@@ -199,8 +201,10 @@  void sctp_transport_reset_timers(struct sctp_transport *transport)
 			sctp_transport_hold(transport);
 
 	/* When a data chunk is sent, reset the heartbeat interval.  */
-	if (!mod_timer(&transport->hb_timer,
-		       sctp_transport_timeout(transport)))
+	expires = sctp_transport_timeout(transport);
+	if (time_before(transport->hb_timer.expires, expires) &&
+	    !mod_timer(&transport->hb_timer,
+		       expires + prandom_u32_max(transport->rto)))
 	    sctp_transport_hold(transport);
 }
 
@@ -595,7 +599,7 @@  void sctp_transport_burst_reset(struct sctp_transport *t)
 unsigned long sctp_transport_timeout(struct sctp_transport *trans)
 {
 	/* RTO + timer slack +/- 50% of RTO */
-	unsigned long timeout = (trans->rto >> 1) + prandom_u32_max(trans->rto);
+	unsigned long timeout = trans->rto >> 1;
 
 	if (trans->state != SCTP_UNCONFIRMED &&
 	    trans->state != SCTP_PF)