diff mbox

[net] sctp: linearize early if it's not GSO

Message ID a4bc263e62ced9f5790932f11e7d829c36566808.1471386679.git.marcelo.leitner@gmail.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Marcelo Ricardo Leitner Aug. 16, 2016, 10:35 p.m. UTC
Because otherwise when crc computation is still needed it's way more
expensive than on a linear buffer to the point that it affects
performance.

It's so expensive that netperf test gives a perf output as below:

Overhead  Shared Object        Symbol
  69,44%  [kernel]             [k] gf2_matrix_square
   2,84%  [kernel]             [k] crc32_generic_combine.part.0
   2,78%  [kernel]             [k] _raw_spin_lock_bh

And performance goes from 2Gbit/s to 0.5Gbit/s on this test. Doing the
linearization before checksumming is enough to restore it.

Fixes: 3acb50c18d8d ("sctp: delay as much as possible skb_linearize")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---

Please consider this to -stable too as v4.7 performance is affected.
Thanks

 net/sctp/input.c   | 11 +++++++----
 net/sctp/inqueue.c | 13 -------------
 2 files changed, 7 insertions(+), 17 deletions(-)

Comments

Daniel Borkmann Aug. 16, 2016, 10:59 p.m. UTC | #1
On 08/17/2016 12:35 AM, Marcelo Ricardo Leitner wrote:
> Because otherwise when crc computation is still needed it's way more
> expensive than on a linear buffer to the point that it affects
> performance.
>
> It's so expensive that netperf test gives a perf output as below:
>
> Overhead  Shared Object        Symbol
>    69,44%  [kernel]             [k] gf2_matrix_square
>     2,84%  [kernel]             [k] crc32_generic_combine.part.0
>     2,78%  [kernel]             [k] _raw_spin_lock_bh

What kernel is this, seems not net kernel?

$ git grep -n gf2_matrix_square
$ git grep -n crc32_generic_combine
$

Maybe RHEL? Did you consider backporting 6d514b4e7737 et al?

> And performance goes from 2Gbit/s to 0.5Gbit/s on this test. Doing the
> linearization before checksumming is enough to restore it.
>
> Fixes: 3acb50c18d8d ("sctp: delay as much as possible skb_linearize")
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Marcelo Ricardo Leitner Aug. 17, 2016, 12:49 a.m. UTC | #2
On Wed, Aug 17, 2016 at 12:59:17AM +0200, Daniel Borkmann wrote:
> On 08/17/2016 12:35 AM, Marcelo Ricardo Leitner wrote:
> > Because otherwise when crc computation is still needed it's way more
> > expensive than on a linear buffer to the point that it affects
> > performance.
> > 
> > It's so expensive that netperf test gives a perf output as below:
> > 
> > Overhead  Shared Object        Symbol
> >    69,44%  [kernel]             [k] gf2_matrix_square
> >     2,84%  [kernel]             [k] crc32_generic_combine.part.0
> >     2,78%  [kernel]             [k] _raw_spin_lock_bh
> 
> What kernel is this, seems not net kernel?
> 
> $ git grep -n gf2_matrix_square
> $ git grep -n crc32_generic_combine
> $
> 
> Maybe RHEL? Did you consider backporting 6d514b4e7737 et al?

Damn, correct. I'll post a v2 later with a proper changelog.
No I hadn't considered backporting that commit.

Now from a different environment, upstream kernel, without the patch, using
mlx4 and perf record -a -- sleep 5 during netperf (Xeon E5-2690 v3, 24 cpus):

Overhead  Command         Shared Object       Symbol
  16,85%  netserver       [kernel.vmlinux]    [k] crc32_generic_shift
   3,46%  swapper         [kernel.vmlinux]    [k] intel_idle
   2,00%  netserver       [kernel.vmlinux]    [k] __pskb_pull_tail
   1,73%  netserver       [kernel.vmlinux]    [k] copy_user_enhanced_fast_string
   1,72%  swapper         [kernel.vmlinux]    [k] crc32_generic_shift
   1,64%  swapper         [kernel.vmlinux]    [k] poll_idle
   1,59%  netserver       [kernel.vmlinux]    [k] memcpy_erms
   1,57%  netserver       [kernel.vmlinux]    [k] fib_table_lookup
   1,47%  netserver       [kernel.vmlinux]    [k] _raw_spin_lock
   1,37%  netserver       [kernel.vmlinux]    [k] __slab_free
   1,32%  netserver       [sctp]              [k] sctp_packet_transmit
   1,18%  netserver       [kernel.vmlinux]    [k] skb_copy_datagram_iter

With the patch:
Overhead  Command         Shared Object        Symbol
   4,71%  swapper         [kernel.vmlinux]     [k] intel_idle
   2,11%  netserver       [kernel.vmlinux]     [k] copy_user_enhanced_fast_string
   1,45%  netserver       [kernel.vmlinux]     [k] memcpy_erms
   1,29%  swapper         [kernel.vmlinux]     [k] memcpy_erms
   1,28%  netserver       [kernel.vmlinux]     [k] fib_table_lookup
   1,27%  netserver       [kernel.vmlinux]     [k] __slab_free
   1,27%  swapper         [kernel.vmlinux]     [k] fib_table_lookup
   1,26%  netserver       [kernel.vmlinux]     [k] kmem_cache_free
   1,14%  netserver       [kernel.vmlinux]     [k] _raw_spin_lock
   1,07%  netserver       [kernel.vmlinux]     [k] __pskb_pull_tail
   1,06%  netserver       [kernel.vmlinux]     [k] skb_copy_datagram_iter
   1,04%  netserver       [sctp]               [k] sctp_packet_transmit
   1,04%  swapper         [kernel.vmlinux]     [k] __pskb_pull_tail
   1,01%  swapper         [mlx4_en]            [k] mlx4_en_process_rx_cq
   0,99%  swapper         [kernel.vmlinux]     [k] native_queued_spin_lock_slowpath
   0,96%  swapper         [kernel.vmlinux]     [k] _raw_spin_lock
   0,89%  swapper         [sctp]               [k] sctp_packet_transmit

Without the patch:
# netperf -H 192.168.10.1 -l 10 -t SCTP_STREAM -cC -- -m 12000
SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.10.1 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

212992 212992  12000    10.00      2896.13   3.34     3.88     2.267   2.635


With the patch:
# netperf -H 192.168.10.1 -l 10 -t SCTP_STREAM -cC -- -m 12000
SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.10.1 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

212992 212992  12000    10.00      3444.89   3.88     3.02     2.216   1.721

And without the patch netperf fluctuates more as there are more packet
drops and netserver is constantly at 100% cpu usage.

Thanks,
Marcelo
diff mbox

Patch

diff --git a/net/sctp/input.c b/net/sctp/input.c
index c182db7d691ff44a52923fb36c9170e49c141c04..69444d32ecda6cd1a4924911172feba89c5ae976 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -119,7 +119,13 @@  int sctp_rcv(struct sk_buff *skb)
 		       skb_transport_offset(skb))
 		goto discard_it;
 
-	if (!pskb_may_pull(skb, sizeof(struct sctphdr)))
+	/* If the packet is fragmented and we need to do crc checking,
+	 * it's better to just linearize it otherwise crc computing
+	 * takes longer.
+	 */
+	if ((!(skb_shinfo(skb)->gso_type & SKB_GSO_SCTP) &&
+	     skb_linearize(skb)) ||
+	    !pskb_may_pull(skb, sizeof(struct sctphdr)))
 		goto discard_it;
 
 	/* Pull up the IP header. */
@@ -1177,9 +1183,6 @@  static struct sctp_association *__sctp_rcv_lookup_harder(struct net *net,
 	if ((skb_shinfo(skb)->gso_type & SKB_GSO_SCTP) == SKB_GSO_SCTP)
 		return NULL;
 
-	if (skb_linearize(skb))
-		return NULL;
-
 	ch = (sctp_chunkhdr_t *) skb->data;
 
 	/* The code below will attempt to walk the chunk and extract
diff --git a/net/sctp/inqueue.c b/net/sctp/inqueue.c
index c30ddb0f31907f57c5ce85b00dbe04260ca1cb2e..6437aa97cfd79f14c633499c2b131389204c435b 100644
--- a/net/sctp/inqueue.c
+++ b/net/sctp/inqueue.c
@@ -170,19 +170,6 @@  next_chunk:
 
 		chunk = list_entry(entry, struct sctp_chunk, list);
 
-		/* Linearize if it's not GSO */
-		if ((skb_shinfo(chunk->skb)->gso_type & SKB_GSO_SCTP) != SKB_GSO_SCTP &&
-		    skb_is_nonlinear(chunk->skb)) {
-			if (skb_linearize(chunk->skb)) {
-				__SCTP_INC_STATS(dev_net(chunk->skb->dev), SCTP_MIB_IN_PKT_DISCARDS);
-				sctp_chunk_free(chunk);
-				goto next_chunk;
-			}
-
-			/* Update sctp_hdr as it probably changed */
-			chunk->sctp_hdr = sctp_hdr(chunk->skb);
-		}
-
 		if ((skb_shinfo(chunk->skb)->gso_type & SKB_GSO_SCTP) == SKB_GSO_SCTP) {
 			/* GSO-marked skbs but without frags, handle
 			 * them normally