diff mbox series

[bpf-next] xsk, net: make sock_def_readable() have external linkage

Message ID 20200120092917.13949-1-bjorn.topel@gmail.com
State Accepted
Delegated to: BPF Maintainers
Headers show
Series [bpf-next] xsk, net: make sock_def_readable() have external linkage | expand

Commit Message

Björn Töpel Jan. 20, 2020, 9:29 a.m. UTC
From: Björn Töpel <bjorn.topel@intel.com>

XDP sockets use the default implementation of struct sock's
sk_data_ready callback, which is sock_def_readable(). This function is
called in the XDP socket fast-path, and involves a retpoline. By
letting sock_def_readable() have external linkage, and being called
directly, the retpoline can be avoided.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 include/net/sock.h | 2 ++
 net/core/sock.c    | 2 +-
 net/xdp/xsk.c      | 2 +-
 3 files changed, 4 insertions(+), 2 deletions(-)

Comments

John Fastabend Jan. 21, 2020, 12:47 a.m. UTC | #1
Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> XDP sockets use the default implementation of struct sock's
> sk_data_ready callback, which is sock_def_readable(). This function is
> called in the XDP socket fast-path, and involves a retpoline. By
> letting sock_def_readable() have external linkage, and being called
> directly, the retpoline can be avoided.
> 
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  include/net/sock.h | 2 ++
>  net/core/sock.c    | 2 +-
>  net/xdp/xsk.c      | 2 +-
>  3 files changed, 4 insertions(+), 2 deletions(-)
> 

I think this is fine but curious were you able to measure the
difference with before/after pps or something?
Björn Töpel Jan. 21, 2020, 12:06 p.m. UTC | #2
On Tue, 21 Jan 2020 at 01:48, John Fastabend <john.fastabend@gmail.com> wrote:
>
> Björn Töpel wrote:
> > From: Björn Töpel <bjorn.topel@intel.com>
> >
> > XDP sockets use the default implementation of struct sock's
> > sk_data_ready callback, which is sock_def_readable(). This function is
> > called in the XDP socket fast-path, and involves a retpoline. By
> > letting sock_def_readable() have external linkage, and being called
> > directly, the retpoline can be avoided.
> >
> > Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> > ---
> >  include/net/sock.h | 2 ++
> >  net/core/sock.c    | 2 +-
> >  net/xdp/xsk.c      | 2 +-
> >  3 files changed, 4 insertions(+), 2 deletions(-)
> >
>
> I think this is fine but curious were you able to measure the
> difference with before/after pps or something?

Ugh, yeah, of course I've should have added that. Sorry for that! Here
goes; Benchmark is xdpsock rxdrop, NAPI running on core 20:

**Pre-patch: xdpsock rxdrop: 22.8 Mpps
 Performance counter stats for 'CPU(s) 20':

         10,000.58 msec cpu-clock                 #    1.000 CPUs
utilized
                12      context-switches          #    0.001 K/sec
                 1      cpu-migrations            #    0.000 K/sec
                 0      page-faults               #    0.000 K/sec
    29,931,407,416      cycles                    #    2.993 GHz
    82,538,852,331      instructions              #    2.76  insn per
cycle
    15,894,169,979      branches                  # 1589.324 M/sec
        30,916,486      branch-misses             #    0.19% of all
branches

      10.000636027 seconds time elapsed

**Post-patch: xdpsock rxdrop: 23.2 Mpps
         10,000.90 msec cpu-clock                 #    1.000 CPUs
utilized
                12      context-switches          #    0.001 K/sec
                 1      cpu-migrations            #    0.000 K/sec
                 0      page-faults               #    0.000 K/sec
    29,932,353,067      cycles                    #    2.993 GHz
    84,299,636,827      instructions              #    2.82  insn per
cycle
    16,228,795,437      branches                  # 1622.733 M/sec
        28,113,847      branch-misses             #    0.17% of all
branches

      10.000596454 seconds time elapsed

This could fall into the category of noise. :-) PPS and IPC is up a
bit. OTOH, maybe UDP can benefit from this as well?


Björn
Daniel Borkmann Jan. 21, 2020, 11:36 p.m. UTC | #3
On 1/21/20 1:06 PM, Björn Töpel wrote:
> On Tue, 21 Jan 2020 at 01:48, John Fastabend <john.fastabend@gmail.com> wrote:
>> Björn Töpel wrote:
>>> From: Björn Töpel <bjorn.topel@intel.com>
>>>
>>> XDP sockets use the default implementation of struct sock's
>>> sk_data_ready callback, which is sock_def_readable(). This function is
>>> called in the XDP socket fast-path, and involves a retpoline. By
>>> letting sock_def_readable() have external linkage, and being called
>>> directly, the retpoline can be avoided.
>>>
>>> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
>>> ---
>>>   include/net/sock.h | 2 ++
>>>   net/core/sock.c    | 2 +-
>>>   net/xdp/xsk.c      | 2 +-
>>>   3 files changed, 4 insertions(+), 2 deletions(-)
>>>
>>
>> I think this is fine but curious were you able to measure the
>> difference with before/after pps or something?
> 
> Ugh, yeah, of course I've should have added that. Sorry for that! Here
> goes; Benchmark is xdpsock rxdrop, NAPI running on core 20:
> 
> **Pre-patch: xdpsock rxdrop: 22.8 Mpps
>   Performance counter stats for 'CPU(s) 20':
> 
>           10,000.58 msec cpu-clock                 #    1.000 CPUs
> utilized
>                  12      context-switches          #    0.001 K/sec
>                   1      cpu-migrations            #    0.000 K/sec
>                   0      page-faults               #    0.000 K/sec
>      29,931,407,416      cycles                    #    2.993 GHz
>      82,538,852,331      instructions              #    2.76  insn per
> cycle
>      15,894,169,979      branches                  # 1589.324 M/sec
>          30,916,486      branch-misses             #    0.19% of all
> branches
> 
>        10.000636027 seconds time elapsed
> 
> **Post-patch: xdpsock rxdrop: 23.2 Mpps
>           10,000.90 msec cpu-clock                 #    1.000 CPUs
> utilized
>                  12      context-switches          #    0.001 K/sec
>                   1      cpu-migrations            #    0.000 K/sec
>                   0      page-faults               #    0.000 K/sec
>      29,932,353,067      cycles                    #    2.993 GHz
>      84,299,636,827      instructions              #    2.82  insn per
> cycle
>      16,228,795,437      branches                  # 1622.733 M/sec
>          28,113,847      branch-misses             #    0.17% of all
> branches
> 
>        10.000596454 seconds time elapsed
> 
> This could fall into the category of noise. :-) PPS and IPC is up a
> bit. OTOH, maybe UDP can benefit from this as well?

Yeah, might be within noise range though getting rid of retpolines in XDP
[socket] fast-path is fine and patch is tiny, so applied, thanks!
diff mbox series

Patch

diff --git a/include/net/sock.h b/include/net/sock.h
index 8dff68b4c316..0891c55f1e82 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2612,4 +2612,6 @@  static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif)
 	return false;
 }
 
+void sock_def_readable(struct sock *sk);
+
 #endif	/* _SOCK_H */
diff --git a/net/core/sock.c b/net/core/sock.c
index 8459ad579f73..a4c8fac781ff 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2786,7 +2786,7 @@  static void sock_def_error_report(struct sock *sk)
 	rcu_read_unlock();
 }
 
-static void sock_def_readable(struct sock *sk)
+void sock_def_readable(struct sock *sk)
 {
 	struct socket_wq *wq;
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 02ada7ab8c6e..df600487a68d 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -217,7 +217,7 @@  static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 static void xsk_flush(struct xdp_sock *xs)
 {
 	xskq_prod_submit(xs->rx);
-	xs->sk.sk_data_ready(&xs->sk);
+	sock_def_readable(&xs->sk);
 }
 
 int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)