diff mbox

[net-next,v4,01/16] bpf: BPF support for sock_ops

Message ID 20170628173124.3299500-2-brakmo@fb.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Lawrence Brakmo June 28, 2017, 5:31 p.m. UTC
Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding
struct that allows BPF programs of this type to access some of the
socket's fields (such as IP addresses, ports, etc.). It uses the
existing bpf cgroups infrastructure so the programs can be attached per
cgroup with full inheritance support. The program will be called at
appropriate times to set relevant connections parameters such as buffer
sizes, SYN and SYN-ACK RTOs, etc., based on connection information such
as IP addresses, port numbers, etc.

Alghough there are already 3 mechanisms to set parameters (sysctls,
route metrics and setsockopts), this new mechanism provides some
distinct advantages. Unlike sysctls, it can set parameters per
connection. In contrast to route metrics, it can also use port numbers
and information provided by a user level program. In addition, it could
set parameters probabilistically for evaluation purposes (i.e. do
something different on 10% of the flows and compare results with the
other 90% of the flows). Also, in cases where IPv6 addresses contain
geographic information, the rules to make changes based on the distance
(or RTT) between the hosts are much easier than route metric rules and
can be global. Finally, unlike setsockopt, it oes not require
application changes and it can be updated easily at any time.

Although the bpf cgroup framework already contains a sock related
program type (BPF_PROG_TYPE_CGROUP_SOCK), I created the new type
(BPF_PROG_TYPE_SOCK_OPS) beccause the existing type expects to be called
only once during the connections's lifetime. In contrast, the new
program type will be called multiple times from different places in the
network stack code.  For example, before sending SYN and SYN-ACKs to set
an appropriate timeout, when the connection is established to set
congestion control, etc. As a result it has "op" field to specify the
type of operation requested.

The purpose of this new program type is to simplify setting connection
parameters, such as buffer sizes, TCP's SYN RTO, etc. For example, it is
easy to use facebook's internal IPv6 addresses to determine if both hosts
of a connection are in the same datacenter. Therefore, it is easy to
write a BPF program to choose a small SYN RTO value when both hosts are
in the same datacenter.

This patch only contains the framework to support the new BPF program
type, following patches add the functionality to set various connection
parameters.

This patch defines a new BPF program type: BPF_PROG_TYPE_SOCKET_OPS
and a new bpf syscall command to load a new program of this type:
BPF_PROG_LOAD_SOCKET_OPS.

Two new corresponding structs (one for the kernel one for the user/BPF
program):

/* kernel version */
struct bpf_sock_ops_kern {
        struct sock *sk;
	bool   is_req_sock:1;
        __u32  op;
        union {
                __u32 reply;
                __u32 replylong[4];
        };
};

/* user version */
struct bpf_sock_ops {
        __u32 op;
        union {
                __u32 reply;
                __u32 replylong[4];
        };
        __u32 family;
        __u32 remote_ip4;
        __u32 local_ip4;
        __u32 remote_ip6[4];
        __u32 local_ip6[4];
        __u32 remote_port;
        __u32 local_port;
};

Currently there are two types of ops. The first type expects the BPF
program to return a value which is then used by the caller (or a
negative value to indicate the operation is not supported). The second
type expects state changes to be done by the BPF program, for example
through a setsockopt BPF helper function, and they ignore the return
value.

The reply fields of the bpf_sockt_ops struct are there in case a bpf
program needs to return a value larger than an integer.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 include/linux/bpf-cgroup.h |  18 +++++
 include/linux/bpf_types.h  |   1 +
 include/linux/filter.h     |  10 +++
 include/net/tcp.h          |  37 ++++++++++
 include/uapi/linux/bpf.h   |  28 ++++++++
 kernel/bpf/cgroup.c        |  37 ++++++++++
 kernel/bpf/syscall.c       |   5 ++
 net/core/filter.c          | 170 +++++++++++++++++++++++++++++++++++++++++++++
 samples/bpf/bpf_load.c     |  13 +++-
 9 files changed, 316 insertions(+), 3 deletions(-)

Comments

Alexei Starovoitov June 28, 2017, 7:53 p.m. UTC | #1
On 6/28/17 10:31 AM, Lawrence Brakmo wrote:
> +#ifdef CONFIG_BPF
> +static inline int tcp_call_bpf(struct sock *sk, bool is_req_sock, int op)
> +{
> +	struct bpf_sock_ops_kern sock_ops;
> +	int ret;
> +
> +	if (!is_req_sock)
> +		sock_owned_by_me(sk);
> +
> +	memset(&sock_ops, 0, sizeof(sock_ops));
> +	sock_ops.sk = sk;
> +	sock_ops.is_req_sock = is_req_sock;
> +	sock_ops.op = op;
> +
> +	ret = BPF_CGROUP_RUN_PROG_SOCK_OPS(&sock_ops);
> +	if (ret == 0)
> +		ret = sock_ops.reply;
> +	else
> +		ret = -1;
> +	return ret;
> +}

the switch to cgroup attached only made it really nice and clean.
No global state to worry about.
I haven't looked through the minor patch details, but overall
it all looks good to me. I don't have any architectural concerns.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann June 29, 2017, 9:46 a.m. UTC | #2
On 06/28/2017 07:31 PM, Lawrence Brakmo wrote:
> Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding
> struct that allows BPF programs of this type to access some of the
> socket's fields (such as IP addresses, ports, etc.). It uses the
> existing bpf cgroups infrastructure so the programs can be attached per
> cgroup with full inheritance support. The program will be called at
> appropriate times to set relevant connections parameters such as buffer
> sizes, SYN and SYN-ACK RTOs, etc., based on connection information such
> as IP addresses, port numbers, etc.
[...]
> Currently there are two types of ops. The first type expects the BPF
> program to return a value which is then used by the caller (or a
> negative value to indicate the operation is not supported). The second
> type expects state changes to be done by the BPF program, for example
> through a setsockopt BPF helper function, and they ignore the return
> value.
>
> The reply fields of the bpf_sockt_ops struct are there in case a bpf
> program needs to return a value larger than an integer.
>
> Signed-off-by: Lawrence Brakmo <brakmo@fb.com>

For BPF bits:

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

> @@ -3379,6 +3409,140 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type,
>   	return insn - insn_buf;
>   }
>
> +static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
> +				       const struct bpf_insn *si,
> +				       struct bpf_insn *insn_buf,
> +				       struct bpf_prog *prog)
> +{
> +	struct bpf_insn *insn = insn_buf;
> +	int off;
> +
> +	switch (si->off) {
[...]
> +	case offsetof(struct bpf_sock_ops, remote_ip4):
> +		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_daddr) != 4);
> +
> +		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
> +						struct bpf_sock_ops_kern, sk),
> +				      si->dst_reg, si->src_reg,
> +				      offsetof(struct bpf_sock_ops_kern, sk));
> +		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
> +				      offsetof(struct sock_common, skc_daddr));
> +		*insn++ = BPF_ENDIAN(BPF_FROM_BE, si->dst_reg, 32);
> +		break;
> +
> +	case offsetof(struct bpf_sock_ops, local_ip4):
> +		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_rcv_saddr) != 4);
> +
> +		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
> +					      struct bpf_sock_ops_kern, sk),
> +				      si->dst_reg, si->src_reg,
> +				      offsetof(struct bpf_sock_ops_kern, sk));
> +		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
> +				      offsetof(struct sock_common,
> +					       skc_rcv_saddr));
> +		*insn++ = BPF_ENDIAN(BPF_FROM_BE, si->dst_reg, 32);
> +		break;
> +
> +	case offsetof(struct bpf_sock_ops, remote_ip6[0]) ...
> +	     offsetof(struct bpf_sock_ops, remote_ip6[3]):
> +#if IS_ENABLED(CONFIG_IPV6)
> +		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common,
> +					  skc_v6_daddr.s6_addr32[0]) != 4);
> +
> +		off = si->off;
> +		off -= offsetof(struct bpf_sock_ops, remote_ip6[0]);
> +		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
> +						struct bpf_sock_ops_kern, sk),
> +				      si->dst_reg, si->src_reg,
> +				      offsetof(struct bpf_sock_ops_kern, sk));
> +		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
> +				      offsetof(struct sock_common,
> +					       skc_v6_daddr.s6_addr32[0]) +
> +				      off);
> +		*insn++ = BPF_ENDIAN(BPF_FROM_BE, si->dst_reg, 32);
> +#else
> +		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
> +#endif
> +		break;
> +
> +	case offsetof(struct bpf_sock_ops, local_ip6[0]) ...
> +	     offsetof(struct bpf_sock_ops, local_ip6[3]):
> +#if IS_ENABLED(CONFIG_IPV6)
> +		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common,
> +					  skc_v6_rcv_saddr.s6_addr32[0]) != 4);
> +
> +		off = si->off;
> +		off -= offsetof(struct bpf_sock_ops, local_ip6[0]);
> +		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
> +						struct bpf_sock_ops_kern, sk),
> +				      si->dst_reg, si->src_reg,
> +				      offsetof(struct bpf_sock_ops_kern, sk));
> +		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
> +				      offsetof(struct sock_common,
> +					       skc_v6_rcv_saddr.s6_addr32[0]) +
> +				      off);
> +		*insn++ = BPF_ENDIAN(BPF_FROM_BE, si->dst_reg, 32);
> +#else
> +		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
> +#endif
> +		break;
> +
> +	case offsetof(struct bpf_sock_ops, remote_port):
> +		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_dport) != 2);
> +
> +		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
> +						struct bpf_sock_ops_kern, sk),
> +				      si->dst_reg, si->src_reg,
> +				      offsetof(struct bpf_sock_ops_kern, sk));
> +		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
> +				      offsetof(struct sock_common, skc_dport));
> +		*insn++ = BPF_ENDIAN(BPF_FROM_BE, si->dst_reg, 16);
> +		break;
> +
> +	case offsetof(struct bpf_sock_ops, local_port):
> +		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_num) != 2);
> +
> +		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
> +						struct bpf_sock_ops_kern, sk),
> +				      si->dst_reg, si->src_reg,
> +				      offsetof(struct bpf_sock_ops_kern, sk));
> +		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
> +				      offsetof(struct sock_common, skc_num));

That one is indeed in host endianness. Makes sense to have remote_port
and local_port in a consistent representation.

I was wondering though whether we should do all the conversion of
BPF_ENDIAN(BPF_FROM_BE, ...) or just leave it to the user whether
he needs the BPF_ENDIAN(BPF_FROM_BE, ...) or process it in network
byte order as-is. In case the user needs to go and undo again via
BPF_ENDIAN(BPF_TO_BE, ...), e.g., to reconstruct a full v6 addr,
then we have two unneeded insns for each of the remote_ip6[X] /
local_ip6[X]. So, not providing it in host byte order, the user can
still always chose to do a BPF_ENDIAN(BPF_FROM_BE, ...) by himself,
if this representation is preferred. Wdyt?

> +		break;
> +	}
> +	return insn - insn_buf;
> +}
> +
>   const struct bpf_verifier_ops sk_filter_prog_ops = {
>   	.get_func_proto		= sk_filter_func_proto,
[...]
kernel test robot June 29, 2017, 3:57 p.m. UTC | #3
Hi Lawrence,

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Lawrence-Brakmo/bpf-BPF-cgroup-support-for-sock_ops/20170629-203719
config: tile-allyesconfig (attached as .config)
compiler: tilegx-linux-gcc (GCC) 4.6.2
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=tile 

All warnings (new ones prefixed by >>):

   In file included from include/linux/netfilter/ipset/pfxlen.h:6:0,
                    from net/netfilter/ipset/pfxlen.c:2:
   include/net/tcp.h: In function 'tcp_call_bpf':
>> include/net/tcp.h:2047:8: warning: the address of 'sock_ops' will always evaluate as 'true' [-Waddress]

vim +2047 include/net/tcp.h

  2031	 * program loaded).
  2032	 */
  2033	#ifdef CONFIG_BPF
  2034	static inline int tcp_call_bpf(struct sock *sk, bool is_req_sock, int op)
  2035	{
  2036		struct bpf_sock_ops_kern sock_ops;
  2037		int ret;
  2038	
  2039		if (!is_req_sock)
  2040			sock_owned_by_me(sk);
  2041	
  2042		memset(&sock_ops, 0, sizeof(sock_ops));
  2043		sock_ops.sk = sk;
  2044		sock_ops.is_req_sock = is_req_sock;
  2045		sock_ops.op = op;
  2046	
> 2047		ret = BPF_CGROUP_RUN_PROG_SOCK_OPS(&sock_ops);
  2048		if (ret == 0)
  2049			ret = sock_ops.reply;
  2050		else
  2051			ret = -1;
  2052		return ret;
  2053	}
  2054	#else
  2055	static inline int tcp_call_bpf(struct sock *sk, bool is_req_sock, int op)

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
kernel test robot June 29, 2017, 4:21 p.m. UTC | #4
Hi Lawrence,

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Lawrence-Brakmo/bpf-BPF-cgroup-support-for-sock_ops/20170629-203719
config: xtensa-allyesconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 4.9.0
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=xtensa 

All warnings (new ones prefixed by >>):

   In file included from include/linux/cgroup-defs.h:20:0,
                    from include/linux/cgroup.h:26,
                    from include/net/netprio_cgroup.h:17,
                    from include/linux/netdevice.h:47,
                    from include/net/sock.h:51,
                    from include/linux/tcp.h:23,
                    from include/net/tcp.h:24,
                    from net//ipv6/netfilter/nf_socket_ipv6.c:13:
   include/net/tcp.h: In function 'tcp_call_bpf':
>> include/linux/bpf-cgroup.h:86:25: warning: the address of 'sock_ops' will always evaluate as 'true' [-Waddress]
     if (cgroup_bpf_enabled && (sock_ops) && (sock_ops)->sk) {        \
                            ^
>> include/net/tcp.h:2047:8: note: in expansion of macro 'BPF_CGROUP_RUN_PROG_SOCK_OPS'
     ret = BPF_CGROUP_RUN_PROG_SOCK_OPS(&sock_ops);
           ^
--
   In file included from include/linux/cgroup-defs.h:20:0,
                    from include/linux/cgroup.h:26,
                    from include/net/netprio_cgroup.h:17,
                    from include/linux/netdevice.h:47,
                    from include/net/sock.h:51,
                    from include/linux/tcp.h:23,
                    from net//netfilter/ipvs/ip_vs_core.c:33:
   include/net/tcp.h: In function 'tcp_call_bpf':
>> include/linux/bpf-cgroup.h:86:25: warning: the address of 'sock_ops' will always evaluate as 'true' [-Waddress]
     if (cgroup_bpf_enabled && (sock_ops) && (sock_ops)->sk) {        \
                            ^
>> include/net/tcp.h:2047:8: note: in expansion of macro 'BPF_CGROUP_RUN_PROG_SOCK_OPS'
     ret = BPF_CGROUP_RUN_PROG_SOCK_OPS(&sock_ops);
           ^
   net//netfilter/ipvs/ip_vs_core.c: In function 'ip_vs_sched_persist':
   net//netfilter/ipvs/ip_vs_core.c:399:1: warning: the frame size of 1072 bytes is larger than 1024 bytes [-Wframe-larger-than=]
    }
    ^
   net//netfilter/ipvs/ip_vs_core.c: In function 'ip_vs_new_conn_out':
   net//netfilter/ipvs/ip_vs_core.c:1199:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
    }
    ^
--
   In file included from include/linux/cgroup-defs.h:20:0,
                    from include/linux/cgroup.h:26,
                    from include/net/netprio_cgroup.h:17,
                    from include/linux/netdevice.h:47,
                    from include/net/sock.h:51,
                    from include/linux/tcp.h:23,
                    from net/netfilter/ipvs/ip_vs_core.c:33:
   include/net/tcp.h: In function 'tcp_call_bpf':
>> include/linux/bpf-cgroup.h:86:25: warning: the address of 'sock_ops' will always evaluate as 'true' [-Waddress]
     if (cgroup_bpf_enabled && (sock_ops) && (sock_ops)->sk) {        \
                            ^
>> include/net/tcp.h:2047:8: note: in expansion of macro 'BPF_CGROUP_RUN_PROG_SOCK_OPS'
     ret = BPF_CGROUP_RUN_PROG_SOCK_OPS(&sock_ops);
           ^
   net/netfilter/ipvs/ip_vs_core.c: In function 'ip_vs_sched_persist':
   net/netfilter/ipvs/ip_vs_core.c:399:1: warning: the frame size of 1072 bytes is larger than 1024 bytes [-Wframe-larger-than=]
    }
    ^
   net/netfilter/ipvs/ip_vs_core.c: In function 'ip_vs_new_conn_out':
   net/netfilter/ipvs/ip_vs_core.c:1199:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
    }
    ^

vim +86 include/linux/bpf-cgroup.h

    70		__ret;								       \
    71	})
    72	
    73	#define BPF_CGROUP_RUN_PROG_INET_SOCK(sk)				       \
    74	({									       \
    75		int __ret = 0;							       \
    76		if (cgroup_bpf_enabled && sk) {					       \
    77			__ret = __cgroup_bpf_run_filter_sk(sk,			       \
    78							 BPF_CGROUP_INET_SOCK_CREATE); \
    79		}								       \
    80		__ret;								       \
    81	})
    82	
    83	#define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops)				       \
    84	({									       \
    85		int __ret = 0;							       \
  > 86		if (cgroup_bpf_enabled && (sock_ops) && (sock_ops)->sk) {	       \
    87			typeof(sk) __sk = sk_to_full_sk((sock_ops)->sk);	       \
    88			if (sk_fullsock(__sk))					       \
    89				__ret = __cgroup_bpf_run_filter_sock_ops(__sk,	       \
    90									 sock_ops,     \
    91								 BPF_CGROUP_SOCK_OPS); \
    92		}								       \
    93		__ret;								       \
    94	})

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
Lawrence Brakmo June 30, 2017, 7:27 a.m. UTC | #5
On 6/29/17, 2:46 AM, "netdev-owner@vger.kernel.org on behalf of Daniel Borkmann" <netdev-owner@vger.kernel.org on behalf of daniel@iogearbox.net> wrote:

    On 06/28/2017 07:31 PM, Lawrence Brakmo wrote:
    > Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding

    > struct that allows BPF programs of this type to access some of the

    > socket's fields (such as IP addresses, ports, etc.). It uses the

    > existing bpf cgroups infrastructure so the programs can be attached per

    > cgroup with full inheritance support. The program will be called at

    > appropriate times to set relevant connections parameters such as buffer

    > sizes, SYN and SYN-ACK RTOs, etc., based on connection information such

    > as IP addresses, port numbers, etc.

    [...]
    > Currently there are two types of ops. The first type expects the BPF

    > program to return a value which is then used by the caller (or a

    > negative value to indicate the operation is not supported). The second

    > type expects state changes to be done by the BPF program, for example

    > through a setsockopt BPF helper function, and they ignore the return

    > value.

    >

    > The reply fields of the bpf_sockt_ops struct are there in case a bpf

    > program needs to return a value larger than an integer.

    >

    > Signed-off-by: Lawrence Brakmo <brakmo@fb.com>

    
    For BPF bits:
    
    Acked-by: Daniel Borkmann <daniel@iogearbox.net>

    
    > @@ -3379,6 +3409,140 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type,

    >   	return insn - insn_buf;

    >   }

    >

    > +static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,

    > +				       const struct bpf_insn *si,

    > +				       struct bpf_insn *insn_buf,

    > +				       struct bpf_prog *prog)

    > +{

    > +	struct bpf_insn *insn = insn_buf;

    > +	int off;

    > +

    > +	switch (si->off) {

    [...]
    > +	case offsetof(struct bpf_sock_ops, remote_ip4):

    > +		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_daddr) != 4);

    > +

    > +		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(

    > +						struct bpf_sock_ops_kern, sk),

    > +				      si->dst_reg, si->src_reg,

    > +				      offsetof(struct bpf_sock_ops_kern, sk));

    > +		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,

    > +				      offsetof(struct sock_common, skc_daddr));

    > +		*insn++ = BPF_ENDIAN(BPF_FROM_BE, si->dst_reg, 32);

    > +		break;

    > +

    > +	case offsetof(struct bpf_sock_ops, local_ip4):

    > +		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_rcv_saddr) != 4);

    > +

    > +		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(

    > +					      struct bpf_sock_ops_kern, sk),

    > +				      si->dst_reg, si->src_reg,

    > +				      offsetof(struct bpf_sock_ops_kern, sk));

    > +		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,

    > +				      offsetof(struct sock_common,

    > +					       skc_rcv_saddr));

    > +		*insn++ = BPF_ENDIAN(BPF_FROM_BE, si->dst_reg, 32);

    > +		break;

    > +

    > +	case offsetof(struct bpf_sock_ops, remote_ip6[0]) ...

    > +	     offsetof(struct bpf_sock_ops, remote_ip6[3]):

    > +#if IS_ENABLED(CONFIG_IPV6)

    > +		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common,

    > +					  skc_v6_daddr.s6_addr32[0]) != 4);

    > +

    > +		off = si->off;

    > +		off -= offsetof(struct bpf_sock_ops, remote_ip6[0]);

    > +		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(

    > +						struct bpf_sock_ops_kern, sk),

    > +				      si->dst_reg, si->src_reg,

    > +				      offsetof(struct bpf_sock_ops_kern, sk));

    > +		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,

    > +				      offsetof(struct sock_common,

    > +					       skc_v6_daddr.s6_addr32[0]) +

    > +				      off);

    > +		*insn++ = BPF_ENDIAN(BPF_FROM_BE, si->dst_reg, 32);

    > +#else

    > +		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);

    > +#endif

    > +		break;

    > +

    > +	case offsetof(struct bpf_sock_ops, local_ip6[0]) ...

    > +	     offsetof(struct bpf_sock_ops, local_ip6[3]):

    > +#if IS_ENABLED(CONFIG_IPV6)

    > +		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common,

    > +					  skc_v6_rcv_saddr.s6_addr32[0]) != 4);

    > +

    > +		off = si->off;

    > +		off -= offsetof(struct bpf_sock_ops, local_ip6[0]);

    > +		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(

    > +						struct bpf_sock_ops_kern, sk),

    > +				      si->dst_reg, si->src_reg,

    > +				      offsetof(struct bpf_sock_ops_kern, sk));

    > +		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,

    > +				      offsetof(struct sock_common,

    > +					       skc_v6_rcv_saddr.s6_addr32[0]) +

    > +				      off);

    > +		*insn++ = BPF_ENDIAN(BPF_FROM_BE, si->dst_reg, 32);

    > +#else

    > +		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);

    > +#endif

    > +		break;

    > +

    > +	case offsetof(struct bpf_sock_ops, remote_port):

    > +		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_dport) != 2);

    > +

    > +		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(

    > +						struct bpf_sock_ops_kern, sk),

    > +				      si->dst_reg, si->src_reg,

    > +				      offsetof(struct bpf_sock_ops_kern, sk));

    > +		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,

    > +				      offsetof(struct sock_common, skc_dport));

    > +		*insn++ = BPF_ENDIAN(BPF_FROM_BE, si->dst_reg, 16);

    > +		break;

    > +

    > +	case offsetof(struct bpf_sock_ops, local_port):

    > +		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_num) != 2);

    > +

    > +		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(

    > +						struct bpf_sock_ops_kern, sk),

    > +				      si->dst_reg, si->src_reg,

    > +				      offsetof(struct bpf_sock_ops_kern, sk));

    > +		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,

    > +				      offsetof(struct sock_common, skc_num));

    
    That one is indeed in host endianness. Makes sense to have remote_port
    and local_port in a consistent representation.
    
    I was wondering though whether we should do all the conversion of
    BPF_ENDIAN(BPF_FROM_BE, ...) or just leave it to the user whether
    he needs the BPF_ENDIAN(BPF_FROM_BE, ...) or process it in network
    byte order as-is. In case the user needs to go and undo again via
    BPF_ENDIAN(BPF_TO_BE, ...), e.g., to reconstruct a full v6 addr,
    then we have two unneeded insns for each of the remote_ip6[X] /
    local_ip6[X]. So, not providing it in host byte order, the user can
    still always chose to do a BPF_ENDIAN(BPF_FROM_BE, ...) by himself,
    if this representation is preferred. Wdyt?

Good point about endianness. What I will do is present the data 
in the same endianness as it is in the kernel sock struct and document
this in the sock_ops struct.
I will submit a new patch set soon.  
    
    > +		break;

    > +	}

    > +	return insn - insn_buf;

    > +}

    > +

    >   const struct bpf_verifier_ops sk_filter_prog_ops = {

    >   	.get_func_proto		= sk_filter_func_proto,

    [...]
diff mbox

Patch

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index c970a25..26449c7 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -7,6 +7,7 @@ 
 struct sock;
 struct cgroup;
 struct sk_buff;
+struct bpf_sock_ops_kern;
 
 #ifdef CONFIG_CGROUP_BPF
 
@@ -42,6 +43,10 @@  int __cgroup_bpf_run_filter_skb(struct sock *sk,
 int __cgroup_bpf_run_filter_sk(struct sock *sk,
 			       enum bpf_attach_type type);
 
+int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
+				     struct bpf_sock_ops_kern *sock_ops,
+				     enum bpf_attach_type type);
+
 /* Wrappers for __cgroup_bpf_run_filter_skb() guarded by cgroup_bpf_enabled. */
 #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk, skb)			      \
 ({									      \
@@ -75,6 +80,18 @@  int __cgroup_bpf_run_filter_sk(struct sock *sk,
 	__ret;								       \
 })
 
+#define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops)				       \
+({									       \
+	int __ret = 0;							       \
+	if (cgroup_bpf_enabled && (sock_ops) && (sock_ops)->sk) {	       \
+		typeof(sk) __sk = sk_to_full_sk((sock_ops)->sk);	       \
+		if (sk_fullsock(__sk))					       \
+			__ret = __cgroup_bpf_run_filter_sock_ops(__sk,	       \
+								 sock_ops,     \
+							 BPF_CGROUP_SOCK_OPS); \
+	}								       \
+	__ret;								       \
+})
 #else
 
 struct cgroup_bpf {};
@@ -85,6 +102,7 @@  static inline void cgroup_bpf_inherit(struct cgroup *cgrp,
 #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk,skb) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk,skb) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; })
 
 #endif /* CONFIG_CGROUP_BPF */
 
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 03bf223..3d137c3 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -10,6 +10,7 @@  BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCK, cg_sock_prog_ops)
 BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_IN, lwt_inout_prog_ops)
 BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_OUT, lwt_inout_prog_ops)
 BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_XMIT, lwt_xmit_prog_ops)
+BPF_PROG_TYPE(BPF_PROG_TYPE_SOCK_OPS, sock_ops_prog_ops)
 #endif
 #ifdef CONFIG_BPF_EVENTS
 BPF_PROG_TYPE(BPF_PROG_TYPE_KPROBE, kprobe_prog_ops)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 1fa26dc..bbd6429 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -898,4 +898,14 @@  static inline int bpf_tell_extensions(void)
 	return SKF_AD_MAX;
 }
 
+struct bpf_sock_ops_kern {
+	struct	sock *sk;
+	bool	is_req_sock:1;
+	u32	op;
+	union {
+		u32 reply;
+		u32 replylong[4];
+	};
+};
+
 #endif /* __LINUX_FILTER_H__ */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index d0751b7..804c27a 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -46,6 +46,10 @@ 
 #include <linux/seq_file.h>
 #include <linux/memcontrol.h>
 
+#include <linux/bpf.h>
+#include <linux/filter.h>
+#include <linux/bpf-cgroup.h>
+
 extern struct inet_hashinfo tcp_hashinfo;
 
 extern struct percpu_counter tcp_orphan_count;
@@ -2021,4 +2025,37 @@  int tcp_set_ulp(struct sock *sk, const char *name);
 void tcp_get_available_ulp(char *buf, size_t len);
 void tcp_cleanup_ulp(struct sock *sk);
 
+/* Call BPF_SOCK_OPS program that returns an int. If the return value
+ * is < 0, then the BPF op failed (for example if the loaded BPF
+ * program does not support the chosen operation or there is no BPF
+ * program loaded).
+ */
+#ifdef CONFIG_BPF
+static inline int tcp_call_bpf(struct sock *sk, bool is_req_sock, int op)
+{
+	struct bpf_sock_ops_kern sock_ops;
+	int ret;
+
+	if (!is_req_sock)
+		sock_owned_by_me(sk);
+
+	memset(&sock_ops, 0, sizeof(sock_ops));
+	sock_ops.sk = sk;
+	sock_ops.is_req_sock = is_req_sock;
+	sock_ops.op = op;
+
+	ret = BPF_CGROUP_RUN_PROG_SOCK_OPS(&sock_ops);
+	if (ret == 0)
+		ret = sock_ops.reply;
+	else
+		ret = -1;
+	return ret;
+}
+#else
+static inline int tcp_call_bpf(struct sock *sk, bool is_req_sock, int op)
+{
+	return -EPERM;
+}
+#endif
+
 #endif	/* _TCP_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f94b48b..617fb66 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -120,12 +120,14 @@  enum bpf_prog_type {
 	BPF_PROG_TYPE_LWT_IN,
 	BPF_PROG_TYPE_LWT_OUT,
 	BPF_PROG_TYPE_LWT_XMIT,
+	BPF_PROG_TYPE_SOCK_OPS,
 };
 
 enum bpf_attach_type {
 	BPF_CGROUP_INET_INGRESS,
 	BPF_CGROUP_INET_EGRESS,
 	BPF_CGROUP_INET_SOCK_CREATE,
+	BPF_CGROUP_SOCK_OPS,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -720,4 +722,30 @@  struct bpf_map_info {
 	__u32 map_flags;
 } __attribute__((aligned(8)));
 
+/* User bpf_sock_ops struct to access socket values and specify request ops
+ * and their replies.
+ * New fields can only be added at the end of this structure
+ */
+struct bpf_sock_ops {
+	__u32 op;
+	union {
+		__u32 reply;
+		__u32 replylong[4];
+	};
+	__u32 family;
+	__u32 remote_ip4;
+	__u32 local_ip4;
+	__u32 remote_ip6[4];
+	__u32 local_ip6[4];
+	__u32 remote_port;
+	__u32 local_port;
+};
+
+/* List of known BPF sock_ops operators.
+ * New entries can only be added at the end
+ */
+enum {
+	BPF_SOCK_OPS_VOID,
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index ea6033c..5461134 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -236,3 +236,40 @@  int __cgroup_bpf_run_filter_sk(struct sock *sk,
 	return ret;
 }
 EXPORT_SYMBOL(__cgroup_bpf_run_filter_sk);
+
+/**
+ * __cgroup_bpf_run_filter_sock_ops() - Run a program on a sock
+ * @sk: socket to get cgroup from
+ * @sock_ops: bpf_sock_ops_kern struct to pass to program. Contains
+ * sk with connection information (IP addresses, etc.) May not contain
+ * cgroup info if it is a req sock.
+ * @type: The type of program to be exectuted
+ *
+ * socket passed is expected to be of type INET or INET6.
+ *
+ * The program type passed in via @type must be suitable for sock_ops
+ * filtering. No further check is performed to assert that.
+ *
+ * This function will return %-EPERM if any if an attached program was found
+ * and if it returned != 1 during execution. In all other cases, 0 is returned.
+ */
+int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
+				     struct bpf_sock_ops_kern *sock_ops,
+				     enum bpf_attach_type type)
+{
+	struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
+	struct bpf_prog *prog;
+	int ret = 0;
+
+
+	rcu_read_lock();
+
+	prog = rcu_dereference(cgrp->bpf.effective[type]);
+	if (prog)
+		ret = BPF_PROG_RUN(prog, sock_ops) == 1 ? 0 : -EPERM;
+
+	rcu_read_unlock();
+
+	return ret;
+}
+EXPORT_SYMBOL(__cgroup_bpf_run_filter_sock_ops);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 8942c82..19905e3 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1069,6 +1069,9 @@  static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_CGROUP_INET_SOCK_CREATE:
 		ptype = BPF_PROG_TYPE_CGROUP_SOCK;
 		break;
+	case BPF_CGROUP_SOCK_OPS:
+		ptype = BPF_PROG_TYPE_SOCK_OPS;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -1109,6 +1112,7 @@  static int bpf_prog_detach(const union bpf_attr *attr)
 	case BPF_CGROUP_INET_INGRESS:
 	case BPF_CGROUP_INET_EGRESS:
 	case BPF_CGROUP_INET_SOCK_CREATE:
+	case BPF_CGROUP_SOCK_OPS:
 		cgrp = cgroup_get_from_fd(attr->target_fd);
 		if (IS_ERR(cgrp))
 			return PTR_ERR(cgrp);
@@ -1123,6 +1127,7 @@  static int bpf_prog_detach(const union bpf_attr *attr)
 
 	return ret;
 }
+
 #endif /* CONFIG_CGROUP_BPF */
 
 #define BPF_PROG_TEST_RUN_LAST_FIELD test.duration
diff --git a/net/core/filter.c b/net/core/filter.c
index b39c869..bb54832 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3110,6 +3110,36 @@  void bpf_warn_invalid_xdp_action(u32 act)
 }
 EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
 
+static bool __is_valid_sock_ops_access(int off, int size)
+{
+	if (off < 0 || off >= sizeof(struct bpf_sock_ops))
+		return false;
+	/* The verifier guarantees that size > 0. */
+	if (off % size != 0)
+		return false;
+	if (size != sizeof(__u32))
+		return false;
+
+	return true;
+}
+
+static bool sock_ops_is_valid_access(int off, int size,
+				     enum bpf_access_type type,
+				     struct bpf_insn_access_aux *info)
+{
+	if (type == BPF_WRITE) {
+		switch (off) {
+		case offsetof(struct bpf_sock_ops, op) ...
+		     offsetof(struct bpf_sock_ops, replylong[3]):
+			break;
+		default:
+			return false;
+		}
+	}
+
+	return __is_valid_sock_ops_access(off, size);
+}
+
 static u32 bpf_convert_ctx_access(enum bpf_access_type type,
 				  const struct bpf_insn *si,
 				  struct bpf_insn *insn_buf,
@@ -3379,6 +3409,140 @@  static u32 xdp_convert_ctx_access(enum bpf_access_type type,
 	return insn - insn_buf;
 }
 
+static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
+				       const struct bpf_insn *si,
+				       struct bpf_insn *insn_buf,
+				       struct bpf_prog *prog)
+{
+	struct bpf_insn *insn = insn_buf;
+	int off;
+
+	switch (si->off) {
+	case offsetof(struct bpf_sock_ops, op) ...
+	     offsetof(struct bpf_sock_ops, replylong[3]):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct bpf_sock_ops, op) !=
+			     FIELD_SIZEOF(struct bpf_sock_ops_kern, op));
+		BUILD_BUG_ON(FIELD_SIZEOF(struct bpf_sock_ops, reply) !=
+			     FIELD_SIZEOF(struct bpf_sock_ops_kern, reply));
+		BUILD_BUG_ON(FIELD_SIZEOF(struct bpf_sock_ops, replylong) !=
+			     FIELD_SIZEOF(struct bpf_sock_ops_kern, replylong));
+		off = si->off;
+		off -= offsetof(struct bpf_sock_ops, op);
+		off += offsetof(struct bpf_sock_ops_kern, op);
+		if (type == BPF_WRITE)
+			*insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg,
+					      off);
+		else
+			*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
+					      off);
+		break;
+
+	case offsetof(struct bpf_sock_ops, family):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_family) != 2);
+
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+					      struct bpf_sock_ops_kern, sk),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_sock_ops_kern, sk));
+		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
+				      offsetof(struct sock_common, skc_family));
+		break;
+
+	case offsetof(struct bpf_sock_ops, remote_ip4):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_daddr) != 4);
+
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+						struct bpf_sock_ops_kern, sk),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_sock_ops_kern, sk));
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+				      offsetof(struct sock_common, skc_daddr));
+		*insn++ = BPF_ENDIAN(BPF_FROM_BE, si->dst_reg, 32);
+		break;
+
+	case offsetof(struct bpf_sock_ops, local_ip4):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_rcv_saddr) != 4);
+
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+					      struct bpf_sock_ops_kern, sk),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_sock_ops_kern, sk));
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+				      offsetof(struct sock_common,
+					       skc_rcv_saddr));
+		*insn++ = BPF_ENDIAN(BPF_FROM_BE, si->dst_reg, 32);
+		break;
+
+	case offsetof(struct bpf_sock_ops, remote_ip6[0]) ...
+	     offsetof(struct bpf_sock_ops, remote_ip6[3]):
+#if IS_ENABLED(CONFIG_IPV6)
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common,
+					  skc_v6_daddr.s6_addr32[0]) != 4);
+
+		off = si->off;
+		off -= offsetof(struct bpf_sock_ops, remote_ip6[0]);
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+						struct bpf_sock_ops_kern, sk),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_sock_ops_kern, sk));
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+				      offsetof(struct sock_common,
+					       skc_v6_daddr.s6_addr32[0]) +
+				      off);
+		*insn++ = BPF_ENDIAN(BPF_FROM_BE, si->dst_reg, 32);
+#else
+		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+#endif
+		break;
+
+	case offsetof(struct bpf_sock_ops, local_ip6[0]) ...
+	     offsetof(struct bpf_sock_ops, local_ip6[3]):
+#if IS_ENABLED(CONFIG_IPV6)
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common,
+					  skc_v6_rcv_saddr.s6_addr32[0]) != 4);
+
+		off = si->off;
+		off -= offsetof(struct bpf_sock_ops, local_ip6[0]);
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+						struct bpf_sock_ops_kern, sk),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_sock_ops_kern, sk));
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+				      offsetof(struct sock_common,
+					       skc_v6_rcv_saddr.s6_addr32[0]) +
+				      off);
+		*insn++ = BPF_ENDIAN(BPF_FROM_BE, si->dst_reg, 32);
+#else
+		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+#endif
+		break;
+
+	case offsetof(struct bpf_sock_ops, remote_port):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_dport) != 2);
+
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+						struct bpf_sock_ops_kern, sk),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_sock_ops_kern, sk));
+		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
+				      offsetof(struct sock_common, skc_dport));
+		*insn++ = BPF_ENDIAN(BPF_FROM_BE, si->dst_reg, 16);
+		break;
+
+	case offsetof(struct bpf_sock_ops, local_port):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_num) != 2);
+
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+						struct bpf_sock_ops_kern, sk),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_sock_ops_kern, sk));
+		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
+				      offsetof(struct sock_common, skc_num));
+		break;
+	}
+	return insn - insn_buf;
+}
+
 const struct bpf_verifier_ops sk_filter_prog_ops = {
 	.get_func_proto		= sk_filter_func_proto,
 	.is_valid_access	= sk_filter_is_valid_access,
@@ -3428,6 +3592,12 @@  const struct bpf_verifier_ops cg_sock_prog_ops = {
 	.convert_ctx_access	= sock_filter_convert_ctx_access,
 };
 
+const struct bpf_verifier_ops sock_ops_prog_ops = {
+	.get_func_proto		= bpf_base_func_proto,
+	.is_valid_access	= sock_ops_is_valid_access,
+	.convert_ctx_access	= sock_ops_convert_ctx_access,
+};
+
 int sk_detach_filter(struct sock *sk)
 {
 	int ret = -ENOENT;
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index a91c57d..a4be7cf 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -64,6 +64,7 @@  static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 	bool is_perf_event = strncmp(event, "perf_event", 10) == 0;
 	bool is_cgroup_skb = strncmp(event, "cgroup/skb", 10) == 0;
 	bool is_cgroup_sk = strncmp(event, "cgroup/sock", 11) == 0;
+	bool is_sockops = strncmp(event, "sockops", 7) == 0;
 	size_t insns_cnt = size / sizeof(struct bpf_insn);
 	enum bpf_prog_type prog_type;
 	char buf[256];
@@ -89,6 +90,8 @@  static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 		prog_type = BPF_PROG_TYPE_CGROUP_SKB;
 	} else if (is_cgroup_sk) {
 		prog_type = BPF_PROG_TYPE_CGROUP_SOCK;
+	} else if (is_sockops) {
+		prog_type = BPF_PROG_TYPE_SOCK_OPS;
 	} else {
 		printf("Unknown event '%s'\n", event);
 		return -1;
@@ -106,8 +109,11 @@  static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 	if (is_xdp || is_perf_event || is_cgroup_skb || is_cgroup_sk)
 		return 0;
 
-	if (is_socket) {
-		event += 6;
+	if (is_socket || is_sockops) {
+		if (is_socket)
+			event += 6;
+		else
+			event += 7;
 		if (*event != '/')
 			return 0;
 		event++;
@@ -560,7 +566,8 @@  static int do_load_bpf_file(const char *path, fixup_map_cb fixup_map)
 		    memcmp(shname, "xdp", 3) == 0 ||
 		    memcmp(shname, "perf_event", 10) == 0 ||
 		    memcmp(shname, "socket", 6) == 0 ||
-		    memcmp(shname, "cgroup/", 7) == 0)
+		    memcmp(shname, "cgroup/", 7) == 0 ||
+		    memcmp(shname, "sockops", 7) == 0)
 			load_and_attach(shname, data->d_buf, data->d_size);
 	}