[RFC,net-next,v2,01/15] bpf: BPF support for socket ops

Created a new BPF program type, BPF_PROG_TYPE_SOCKET_OPS, and a corresponding
struct that allows BPF programs of this type to access some of the
socket's fields (such as IP addresses, ports, etc.). Currently there is
functionality to load one global BPF program of this type which can be
called at appropriate times to set relevant connection parameters such
as buffer sizes, SYN and SYN-ACK RTOs, etc., based on connection
information such as IP addresses, port numbers, etc.

Alghough there are already 3 mechanisms to set parameters (sysctls,
route metrics and setsockopts), this new mechanism provides some
disticnt advantages. Unlike sysctls, it can set parameters per
connection. In contrast to route metrics, it can also use port numbers
and information provided by a user level program. In addition, it could
set parameters probabilistically for evaluation purposes (i.e. do
something different on 10% of the flows and compare results with the
other 90% of the flows). Also, in cases where IPv6 addresses contain
geographic information, the rules to make changes based on the distance
(or RTT) between the hosts are much easier than route metric rules and
can be global. Finally, unlike setsockopt, it oes not require
application changes and it can be updated easily at any time.

I plan to add support for loading per cgroup socket ops BPF programs in
the near future. One question is whether I should add this functionality
into David Ahern's BPF_PROG_TYPE_CGROUP_SOCK or create a new cgroup bpf
type. Whereas the current cgroup_sock type expects to be called only once
during a connection's lifetime, the new socket_ops type could be called
multipe times. For example, before sending SYN and SYN-ACKs to set an
appropriate timeout, when the connection is established to set
congestion control, etc. As a result it has "op" field to specify the
type of operation requested.

The purpose of this new program type is to simplify setting connection
parameters, such as buffer sizes, TCP's SYN RTO, etc. For example, it is
easy to use facebook's internal IPv6 addresses to determine if both hosts
of a connection are in the same datacenter. Therefore, it is easy to
write a BPF program to choose a small SYN RTO value when both hosts are
in the same datacenter.

This patch only contains the framework to support the new BPF program
type, following patches add the functionality to set various connection
parameters.

This patch defines a new BPF program type: BPF_PROG_TYPE_SOCKET_OPS
and a new bpf syscall command to load a new program of this type:
BPF_PROG_LOAD_SOCKET_OPS.

Two new corresponding structs (one for the kernel one for the user/BPF
program):

/* kernel version */
struct bpf_socket_ops_kern {
        struct sock *sk;
	__u32  is_req_sock:1;
        __u32  op;
        union {
                __u32 reply;
                __u32 replylong[4];
        };
};

/* user version */
struct bpf_socket_ops {
        __u32 op;
        union {
                __u32 reply;
                __u32 replylong[4];
        };
        __u32 family;
        __u32 remote_ip4;
        __u32 local_ip4;
        __u32 remote_ip6[4];
        __u32 local_ip6[4];
        __u32 remote_port;
        __u32 local_port;
};

Currently there are two types of ops. The first type expects the BPF
program to return a value which is then used by the caller (or a
negative value to indicate the operation is not supported). The second
type expects state changes to be done by the BPF program, for example
through a setsockopt BPF helper function, and they ignore the return
value.

The reply fields of the bpf_sockt_ops struct are there in case a bpf
program needs to return a value larger than an integer.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 include/linux/bpf.h       |   6 ++
 include/linux/bpf_types.h |   1 +
 include/linux/filter.h    |  10 +++
 include/net/tcp.h         |  27 ++++++++
 include/uapi/linux/bpf.h  |  28 +++++++++
 kernel/bpf/syscall.c      |   2 +
 net/core/Makefile         |   3 +-
 net/core/filter.c         | 157 ++++++++++++++++++++++++++++++++++++++++++++++
 net/core/sock_bpfops.c    |  67 ++++++++++++++++++++
 samples/bpf/bpf_load.c    |  13 +++-
 10 files changed, 310 insertions(+), 4 deletions(-)
 create mode 100644 net/core/sock_bpfops.c

Message ID	20170615200844.2752485-2-brakmo@fb.com
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3wpZNZ0qpgz9s3s for <patchwork-incoming@ozlabs.org>; Fri, 16 Jun 2017 06:09:50 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.b="IiYh9Vw/"; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752388AbdFOUJR (ORCPT <rfc822;patchwork-incoming@ozlabs.org>); Thu, 15 Jun 2017 16:09:17 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:58451 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752072AbdFOUIw (ORCPT <rfc822;netdev@vger.kernel.org>); Thu, 15 Jun 2017 16:08:52 -0400 Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v5FK3Uud005082 for <netdev@vger.kernel.org>; Thu, 15 Jun 2017 13:08:52 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=r+9OGURfCC84FOaoq8HAjEBWiNV8JysNsRcdKnerRKo=; b=IiYh9Vw/lU9WXjNyYuFxZASUuyCxEDDfUidnhirdYYCksoW1djtMJtQZN3x5Vv6vMw5t oh5abzEilYwuGhc1aGVgGi4oS6cKOHt70TRyUKQ3fdA3MAamb8DB6Js5iQGih040ybV4 /oKR3tN7ohttGO/ED9gOtf6nR56DWqnT1LY= Received: from mail.thefacebook.com ([199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 2b3v6g1270-4 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT) for <netdev@vger.kernel.org>; Thu, 15 Jun 2017 13:08:51 -0700 Received: from mx-out.facebook.com (192.168.52.123) by PRN-CHUB12.TheFacebook.com (192.168.16.22) with Microsoft SMTP Server (TLS) id 14.3.319.2; Thu, 15 Jun 2017 13:08:50 -0700 Received: from facebook.com (2401:db00:11:d025:face:0:13:0) by mx-out.facebook.com (10.212.236.89) with ESMTP id 72790e60520611e7a6ff0002c95209d8-35388f0 for <netdev@vger.kernel.org>; Thu, 15 Jun 2017 13:08:50 -0700 Received: by dev11624.prn1.facebook.com (Postfix, from userid 10340) id 5CFC32E3A43; Thu, 15 Jun 2017 13:08:49 -0700 (PDT) Smtp-Origin-Hostprefix: dev From: Lawrence Brakmo <brakmo@fb.com> Smtp-Origin-Hostname: dev11624.prn1.facebook.com To: netdev <netdev@vger.kernel.org> CC: Kernel Team <kernel-team@fb.com>, Blake Matheny <bmatheny@fb.com>, Alexei Starovoitov <ast@fb.com>, Daniel Borkmann <daniel@iogearbox.net>, David Ahern <dsa@cumulusnetworks.com> Smtp-Origin-Cluster: prn1c29 Subject: [RFC PATCH net-next v2 01/15] bpf: BPF support for socket ops Date: Thu, 15 Jun 2017 13:08:30 -0700 Message-ID: <20170615200844.2752485-2-brakmo@fb.com> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20170615200844.2752485-1-brakmo@fb.com> References: <20170615200844.2752485-1-brakmo@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 Content-Type: text/plain X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-06-15_10:, , signatures=0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org

[RFC,net-next,v2,01/15] bpf: BPF support for socket ops

Commit Message

Comments

Patch