From patchwork Thu Jan 23 15:55:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Sitnicki X-Patchwork-Id: 1228199 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=quarantine dis=none) header.from=cloudflare.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=cloudflare.com header.i=@cloudflare.com header.a=rsa-sha256 header.s=google header.b=sAHPVyqY; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 483Rgw3LFtz9sSN for ; Fri, 24 Jan 2020 02:55:40 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728842AbgAWPzj (ORCPT ); Thu, 23 Jan 2020 10:55:39 -0500 Received: from mail-wm1-f66.google.com ([209.85.128.66]:54159 "EHLO mail-wm1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728792AbgAWPzi (ORCPT ); Thu, 23 Jan 2020 10:55:38 -0500 Received: by mail-wm1-f66.google.com with SMTP id m24so3097426wmc.3 for ; Thu, 23 Jan 2020 07:55:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1WPgC+WzLyR0JTgO+VoeJbEDZ15TC0NlN7bHnmyGW2I=; b=sAHPVyqYzEvMha6fJROM+PxTgNcCuPo0cqK6Hmrx/LtDCX9xkACEVFpq3smba2rVdX 241+QsoWj3jHFOjTX9ZOnh2/rjweZ1pugbdNMtq5P0MHHbKs8cOZTxUgwbwXY2oEE0Jj 4aKna+SWlPYXVm8IQgZ9KUW3wS9mmOaCGtb5g= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1WPgC+WzLyR0JTgO+VoeJbEDZ15TC0NlN7bHnmyGW2I=; b=n9530rcM2F4bCEI9teqyXzPJKU9ipsoSkKiBL2V4MWEmpRvpDVQTBHNV6A7tGp/m88 7ZeCUETPORq3Z7f8lJ7qrjkVgdxmYmLV8Z3cUrN/fPf5WaudaUGG3W2ojNtpC+j4UFLe usJZPBtcTfUsq3cKRkkzi9vjJN6H0yH61i9edHEuPfMe8a3fv3k6w8ew++5Nx7N1wB6S lF0k8p1qsFrKjHSSXfgwiN+osc5gf8YtNjfUuZecBkN14SBbA8mFYNZxYYHXEoH1CXgQ K1LUG44U0JV2Dw7x1jWyTsehjZ/hHtVyh9/x97spqGtpF2MpQcj9qaFiW+L0P10xhoI6 vsiA== X-Gm-Message-State: APjAAAVjd5ik3knw3vUGgr2D0l+fvY1VXegkSj+3ZhV+Zy7cMwaCYKGp ery3q7jTOFBBTfvDkVMn04a0Pw== X-Google-Smtp-Source: APXvYqxusyD9UZ0NtSJm5Rb2TIUtnZRoVYn3HHN6hF8Fj42CoPuNw7X83sITQRWfWc/8VVkI5MGWbw== X-Received: by 2002:a1c:5401:: with SMTP id i1mr4616036wmb.99.1579794936814; Thu, 23 Jan 2020 07:55:36 -0800 (PST) Received: from cloudflare.com ([176.221.114.230]) by smtp.gmail.com with ESMTPSA id o2sm2554270wmh.46.2020.01.23.07.55.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2020 07:55:36 -0800 (PST) From: Jakub Sitnicki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, kernel-team@cloudflare.com, John Fastabend , Lorenz Bauer , Martin Lau Subject: [PATCH bpf-next v4 01/12] bpf, sk_msg: Don't clear saved sock proto on restore Date: Thu, 23 Jan 2020 16:55:23 +0100 Message-Id: <20200123155534.114313-2-jakub@cloudflare.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200123155534.114313-1-jakub@cloudflare.com> References: <20200123155534.114313-1-jakub@cloudflare.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org There is no need to clear psock->sk_proto when restoring socket protocol callbacks in sk->sk_prot. The psock is about to get detached from the sock and eventually destroyed. At worst we will restore the protocol callbacks twice. This makes reasoning about psock state easier. Once psock is initialized, we can count on psock->sk_proto always being set. Also, we don't need a fallback for when socket is not using ULP. tcp_update_ulp already does this for us. Acked-by: John Fastabend Signed-off-by: Jakub Sitnicki --- include/linux/skmsg.h | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index ef7031f8a304..41ea1258d15e 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -359,17 +359,7 @@ static inline void sk_psock_restore_proto(struct sock *sk, struct sk_psock *psock) { sk->sk_write_space = psock->saved_write_space; - - if (psock->sk_proto) { - struct inet_connection_sock *icsk = inet_csk(sk); - bool has_ulp = !!icsk->icsk_ulp_data; - - if (has_ulp) - tcp_update_ulp(sk, psock->sk_proto); - else - sk->sk_prot = psock->sk_proto; - psock->sk_proto = NULL; - } + tcp_update_ulp(sk, psock->sk_proto); } static inline void sk_psock_set_state(struct sk_psock *psock, From patchwork Thu Jan 23 15:55:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Sitnicki X-Patchwork-Id: 1228203 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=quarantine dis=none) header.from=cloudflare.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=cloudflare.com header.i=@cloudflare.com header.a=rsa-sha256 header.s=google header.b=WVEfHv9/; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 483Rgz3k6jz9sPW for ; Fri, 24 Jan 2020 02:55:43 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728928AbgAWPzm (ORCPT ); Thu, 23 Jan 2020 10:55:42 -0500 Received: from mail-wr1-f67.google.com ([209.85.221.67]:44496 "EHLO mail-wr1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728816AbgAWPzk (ORCPT ); Thu, 23 Jan 2020 10:55:40 -0500 Received: by mail-wr1-f67.google.com with SMTP id q10so3622142wrm.11 for ; Thu, 23 Jan 2020 07:55:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=GcvhsL+VpEJFHJ1f37fO6HxodnClhAyICem3KeipO9g=; b=WVEfHv9/nrsjJiDhk2cR3CKIUaomVmuAR7mzmRWSbcA3zV5nRio1h/SZqJM6On0AR/ JyE2HdXzR+OHxwxxDWyr3DMXOGUoOecv4rlAHjTM9ogLR2p1gOCmxg/qbk4+J0NgD4uE l/unVk8sF6/Y2QMEyTYqWqXEmBQiqPnJYbi2I= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=GcvhsL+VpEJFHJ1f37fO6HxodnClhAyICem3KeipO9g=; b=MyQ+Tjea/v9ysA2aHYoc2sR2NGrsRKr6cNLEPxjx8a0SWzDnCJZMXBlJjMQ+uMysYg Iwqi/pmE50NqMPyYJ8uKyorXBwIKD3k9d87LY9cm46QVtzEqTLy6X5S7AymVuyRTPCUT 8uLnRzB71w2p98cphZa9ihivhukBLnY5AqowvZ/JB1nJ4rKTY2y2p8iRHVYPuyeiCD7R MPPpRWfXgpQbNThJRPX+JQmukWd1T1Jdp32yuHu2m4mONjDOYfulmSqUrHgpzydPmpAt Ta/fw9ftRTcuHT2jsTz1pwyO2oWTLs4kY1DVCTKfPFmO2CGpIH32jfNy3NjfWiT9F4Bz Ig4Q== X-Gm-Message-State: APjAAAXULPuA+/kEBjdanpaE6sL4eP8J0WLpmikjq9thyFT3FGyu6V0c g+DiDC7I9lgoJX8dJOpd6Uaf+g== X-Google-Smtp-Source: APXvYqynBdBrOnDxCn0a6swINDVdgYOfN7uGP2C7J05zUIxYIXs0Q6fYJoNH8dobD/p3ZxUmPQwK2Q== X-Received: by 2002:adf:f789:: with SMTP id q9mr19369644wrp.103.1579794938318; Thu, 23 Jan 2020 07:55:38 -0800 (PST) Received: from cloudflare.com ([176.221.114.230]) by smtp.gmail.com with ESMTPSA id a132sm3189409wme.3.2020.01.23.07.55.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2020 07:55:37 -0800 (PST) From: Jakub Sitnicki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, kernel-team@cloudflare.com, John Fastabend , Lorenz Bauer , Martin Lau Subject: [PATCH bpf-next v4 02/12] net, sk_msg: Annotate lockless access to sk_prot on clone Date: Thu, 23 Jan 2020 16:55:24 +0100 Message-Id: <20200123155534.114313-3-jakub@cloudflare.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200123155534.114313-1-jakub@cloudflare.com> References: <20200123155534.114313-1-jakub@cloudflare.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org sk_msg and ULP frameworks override protocol callbacks pointer in sk->sk_prot, while tcp accesses it locklessly when cloning the listening socket, that is with neither sk_lock nor sk_callback_lock held. Once we enable use of listening sockets with sockmap (and hence sk_msg), there will be shared access to sk->sk_prot if socket is getting cloned while being inserted/deleted to/from the sockmap from another CPU: Read side: tcp_v4_rcv sk = __inet_lookup_skb(...) tcp_check_req(sk) inet_csk(sk)->icsk_af_ops->syn_recv_sock tcp_v4_syn_recv_sock tcp_create_openreq_child inet_csk_clone_lock sk_clone_lock READ_ONCE(sk->sk_prot) Write side: sock_map_ops->map_update_elem sock_map_update_elem sock_map_update_common sock_map_link_no_progs tcp_bpf_init tcp_bpf_update_sk_prot sk_psock_update_proto WRITE_ONCE(sk->sk_prot, ops) sock_map_ops->map_delete_elem sock_map_delete_elem __sock_map_delete sock_map_unref sk_psock_put sk_psock_drop sk_psock_restore_proto tcp_update_ulp WRITE_ONCE(sk->sk_prot, proto) Mark the shared access with READ_ONCE/WRITE_ONCE annotations. Acked-by: Martin KaFai Lau Signed-off-by: Jakub Sitnicki --- include/linux/skmsg.h | 3 ++- net/core/sock.c | 5 +++-- net/ipv4/tcp_bpf.c | 4 +++- net/ipv4/tcp_ulp.c | 3 ++- net/tls/tls_main.c | 3 ++- 5 files changed, 12 insertions(+), 6 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 41ea1258d15e..55c834a5c25e 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -352,7 +352,8 @@ static inline void sk_psock_update_proto(struct sock *sk, psock->saved_write_space = sk->sk_write_space; psock->sk_proto = sk->sk_prot; - sk->sk_prot = ops; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, ops); } static inline void sk_psock_restore_proto(struct sock *sk, diff --git a/net/core/sock.c b/net/core/sock.c index a4c8fac781ff..3953bb23f4d0 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1792,16 +1792,17 @@ static void sk_init_common(struct sock *sk) */ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority) { + struct proto *prot = READ_ONCE(sk->sk_prot); struct sock *newsk; bool is_charged = true; - newsk = sk_prot_alloc(sk->sk_prot, priority, sk->sk_family); + newsk = sk_prot_alloc(prot, priority, sk->sk_family); if (newsk != NULL) { struct sk_filter *filter; sock_copy(newsk, sk); - newsk->sk_prot_creator = sk->sk_prot; + newsk->sk_prot_creator = prot; /* SANITY */ if (likely(newsk->sk_net_refcnt)) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index e38705165ac9..4f25aba44ead 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -648,8 +648,10 @@ static void tcp_bpf_reinit_sk_prot(struct sock *sk, struct sk_psock *psock) /* Reinit occurs when program types change e.g. TCP_BPF_TX is removed * or added requiring sk_prot hook updates. We keep original saved * hooks in this case. + * + * Pairs with lockless read in sk_clone_lock(). */ - sk->sk_prot = &tcp_bpf_prots[family][config]; + WRITE_ONCE(sk->sk_prot, &tcp_bpf_prots[family][config]); } static int tcp_bpf_assert_proto_ops(struct proto *ops) diff --git a/net/ipv4/tcp_ulp.c b/net/ipv4/tcp_ulp.c index 12ab5db2b71c..acd1ea0a66f7 100644 --- a/net/ipv4/tcp_ulp.c +++ b/net/ipv4/tcp_ulp.c @@ -104,7 +104,8 @@ void tcp_update_ulp(struct sock *sk, struct proto *proto) struct inet_connection_sock *icsk = inet_csk(sk); if (!icsk->icsk_ulp_ops) { - sk->sk_prot = proto; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, proto); return; } diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index dac24c7aa7d4..f0748e951dea 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -740,7 +740,8 @@ static void tls_update(struct sock *sk, struct proto *p) if (likely(ctx)) ctx->sk_proto = p; else - sk->sk_prot = p; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, p); } static int tls_get_info(const struct sock *sk, struct sk_buff *skb) From patchwork Thu Jan 23 15:55:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Sitnicki X-Patchwork-Id: 1228204 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=quarantine dis=none) header.from=cloudflare.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=cloudflare.com header.i=@cloudflare.com header.a=rsa-sha256 header.s=google header.b=nwkm0mz8; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 483Rh13XF3z9sPW for ; Fri, 24 Jan 2020 02:55:45 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728816AbgAWPzp (ORCPT ); Thu, 23 Jan 2020 10:55:45 -0500 Received: from mail-wr1-f66.google.com ([209.85.221.66]:34568 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728911AbgAWPzl (ORCPT ); Thu, 23 Jan 2020 10:55:41 -0500 Received: by mail-wr1-f66.google.com with SMTP id t2so3684590wrr.1 for ; Thu, 23 Jan 2020 07:55:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=cTVNuQ7EbbdFeGuZ+T31KnByBM1u0Tx/L1PeHjvJyOM=; b=nwkm0mz8tVPRb9tnHZpAPlOfghqnYQpQQ5f7DjJV+FXIlzvdaGyRTaJBNuseuzcm4+ hz4bX7b153SoDTSLK+gXou4DWcQWDf618R8cewdOM7L4faO8BDRyhdb44kkU4LK2AHmO c0qDMWnor87nGeLKJG2+cJIMkzAgiwy/I3Aqo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cTVNuQ7EbbdFeGuZ+T31KnByBM1u0Tx/L1PeHjvJyOM=; b=kgDONyzSByqAw5Lx5+MHntv9uKhFIv7nmdHyB04O+8GERnao+QgCW5qQ17O/zFxJYb uoqeFH5/d/1gFXc5H/xL3qyVxaVWULQ7lkpbS50vnXJaS9QaotHdVqK9gbTS7OmrCqqs 5dTbfWgUUwRb3sa6SbF/DsGkSa1BbqiVvEKEHSYpVc6yksH5OZQbgxDhntsupRpfGqiG aY/iF+Aw5cz3/05vskTaCQU1bM4TY1oGKPAO87IjXvQRaNFgTyFlHPFd086CNCTvmsjs wiixZQLnb6LkJguYH/na6OD5znm2Me9rXmGaMQ+4HPcoih7riAxFkI9HLayakHsj9YZi DR1A== X-Gm-Message-State: APjAAAUd9kQzms19EqBrHAA6UJtek4LGi9B8fxwLOYUMcGQIstVUZfHa wvujJLtBpkZTXbaMnk+0hZ5vpx8sdfIXrQ== X-Google-Smtp-Source: APXvYqzlts4QhlEzAWF7BL7PVZa2arGA7dpfBrE76w08gSalU/IXieor/jZXQRiZ24HTGXuLGO043g== X-Received: by 2002:a05:6000:1052:: with SMTP id c18mr17925717wrx.268.1579794939826; Thu, 23 Jan 2020 07:55:39 -0800 (PST) Received: from cloudflare.com ([176.221.114.230]) by smtp.gmail.com with ESMTPSA id s19sm3073497wmj.33.2020.01.23.07.55.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2020 07:55:39 -0800 (PST) From: Jakub Sitnicki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, kernel-team@cloudflare.com, John Fastabend , Lorenz Bauer , Martin Lau Subject: [PATCH bpf-next v4 03/12] net, sk_msg: Clear sk_user_data pointer on clone if tagged Date: Thu, 23 Jan 2020 16:55:25 +0100 Message-Id: <20200123155534.114313-4-jakub@cloudflare.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200123155534.114313-1-jakub@cloudflare.com> References: <20200123155534.114313-1-jakub@cloudflare.com> MIME-Version: 1.0 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org sk_user_data can hold a pointer to an object that is not intended to be shared between the parent socket and the child that gets a pointer copy on clone. This is the case when sk_user_data points at reference-counted object, like struct sk_psock. One way to resolve it is to tag the pointer with a no-copy flag by repurposing its lowest bit. Based on the bit-flag value we clear the child sk_user_data pointer after cloning the parent socket. The no-copy flag is stored in the pointer itself as opposed to externally, say in socket flags, to guarantee that the pointer and the flag are copied from parent to child socket in an atomic fashion. Parent socket state is subject to change while copying, we don't hold any locks at that time. This approach relies on an assumption that sk_user_data holds a pointer to an object aligned at least 2 bytes. A manual audit of existing users of rcu_dereference_sk_user_data helper confirms our assumption. Also, an RCU-protected sk_user_data is not likely to hold a pointer to a char value or a pathological case of "struct { char c; }". To be safe, warn when the flag-bit is set when setting sk_user_data to catch any future misuses. It is worth considering why clearing sk_user_data unconditionally is not an option. There exist users, DRBD, NVMe, and Xen drivers being among them, that rely on the pointer being copied when cloning the listening socket. Potentially we could distinguish these users by checking if the listening socket has been created in kernel-space via sock_create_kern, and hence has sk_kern_sock flag set. However, this is not the case for NVMe and Xen drivers, which create sockets without marking them as belonging to the kernel. Acked-by: John Fastabend Acked-by: Martin KaFai Lau Signed-off-by: Jakub Sitnicki --- include/net/sock.h | 37 +++++++++++++++++++++++++++++++++++-- net/core/skmsg.c | 2 +- net/core/sock.c | 6 ++++++ 3 files changed, 42 insertions(+), 3 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 0891c55f1e82..93e359a03174 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -518,10 +518,43 @@ enum sk_pacing { SK_PACING_FQ = 2, }; +/* Pointer stored in sk_user_data might not be suitable for copying + * when cloning the socket. For instance, it can point to a reference + * counted object. sk_user_data bottom bit is set if pointer must not + * be copied. + */ +#define SK_USER_DATA_NOCOPY 1UL +#define SK_USER_DATA_PTRMASK ~(SK_USER_DATA_NOCOPY) + +/** + * sk_user_data_is_nocopy - Test if sk_user_data pointer must not be copied + * @sk: socket + */ +static inline bool sk_user_data_is_nocopy(const struct sock *sk) +{ + return ((uintptr_t)sk->sk_user_data & SK_USER_DATA_NOCOPY); +} + #define __sk_user_data(sk) ((*((void __rcu **)&(sk)->sk_user_data))) -#define rcu_dereference_sk_user_data(sk) rcu_dereference(__sk_user_data((sk))) -#define rcu_assign_sk_user_data(sk, ptr) rcu_assign_pointer(__sk_user_data((sk)), ptr) +#define rcu_dereference_sk_user_data(sk) \ +({ \ + void *__tmp = rcu_dereference(__sk_user_data((sk))); \ + (void *)((uintptr_t)__tmp & SK_USER_DATA_PTRMASK); \ +}) +#define rcu_assign_sk_user_data(sk, ptr) \ +({ \ + uintptr_t __tmp = (uintptr_t)(ptr); \ + WARN_ON_ONCE(__tmp & ~SK_USER_DATA_PTRMASK); \ + rcu_assign_pointer(__sk_user_data((sk)), __tmp); \ +}) +#define rcu_assign_sk_user_data_nocopy(sk, ptr) \ +({ \ + uintptr_t __tmp = (uintptr_t)(ptr); \ + WARN_ON_ONCE(__tmp & ~SK_USER_DATA_PTRMASK); \ + rcu_assign_pointer(__sk_user_data((sk)), \ + __tmp | SK_USER_DATA_NOCOPY); \ +}) /* * SK_CAN_REUSE and SK_NO_REUSE on a socket mean that the socket is OK diff --git a/net/core/skmsg.c b/net/core/skmsg.c index ded2d5227678..eeb28cb85664 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -512,7 +512,7 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node) sk_psock_set_state(psock, SK_PSOCK_TX_ENABLED); refcount_set(&psock->refcnt, 1); - rcu_assign_sk_user_data(sk, psock); + rcu_assign_sk_user_data_nocopy(sk, psock); sock_hold(sk); return psock; diff --git a/net/core/sock.c b/net/core/sock.c index 3953bb23f4d0..74662943af5c 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1864,6 +1864,12 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority) goto out; } + /* Clear sk_user_data if parent had the pointer tagged + * as not suitable for copying when cloning. + */ + if (sk_user_data_is_nocopy(newsk)) + RCU_INIT_POINTER(newsk->sk_user_data, NULL); + newsk->sk_err = 0; newsk->sk_err_soft = 0; newsk->sk_priority = 0; From patchwork Thu Jan 23 15:55:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Sitnicki X-Patchwork-Id: 1228208 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=quarantine dis=none) header.from=cloudflare.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=cloudflare.com header.i=@cloudflare.com header.a=rsa-sha256 header.s=google header.b=eMNWesv/; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 483Rh471Hhz9sSH for ; Fri, 24 Jan 2020 02:55:48 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729021AbgAWPzr (ORCPT ); Thu, 23 Jan 2020 10:55:47 -0500 Received: from mail-wm1-f52.google.com ([209.85.128.52]:39399 "EHLO mail-wm1-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728927AbgAWPzn (ORCPT ); Thu, 23 Jan 2020 10:55:43 -0500 Received: by mail-wm1-f52.google.com with SMTP id 20so3046252wmj.4 for ; Thu, 23 Jan 2020 07:55:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1gLiUpm7tHLBpKzGOWJV2RBszAk1RrfTJZPAZfNAcDo=; b=eMNWesv/5D3hPVTrHtqC32D7vxtsA9qOYmsvJfgu6us2GVHHEkqPcDYGw7j70xdube tWm0fElCIE/aquXPZiRfeEOk2UrLmOJXCH2m0CIU44y0MGvGOlo+NytSz+brE/py4tqs /+SGC+g935ZgN0qqIiJSuYbSpll9H8cshfd2M= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1gLiUpm7tHLBpKzGOWJV2RBszAk1RrfTJZPAZfNAcDo=; b=WlqKNM/33WFe0zVHXQdUR2APWBTEirbfqXj0WZq+eNgpapVs36gmv/LMrleswMyqCn IJTyTIwtBGEbCuxOwWjZFufRoltJxpwBEUC1mpCtyhdtm2lRZYAmjLzlh/2O5qvCbOVk UIEnv3oP0rYGabtczBq3zOCyQTRxXC4xcIbc6I7goUvzPDwRlVmUtMDUlZBEFspeWKPK hp6j1MQmlVQRWGOc1+6hFgatadyPXGgVf1GNblhYA6nwoO1GqIJQmtEs8LbczK+KLrJ/ qmtXbLWVWMKy1KwehtfACQHQ7kFP8AeZ9CpgNSiAfLGyvhYDNpKsnnrvugk4WBOoL1sK qtuQ== X-Gm-Message-State: APjAAAWKaaIuWilJIi6xFCL6TwiKutLAvsuBnAVRlRFjv3rHMZH6GqJW nYXVrZIY3OAzAiKJwyWdpH8/Ug== X-Google-Smtp-Source: APXvYqzCs7g2q11RRR7TcnaNBUny8UtlK9D018Sz4iVoRuuycDwkM7UiF8eCTjM0EphglyBmWyD4sQ== X-Received: by 2002:a1c:e007:: with SMTP id x7mr4791030wmg.3.1579794941185; Thu, 23 Jan 2020 07:55:41 -0800 (PST) Received: from cloudflare.com ([176.221.114.230]) by smtp.gmail.com with ESMTPSA id s10sm3443300wrw.12.2020.01.23.07.55.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2020 07:55:40 -0800 (PST) From: Jakub Sitnicki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, kernel-team@cloudflare.com, John Fastabend , Lorenz Bauer , Martin Lau Subject: [PATCH bpf-next v4 04/12] tcp_bpf: Don't let child socket inherit parent protocol ops on copy Date: Thu, 23 Jan 2020 16:55:26 +0100 Message-Id: <20200123155534.114313-5-jakub@cloudflare.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200123155534.114313-1-jakub@cloudflare.com> References: <20200123155534.114313-1-jakub@cloudflare.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Prepare for cloning listening sockets that have their protocol callbacks overridden by sk_msg. Child sockets must not inherit parent callbacks that access state stored in sk_user_data owned by the parent. Restore the child socket protocol callbacks before it gets hashed and any of the callbacks can get invoked. Acked-by: Martin KaFai Lau Signed-off-by: Jakub Sitnicki --- include/net/tcp.h | 7 +++++++ net/ipv4/tcp_bpf.c | 13 +++++++++++++ net/ipv4/tcp_minisocks.c | 2 ++ 3 files changed, 22 insertions(+) diff --git a/include/net/tcp.h b/include/net/tcp.h index 9dd975be7fdf..b969d5984f97 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2181,6 +2181,13 @@ int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock, int flags, int *addr_len); int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, int len, int flags); +#ifdef CONFIG_NET_SOCK_MSG +void tcp_bpf_clone(const struct sock *sk, struct sock *newsk); +#else +static inline void tcp_bpf_clone(const struct sock *sk, struct sock *newsk) +{ +} +#endif /* Call BPF_SOCK_OPS program that returns an int. If the return value * is < 0, then the BPF op failed (for example if the loaded BPF diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 4f25aba44ead..16060e0893a1 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -582,6 +582,19 @@ static void tcp_bpf_close(struct sock *sk, long timeout) saved_close(sk, timeout); } +/* If a child got cloned from a listening socket that had tcp_bpf + * protocol callbacks installed, we need to restore the callbacks to + * the default ones because the child does not inherit the psock state + * that tcp_bpf callbacks expect. + */ +void tcp_bpf_clone(const struct sock *sk, struct sock *newsk) +{ + struct proto *prot = newsk->sk_prot; + + if (prot->unhash == tcp_bpf_unhash) + newsk->sk_prot = sk->sk_prot_creator; +} + enum { TCP_BPF_IPV4, TCP_BPF_IPV6, diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index ad3b56d9fa71..c8274371c3d0 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -548,6 +548,8 @@ struct sock *tcp_create_openreq_child(const struct sock *sk, newtp->fastopen_req = NULL; RCU_INIT_POINTER(newtp->fastopen_rsk, NULL); + tcp_bpf_clone(sk, newsk); + __TCP_INC_STATS(sock_net(sk), TCP_MIB_PASSIVEOPENS); return newsk; From patchwork Thu Jan 23 15:55:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Sitnicki X-Patchwork-Id: 1228207 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=quarantine dis=none) header.from=cloudflare.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=cloudflare.com header.i=@cloudflare.com header.a=rsa-sha256 header.s=google header.b=fH5UY+KX; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 483Rh26cT9z9sSP for ; Fri, 24 Jan 2020 02:55:46 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728853AbgAWPzp (ORCPT ); Thu, 23 Jan 2020 10:55:45 -0500 Received: from mail-wm1-f65.google.com ([209.85.128.65]:54172 "EHLO mail-wm1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728939AbgAWPzo (ORCPT ); Thu, 23 Jan 2020 10:55:44 -0500 Received: by mail-wm1-f65.google.com with SMTP id m24so3097788wmc.3 for ; Thu, 23 Jan 2020 07:55:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=19I0jwH6NcpRqtS1G/gKKbV/DMmIp3+NrSXyhRDpbX0=; b=fH5UY+KXmf1v2N50fJbfuqByilUKEInRsGFmgzpyOiIm0NqhHAo42VT7ATIe1ASPFc QEOOmlmvOigHSHpJwDuJ1F4XTjm4Bg4i1LfasvxHNucELBJeoG7sy3qcHB2BHZWWGGZ3 27bh54nT0Nelj6Y1IBDVx8WF5amXEgUCweh/M= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=19I0jwH6NcpRqtS1G/gKKbV/DMmIp3+NrSXyhRDpbX0=; b=WmxwKWCGVHObrdByzNwLb57koPgRKoTAtVsm5FhPHrn7B+yBiezAhKG4RDx4LLMyh5 D+XS/MV+UDf/mLX2awYdAb+ew0yJzJL1V4lyGKqEOA4ZAI9LUmIMrsUaHdNgaXt99eNa +rKWx3yhXkDgAqg2yL+piQNFkKq+3wcbWq7fZJ7D+t3kHDAMDjEuPS7LlDAmlHUtTIgt n3PqmVUjpENE24deClpX6cvswue1NZs1m0WGVSDh0Ix4jwu8D8sV0J586WnJmr3Wu6kc MC2m6P9v0upskFG1yP9LjuvEBZP4OsxNig7LsbBRZSxtDg/3p0XOg7lZ4Dgkrfw+SAjg Cjwg== X-Gm-Message-State: APjAAAVH7BYZa1gcNPScbI+bREn26wyIIGVucLnvjTSzTsR6DwGtFujX /vX8crdo9R0hn3RzaWdsRl+tz6Cb9/FKzw== X-Google-Smtp-Source: APXvYqy9OZkDy5xOByXlXA6MfYGc/6rBE45j/g+VQb6DqRImJCCiGSSC1vj9aS8dsewu89Ois19Skg== X-Received: by 2002:a7b:c8d6:: with SMTP id f22mr5081031wml.189.1579794942363; Thu, 23 Jan 2020 07:55:42 -0800 (PST) Received: from cloudflare.com ([176.221.114.230]) by smtp.gmail.com with ESMTPSA id z123sm3209464wme.18.2020.01.23.07.55.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2020 07:55:41 -0800 (PST) From: Jakub Sitnicki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, kernel-team@cloudflare.com, John Fastabend , Lorenz Bauer , Martin Lau Subject: [PATCH bpf-next v4 05/12] bpf, sockmap: Allow inserting listening TCP sockets into sockmap Date: Thu, 23 Jan 2020 16:55:27 +0100 Message-Id: <20200123155534.114313-6-jakub@cloudflare.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200123155534.114313-1-jakub@cloudflare.com> References: <20200123155534.114313-1-jakub@cloudflare.com> MIME-Version: 1.0 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org In order for sockmap type to become a generic collection for storing TCP sockets we need to loosen the checks during map update, while tightening the checks in redirect helpers. Currently sockmap requires the TCP socket to be in established state (or transitioning out of SYN_RECV into established state when done from BPF), which prevents inserting listening sockets. Change the update pre-checks so the socket can also be in listening state. Since it doesn't make sense to redirect with sockmap to listening sockets, add appropriate socket state checks to BPF redirect helpers too. We leave sockhash as is for the moment, with no support for holding listening sockets. Therefore sockhash needs its own set of checks. Acked-by: Martin KaFai Lau Signed-off-by: Jakub Sitnicki --- net/core/sock_map.c | 62 +++++++++++++++++++------ tools/testing/selftests/bpf/test_maps.c | 6 +-- 2 files changed, 50 insertions(+), 18 deletions(-) diff --git a/net/core/sock_map.c b/net/core/sock_map.c index eb114ee419b6..97bdceb29f09 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -385,15 +385,44 @@ static int sock_map_update_common(struct bpf_map *map, u32 idx, } static bool sock_map_op_okay(const struct bpf_sock_ops_kern *ops) +{ + return ops->op == BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB || + ops->op == BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB || + ops->op == BPF_SOCK_OPS_TCP_LISTEN_CB; +} + +static bool sock_hash_op_okay(const struct bpf_sock_ops_kern *ops) { return ops->op == BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB || ops->op == BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB; } +/* Only TCP sockets can be inserted into the map. They must be either + * in established or listening state. SYN_RECV is also allowed because + * BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB happens just before socket + * enters established state. + */ static bool sock_map_sk_is_suitable(const struct sock *sk) { return sk->sk_type == SOCK_STREAM && - sk->sk_protocol == IPPROTO_TCP; + sk->sk_protocol == IPPROTO_TCP && + (1 << sk->sk_state) & (TCPF_ESTABLISHED | + TCPF_SYN_RECV | + TCPF_LISTEN); +} + +static bool sock_hash_sk_is_suitable(const struct sock *sk) +{ + return sk->sk_type == SOCK_STREAM && + sk->sk_protocol == IPPROTO_TCP && + (1 << sk->sk_state) & (TCPF_ESTABLISHED | + TCPF_SYN_RECV); +} + +/* Is sock in a state that allows redirecting into it? */ +static bool sock_map_redirect_okay(const struct sock *sk) +{ + return sk->sk_state != TCP_LISTEN; } static int sock_map_update_elem(struct bpf_map *map, void *key, @@ -413,8 +442,7 @@ static int sock_map_update_elem(struct bpf_map *map, void *key, ret = -EINVAL; goto out; } - if (!sock_map_sk_is_suitable(sk) || - sk->sk_state != TCP_ESTABLISHED) { + if (!sock_map_sk_is_suitable(sk)) { ret = -EOPNOTSUPP; goto out; } @@ -454,13 +482,17 @@ BPF_CALL_4(bpf_sk_redirect_map, struct sk_buff *, skb, struct bpf_map *, map, u32, key, u64, flags) { struct tcp_skb_cb *tcb = TCP_SKB_CB(skb); + struct sock *sk; if (unlikely(flags & ~(BPF_F_INGRESS))) return SK_DROP; - tcb->bpf.flags = flags; - tcb->bpf.sk_redir = __sock_map_lookup_elem(map, key); - if (!tcb->bpf.sk_redir) + + sk = __sock_map_lookup_elem(map, key); + if (unlikely(!sk || !sock_map_redirect_okay(sk))) return SK_DROP; + + tcb->bpf.flags = flags; + tcb->bpf.sk_redir = sk; return SK_PASS; } @@ -477,12 +509,17 @@ const struct bpf_func_proto bpf_sk_redirect_map_proto = { BPF_CALL_4(bpf_msg_redirect_map, struct sk_msg *, msg, struct bpf_map *, map, u32, key, u64, flags) { + struct sock *sk; + if (unlikely(flags & ~(BPF_F_INGRESS))) return SK_DROP; - msg->flags = flags; - msg->sk_redir = __sock_map_lookup_elem(map, key); - if (!msg->sk_redir) + + sk = __sock_map_lookup_elem(map, key); + if (unlikely(!sk || !sock_map_redirect_okay(sk))) return SK_DROP; + + msg->flags = flags; + msg->sk_redir = sk; return SK_PASS; } @@ -736,8 +773,7 @@ static int sock_hash_update_elem(struct bpf_map *map, void *key, ret = -EINVAL; goto out; } - if (!sock_map_sk_is_suitable(sk) || - sk->sk_state != TCP_ESTABLISHED) { + if (!sock_hash_sk_is_suitable(sk)) { ret = -EOPNOTSUPP; goto out; } @@ -882,8 +918,8 @@ BPF_CALL_4(bpf_sock_hash_update, struct bpf_sock_ops_kern *, sops, { WARN_ON_ONCE(!rcu_read_lock_held()); - if (likely(sock_map_sk_is_suitable(sops->sk) && - sock_map_op_okay(sops))) + if (likely(sock_hash_sk_is_suitable(sops->sk) && + sock_hash_op_okay(sops))) return sock_hash_update_common(map, key, sops->sk, flags); return -EOPNOTSUPP; } diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c index 02eae1e864c2..c6766b2cff85 100644 --- a/tools/testing/selftests/bpf/test_maps.c +++ b/tools/testing/selftests/bpf/test_maps.c @@ -756,11 +756,7 @@ static void test_sockmap(unsigned int tasks, void *data) /* Test update without programs */ for (i = 0; i < 6; i++) { err = bpf_map_update_elem(fd, &i, &sfd[i], BPF_ANY); - if (i < 2 && !err) { - printf("Allowed update sockmap '%i:%i' not in ESTABLISHED\n", - i, sfd[i]); - goto out_sockmap; - } else if (i >= 2 && err) { + if (err) { printf("Failed noprog update sockmap '%i:%i'\n", i, sfd[i]); goto out_sockmap; From patchwork Thu Jan 23 15:55:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Sitnicki X-Patchwork-Id: 1228222 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=quarantine dis=none) header.from=cloudflare.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=cloudflare.com header.i=@cloudflare.com header.a=rsa-sha256 header.s=google header.b=XIZ0t2fi; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 483RhP18jvz9sSH for ; Fri, 24 Jan 2020 02:56:05 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729122AbgAWP4C (ORCPT ); Thu, 23 Jan 2020 10:56:02 -0500 Received: from mail-wr1-f66.google.com ([209.85.221.66]:44513 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728978AbgAWPzq (ORCPT ); Thu, 23 Jan 2020 10:55:46 -0500 Received: by mail-wr1-f66.google.com with SMTP id q10so3622542wrm.11 for ; Thu, 23 Jan 2020 07:55:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=u4KFoB6a9qwjYl9Xl2tKGq3yT8GvqQL/hMZYCjtkwSE=; b=XIZ0t2firnJwZDGiFpo3qRPJwmPL+sk59Apcw44NoKZB8jcWy96AxAGbCs0xoewV1W 3ZwdUhLI2Zfngd0mpbHm15ef7oK+D06bK6f/4zMz5nSO38Hz2esHhGTKMDAbhIC9mywJ ya+vIz29nbOivK9nxi+2Di+liw/PkkcFHqqdE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=u4KFoB6a9qwjYl9Xl2tKGq3yT8GvqQL/hMZYCjtkwSE=; b=jldEC7MRHxkA6hSofKCSwOt/2OccRLskXLaDE4mnTzMV6tNB0AHlQYrkxwQBbNI5EV hGFSICvEpcXweu4Yo9dPGU+0qB1Fuxnv2uHj8lcfSMckQiCTjbUyNYRhbmbKMH8HeKoI YHZXMdG72b1yOC7vg11COXztZSDlxcI/GVXxLQd1x3CcDYq26ffnsgt5H3nlRtKBnYX3 9Yn49rOGMjeTGDIiehmaM6Q68bkIk2MEIARNlO7mzLB37VvEiIxQj34O9/HBs94OpSEo X5J/jmTwSe2eBTkpDfKgVA6FYv+7lHPkoZU7Lcc3zxqBsRBprKUDQIPkIWG55Yp1BeEJ Qwfw== X-Gm-Message-State: APjAAAVqMRILVbVu7L0ETlh3fMTgcPxPpynFoSclr0OOylQZlvgzLfuo v4ER6h+XjHYnclijzYdzmbHeNahc7gG97w== X-Google-Smtp-Source: APXvYqyAzTzBoJ/lvxUan6wwQSO2i+JsYs6Ilo5GAFeLFG98vbKUzo9/aCgBqojxIBQqOQ5oYoJdTA== X-Received: by 2002:adf:82a7:: with SMTP id 36mr18697182wrc.203.1579794943565; Thu, 23 Jan 2020 07:55:43 -0800 (PST) Received: from cloudflare.com ([176.221.114.230]) by smtp.gmail.com with ESMTPSA id j12sm3793264wrw.54.2020.01.23.07.55.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2020 07:55:43 -0800 (PST) From: Jakub Sitnicki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, kernel-team@cloudflare.com, John Fastabend , Lorenz Bauer , Martin Lau Subject: [PATCH bpf-next v4 06/12] bpf, sockmap: Don't set up sockmap progs for listening sockets Date: Thu, 23 Jan 2020 16:55:28 +0100 Message-Id: <20200123155534.114313-7-jakub@cloudflare.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200123155534.114313-1-jakub@cloudflare.com> References: <20200123155534.114313-1-jakub@cloudflare.com> MIME-Version: 1.0 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Now that sockmap can hold listening sockets, when setting up the psock we will (i) grab references to verdict/parser progs, and (2) override socket upcalls sk_data_ready and sk_write_space. We cannot redirect to listening sockets so we don't need to link the socket to the BPF progs, but more importantly we don't want the listening socket to have overridden upcalls because they would get inherited by child sockets cloned from it. Introduce a separate initialization path for listening sockets that does not change the upcalls and ignores the BPF progs. Acked-by: Martin KaFai Lau Signed-off-by: Jakub Sitnicki --- net/core/sock_map.c | 45 ++++++++++++++++++++++++++++++++++++++------- 1 file changed, 38 insertions(+), 7 deletions(-) diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 97bdceb29f09..2ff545e04f6e 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -228,6 +228,30 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, return ret; } +static int sock_map_link_no_progs(struct bpf_map *map, struct sock *sk) +{ + struct sk_psock *psock; + int ret; + + psock = sk_psock_get_checked(sk); + if (IS_ERR(psock)) + return PTR_ERR(psock); + + if (psock) { + tcp_bpf_reinit(sk); + return 0; + } + + psock = sk_psock_init(sk, map->numa_node); + if (!psock) + return -ENOMEM; + + ret = tcp_bpf_init(sk); + if (ret < 0) + sk_psock_put(sk, psock); + return ret; +} + static void sock_map_free(struct bpf_map *map) { struct bpf_stab *stab = container_of(map, struct bpf_stab, map); @@ -330,6 +354,12 @@ static int sock_map_get_next_key(struct bpf_map *map, void *key, void *next) return 0; } +/* Is sock in a state that allows redirecting from/into it? */ +static bool sock_map_redirect_okay(const struct sock *sk) +{ + return sk->sk_state != TCP_LISTEN; +} + static int sock_map_update_common(struct bpf_map *map, u32 idx, struct sock *sk, u64 flags) { @@ -352,7 +382,14 @@ static int sock_map_update_common(struct bpf_map *map, u32 idx, if (!link) return -ENOMEM; - ret = sock_map_link(map, &stab->progs, sk); + /* Only sockets we can redirect into/from in BPF need to hold + * refs to parser/verdict progs and have their sk_data_ready + * and sk_write_space callbacks overridden. + */ + if (sock_map_redirect_okay(sk)) + ret = sock_map_link(map, &stab->progs, sk); + else + ret = sock_map_link_no_progs(map, sk); if (ret < 0) goto out_free; @@ -419,12 +456,6 @@ static bool sock_hash_sk_is_suitable(const struct sock *sk) TCPF_SYN_RECV); } -/* Is sock in a state that allows redirecting into it? */ -static bool sock_map_redirect_okay(const struct sock *sk) -{ - return sk->sk_state != TCP_LISTEN; -} - static int sock_map_update_elem(struct bpf_map *map, void *key, void *value, u64 flags) { From patchwork Thu Jan 23 15:55:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Sitnicki X-Patchwork-Id: 1228209 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=quarantine dis=none) header.from=cloudflare.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=cloudflare.com header.i=@cloudflare.com header.a=rsa-sha256 header.s=google header.b=B2Jq7Iy4; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 483Rh73frcz9sSH for ; Fri, 24 Jan 2020 02:55:51 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729074AbgAWPzu (ORCPT ); Thu, 23 Jan 2020 10:55:50 -0500 Received: from mail-wr1-f66.google.com ([209.85.221.66]:37741 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728992AbgAWPzr (ORCPT ); Thu, 23 Jan 2020 10:55:47 -0500 Received: by mail-wr1-f66.google.com with SMTP id w15so3665285wru.4 for ; Thu, 23 Jan 2020 07:55:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=VVLv1nOpcm8g71RlGNU+JDanGJ7QnKmZ4x0azYDRngs=; b=B2Jq7Iy4QUtzJrwz1A1nwILtdxg/B/16ouDihBZS+EiodNyGu6LJXbszyNZ9ht/6Gn B6V+ioOzGYwSygItm+xcVAja9c+ExfIGKO2XAMCVps4d3/YZB8+aMXyyydxpqy39Mn0h 198Fwe4GYbL7Qny/+OnPQQXNQlTVpi130r4js= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=VVLv1nOpcm8g71RlGNU+JDanGJ7QnKmZ4x0azYDRngs=; b=Bi2USlUDomzDVBIeoKOrxqCIHwomZwHkB29lZWxXVBJdiw5tGGNYvCefQFbhDvtGcv aFMC/MDRvwHlIQYYgnfS1TwC6EKHwZil+ssAyQuZlgrKHAd5NqiJOLIzXJZoMRQJP9+j qKmwGK8qpfo0gpCc5iyxZ92s+7zsgOLtVVsqTWM97BmM2NmDreEWJ2QgaCnNY9STY76D HFVNKC/NS7IqS8FcKdCD3YUsohJ3UeiHlE1AYJpJUERbxtDh+xfcaX+t1ya+BLkuBx4J And1ztj8dJ5qjat3B7Lav692cn3AWYmzztBFGTFtZAnQ7CtjgLT5uSZKb+1RwSydO3fI /pbw== X-Gm-Message-State: APjAAAWarSB3KN8GuDBuEFFup4cve1WnCSupd5npDfsN4TGGGISgBi9k eB8V26C0QKv4lNnn7Y+VgoTSEQ== X-Google-Smtp-Source: APXvYqwsGxZP/OSMOEPRohkvmQeQm4Jdsa4I0e9fADufRzS4mZO2BsuOFbeUvAfS2tnV3jgVRja2rQ== X-Received: by 2002:adf:806e:: with SMTP id 101mr18279107wrk.300.1579794944869; Thu, 23 Jan 2020 07:55:44 -0800 (PST) Received: from cloudflare.com ([176.221.114.230]) by smtp.gmail.com with ESMTPSA id p5sm3394270wrt.79.2020.01.23.07.55.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2020 07:55:44 -0800 (PST) From: Jakub Sitnicki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, kernel-team@cloudflare.com, John Fastabend , Lorenz Bauer , Martin Lau Subject: [PATCH bpf-next v4 07/12] bpf, sockmap: Return socket cookie on lookup from syscall Date: Thu, 23 Jan 2020 16:55:29 +0100 Message-Id: <20200123155534.114313-8-jakub@cloudflare.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200123155534.114313-1-jakub@cloudflare.com> References: <20200123155534.114313-1-jakub@cloudflare.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Tooling that populates the SOCKMAP with sockets from user-space needs a way to inspect its contents. Returning the struct sock * that SOCKMAP holds to user-space is neither safe nor useful. An approach established by REUSEPORT_SOCKARRAY is to return a socket cookie (a unique identifier) instead. Since socket cookies are u64 values, SOCKMAP needs to support such a value size for lookup to be possible. This requires special handling on update, though. Attempts to do a lookup on SOCKMAP holding u32 values will be met with ENOSPC error. Acked-by: John Fastabend Acked-by: Martin KaFai Lau Signed-off-by: Jakub Sitnicki --- net/core/sock_map.c | 29 +++++++++++++++++++++++++++-- 1 file changed, 27 insertions(+), 2 deletions(-) diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 2ff545e04f6e..441f213bd4c5 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -10,6 +10,7 @@ #include #include #include +#include struct bpf_stab { struct bpf_map map; @@ -31,7 +32,8 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr) return ERR_PTR(-EPERM); if (attr->max_entries == 0 || attr->key_size != 4 || - attr->value_size != 4 || + (attr->value_size != sizeof(u32) && + attr->value_size != sizeof(u64)) || attr->map_flags & ~SOCK_CREATE_FLAG_MASK) return ERR_PTR(-EINVAL); @@ -298,6 +300,21 @@ static void *sock_map_lookup(struct bpf_map *map, void *key) return ERR_PTR(-EOPNOTSUPP); } +static void *sock_map_lookup_sys(struct bpf_map *map, void *key) +{ + struct sock *sk; + + if (map->value_size != sizeof(u64)) + return ERR_PTR(-ENOSPC); + + sk = __sock_map_lookup_elem(map, *(u32 *)key); + if (!sk) + return ERR_PTR(-ENOENT); + + sock_gen_cookie(sk); + return &sk->sk_cookie; +} + static int __sock_map_delete(struct bpf_stab *stab, struct sock *sk_test, struct sock **psk) { @@ -459,12 +476,19 @@ static bool sock_hash_sk_is_suitable(const struct sock *sk) static int sock_map_update_elem(struct bpf_map *map, void *key, void *value, u64 flags) { - u32 ufd = *(u32 *)value; u32 idx = *(u32 *)key; struct socket *sock; struct sock *sk; + u64 ufd; int ret; + if (map->value_size == sizeof(u64)) + ufd = *(u64 *)value; + else + ufd = *(u32 *)value; + if (ufd > S32_MAX) + return -EINVAL; + sock = sockfd_lookup(ufd, &ret); if (!sock) return ret; @@ -568,6 +592,7 @@ const struct bpf_map_ops sock_map_ops = { .map_alloc = sock_map_alloc, .map_free = sock_map_free, .map_get_next_key = sock_map_get_next_key, + .map_lookup_elem_sys_only = sock_map_lookup_sys, .map_update_elem = sock_map_update_elem, .map_delete_elem = sock_map_delete_elem, .map_lookup_elem = sock_map_lookup, From patchwork Thu Jan 23 15:55:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Sitnicki X-Patchwork-Id: 1228219 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=quarantine dis=none) header.from=cloudflare.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=cloudflare.com header.i=@cloudflare.com header.a=rsa-sha256 header.s=google header.b=FJIxxBQe; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 483RhL4fMXz9sSH for ; Fri, 24 Jan 2020 02:56:02 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729043AbgAWPzt (ORCPT ); Thu, 23 Jan 2020 10:55:49 -0500 Received: from mail-wr1-f68.google.com ([209.85.221.68]:39151 "EHLO mail-wr1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729017AbgAWPzs (ORCPT ); Thu, 23 Jan 2020 10:55:48 -0500 Received: by mail-wr1-f68.google.com with SMTP id y11so3643870wrt.6 for ; Thu, 23 Jan 2020 07:55:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=JLv/e4bALSFRO4Odq3IND/raBr+TbMVPO/ljGM7kDSM=; b=FJIxxBQeFlxLjndHJdqOQGpOzNE8O94+CfJwZoE+Zb3qHSZ0P0tmVqndNfHEcs//7V OugimUNmDOBFCmKiqY8sDlizFQm7W+4+cdgzszbeXFqJJpUNVvqkt5Je8WVQ7cqwvvk9 HMjxkJWKxDOhNlZCzNUukWqWxAP7NtfUw83lI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=JLv/e4bALSFRO4Odq3IND/raBr+TbMVPO/ljGM7kDSM=; b=ggNUmEgsY4GDNynXUCgXwWE+7+UVbU3YsoDPZXmVYWrZ+sbqaGCI/YdMCPNE8v/2wU 4Vpv/apAmBs2Mx+rd+0e1NBWt8S7KUgaYqeJlFhs0FU9AwUK7wdhZfihDJeQgbyeV4RD clcEItGbGnU1DVYP+E8h6JN3kz+Zv0+Ziyx5IbpoT1xr6Nj+BdfLRwdteM2QXc6gtxKC kphEKtwOTJO4NlxfuxUXcYf9MNcgbIqGqZfCx5lSEgHM01Pu33m4MNX6FxuiHN0QwneY dQOtV0c9vIzpGI0vRfJl9u/Y11X4shvPwv3/WXS/4EktMmymDDn4i8exqoJokvUWcuue INFw== X-Gm-Message-State: APjAAAXuW1qUCUaqH0N5FViqa3DMh7S1UC92SF8EbTay2AaazWZ1DVmx y5fXiUBkA59wSsc6NYrFKkWseUZsJXm/Qg== X-Google-Smtp-Source: APXvYqyIkUkRINJ6QcWysHrqkt5sG/ipDBIZcaUA3Fhlwhc5U9dIACFLO9+6bRd1blfbJ5iEvcRxBQ== X-Received: by 2002:a5d:5273:: with SMTP id l19mr19321107wrc.175.1579794946327; Thu, 23 Jan 2020 07:55:46 -0800 (PST) Received: from cloudflare.com ([176.221.114.230]) by smtp.gmail.com with ESMTPSA id z8sm3478146wrq.22.2020.01.23.07.55.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2020 07:55:45 -0800 (PST) From: Jakub Sitnicki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, kernel-team@cloudflare.com, John Fastabend , Lorenz Bauer , Martin Lau Subject: [PATCH bpf-next v4 08/12] bpf, sockmap: Let all kernel-land lookup values in SOCKMAP Date: Thu, 23 Jan 2020 16:55:30 +0100 Message-Id: <20200123155534.114313-9-jakub@cloudflare.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200123155534.114313-1-jakub@cloudflare.com> References: <20200123155534.114313-1-jakub@cloudflare.com> MIME-Version: 1.0 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Don't require the kernel code, like BPF helpers, that needs access to SOCKMAP map contents to live in net/core/sock_map.c. Expose the SOCKMAP lookup operation to all kernel-land. Lookup from BPF context is not whitelisted yet. While syscalls have a dedicated lookup handler. Acked-by: Martin KaFai Lau Signed-off-by: Jakub Sitnicki --- net/core/sock_map.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 441f213bd4c5..7b17b258a3d7 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -297,7 +297,7 @@ static struct sock *__sock_map_lookup_elem(struct bpf_map *map, u32 key) static void *sock_map_lookup(struct bpf_map *map, void *key) { - return ERR_PTR(-EOPNOTSUPP); + return __sock_map_lookup_elem(map, *(u32 *)key); } static void *sock_map_lookup_sys(struct bpf_map *map, void *key) @@ -964,6 +964,11 @@ static void sock_hash_free(struct bpf_map *map) kfree(htab); } +static void *sock_hash_lookup(struct bpf_map *map, void *key) +{ + return ERR_PTR(-EOPNOTSUPP); +} + static void sock_hash_release_progs(struct bpf_map *map) { psock_progs_drop(&container_of(map, struct bpf_htab, map)->progs); @@ -1043,7 +1048,7 @@ const struct bpf_map_ops sock_hash_ops = { .map_get_next_key = sock_hash_get_next_key, .map_update_elem = sock_hash_update_elem, .map_delete_elem = sock_hash_delete_elem, - .map_lookup_elem = sock_map_lookup, + .map_lookup_elem = sock_hash_lookup, .map_release_uref = sock_hash_release_progs, .map_check_btf = map_check_no_btf, }; From patchwork Thu Jan 23 15:55:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Sitnicki X-Patchwork-Id: 1228212 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=quarantine dis=none) header.from=cloudflare.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=cloudflare.com header.i=@cloudflare.com header.a=rsa-sha256 header.s=google header.b=RZMIaeB2; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 483RhB32Wvz9sSN for ; Fri, 24 Jan 2020 02:55:54 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729085AbgAWPzw (ORCPT ); Thu, 23 Jan 2020 10:55:52 -0500 Received: from mail-wr1-f65.google.com ([209.85.221.65]:44526 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729030AbgAWPzu (ORCPT ); Thu, 23 Jan 2020 10:55:50 -0500 Received: by mail-wr1-f65.google.com with SMTP id q10so3622823wrm.11 for ; Thu, 23 Jan 2020 07:55:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=y/5BZmJT8mKE56SUZiH7P/ZiqsELI/+2GypFXWy++to=; b=RZMIaeB2M1SJX/6LvC2tRbAVNav9fqo2SyfdxvFduZ5iVH7DzbuQ/orwRW0nAvVZ/8 hMuwGubx0P8JcayS7yZWOuPb8yH+zkhJ1UtnlKDC6kdzL9A2xsyM+sSiIhNofy+N5Uyl 68hf9kBGjdqpAt7aug2rVpY5SR4X7WMM6LgsI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=y/5BZmJT8mKE56SUZiH7P/ZiqsELI/+2GypFXWy++to=; b=qMEgYs7p3pzlptgQfqWTKcJOIRvpQfACCKG6ndJ8tFRxnfu5nc4u3rV8C0AjpNmjge zuYhY2d/OzB+ECqtqdPoE4Y5OPFz7u+jvAoBxCEEncS+2yX7/+7oft3pxiXvXGWSnjX7 4KbEJrf64IlssTVrHIvo+rXgNgO97uTuZXsKZBaCmkD3KghoBCc45NUQZHINWW5zNATK udyilD5VvYQBF9Aiy8VUMGU8OyaI2bCnrbwaAgAa4ZwxZZoqqM5GU+xHt21G/7IlfjT/ Khigd1fZcFdxbaWjTBy+5RjKal0bRfrZPoIfmDG6VozBHBy3Z67yofZlGNZLlgzbbSVv yLsg== X-Gm-Message-State: APjAAAVw68kTwS7ADP2TuL1CpBFMOoAHiYdUcPP6lSYXsH1NXYMFoR+d wHUC7IQK0Cxc/z9/tY00vAJGWtQTd9lhnw== X-Google-Smtp-Source: APXvYqxEVdD+jHvsVaj1J1YYOuGhbqiR8+bMcDvZzYBxdYVDD68tdJPUb7LF/oIK4kv96gfnD1LVNQ== X-Received: by 2002:a5d:42c5:: with SMTP id t5mr19145427wrr.73.1579794947668; Thu, 23 Jan 2020 07:55:47 -0800 (PST) Received: from cloudflare.com ([176.221.114.230]) by smtp.gmail.com with ESMTPSA id h2sm3579560wrv.66.2020.01.23.07.55.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2020 07:55:47 -0800 (PST) From: Jakub Sitnicki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, kernel-team@cloudflare.com, John Fastabend , Lorenz Bauer , Martin Lau Subject: [PATCH bpf-next v4 09/12] bpf: Allow selecting reuseport socket from a SOCKMAP Date: Thu, 23 Jan 2020 16:55:31 +0100 Message-Id: <20200123155534.114313-10-jakub@cloudflare.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200123155534.114313-1-jakub@cloudflare.com> References: <20200123155534.114313-1-jakub@cloudflare.com> MIME-Version: 1.0 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org SOCKMAP now supports storing references to listening sockets. Nothing keeps us from using it as an array of sockets to select from in BPF reuseport programs. Whitelist the map type with the bpf_sk_select_reuseport helper. The restriction that the socket has to be a member of a reuseport group still applies. Socket from a SOCKMAP that does not have sk_reuseport_cb set is not a valid target and we signal it with -EINVAL. This lifts the restriction that SOCKARRAY imposes, if SOCKMAP is used with reuseport BPF, the listening sockets can exist in more than one BPF map at the same time. Acked-by: John Fastabend Acked-by: Martin KaFai Lau Signed-off-by: Jakub Sitnicki --- kernel/bpf/verifier.c | 6 ++++-- net/core/filter.c | 15 ++++++++++----- 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 734abaa02123..45b48f8e8b40 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3693,7 +3693,8 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, if (func_id != BPF_FUNC_sk_redirect_map && func_id != BPF_FUNC_sock_map_update && func_id != BPF_FUNC_map_delete_elem && - func_id != BPF_FUNC_msg_redirect_map) + func_id != BPF_FUNC_msg_redirect_map && + func_id != BPF_FUNC_sk_select_reuseport) goto error; break; case BPF_MAP_TYPE_SOCKHASH: @@ -3774,7 +3775,8 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, goto error; break; case BPF_FUNC_sk_select_reuseport: - if (map->map_type != BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) + if (map->map_type != BPF_MAP_TYPE_REUSEPORT_SOCKARRAY && + map->map_type != BPF_MAP_TYPE_SOCKMAP) goto error; break; case BPF_FUNC_map_peek_elem: diff --git a/net/core/filter.c b/net/core/filter.c index 4bf3e4aa8a7a..261d33560b14 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -8627,6 +8627,7 @@ struct sock *bpf_run_sk_reuseport(struct sock_reuseport *reuse, struct sock *sk, BPF_CALL_4(sk_select_reuseport, struct sk_reuseport_kern *, reuse_kern, struct bpf_map *, map, void *, key, u32, flags) { + bool is_sockarray = map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY; struct sock_reuseport *reuse; struct sock *selected_sk; @@ -8635,12 +8636,16 @@ BPF_CALL_4(sk_select_reuseport, struct sk_reuseport_kern *, reuse_kern, return -ENOENT; reuse = rcu_dereference(selected_sk->sk_reuseport_cb); - if (!reuse) - /* selected_sk is unhashed (e.g. by close()) after the - * above map_lookup_elem(). Treat selected_sk has already - * been removed from the map. + if (!reuse) { + /* reuseport_array has only sk with non NULL sk_reuseport_cb. + * The only (!reuse) case here is - the sk has already been + * unhashed (e.g. by close()), so treat it as -ENOENT. + * + * Other maps (e.g. sock_map) do not provide this guarantee and + * the sk may never be in the reuseport group to begin with. */ - return -ENOENT; + return is_sockarray ? -ENOENT : -EINVAL; + } if (unlikely(reuse->reuseport_id != reuse_kern->reuseport_id)) { struct sock *sk; From patchwork Thu Jan 23 15:55:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Sitnicki X-Patchwork-Id: 1228210 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=quarantine dis=none) header.from=cloudflare.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=cloudflare.com header.i=@cloudflare.com header.a=rsa-sha256 header.s=google header.b=aDF5VJYZ; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 483Rh82qDfz9sPW for ; Fri, 24 Jan 2020 02:55:52 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729076AbgAWPzv (ORCPT ); Thu, 23 Jan 2020 10:55:51 -0500 Received: from mail-wm1-f65.google.com ([209.85.128.65]:39935 "EHLO mail-wm1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729050AbgAWPzv (ORCPT ); Thu, 23 Jan 2020 10:55:51 -0500 Received: by mail-wm1-f65.google.com with SMTP id 20so3046641wmj.4 for ; Thu, 23 Jan 2020 07:55:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=HE0q280pL88FsLm6wA1S6T+oOCzoKnLExHZQkFmW9pQ=; b=aDF5VJYZKqsBPEmKYCtzqwOPoAhbeDZYs1IHvE0MRqdQibLtXhMKpwoLQyTRvqEMrX P8gYxGrdf3iJ4EmYI8v7DI7V1HLEcuLX8o4yEzYA5G+kwW13RZe+03W0bCWNqdYo17CK 1a6CIxnx3BIJ197G+PSYwlHzBW7Vratd7mTrk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=HE0q280pL88FsLm6wA1S6T+oOCzoKnLExHZQkFmW9pQ=; b=IpgOctLs2GenJyzmYt/SSc3HVBPlkApbomWI+D87Ml2aIbtPnDenlpBn6aZbyY74DG d3GPyGdYGsWI3EGn45INEm1Irfhov1TijHIPc/r2vwXBCQK8VtKJ+a4a/u91VA6bVj4Q NXDhoG8s8OUdAxxWZG0BL5u9ocvXrgcaLQwxe1mElblVsdCdHi0zf00CCm38aPKYuOa6 jWm86zfhs7hv6OHoI3MbcJLZPxJDPiD2vGYZaU4noFIMDyd5yRV7tfornUotKZ/sLOT1 tvouML37KnrGHsnM6m9lGSwOunpAnTMvkqfDSLMtYtwM6s66vZPh6EP0oizL9J9J/5Pf wpsQ== X-Gm-Message-State: APjAAAVoIas0MGHN4sMTrNvtkjs5lmC/yDGlefESHNqOR0qYTA+xgcV6 Focs6ra3eROYAbg1pVgeuNynqvPq4ip9Vg== X-Google-Smtp-Source: APXvYqweMWKblxv6CJDKW8AMYg53HHMGg20wSDt5TFU/RArL4k9R1qflqvt8hkhs8JrrK+FbvVURsA== X-Received: by 2002:a7b:ce87:: with SMTP id q7mr3678893wmj.25.1579794948974; Thu, 23 Jan 2020 07:55:48 -0800 (PST) Received: from cloudflare.com ([176.221.114.230]) by smtp.gmail.com with ESMTPSA id i5sm3150254wml.31.2020.01.23.07.55.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2020 07:55:48 -0800 (PST) From: Jakub Sitnicki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, kernel-team@cloudflare.com, John Fastabend , Lorenz Bauer , Martin Lau Subject: [PATCH bpf-next v4 10/12] net: Generate reuseport group ID on group creation Date: Thu, 23 Jan 2020 16:55:32 +0100 Message-Id: <20200123155534.114313-11-jakub@cloudflare.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200123155534.114313-1-jakub@cloudflare.com> References: <20200123155534.114313-1-jakub@cloudflare.com> MIME-Version: 1.0 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Commit 736b46027eb4 ("net: Add ID (if needed) to sock_reuseport and expose reuseport_lock") has introduced lazy generation of reuseport group IDs that survive group resize. By comparing the identifier we check if BPF reuseport program is not trying to select a socket from a BPF map that belongs to a different reuseport group than the one the packet is for. Because SOCKARRAY used to be the only BPF map type that can be used with reuseport BPF, it was possible to delay the generation of reuseport group ID until a socket from the group was inserted into BPF map for the first time. Now that SOCKMAP can be used with reuseport BPF we have two options, either generate the reuseport ID on map update, like SOCKARRAY does, or allocate an ID from the start when reuseport group gets created. This patch goes the latter approach to keep SOCKMAP free of calls into reuseport code. This streamlines the reuseport_id access as its lifetime now matches the longevity of reuseport object. The cost of this simplification, however, is that we allocate reuseport IDs for all SO_REUSEPORT users. Even those that don't use SOCKARRAY in their setups. With the way identifiers are currently generated, we can have at most S32_MAX reuseport groups, which hopefully is sufficient. If we ever get close to the limit, we can switch an u64 counter like sk_cookie. Another change is that we now always call into SOCKARRAY logic to unlink the socket from the map when unhashing or closing the socket. Previously we did it only when at least one socket from the group was in a BPF map. It is worth noting that this doesn't conflict with SOCKMAP tear-down in case a socket is in a SOCKMAP and belongs to a reuseport group. SOCKMAP tear-down happens first: prot->unhash `- tcp_bpf_unhash |- tcp_bpf_remove | `- while (sk_psock_link_pop(psock)) | `- sk_psock_unlink | `- sock_map_delete_from_link | `- __sock_map_delete | `- sock_map_unref | `- sk_psock_put | `- sk_psock_drop | `- rcu_assign_sk_user_data(sk, NULL) `- inet_unhash `- reuseport_detach_sock `- bpf_sk_reuseport_detach `- WRITE_ONCE(sk->sk_user_data, NULL) Suggested-by: Martin Lau Signed-off-by: Jakub Sitnicki Acked-by: Martin KaFai Lau --- include/net/sock_reuseport.h | 2 -- kernel/bpf/reuseport_array.c | 5 ---- net/core/filter.c | 12 +-------- net/core/sock_reuseport.c | 50 +++++++++++++++--------------------- 4 files changed, 22 insertions(+), 47 deletions(-) diff --git a/include/net/sock_reuseport.h b/include/net/sock_reuseport.h index 43f4a818d88f..3ecaa15d1850 100644 --- a/include/net/sock_reuseport.h +++ b/include/net/sock_reuseport.h @@ -55,6 +55,4 @@ static inline bool reuseport_has_conns(struct sock *sk, bool set) return ret; } -int reuseport_get_id(struct sock_reuseport *reuse); - #endif /* _SOCK_REUSEPORT_H */ diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c index 50c083ba978c..01badd3eda7a 100644 --- a/kernel/bpf/reuseport_array.c +++ b/kernel/bpf/reuseport_array.c @@ -305,11 +305,6 @@ int bpf_fd_reuseport_array_update_elem(struct bpf_map *map, void *key, if (err) goto put_file_unlock; - /* Ensure reuse->reuseport_id is set */ - err = reuseport_get_id(reuse); - if (err < 0) - goto put_file_unlock; - WRITE_ONCE(nsk->sk_user_data, &array->ptrs[index]); rcu_assign_pointer(array->ptrs[index], nsk); free_osk = osk; diff --git a/net/core/filter.c b/net/core/filter.c index 261d33560b14..4a77834e0f93 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -8648,18 +8648,8 @@ BPF_CALL_4(sk_select_reuseport, struct sk_reuseport_kern *, reuse_kern, } if (unlikely(reuse->reuseport_id != reuse_kern->reuseport_id)) { - struct sock *sk; - - if (unlikely(!reuse_kern->reuseport_id)) - /* There is a small race between adding the - * sk to the map and setting the - * reuse_kern->reuseport_id. - * Treat it as the sk has not been added to - * the bpf map yet. - */ - return -ENOENT; + struct sock *sk = reuse_kern->sk; - sk = reuse_kern->sk; if (sk->sk_protocol != selected_sk->sk_protocol) return -EPROTOTYPE; else if (sk->sk_family != selected_sk->sk_family) diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c index f19f179538b9..8d928d632ac5 100644 --- a/net/core/sock_reuseport.c +++ b/net/core/sock_reuseport.c @@ -16,27 +16,8 @@ DEFINE_SPINLOCK(reuseport_lock); -#define REUSEPORT_MIN_ID 1 static DEFINE_IDA(reuseport_ida); -int reuseport_get_id(struct sock_reuseport *reuse) -{ - int id; - - if (reuse->reuseport_id) - return reuse->reuseport_id; - - id = ida_simple_get(&reuseport_ida, REUSEPORT_MIN_ID, 0, - /* Called under reuseport_lock */ - GFP_ATOMIC); - if (id < 0) - return id; - - reuse->reuseport_id = id; - - return reuse->reuseport_id; -} - static struct sock_reuseport *__reuseport_alloc(unsigned int max_socks) { unsigned int size = sizeof(struct sock_reuseport) + @@ -55,6 +36,7 @@ static struct sock_reuseport *__reuseport_alloc(unsigned int max_socks) int reuseport_alloc(struct sock *sk, bool bind_inany) { struct sock_reuseport *reuse; + int id, ret = 0; /* bh lock used since this function call may precede hlist lock in * soft irq of receive path or setsockopt from process context @@ -78,10 +60,18 @@ int reuseport_alloc(struct sock *sk, bool bind_inany) reuse = __reuseport_alloc(INIT_SOCKS); if (!reuse) { - spin_unlock_bh(&reuseport_lock); - return -ENOMEM; + ret = -ENOMEM; + goto out; } + id = ida_alloc(&reuseport_ida, GFP_ATOMIC); + if (id < 0) { + kfree(reuse); + ret = id; + goto out; + } + + reuse->reuseport_id = id; reuse->socks[0] = sk; reuse->num_socks = 1; reuse->bind_inany = bind_inany; @@ -90,7 +80,7 @@ int reuseport_alloc(struct sock *sk, bool bind_inany) out: spin_unlock_bh(&reuseport_lock); - return 0; + return ret; } EXPORT_SYMBOL(reuseport_alloc); @@ -135,8 +125,7 @@ static void reuseport_free_rcu(struct rcu_head *head) reuse = container_of(head, struct sock_reuseport, rcu); sk_reuseport_prog_free(rcu_dereference_protected(reuse->prog, 1)); - if (reuse->reuseport_id) - ida_simple_remove(&reuseport_ida, reuse->reuseport_id); + ida_free(&reuseport_ida, reuse->reuseport_id); kfree(reuse); } @@ -200,12 +189,15 @@ void reuseport_detach_sock(struct sock *sk) reuse = rcu_dereference_protected(sk->sk_reuseport_cb, lockdep_is_held(&reuseport_lock)); - /* At least one of the sk in this reuseport group is added to - * a bpf map. Notify the bpf side. The bpf map logic will - * remove the sk if it is indeed added to a bpf map. + /* Notify the bpf side. The sk may be added to a sockarray + * map. If so, sockarray logic will remove it from the map. + * + * Other bpf map types that work with reuseport, like sockmap, + * don't need an explicit callback from here. They override sk + * unhash/close ops to remove the sk from the map before we + * get to this point. */ - if (reuse->reuseport_id) - bpf_sk_reuseport_detach(sk); + bpf_sk_reuseport_detach(sk); rcu_assign_pointer(sk->sk_reuseport_cb, NULL); From patchwork Thu Jan 23 15:55:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Sitnicki X-Patchwork-Id: 1228213 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=quarantine dis=none) header.from=cloudflare.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=cloudflare.com header.i=@cloudflare.com header.a=rsa-sha256 header.s=google header.b=hSmMaS84; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 483RhB6XZpz9sPW for ; Fri, 24 Jan 2020 02:55:54 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729094AbgAWPzx (ORCPT ); Thu, 23 Jan 2020 10:55:53 -0500 Received: from mail-wr1-f67.google.com ([209.85.221.67]:40069 "EHLO mail-wr1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729075AbgAWPzw (ORCPT ); Thu, 23 Jan 2020 10:55:52 -0500 Received: by mail-wr1-f67.google.com with SMTP id c14so3637900wrn.7 for ; Thu, 23 Jan 2020 07:55:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=4CXJwdpYP+mrY90uu9dxiHkdaf1zG454XQX7axBVn84=; b=hSmMaS84Gv1cVM3oKhryJYHkrp+IBmyxVnDTyOEANgvyF2EPs+LT3NXUychdR5VBJq 2oUe2bx2TyH9H+t78fYBzOXNqaDVCJ3F1axqG6UE7Md/xIUiDfSpvBzHKfM8KD5sbuGd 3+f9ZvmF5PoJe4OHOeOUpLASxgrJv71hr8/QA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4CXJwdpYP+mrY90uu9dxiHkdaf1zG454XQX7axBVn84=; b=lm7TF0SrRKd2LE5zSN/JcaYBTGOLDgKv/M4BNVYlnirXqAQ2EVe7L/yfESJ4wiU1cX iCKqFQXh8Gqp2nmcS/PY3p6GYZjd7ccwcdJXGeOOA4XNxW56n5swnEsRzZkpmD7KSArI GBtbjpCgCamw2P1wOaUxG2EeOymPRqVjX2l5tfi0BwnADuuk0EPMTnQBVYPogl51fOs8 d8oHr/9ZUWNjQDQVYlL+UBDY7FWCRtiRLNd6Z0xr4p99qAsZlV9dJYSSuNvQeoo4vgon VBCZlXERorQ7fb0zO2Az1b8q2iXrhhUeUY9m1iZOabcBWr1YotUOgekoy70SSC+0WOfu AyGQ== X-Gm-Message-State: APjAAAVY6FbgCAaXa0KwcyNCwUdX+Mqg1m8vSfYgc98EBZhQLbnjKgTp LaN7a6RG0npBcalWRuOvWJ9qrYLnc9Uerw== X-Google-Smtp-Source: APXvYqxKr2cH5CNE2xIlh80k4WJwDKXpOW2kBYfH2vy+YfX7m7UhESniiQGebGeqe3hR0Si+iYzDDQ== X-Received: by 2002:a5d:6ca1:: with SMTP id a1mr17959120wra.36.1579794950225; Thu, 23 Jan 2020 07:55:50 -0800 (PST) Received: from cloudflare.com ([176.221.114.230]) by smtp.gmail.com with ESMTPSA id x7sm3431375wrq.41.2020.01.23.07.55.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2020 07:55:49 -0800 (PST) From: Jakub Sitnicki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, kernel-team@cloudflare.com, John Fastabend , Lorenz Bauer , Martin Lau Subject: [PATCH bpf-next v4 11/12] selftests/bpf: Extend SK_REUSEPORT tests to cover SOCKMAP Date: Thu, 23 Jan 2020 16:55:33 +0100 Message-Id: <20200123155534.114313-12-jakub@cloudflare.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200123155534.114313-1-jakub@cloudflare.com> References: <20200123155534.114313-1-jakub@cloudflare.com> MIME-Version: 1.0 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Parametrize the SK_REUSEPORT tests so that the map type for storing sockets is not hard-coded in the test setup routine. This, together with careful state cleaning after the tests, let's us run the test cases once with REUSEPORT_ARRAY and once with SOCKMAP (TCP only), to have test coverage for the latter as well. Acked-by: John Fastabend Signed-off-by: Jakub Sitnicki --- .../bpf/prog_tests/select_reuseport.c | 60 +++++++++++++++---- 1 file changed, 50 insertions(+), 10 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/select_reuseport.c b/tools/testing/selftests/bpf/prog_tests/select_reuseport.c index 2c37ae7dc214..e7b4abfca2ab 100644 --- a/tools/testing/selftests/bpf/prog_tests/select_reuseport.c +++ b/tools/testing/selftests/bpf/prog_tests/select_reuseport.c @@ -36,6 +36,7 @@ static int result_map, tmp_index_ovr_map, linum_map, data_check_map; static enum result expected_results[NR_RESULTS]; static int sk_fds[REUSEPORT_ARRAY_SIZE]; static int reuseport_array = -1, outer_map = -1; +static enum bpf_map_type inner_map_type; static int select_by_skb_data_prog; static int saved_tcp_syncookie = -1; static struct bpf_object *obj; @@ -63,13 +64,15 @@ static union sa46 { } \ }) -static int create_maps(void) +static int create_maps(enum bpf_map_type inner_type) { struct bpf_create_map_attr attr = {}; + inner_map_type = inner_type; + /* Creating reuseport_array */ attr.name = "reuseport_array"; - attr.map_type = BPF_MAP_TYPE_REUSEPORT_SOCKARRAY; + attr.map_type = inner_type; attr.key_size = sizeof(__u32); attr.value_size = sizeof(__u32); attr.max_entries = REUSEPORT_ARRAY_SIZE; @@ -694,12 +697,34 @@ static void cleanup_per_test(bool no_inner_map) static void cleanup(void) { - if (outer_map != -1) + if (outer_map != -1) { close(outer_map); - if (reuseport_array != -1) + outer_map = -1; + } + + if (reuseport_array != -1) { close(reuseport_array); - if (obj) + reuseport_array = -1; + } + + if (obj) { bpf_object__close(obj); + obj = NULL; + } + + memset(expected_results, 0, sizeof(expected_results)); +} + +static const char *maptype_str(enum bpf_map_type type) +{ + switch (type) { + case BPF_MAP_TYPE_REUSEPORT_SOCKARRAY: + return "reuseport_sockarray"; + case BPF_MAP_TYPE_SOCKMAP: + return "sockmap"; + default: + return "unknown"; + } } static const char *family_str(sa_family_t family) @@ -747,13 +772,21 @@ static void test_config(int sotype, sa_family_t family, bool inany) const struct test *t; for (t = tests; t < tests + ARRAY_SIZE(tests); t++) { - snprintf(s, sizeof(s), "%s/%s %s %s", + snprintf(s, sizeof(s), "%s %s/%s %s %s", + maptype_str(inner_map_type), family_str(family), sotype_str(sotype), inany ? "INANY" : "LOOPBACK", t->name); if (!test__start_subtest(s)) continue; + if (sotype == SOCK_DGRAM && + inner_map_type == BPF_MAP_TYPE_SOCKMAP) { + /* SOCKMAP doesn't support UDP yet */ + test__skip(); + continue; + } + setup_per_test(sotype, family, inany, t->no_inner_map); t->fn(sotype, family); cleanup_per_test(t->no_inner_map); @@ -782,13 +815,20 @@ static void test_all(void) test_config(c->sotype, c->family, c->inany); } -void test_select_reuseport(void) +void test_map_type(enum bpf_map_type mt) { - if (create_maps()) + if (create_maps(mt)) goto out; if (prepare_bpf_obj()) goto out; + test_all(); +out: + cleanup(); +} + +void test_select_reuseport(void) +{ saved_tcp_fo = read_int_sysctl(TCP_FO_SYSCTL); saved_tcp_syncookie = read_int_sysctl(TCP_SYNCOOKIE_SYSCTL); if (saved_tcp_syncookie < 0 || saved_tcp_syncookie < 0) @@ -799,8 +839,8 @@ void test_select_reuseport(void) if (disable_syncookie()) goto out; - test_all(); + test_map_type(BPF_MAP_TYPE_REUSEPORT_SOCKARRAY); + test_map_type(BPF_MAP_TYPE_SOCKMAP); out: - cleanup(); restore_sysctls(); } From patchwork Thu Jan 23 15:55:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Sitnicki X-Patchwork-Id: 1228215 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=quarantine dis=none) header.from=cloudflare.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=cloudflare.com header.i=@cloudflare.com header.a=rsa-sha256 header.s=google header.b=ngA0JN3e; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 483RhG0yM8z9sSH for ; Fri, 24 Jan 2020 02:55:58 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729108AbgAWPz5 (ORCPT ); Thu, 23 Jan 2020 10:55:57 -0500 Received: from mail-wr1-f67.google.com ([209.85.221.67]:32902 "EHLO mail-wr1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729093AbgAWPzz (ORCPT ); Thu, 23 Jan 2020 10:55:55 -0500 Received: by mail-wr1-f67.google.com with SMTP id b6so3696824wrq.0 for ; Thu, 23 Jan 2020 07:55:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=cWleYZIYblVIhn6HrxbAx1lq1Ij70GZ2QhRCyCGkiGY=; b=ngA0JN3eEAi38NP6c0TbI97J98truZZ5m1lE68b9Bl7XaS0mk7An4jNhiJk4eLFILE jdFD39Yi2vQRxxNqd60D5uM/vODAUnbrmL3msv9QuRE9yq+JfKqVtQqG8ZrDeqCI7Yzr KERfNmGqEPMOIXPSBNkDWv/maoUUWA45F9OZk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cWleYZIYblVIhn6HrxbAx1lq1Ij70GZ2QhRCyCGkiGY=; b=PcKoKgZoRCPtgdzASXl5CoI6RdiM/fA9FWOiSHXTdmKcHZpVYV+BUmX5/S0sICcwdo wqjmPdYcEFM1chH2AJnYWn2yl5LN1frgpfSRaRKWpwfI/sOOvi48AczY844DXRnlB+Bv nMyHn/kDeDNS0i2yfr4/+sam7wXomU3Cae78YScXOIOy64hDWq9ywKGwX8caUM6zF5TO MutEsOYP+baNvNfYtmOXH1gtci8OmEsktNz2xQeqNgl897YX4oYfN/isvaOhANGVHflY kDbi0o4RJRFkGb6+TvA6SLTSe7bZYB7sc+9Uh9YYCmTfCCW1kw72TRVhZhxQOHWybDBz PwcQ== X-Gm-Message-State: APjAAAUX8YNZgdQfLgWnFcpqjqYMW83BV8zVWHISBAMG6FF5NZuCRU5J /fKZkmp+5zzk7QHGIvseZlveMA== X-Google-Smtp-Source: APXvYqz6Daz2tv51NM/U6ll/5b/pnlP1Tn0Yo6Uer53YMkUXWBX+ff85GvkrlR77445Q3/SZAHyCJQ== X-Received: by 2002:adf:f491:: with SMTP id l17mr19602054wro.149.1579794951582; Thu, 23 Jan 2020 07:55:51 -0800 (PST) Received: from cloudflare.com ([176.221.114.230]) by smtp.gmail.com with ESMTPSA id o187sm3633611wme.36.2020.01.23.07.55.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2020 07:55:51 -0800 (PST) From: Jakub Sitnicki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, kernel-team@cloudflare.com, John Fastabend , Lorenz Bauer , Martin Lau Subject: [PATCH bpf-next v4 12/12] selftests/bpf: Tests for SOCKMAP holding listening sockets Date: Thu, 23 Jan 2020 16:55:34 +0100 Message-Id: <20200123155534.114313-13-jakub@cloudflare.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200123155534.114313-1-jakub@cloudflare.com> References: <20200123155534.114313-1-jakub@cloudflare.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Now that SOCKMAP can store listening sockets, user-space and BPF API is open to a new set of potential pitfalls. Exercise the map operations (with extra attention to code paths susceptible to races between map ops and socket cloning), and BPF helpers that work with SOCKMAP to gain confidence that all works as expected. Signed-off-by: Jakub Sitnicki --- .../selftests/bpf/prog_tests/sockmap_listen.c | 1455 +++++++++++++++++ .../selftests/bpf/progs/test_sockmap_listen.c | 77 + 2 files changed, 1532 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/sockmap_listen.c create mode 100644 tools/testing/selftests/bpf/progs/test_sockmap_listen.c diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c new file mode 100644 index 000000000000..93c74a528cc9 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c @@ -0,0 +1,1455 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2020 Cloudflare +/* + * Test suite for SOCKMAP holding listening sockets. Covers: + * 1. BPF map operations - bpf_map_{update,lookup delete}_elem + * 2. BPF redirect helpers - bpf_{sk,msg}_redirect_map + * 3. BPF reuseport helper - bpf_sk_select_reuseport + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#include "bpf_util.h" +#include "test_progs.h" +#include "test_sockmap_listen.skel.h" + +#define MAX_STRERR_LEN 256 +#define MAX_TEST_NAME 80 + +#define _FAIL(errnum, fmt...) \ + ({ \ + error_at_line(0, (errnum), __func__, __LINE__, fmt); \ + CHECK_FAIL(true); \ + }) +#define FAIL(fmt...) _FAIL(0, fmt) +#define FAIL_ERRNO(fmt...) _FAIL(errno, fmt) +#define FAIL_LIBBPF(err, msg) \ + ({ \ + char __buf[MAX_STRERR_LEN]; \ + libbpf_strerror((err), __buf, sizeof(__buf)); \ + FAIL("%s: %s", (msg), __buf); \ + }) + +/* Wrappers that fail the test on error and report it. */ + +#define xaccept(fd, addr, len) \ + ({ \ + int __ret = accept((fd), (addr), (len)); \ + if (__ret == -1) \ + FAIL_ERRNO("accept"); \ + __ret; \ + }) + +#define xbind(fd, addr, len) \ + ({ \ + int __ret = bind((fd), (addr), (len)); \ + if (__ret == -1) \ + FAIL_ERRNO("bind"); \ + __ret; \ + }) + +#define xclose(fd) \ + ({ \ + int __ret = close((fd)); \ + if (__ret == -1) \ + FAIL_ERRNO("close"); \ + __ret; \ + }) + +#define xconnect(fd, addr, len) \ + ({ \ + int __ret = connect((fd), (addr), (len)); \ + if (__ret == -1) \ + FAIL_ERRNO("connect"); \ + __ret; \ + }) + +#define xgetsockname(fd, addr, len) \ + ({ \ + int __ret = getsockname((fd), (addr), (len)); \ + if (__ret == -1) \ + FAIL_ERRNO("getsockname"); \ + __ret; \ + }) + +#define xgetsockopt(fd, level, name, val, len) \ + ({ \ + int __ret = getsockopt((fd), (level), (name), (val), (len)); \ + if (__ret == -1) \ + FAIL_ERRNO("getsockopt(" #name ")"); \ + __ret; \ + }) + +#define xlisten(fd, backlog) \ + ({ \ + int __ret = listen((fd), (backlog)); \ + if (__ret == -1) \ + FAIL_ERRNO("listen"); \ + __ret; \ + }) + +#define xsetsockopt(fd, level, name, val, len) \ + ({ \ + int __ret = setsockopt((fd), (level), (name), (val), (len)); \ + if (__ret == -1) \ + FAIL_ERRNO("setsockopt(" #name ")"); \ + __ret; \ + }) + +#define xsocket(family, sotype, flags) \ + ({ \ + int __ret = socket(family, sotype, flags); \ + if (__ret == -1) \ + FAIL_ERRNO("socket"); \ + __ret; \ + }) + +#define xbpf_map_delete_elem(fd, key) \ + ({ \ + int __ret = bpf_map_delete_elem((fd), (key)); \ + if (__ret == -1) \ + FAIL_ERRNO("map_delete"); \ + __ret; \ + }) + +#define xbpf_map_lookup_elem(fd, key, val) \ + ({ \ + int __ret = bpf_map_lookup_elem((fd), (key), (val)); \ + if (__ret == -1) \ + FAIL_ERRNO("map_lookup"); \ + __ret; \ + }) + +#define xbpf_map_update_elem(fd, key, val, flags) \ + ({ \ + int __ret = bpf_map_update_elem((fd), (key), (val), (flags)); \ + if (__ret == -1) \ + FAIL_ERRNO("map_update"); \ + __ret; \ + }) + +#define xbpf_prog_attach(prog, target, type, flags) \ + ({ \ + int __ret = \ + bpf_prog_attach((prog), (target), (type), (flags)); \ + if (__ret == -1) \ + FAIL_ERRNO("prog_attach(" #type ")"); \ + __ret; \ + }) + +#define xbpf_prog_detach2(prog, target, type) \ + ({ \ + int __ret = bpf_prog_detach2((prog), (target), (type)); \ + if (__ret == -1) \ + FAIL_ERRNO("prog_detach2(" #type ")"); \ + __ret; \ + }) + +#define xpthread_create(thread, attr, func, arg) \ + ({ \ + int __ret = pthread_create((thread), (attr), (func), (arg)); \ + errno = __ret; \ + if (__ret) \ + FAIL_ERRNO("pthread_create"); \ + __ret; \ + }) + +#define xpthread_join(thread, retval) \ + ({ \ + int __ret = pthread_join((thread), (retval)); \ + errno = __ret; \ + if (__ret) \ + FAIL_ERRNO("pthread_join"); \ + __ret; \ + }) + +static void init_addr_loopback4(struct sockaddr_storage *ss, socklen_t *len) +{ + struct sockaddr_in *addr4 = memset(ss, 0, sizeof(*ss)); + + addr4->sin_family = AF_INET; + addr4->sin_port = 0; + addr4->sin_addr.s_addr = htonl(INADDR_LOOPBACK); + *len = sizeof(*addr4); +} + +static void init_addr_loopback6(struct sockaddr_storage *ss, socklen_t *len) +{ + struct sockaddr_in6 *addr6 = memset(ss, 0, sizeof(*ss)); + + addr6->sin6_family = AF_INET6; + addr6->sin6_port = 0; + addr6->sin6_addr = in6addr_loopback; + *len = sizeof(*addr6); +} + +static void init_addr_loopback(int family, struct sockaddr_storage *ss, + socklen_t *len) +{ + switch (family) { + case AF_INET: + init_addr_loopback4(ss, len); + return; + case AF_INET6: + init_addr_loopback6(ss, len); + return; + default: + FAIL("unsupported address family %d", family); + } +} + +static inline struct sockaddr *sockaddr(struct sockaddr_storage *ss) +{ + return (struct sockaddr *)ss; +} + +static int enable_reuseport(int s, int progfd) +{ + int err, one = 1; + + err = xsetsockopt(s, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one)); + if (err) + return -1; + err = xsetsockopt(s, SOL_SOCKET, SO_ATTACH_REUSEPORT_EBPF, &progfd, + sizeof(progfd)); + if (err) + return -1; + + return 0; +} + +static int listen_loopback_reuseport(int family, int sotype, int progfd) +{ + struct sockaddr_storage addr; + socklen_t len; + int err, s; + + init_addr_loopback(family, &addr, &len); + + s = xsocket(family, sotype, 0); + if (s == -1) + return -1; + + if (progfd >= 0) + enable_reuseport(s, progfd); + + err = xbind(s, sockaddr(&addr), len); + if (err) + goto close; + + err = xlisten(s, SOMAXCONN); + if (err) + goto close; + + return s; +close: + xclose(s); + return -1; +} + +static int listen_loopback(int family, int sotype) +{ + return listen_loopback_reuseport(family, sotype, -1); +} + +static void test_sockmap_insert_invalid(int family, int sotype, int mapfd) +{ + u32 key = 0; + u64 value; + int err; + + value = -1; + err = bpf_map_update_elem(mapfd, &key, &value, BPF_NOEXIST); + if (!err || errno != EINVAL) + FAIL_ERRNO("map_update: expected EINVAL"); + + value = INT_MAX; + err = bpf_map_update_elem(mapfd, &key, &value, BPF_NOEXIST); + if (!err || errno != EBADF) + FAIL_ERRNO("map_update: expected EBADF"); +} + +static void test_sockmap_insert_opened(int family, int sotype, int mapfd) +{ + u32 key = 0; + u64 value; + int err, s; + + s = xsocket(family, sotype, 0); + if (s == -1) + return; + + errno = 0; + value = s; + err = bpf_map_update_elem(mapfd, &key, &value, BPF_NOEXIST); + if (!err || errno != EOPNOTSUPP) + FAIL_ERRNO("map_update: expected EOPNOTSUPP"); + + xclose(s); +} + +static void test_sockmap_insert_bound(int family, int sotype, int mapfd) +{ + struct sockaddr_storage addr; + socklen_t len; + u32 key = 0; + u64 value; + int err, s; + + init_addr_loopback(family, &addr, &len); + + s = xsocket(family, sotype, 0); + if (s == -1) + return; + + err = xbind(s, sockaddr(&addr), len); + if (err) + goto close; + + errno = 0; + value = s; + err = bpf_map_update_elem(mapfd, &key, &value, BPF_NOEXIST); + if (!err || errno != EOPNOTSUPP) + FAIL_ERRNO("map_update: expected EOPNOTSUPP"); +close: + xclose(s); +} + +static void test_sockmap_insert_listening(int family, int sotype, int mapfd) +{ + u64 value; + u32 key; + int s; + + s = listen_loopback(family, sotype); + if (s < 0) + return; + + key = 0; + value = s; + xbpf_map_update_elem(mapfd, &key, &value, BPF_NOEXIST); + xclose(s); +} + +static void test_sockmap_delete_after_insert(int family, int sotype, int mapfd) +{ + u64 value; + u32 key; + int s; + + s = listen_loopback(family, sotype); + if (s < 0) + return; + + key = 0; + value = s; + xbpf_map_update_elem(mapfd, &key, &value, BPF_NOEXIST); + xbpf_map_delete_elem(mapfd, &key); + xclose(s); +} + +static void test_sockmap_delete_after_close(int family, int sotype, int mapfd) +{ + int err, s; + u64 value; + u32 key; + + s = listen_loopback(family, sotype); + if (s < 0) + return; + + key = 0; + value = s; + xbpf_map_update_elem(mapfd, &key, &value, BPF_NOEXIST); + + xclose(s); + + errno = 0; + err = bpf_map_delete_elem(mapfd, &key); + if (!err || errno != EINVAL) + FAIL_ERRNO("map_delete: expected EINVAL"); +} + +static void test_sockmap_lookup_after_insert(int family, int sotype, int mapfd) +{ + u64 cookie, value; + socklen_t len; + u32 key; + int s; + + s = listen_loopback(family, sotype); + if (s < 0) + return; + + key = 0; + value = s; + xbpf_map_update_elem(mapfd, &key, &value, BPF_NOEXIST); + + len = sizeof(cookie); + xgetsockopt(s, SOL_SOCKET, SO_COOKIE, &cookie, &len); + + xbpf_map_lookup_elem(mapfd, &key, &value); + + if (value != cookie) { + FAIL("map_lookup: have %#llx, want %#llx", + (unsigned long long)value, (unsigned long long)cookie); + } + + xclose(s); +} + +static void test_sockmap_lookup_after_delete(int family, int sotype, int mapfd) +{ + int err, s; + u64 value; + u32 key; + + s = listen_loopback(family, sotype); + if (s < 0) + return; + + key = 0; + value = s; + xbpf_map_update_elem(mapfd, &key, &value, BPF_NOEXIST); + xbpf_map_delete_elem(mapfd, &key); + + errno = 0; + err = bpf_map_lookup_elem(mapfd, &key, &value); + if (!err || errno != ENOENT) + FAIL_ERRNO("map_lookup: expected ENOENT"); + + xclose(s); +} + +static void test_sockmap_lookup_32_bit_value(int family, int sotype, int mapfd) +{ + u32 key, value32; + int err, s; + + s = listen_loopback(family, sotype); + if (s < 0) + return; + + mapfd = bpf_create_map(BPF_MAP_TYPE_SOCKMAP, sizeof(key), + sizeof(value32), 1, 0); + if (mapfd < 0) { + FAIL_ERRNO("map_create"); + goto close; + } + + key = 0; + value32 = s; + xbpf_map_update_elem(mapfd, &key, &value32, BPF_NOEXIST); + + errno = 0; + err = bpf_map_lookup_elem(mapfd, &key, &value32); + if (!err || errno != ENOSPC) + FAIL_ERRNO("map_lookup: expected ENOSPC"); + + xclose(mapfd); +close: + xclose(s); +} + +static void test_sockmap_update_listening(int family, int sotype, int mapfd) +{ + int s1, s2; + u64 value; + u32 key; + + s1 = listen_loopback(family, sotype); + if (s1 < 0) + return; + + s2 = listen_loopback(family, sotype); + if (s2 < 0) + goto close_s1; + + key = 0; + value = s1; + xbpf_map_update_elem(mapfd, &key, &value, BPF_NOEXIST); + + value = s2; + xbpf_map_update_elem(mapfd, &key, &value, BPF_EXIST); + xclose(s2); +close_s1: + xclose(s1); +} + +/* Exercise the code path where we destroy child sockets that never + * got accept()'ed, aka orphans, when parent socket gets closed. + */ +static void test_sockmap_destroy_orphan_child(int family, int sotype, int mapfd) +{ + struct sockaddr_storage addr; + socklen_t len; + int err, s, c; + u64 value; + u32 key; + + s = listen_loopback(family, sotype); + if (s < 0) + return; + + len = sizeof(addr); + err = xgetsockname(s, sockaddr(&addr), &len); + if (err) + goto close_srv; + + key = 0; + value = s; + xbpf_map_update_elem(mapfd, &key, &value, BPF_NOEXIST); + + c = xsocket(family, sotype, 0); + if (c == -1) + goto close_srv; + + xconnect(c, sockaddr(&addr), len); + xclose(c); +close_srv: + xclose(s); +} + +/* Perform a passive open after removing listening socket from SOCKMAP + * to ensure that callbacks get restored properly. + */ +static void test_sockmap_clone_after_delete(int family, int sotype, int mapfd) +{ + struct sockaddr_storage addr; + socklen_t len; + int err, s, c; + u64 value; + u32 key; + + s = listen_loopback(family, sotype); + if (s < 0) + return; + + len = sizeof(addr); + err = xgetsockname(s, sockaddr(&addr), &len); + if (err) + goto close_srv; + + key = 0; + value = s; + xbpf_map_update_elem(mapfd, &key, &value, BPF_NOEXIST); + xbpf_map_delete_elem(mapfd, &key); + + c = xsocket(family, sotype, 0); + if (c < 0) + goto close_srv; + + xconnect(c, sockaddr(&addr), len); + xclose(c); +close_srv: + xclose(s); +} + +/* Check that child socket that got created while parent was in a + * SOCKMAP, but got accept()'ed only after the parent has been removed + * from SOCKMAP, gets cloned without parent psock state or callbacks. + */ +static void test_sockmap_accept_after_delete(int family, int sotype, int mapfd) +{ + struct sockaddr_storage addr; + const u32 zero = 0; + int err, s, c, p; + socklen_t len; + u64 value; + + s = listen_loopback(family, sotype); + if (s == -1) + return; + + len = sizeof(addr); + err = xgetsockname(s, sockaddr(&addr), &len); + if (err) + goto close_srv; + + value = s; + err = xbpf_map_update_elem(mapfd, &zero, &value, BPF_NOEXIST); + if (err) + goto close_srv; + + c = xsocket(family, sotype, 0); + if (c == -1) + goto close_srv; + + /* Create child while parent is in sockmap */ + err = xconnect(c, sockaddr(&addr), len); + if (err) + goto close_cli; + + /* Remove parent from sockmap */ + err = xbpf_map_delete_elem(mapfd, &zero); + if (err) + goto close_cli; + + p = xaccept(s, NULL, NULL); + if (p == -1) + goto close_cli; + + /* Check that child sk_user_data is not set */ + value = p; + xbpf_map_update_elem(mapfd, &zero, &value, BPF_NOEXIST); + + xclose(p); +close_cli: + xclose(c); +close_srv: + xclose(s); +} + +/* Check that child socket that got created and accepted while parent + * was in a SOCKMAP is cloned without parent psock state or callbacks. + */ +static void test_sockmap_accept_before_delete(int family, int sotype, int mapfd) +{ + struct sockaddr_storage addr; + const u32 zero = 0, one = 1; + int err, s, c, p; + socklen_t len; + u64 value; + + s = listen_loopback(family, sotype); + if (s == -1) + return; + + len = sizeof(addr); + err = xgetsockname(s, sockaddr(&addr), &len); + if (err) + goto close_srv; + + value = s; + err = xbpf_map_update_elem(mapfd, &zero, &value, BPF_NOEXIST); + if (err) + goto close_srv; + + c = xsocket(family, sotype, 0); + if (c == -1) + goto close_srv; + + /* Create & accept child while parent is in sockmap */ + err = xconnect(c, sockaddr(&addr), len); + if (err) + goto close_cli; + + p = xaccept(s, NULL, NULL); + if (p == -1) + goto close_cli; + + /* Check that child sk_user_data is not set */ + value = p; + xbpf_map_update_elem(mapfd, &one, &value, BPF_NOEXIST); + + xclose(p); +close_cli: + xclose(c); +close_srv: + xclose(s); +} + +struct connect_accept_ctx { + int sockfd; + unsigned int done; + unsigned int nr_iter; +}; + +static bool is_thread_done(struct connect_accept_ctx *ctx) +{ + return READ_ONCE(ctx->done); +} + +static void *connect_accept_thread(void *arg) +{ + struct connect_accept_ctx *ctx = arg; + struct sockaddr_storage addr; + int family, socktype; + socklen_t len; + int err, i, s; + + s = ctx->sockfd; + + len = sizeof(addr); + err = xgetsockname(s, sockaddr(&addr), &len); + if (err) + goto done; + + len = sizeof(family); + err = xgetsockopt(s, SOL_SOCKET, SO_DOMAIN, &family, &len); + if (err) + goto done; + + len = sizeof(socktype); + err = xgetsockopt(s, SOL_SOCKET, SO_TYPE, &socktype, &len); + if (err) + goto done; + + for (i = 0; i < ctx->nr_iter; i++) { + int c, p; + + c = xsocket(family, socktype, 0); + if (c < 0) + break; + + err = xconnect(c, (struct sockaddr *)&addr, sizeof(addr)); + if (err) { + xclose(c); + break; + } + + p = xaccept(s, NULL, NULL); + if (p < 0) { + xclose(c); + break; + } + + xclose(p); + xclose(c); + } +done: + WRITE_ONCE(ctx->done, 1); + return NULL; +} + +static void test_sockmap_syn_recv_insert_delete(int family, int sotype, + int mapfd) +{ + struct connect_accept_ctx ctx = { 0 }; + struct sockaddr_storage addr; + socklen_t len; + u32 zero = 0; + pthread_t t; + int err, s; + u64 value; + + s = listen_loopback(family, sotype | SOCK_NONBLOCK); + if (s < 0) + return; + + len = sizeof(addr); + err = xgetsockname(s, sockaddr(&addr), &len); + if (err) + goto close; + + ctx.sockfd = s; + ctx.nr_iter = 1000; + + err = xpthread_create(&t, NULL, connect_accept_thread, &ctx); + if (err) + goto close; + + value = s; + while (!is_thread_done(&ctx)) { + err = xbpf_map_update_elem(mapfd, &zero, &value, BPF_NOEXIST); + if (err) + break; + + err = xbpf_map_delete_elem(mapfd, &zero); + if (err) + break; + } + + xpthread_join(t, NULL); +close: + xclose(s); +} + +static void *listen_thread(void *arg) +{ + struct sockaddr unspec = { AF_UNSPEC }; + struct connect_accept_ctx *ctx = arg; + int err, i, s; + + s = ctx->sockfd; + + for (i = 0; i < ctx->nr_iter; i++) { + err = xlisten(s, 1); + if (err) + break; + err = xconnect(s, &unspec, sizeof(unspec)); + if (err) + break; + } + + WRITE_ONCE(ctx->done, 1); + return NULL; +} + +static void test_sockmap_race_insert_listen(int family, int socktype, int mapfd) +{ + struct connect_accept_ctx ctx = { 0 }; + const u32 zero = 0; + const int one = 1; + pthread_t t; + int err, s; + u64 value; + + s = xsocket(family, socktype, 0); + if (s < 0) + return; + + err = xsetsockopt(s, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one)); + if (err) + goto close; + + ctx.sockfd = s; + ctx.nr_iter = 10000; + + err = pthread_create(&t, NULL, listen_thread, &ctx); + if (err) + goto close; + + value = s; + while (!is_thread_done(&ctx)) { + err = bpf_map_update_elem(mapfd, &zero, &value, BPF_NOEXIST); + /* Expecting EOPNOTSUPP before listen() */ + if (err && errno != EOPNOTSUPP) { + FAIL_ERRNO("map_update"); + break; + } + + err = bpf_map_delete_elem(mapfd, &zero); + /* Expecting EINVAL after unhash on connect(AF_UNSPEC) */ + if (err && errno != EINVAL) { + FAIL_ERRNO("map_delete"); + break; + } + } + + xpthread_join(t, NULL); +close: + xclose(s); +} + +static void zero_verdict_count(int mapfd) +{ + unsigned int zero = 0; + int key; + + key = SK_DROP; + xbpf_map_update_elem(mapfd, &key, &zero, BPF_ANY); + key = SK_PASS; + xbpf_map_update_elem(mapfd, &key, &zero, BPF_ANY); +} + +enum redir_mode { + REDIR_INGRESS, + REDIR_EGRESS, +}; + +static const char *redir_mode_str(enum redir_mode mode) +{ + switch (mode) { + case REDIR_INGRESS: + return "ingress"; + case REDIR_EGRESS: + return "egress"; + default: + return "unknown"; + } +} + +static void redir_to_connected(int family, int sotype, int sock_mapfd, + int verd_mapfd, enum redir_mode mode) +{ + const char *log_prefix = redir_mode_str(mode); + struct sockaddr_storage addr; + int s, c0, c1, p0, p1; + unsigned int pass; + socklen_t len; + int err, n; + u64 value; + u32 key; + char b; + + zero_verdict_count(verd_mapfd); + + s = listen_loopback(family, sotype | SOCK_NONBLOCK); + if (s < 0) + return; + + len = sizeof(addr); + err = xgetsockname(s, sockaddr(&addr), &len); + if (err) + goto close_srv; + + c0 = xsocket(family, sotype, 0); + if (c0 < 0) + goto close_srv; + err = xconnect(c0, sockaddr(&addr), len); + if (err) + goto close_cli0; + + p0 = xaccept(s, NULL, NULL); + if (p0 < 0) + goto close_cli0; + + c1 = xsocket(family, sotype, 0); + if (c1 < 0) + goto close_peer0; + err = xconnect(c1, sockaddr(&addr), len); + if (err) + goto close_cli1; + + p1 = xaccept(s, NULL, NULL); + if (p1 < 0) + goto close_cli1; + + key = 0; + value = p0; + err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); + if (err) + goto close_peer1; + + key = 1; + value = p1; + err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); + if (err) + goto close_peer1; + + n = write(mode == REDIR_INGRESS ? c1 : p1, "a", 1); + if (n < 0) + FAIL_ERRNO("%s: write", log_prefix); + if (n == 0) + FAIL("%s: incomplete write", log_prefix); + if (n < 1) + goto close_peer1; + + key = SK_PASS; + err = xbpf_map_lookup_elem(verd_mapfd, &key, &pass); + if (err) + goto close_peer1; + if (pass != 1) + FAIL("%s: want pass count 1, have %d", log_prefix, pass); + + n = read(c0, &b, 1); + if (n < 0) + FAIL_ERRNO("%s: read", log_prefix); + if (n == 0) + FAIL("%s: incomplete read", log_prefix); + +close_peer1: + xclose(p1); +close_cli1: + xclose(c1); +close_peer0: + xclose(p0); +close_cli0: + xclose(c0); +close_srv: + xclose(s); +} + +static void +test_sockmap_skb_redir_to_connected(struct test_sockmap_listen *skel, + int family, int sotype) +{ + int verdict = bpf_program__fd(skel->progs.prog_skb_verdict); + int parser = bpf_program__fd(skel->progs.prog_skb_parser); + int verdict_map = bpf_map__fd(skel->maps.verdict_map); + int sock_map = bpf_map__fd(skel->maps.sock_map); + int err; + + err = xbpf_prog_attach(parser, sock_map, BPF_SK_SKB_STREAM_PARSER, 0); + if (err) + return; + err = xbpf_prog_attach(verdict, sock_map, BPF_SK_SKB_STREAM_VERDICT, 0); + if (err) + goto detach; + + redir_to_connected(family, sotype, sock_map, verdict_map, + REDIR_INGRESS); + + xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_STREAM_VERDICT); +detach: + xbpf_prog_detach2(parser, sock_map, BPF_SK_SKB_STREAM_PARSER); +} + +static void +test_sockmap_msg_redir_to_connected(struct test_sockmap_listen *skel, + int family, int sotype) +{ + int verdict = bpf_program__fd(skel->progs.prog_msg_verdict); + int verdict_map = bpf_map__fd(skel->maps.verdict_map); + int sock_map = bpf_map__fd(skel->maps.sock_map); + int err; + + err = xbpf_prog_attach(verdict, sock_map, BPF_SK_MSG_VERDICT, 0); + if (err) + return; + + redir_to_connected(family, sotype, sock_map, verdict_map, REDIR_EGRESS); + + xbpf_prog_detach2(verdict, sock_map, BPF_SK_MSG_VERDICT); +} + +static void redir_to_listening(int family, int sotype, int sock_mapfd, + int verd_mapfd, enum redir_mode mode) +{ + const char *log_prefix = redir_mode_str(mode); + struct sockaddr_storage addr; + int s, c, p, err, n; + unsigned int drop; + socklen_t len; + u64 value; + u32 key; + + zero_verdict_count(verd_mapfd); + + s = listen_loopback(family, sotype | SOCK_NONBLOCK); + if (s < 0) + return; + + len = sizeof(addr); + err = xgetsockname(s, sockaddr(&addr), &len); + if (err) + goto close_srv; + + c = xsocket(family, sotype, 0); + if (c < 0) + goto close_srv; + err = xconnect(c, sockaddr(&addr), len); + if (err) + goto close_cli; + + p = xaccept(s, NULL, NULL); + if (p < 0) + goto close_cli; + + key = 0; + value = s; + err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); + if (err) + goto close_peer; + + key = 1; + value = p; + err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); + if (err) + goto close_peer; + + n = write(mode == REDIR_INGRESS ? c : p, "a", 1); + if (n < 0 && errno != EACCES) + FAIL_ERRNO("%s: write", log_prefix); + if (n == 0) + FAIL("%s: incomplete write", log_prefix); + if (n < 1) + goto close_peer; + + key = SK_DROP; + err = xbpf_map_lookup_elem(verd_mapfd, &key, &drop); + if (err) + goto close_peer; + if (drop != 1) + FAIL("%s: want drop count 1, have %d", log_prefix, drop); + +close_peer: + xclose(p); +close_cli: + xclose(c); +close_srv: + xclose(s); +} + +static void +test_sockmap_skb_redir_to_listening(struct test_sockmap_listen *skel, + int family, int sotype) +{ + int verdict = bpf_program__fd(skel->progs.prog_skb_verdict); + int parser = bpf_program__fd(skel->progs.prog_skb_parser); + int verdict_map = bpf_map__fd(skel->maps.verdict_map); + int sock_map = bpf_map__fd(skel->maps.sock_map); + int err; + + err = xbpf_prog_attach(parser, sock_map, BPF_SK_SKB_STREAM_PARSER, 0); + if (err) + return; + err = xbpf_prog_attach(verdict, sock_map, BPF_SK_SKB_STREAM_VERDICT, 0); + if (err) + goto detach; + + redir_to_listening(family, sotype, sock_map, verdict_map, + REDIR_INGRESS); + + xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_STREAM_VERDICT); +detach: + xbpf_prog_detach2(parser, sock_map, BPF_SK_SKB_STREAM_PARSER); +} + +static void +test_sockmap_msg_redir_to_listening(struct test_sockmap_listen *skel, + int family, int sotype) +{ + int verdict = bpf_program__fd(skel->progs.prog_msg_verdict); + int verdict_map = bpf_map__fd(skel->maps.verdict_map); + int sock_map = bpf_map__fd(skel->maps.sock_map); + int err; + + err = xbpf_prog_attach(verdict, sock_map, BPF_SK_MSG_VERDICT, 0); + if (err) + return; + + redir_to_listening(family, sotype, sock_map, verdict_map, REDIR_EGRESS); + + xbpf_prog_detach2(verdict, sock_map, BPF_SK_MSG_VERDICT); +} + +static void test_sockmap_reuseport_select_listening(int family, int sotype, + int sock_map, int verd_map, + int reuseport_prog) +{ + struct sockaddr_storage addr; + unsigned int pass; + int s, c, p, err; + socklen_t len; + u64 value; + u32 key; + + zero_verdict_count(verd_map); + + s = listen_loopback_reuseport(family, sotype, reuseport_prog); + if (s < 0) + return; + + len = sizeof(addr); + err = xgetsockname(s, sockaddr(&addr), &len); + if (err) + goto close_srv; + + key = 0; + value = s; + err = xbpf_map_update_elem(sock_map, &key, &value, BPF_NOEXIST); + if (err) + goto close_srv; + + c = xsocket(family, sotype, 0); + if (c < 0) + goto close_srv; + err = xconnect(c, sockaddr(&addr), len); + if (err) + goto close_cli; + + p = xaccept(s, NULL, NULL); + if (p < 0) + goto close_cli; + + key = SK_PASS; + err = xbpf_map_lookup_elem(verd_map, &key, &pass); + if (err) + goto close_peer; + if (pass != 1) + FAIL("want pass count 1, have %d", pass); + +close_peer: + xclose(p); +close_cli: + xclose(c); +close_srv: + xclose(s); +} + +static void test_sockmap_reuseport_select_connected(int family, int sotype, + int sock_map, int verd_map, + int reuseport_prog) +{ + struct sockaddr_storage addr; + int s, c0, c1, p0, err; + unsigned int drop; + socklen_t len; + u64 value; + u32 key; + + zero_verdict_count(verd_map); + + s = listen_loopback_reuseport(family, sotype, reuseport_prog); + if (s < 0) + return; + + /* Populate sock_map[0] to avoid ENOENT on first connection */ + key = 0; + value = s; + err = xbpf_map_update_elem(sock_map, &key, &value, BPF_NOEXIST); + if (err) + goto close_srv; + + len = sizeof(addr); + err = xgetsockname(s, sockaddr(&addr), &len); + if (err) + goto close_srv; + + c0 = xsocket(family, sotype, 0); + if (c0 < 0) + goto close_srv; + + err = xconnect(c0, sockaddr(&addr), len); + if (err) + goto close_cli0; + + p0 = xaccept(s, NULL, NULL); + if (err) + goto close_cli0; + + /* Update sock_map[0] to redirect to a connected socket */ + key = 0; + value = p0; + err = xbpf_map_update_elem(sock_map, &key, &value, BPF_EXIST); + if (err) + goto close_peer0; + + c1 = xsocket(family, sotype, 0); + if (c1 < 0) + goto close_peer0; + + errno = 0; + err = connect(c1, sockaddr(&addr), len); + if (!err || errno != ECONNREFUSED) + FAIL_ERRNO("connect: expected ECONNREFUSED"); + + key = SK_DROP; + err = xbpf_map_lookup_elem(verd_map, &key, &drop); + if (err) + goto close_cli1; + if (drop != 1) + FAIL("want drop count 1, have %d", drop); + +close_cli1: + xclose(c1); +close_peer0: + xclose(p0); +close_cli0: + xclose(c0); +close_srv: + xclose(s); +} + +/* Check that redirecting across reuseport groups is not allowed. */ +static void test_sockmap_reuseport_mixed_groups(int family, int sotype, + int sock_map, int verd_map, + int reuseport_prog) +{ + struct sockaddr_storage addr; + int s1, s2, c, err; + unsigned int drop; + socklen_t len; + u64 value; + u32 key; + + zero_verdict_count(verd_map); + + /* Create two listeners, each in its own reuseport group */ + s1 = listen_loopback_reuseport(family, sotype, reuseport_prog); + if (s1 < 0) + return; + + s2 = listen_loopback_reuseport(family, sotype, reuseport_prog); + if (s2 < 0) + goto close_srv1; + + key = 0; + value = s1; + err = xbpf_map_update_elem(sock_map, &key, &value, BPF_NOEXIST); + if (err) + goto close_srv2; + + key = 1; + value = s2; + err = xbpf_map_update_elem(sock_map, &key, &value, BPF_NOEXIST); + + /* Connect to s2, reuseport BPF selects s1 via sock_map[0] */ + len = sizeof(addr); + err = xgetsockname(s2, sockaddr(&addr), &len); + if (err) + goto close_srv2; + + c = xsocket(family, sotype, 0); + if (c < 0) + goto close_srv2; + + err = connect(c, sockaddr(&addr), len); + if (err && errno != ECONNREFUSED) { + FAIL_ERRNO("connect: expected ECONNREFUSED"); + goto close_cli; + } + + /* Expect drop, can't redirect outside of reuseport group */ + key = SK_DROP; + err = xbpf_map_lookup_elem(verd_map, &key, &drop); + if (err) + goto close_cli; + if (drop != 1) + FAIL("want drop count 1, have %d", drop); + +close_cli: + xclose(c); +close_srv2: + xclose(s2); +close_srv1: + xclose(s1); +} + +#define TEST(fn) \ + { \ + fn, #fn \ + } + +static void cleanup_sockmap_ops(int mapfd) +{ + int err; + u32 key; + + for (key = 0; key < 2; key++) { + err = bpf_map_delete_elem(mapfd, &key); + if (err && errno != EINVAL) + FAIL_ERRNO("map_delete"); + } +} + +static const char *family_str(sa_family_t family) +{ + switch (family) { + case AF_INET: + return "IPv4"; + case AF_INET6: + return "IPv6"; + default: + return "unknown"; + } +} + +static void test_sockmap_ops(struct test_sockmap_listen *skel, int family, + int sotype) +{ + const struct op_test { + void (*fn)(int family, int sotype, int sock_map); + const char *name; + } tests[] = { + /* insert */ + TEST(test_sockmap_insert_invalid), + TEST(test_sockmap_insert_opened), + TEST(test_sockmap_insert_bound), + TEST(test_sockmap_insert_listening), + /* delete */ + TEST(test_sockmap_delete_after_insert), + TEST(test_sockmap_delete_after_close), + /* lookup */ + TEST(test_sockmap_lookup_after_insert), + TEST(test_sockmap_lookup_after_delete), + TEST(test_sockmap_lookup_32_bit_value), + /* update */ + TEST(test_sockmap_update_listening), + /* races with insert/delete */ + TEST(test_sockmap_destroy_orphan_child), + TEST(test_sockmap_syn_recv_insert_delete), + TEST(test_sockmap_race_insert_listen), + /* child clone */ + TEST(test_sockmap_clone_after_delete), + TEST(test_sockmap_accept_after_delete), + TEST(test_sockmap_accept_before_delete), + }; + const struct op_test *t; + char s[MAX_TEST_NAME]; + int sock_map; + + sock_map = bpf_map__fd(skel->maps.sock_map); + + for (t = tests; t < tests + ARRAY_SIZE(tests); t++) { + snprintf(s, sizeof(s), "%s %s", family_str(family), t->name); + + if (!test__start_subtest(s)) + continue; + + t->fn(family, sotype, sock_map); + cleanup_sockmap_ops(sock_map); + } +} + +static void test_sockmap_redir(struct test_sockmap_listen *skel, int family, + int sotype) +{ + const struct redir_test { + void (*fn)(struct test_sockmap_listen *skel, int family, + int sotype); + const char *name; + } tests[] = { + TEST(test_sockmap_skb_redir_to_connected), + TEST(test_sockmap_skb_redir_to_listening), + TEST(test_sockmap_msg_redir_to_connected), + TEST(test_sockmap_msg_redir_to_listening), + }; + const struct redir_test *t; + char s[MAX_TEST_NAME]; + + for (t = tests; t < tests + ARRAY_SIZE(tests); t++) { + snprintf(s, sizeof(s), "%s %s", family_str(family), t->name); + + if (!test__start_subtest(s)) + continue; + + t->fn(skel, family, sotype); + } +} + +static void test_sockmap_reuseport(struct test_sockmap_listen *skel, int family, + int sotype) +{ + const struct reuseport_test { + void (*fn)(int family, int sotype, int sock_map, + int verdict_map, int reuseport_prog); + const char *name; + } tests[] = { + TEST(test_sockmap_reuseport_select_listening), + TEST(test_sockmap_reuseport_select_connected), + TEST(test_sockmap_reuseport_mixed_groups), + }; + int sock_map, verdict_map, reuseport_prog; + const struct reuseport_test *t; + char s[MAX_TEST_NAME]; + + sock_map = bpf_map__fd(skel->maps.sock_map); + verdict_map = bpf_map__fd(skel->maps.verdict_map); + reuseport_prog = bpf_program__fd(skel->progs.prog_reuseport); + + for (t = tests; t < tests + ARRAY_SIZE(tests); t++) { + snprintf(s, sizeof(s), "%s %s", family_str(family), t->name); + + if (!test__start_subtest(s)) + continue; + + t->fn(family, sotype, sock_map, verdict_map, reuseport_prog); + } +} + +static void run_tests(struct test_sockmap_listen *skel, int family) +{ + test_sockmap_ops(skel, family, SOCK_STREAM); + test_sockmap_redir(skel, family, SOCK_STREAM); + test_sockmap_reuseport(skel, family, SOCK_STREAM); +} + +void test_sockmap_listen(void) +{ + struct test_sockmap_listen *skel; + + skel = test_sockmap_listen__open_and_load(); + if (!skel) { + FAIL("skeleton open/load failed"); + return; + } + + run_tests(skel, AF_INET); + run_tests(skel, AF_INET6); + + test_sockmap_listen__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c new file mode 100644 index 000000000000..807f4857b975 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c @@ -0,0 +1,77 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2020 Cloudflare + +#include +#include + +#include + +struct { + __uint(type, BPF_MAP_TYPE_SOCKMAP); + __uint(max_entries, 2); + __type(key, __u32); + __type(value, __u64); +} sock_map SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 2); + __type(key, int); + __type(value, unsigned int); +} verdict_map SEC(".maps"); + +SEC("sk_skb/stream_parser") +int prog_skb_parser(struct __sk_buff *skb) +{ + return skb->len; +} + +SEC("sk_skb/stream_verdict") +int prog_skb_verdict(struct __sk_buff *skb) +{ + unsigned int *count; + int verdict; + + verdict = bpf_sk_redirect_map(skb, &sock_map, 0, 0); + + count = bpf_map_lookup_elem(&verdict_map, &verdict); + if (count) + (*count)++; + + return verdict; +} + +SEC("sk_msg") +int prog_msg_verdict(struct sk_msg_md *msg) +{ + unsigned int *count; + int verdict; + + verdict = bpf_msg_redirect_map(msg, &sock_map, 0, 0); + + count = bpf_map_lookup_elem(&verdict_map, &verdict); + if (count) + (*count)++; + + return verdict; +} + +SEC("sk_reuseport") +int prog_reuseport(struct sk_reuseport_md *reuse) +{ + unsigned int *count; + int err, verdict; + int key = 0; + + err = bpf_sk_select_reuseport(reuse, &sock_map, &key, 0); + verdict = err ? SK_DROP : SK_PASS; + + count = bpf_map_lookup_elem(&verdict_map, &verdict); + if (count) + (*count)++; + + return verdict; +} + +int _version SEC("version") = 1; +char _license[] SEC("license") = "GPL";