From patchwork Fri Mar 20 20:36:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 1259231 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.a=rsa-sha256 header.s=facebook header.b=el86rDt+; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48kbCn61nyz9sPk for ; Sat, 21 Mar 2020 07:36:37 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727402AbgCTUgh (ORCPT ); Fri, 20 Mar 2020 16:36:37 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:35928 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727362AbgCTUgf (ORCPT ); Fri, 20 Mar 2020 16:36:35 -0400 Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 02KKaMAO001718 for ; Fri, 20 Mar 2020 13:36:34 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=TIP5YKzDJ8AN2ByqCGv0FhNrLUtimv+i+/tXVxcy3Dc=; b=el86rDt+W6eqQvlyNZjgpNHaZim1fyvNB0pdmNlNl+28dTGbMzF9Y8ZW+N1AbZBeZy2S +kskJ2k8L9ja8MoQICIZ7QxuCBjDclyUI8pkAoU7tVhL5/kX5kQ0BRDHzH45vs8WQqQn Hgl4ATckBLcj7XVaJxC0QR6eULCUiz5EpvQ= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 2yvu8yu2qs-17 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 20 Mar 2020 13:36:34 -0700 Received: from intmgw002.08.frc2.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1847.3; Fri, 20 Mar 2020 13:36:25 -0700 Received: by devbig012.ftw2.facebook.com (Postfix, from userid 137359) id 95C912EC2E57; Fri, 20 Mar 2020 13:36:21 -0700 (PDT) Smtp-Origin-Hostprefix: devbig From: Andrii Nakryiko Smtp-Origin-Hostname: devbig012.ftw2.facebook.com To: , , , CC: , , Andrii Nakryiko Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH bpf-next 1/6] bpf: factor out cgroup storages operations Date: Fri, 20 Mar 2020 13:36:09 -0700 Message-ID: <20200320203615.1519013-2-andriin@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200320203615.1519013-1-andriin@fb.com> References: <20200320203615.1519013-1-andriin@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.645 definitions=2020-03-20_07:2020-03-20,2020-03-20 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 bulkscore=0 phishscore=0 clxscore=1015 spamscore=0 malwarescore=0 priorityscore=1501 mlxscore=0 adultscore=0 impostorscore=0 mlxlogscore=819 suspectscore=25 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2003200082 X-FB-Internal: deliver Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Refactor cgroup attach/detach code to abstract away common operations performed on all types of cgroup storages. This makes high-level logic more apparent, plus allows to reuse more code across multiple functions. Signed-off-by: Andrii Nakryiko --- kernel/bpf/cgroup.c | 118 +++++++++++++++++++++++++++----------------- 1 file changed, 72 insertions(+), 46 deletions(-) diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 9a500fadbef5..9c8472823a7f 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -28,6 +28,58 @@ void cgroup_bpf_offline(struct cgroup *cgrp) percpu_ref_kill(&cgrp->bpf.refcnt); } +static void bpf_cgroup_storages_free(struct bpf_cgroup_storage *storages[]) +{ + enum bpf_cgroup_storage_type stype; + + for_each_cgroup_storage_type(stype) + bpf_cgroup_storage_free(storages[stype]); +} + +static int bpf_cgroup_storages_alloc(struct bpf_cgroup_storage *storages[], + struct bpf_prog *prog) +{ + enum bpf_cgroup_storage_type stype; + + for_each_cgroup_storage_type(stype) { + storages[stype] = bpf_cgroup_storage_alloc(prog, stype); + if (IS_ERR(storages[stype])) { + storages[stype] = NULL; + bpf_cgroup_storages_free(storages); + return -ENOMEM; + } + } + + return 0; +} + +static void bpf_cgroup_storages_assign(struct bpf_cgroup_storage *dst[], + struct bpf_cgroup_storage *src[]) +{ + enum bpf_cgroup_storage_type stype; + + for_each_cgroup_storage_type(stype) + dst[stype] = src[stype]; +} + +static void bpf_cgroup_storages_link(struct bpf_cgroup_storage *storages[], + struct cgroup* cgrp, + enum bpf_attach_type attach_type) +{ + enum bpf_cgroup_storage_type stype; + + for_each_cgroup_storage_type(stype) + bpf_cgroup_storage_link(storages[stype], cgrp, attach_type); +} + +static void bpf_cgroup_storages_unlink(struct bpf_cgroup_storage *storages[]) +{ + enum bpf_cgroup_storage_type stype; + + for_each_cgroup_storage_type(stype) + bpf_cgroup_storage_unlink(storages[stype]); +} + /** * cgroup_bpf_release() - put references of all bpf programs and * release all cgroup bpf data @@ -37,7 +89,6 @@ static void cgroup_bpf_release(struct work_struct *work) { struct cgroup *p, *cgrp = container_of(work, struct cgroup, bpf.release_work); - enum bpf_cgroup_storage_type stype; struct bpf_prog_array *old_array; unsigned int type; @@ -50,10 +101,8 @@ static void cgroup_bpf_release(struct work_struct *work) list_for_each_entry_safe(pl, tmp, progs, node) { list_del(&pl->node); bpf_prog_put(pl->prog); - for_each_cgroup_storage_type(stype) { - bpf_cgroup_storage_unlink(pl->storage[stype]); - bpf_cgroup_storage_free(pl->storage[stype]); - } + bpf_cgroup_storages_unlink(pl->storage); + bpf_cgroup_storages_free(pl->storage); kfree(pl); static_branch_dec(&cgroup_bpf_enabled_key); } @@ -138,7 +187,7 @@ static int compute_effective_progs(struct cgroup *cgrp, enum bpf_attach_type type, struct bpf_prog_array **array) { - enum bpf_cgroup_storage_type stype; + struct bpf_prog_array_item *item; struct bpf_prog_array *progs; struct bpf_prog_list *pl; struct cgroup *p = cgrp; @@ -166,10 +215,10 @@ static int compute_effective_progs(struct cgroup *cgrp, if (!pl->prog) continue; - progs->items[cnt].prog = pl->prog; - for_each_cgroup_storage_type(stype) - progs->items[cnt].cgroup_storage[stype] = - pl->storage[stype]; + item = &progs->items[cnt]; + item->prog = pl->prog; + bpf_cgroup_storages_assign(item->cgroup_storage, + pl->storage); cnt++; } } while ((p = cgroup_parent(p))); @@ -305,7 +354,6 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog, struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE], *old_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {NULL}; struct bpf_prog_list *pl, *replace_pl = NULL; - enum bpf_cgroup_storage_type stype; int err; if (((flags & BPF_F_ALLOW_OVERRIDE) && (flags & BPF_F_ALLOW_MULTI)) || @@ -341,37 +389,25 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog, replace_pl = list_first_entry(progs, typeof(*pl), node); } - for_each_cgroup_storage_type(stype) { - storage[stype] = bpf_cgroup_storage_alloc(prog, stype); - if (IS_ERR(storage[stype])) { - storage[stype] = NULL; - for_each_cgroup_storage_type(stype) - bpf_cgroup_storage_free(storage[stype]); - return -ENOMEM; - } - } + if (bpf_cgroup_storages_alloc(storage, prog)) + return -ENOMEM; if (replace_pl) { pl = replace_pl; old_prog = pl->prog; - for_each_cgroup_storage_type(stype) { - old_storage[stype] = pl->storage[stype]; - bpf_cgroup_storage_unlink(old_storage[stype]); - } + bpf_cgroup_storages_unlink(pl->storage); + bpf_cgroup_storages_assign(old_storage, pl->storage); } else { pl = kmalloc(sizeof(*pl), GFP_KERNEL); if (!pl) { - for_each_cgroup_storage_type(stype) - bpf_cgroup_storage_free(storage[stype]); + bpf_cgroup_storages_free(storage); return -ENOMEM; } list_add_tail(&pl->node, progs); } pl->prog = prog; - for_each_cgroup_storage_type(stype) - pl->storage[stype] = storage[stype]; - + bpf_cgroup_storages_assign(pl->storage, storage); cgrp->bpf.flags[type] = saved_flags; err = update_effective_progs(cgrp, type); @@ -379,27 +415,20 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog, goto cleanup; static_branch_inc(&cgroup_bpf_enabled_key); - for_each_cgroup_storage_type(stype) { - if (!old_storage[stype]) - continue; - bpf_cgroup_storage_free(old_storage[stype]); - } + bpf_cgroup_storages_free(old_storage); if (old_prog) { bpf_prog_put(old_prog); static_branch_dec(&cgroup_bpf_enabled_key); } - for_each_cgroup_storage_type(stype) - bpf_cgroup_storage_link(storage[stype], cgrp, type); + bpf_cgroup_storages_link(storage, cgrp, type); return 0; cleanup: /* and cleanup the prog list */ pl->prog = old_prog; - for_each_cgroup_storage_type(stype) { - bpf_cgroup_storage_free(pl->storage[stype]); - pl->storage[stype] = old_storage[stype]; - bpf_cgroup_storage_link(old_storage[stype], cgrp, type); - } + bpf_cgroup_storages_free(pl->storage); + bpf_cgroup_storages_assign(pl->storage, old_storage); + bpf_cgroup_storages_link(pl->storage, cgrp, type); if (!replace_pl) { list_del(&pl->node); kfree(pl); @@ -420,7 +449,6 @@ int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, enum bpf_attach_type type) { struct list_head *progs = &cgrp->bpf.progs[type]; - enum bpf_cgroup_storage_type stype; u32 flags = cgrp->bpf.flags[type]; struct bpf_prog *old_prog = NULL; struct bpf_prog_list *pl; @@ -467,10 +495,8 @@ int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, /* now can actually delete it from this cgroup list */ list_del(&pl->node); - for_each_cgroup_storage_type(stype) { - bpf_cgroup_storage_unlink(pl->storage[stype]); - bpf_cgroup_storage_free(pl->storage[stype]); - } + bpf_cgroup_storages_unlink(pl->storage); + bpf_cgroup_storages_free(pl->storage); kfree(pl); if (list_empty(progs)) /* last program was detached, reset flags to zero */ From patchwork Fri Mar 20 20:36:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 1259227 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.a=rsa-sha256 header.s=facebook header.b=dz8wz+vC; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48kbCf6pGvz9sR4 for ; Sat, 21 Mar 2020 07:36:30 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727364AbgCTUga (ORCPT ); Fri, 20 Mar 2020 16:36:30 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:43428 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726829AbgCTUg3 (ORCPT ); Fri, 20 Mar 2020 16:36:29 -0400 Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 02KKaMX5001787 for ; Fri, 20 Mar 2020 13:36:28 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=Z3SY+JmABAHgg0NODEo2+2c0yyWwsRBYA2ipAg5I3iE=; b=dz8wz+vCNrFXEO8Vd+va7N6rUdvIhBtBdamHoHjNW6MVpP4cUzurG1EniCiHL9cvQDGs iH629Nq0OA0mDuYyc0CdX3GDHIeN23fGrYMZyFczWehO1cKrAa/wwKScIPA0Nf9RmSpl yJBBmXWBq/3PoAhnRE7SyjFWNS8LufGLfeA= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 2yvu8yu2ps-10 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 20 Mar 2020 13:36:28 -0700 Received: from intmgw004.03.ash8.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::e) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1847.3; Fri, 20 Mar 2020 13:36:25 -0700 Received: by devbig012.ftw2.facebook.com (Postfix, from userid 137359) id C69302EC2E57; Fri, 20 Mar 2020 13:36:23 -0700 (PDT) Smtp-Origin-Hostprefix: devbig From: Andrii Nakryiko Smtp-Origin-Hostname: devbig012.ftw2.facebook.com To: , , , CC: , , Andrii Nakryiko Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH bpf-next 2/6] bpf: factor out attach_type to prog_type mapping for attach/detach Date: Fri, 20 Mar 2020 13:36:10 -0700 Message-ID: <20200320203615.1519013-3-andriin@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200320203615.1519013-1-andriin@fb.com> References: <20200320203615.1519013-1-andriin@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.645 definitions=2020-03-20_07:2020-03-20,2020-03-20 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 bulkscore=0 phishscore=0 clxscore=1015 spamscore=0 malwarescore=0 priorityscore=1501 mlxscore=0 adultscore=0 impostorscore=0 mlxlogscore=978 suspectscore=8 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2003200082 X-FB-Internal: deliver Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Factor out logic mapping expected program attach type to program type and subsequent handling of program attach/detach. Also list out all supported cgroup BPF program types explicitly to prevent accidental bugs once more program types are added to a mapping. Do the same for prog_query API. Signed-off-by: Andrii Nakryiko --- kernel/bpf/syscall.c | 153 +++++++++++++++++++------------------------ 1 file changed, 66 insertions(+), 87 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 85567a6ea5f9..fd4181939064 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2535,36 +2535,18 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog, } } -#define BPF_PROG_ATTACH_LAST_FIELD replace_bpf_fd - -#define BPF_F_ATTACH_MASK \ - (BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI | BPF_F_REPLACE) - -static int bpf_prog_attach(const union bpf_attr *attr) +static enum bpf_prog_type +attach_type_to_prog_type(enum bpf_attach_type attach_type) { - enum bpf_prog_type ptype; - struct bpf_prog *prog; - int ret; - - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - - if (CHECK_ATTR(BPF_PROG_ATTACH)) - return -EINVAL; - - if (attr->attach_flags & ~BPF_F_ATTACH_MASK) - return -EINVAL; - - switch (attr->attach_type) { + switch (attach_type) { case BPF_CGROUP_INET_INGRESS: case BPF_CGROUP_INET_EGRESS: - ptype = BPF_PROG_TYPE_CGROUP_SKB; + return BPF_PROG_TYPE_CGROUP_SKB; break; case BPF_CGROUP_INET_SOCK_CREATE: case BPF_CGROUP_INET4_POST_BIND: case BPF_CGROUP_INET6_POST_BIND: - ptype = BPF_PROG_TYPE_CGROUP_SOCK; - break; + return BPF_PROG_TYPE_CGROUP_SOCK; case BPF_CGROUP_INET4_BIND: case BPF_CGROUP_INET6_BIND: case BPF_CGROUP_INET4_CONNECT: @@ -2573,37 +2555,53 @@ static int bpf_prog_attach(const union bpf_attr *attr) case BPF_CGROUP_UDP6_SENDMSG: case BPF_CGROUP_UDP4_RECVMSG: case BPF_CGROUP_UDP6_RECVMSG: - ptype = BPF_PROG_TYPE_CGROUP_SOCK_ADDR; - break; + return BPF_PROG_TYPE_CGROUP_SOCK_ADDR; case BPF_CGROUP_SOCK_OPS: - ptype = BPF_PROG_TYPE_SOCK_OPS; - break; + return BPF_PROG_TYPE_SOCK_OPS; case BPF_CGROUP_DEVICE: - ptype = BPF_PROG_TYPE_CGROUP_DEVICE; - break; + return BPF_PROG_TYPE_CGROUP_DEVICE; case BPF_SK_MSG_VERDICT: - ptype = BPF_PROG_TYPE_SK_MSG; - break; + return BPF_PROG_TYPE_SK_MSG; case BPF_SK_SKB_STREAM_PARSER: case BPF_SK_SKB_STREAM_VERDICT: - ptype = BPF_PROG_TYPE_SK_SKB; - break; + return BPF_PROG_TYPE_SK_SKB; case BPF_LIRC_MODE2: - ptype = BPF_PROG_TYPE_LIRC_MODE2; - break; + return BPF_PROG_TYPE_LIRC_MODE2; case BPF_FLOW_DISSECTOR: - ptype = BPF_PROG_TYPE_FLOW_DISSECTOR; - break; + return BPF_PROG_TYPE_FLOW_DISSECTOR; case BPF_CGROUP_SYSCTL: - ptype = BPF_PROG_TYPE_CGROUP_SYSCTL; - break; + return BPF_PROG_TYPE_CGROUP_SYSCTL; case BPF_CGROUP_GETSOCKOPT: case BPF_CGROUP_SETSOCKOPT: - ptype = BPF_PROG_TYPE_CGROUP_SOCKOPT; - break; + return BPF_PROG_TYPE_CGROUP_SOCKOPT; default: - return -EINVAL; + return BPF_PROG_TYPE_UNSPEC; } +} + +#define BPF_PROG_ATTACH_LAST_FIELD replace_bpf_fd + +#define BPF_F_ATTACH_MASK \ + (BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI | BPF_F_REPLACE) + +static int bpf_prog_attach(const union bpf_attr *attr) +{ + enum bpf_prog_type ptype; + struct bpf_prog *prog; + int ret; + + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + + if (CHECK_ATTR(BPF_PROG_ATTACH)) + return -EINVAL; + + if (attr->attach_flags & ~BPF_F_ATTACH_MASK) + return -EINVAL; + + ptype = attach_type_to_prog_type(attr->attach_type); + if (ptype == BPF_PROG_TYPE_UNSPEC) + return -EINVAL; prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype); if (IS_ERR(prog)) @@ -2625,8 +2623,17 @@ static int bpf_prog_attach(const union bpf_attr *attr) case BPF_PROG_TYPE_FLOW_DISSECTOR: ret = skb_flow_dissector_bpf_prog_attach(attr, prog); break; - default: + case BPF_PROG_TYPE_CGROUP_DEVICE: + case BPF_PROG_TYPE_CGROUP_SKB: + case BPF_PROG_TYPE_CGROUP_SOCK: + case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: + case BPF_PROG_TYPE_CGROUP_SOCKOPT: + case BPF_PROG_TYPE_CGROUP_SYSCTL: + case BPF_PROG_TYPE_SOCK_OPS: ret = cgroup_bpf_prog_attach(attr, ptype, prog); + break; + default: + ret = -EINVAL; } if (ret) @@ -2646,53 +2653,27 @@ static int bpf_prog_detach(const union bpf_attr *attr) if (CHECK_ATTR(BPF_PROG_DETACH)) return -EINVAL; - switch (attr->attach_type) { - case BPF_CGROUP_INET_INGRESS: - case BPF_CGROUP_INET_EGRESS: - ptype = BPF_PROG_TYPE_CGROUP_SKB; - break; - case BPF_CGROUP_INET_SOCK_CREATE: - case BPF_CGROUP_INET4_POST_BIND: - case BPF_CGROUP_INET6_POST_BIND: - ptype = BPF_PROG_TYPE_CGROUP_SOCK; - break; - case BPF_CGROUP_INET4_BIND: - case BPF_CGROUP_INET6_BIND: - case BPF_CGROUP_INET4_CONNECT: - case BPF_CGROUP_INET6_CONNECT: - case BPF_CGROUP_UDP4_SENDMSG: - case BPF_CGROUP_UDP6_SENDMSG: - case BPF_CGROUP_UDP4_RECVMSG: - case BPF_CGROUP_UDP6_RECVMSG: - ptype = BPF_PROG_TYPE_CGROUP_SOCK_ADDR; - break; - case BPF_CGROUP_SOCK_OPS: - ptype = BPF_PROG_TYPE_SOCK_OPS; - break; - case BPF_CGROUP_DEVICE: - ptype = BPF_PROG_TYPE_CGROUP_DEVICE; - break; - case BPF_SK_MSG_VERDICT: - return sock_map_get_from_fd(attr, NULL); - case BPF_SK_SKB_STREAM_PARSER: - case BPF_SK_SKB_STREAM_VERDICT: + ptype = attach_type_to_prog_type(attr->attach_type); + + switch (ptype) { + case BPF_PROG_TYPE_SK_MSG: + case BPF_PROG_TYPE_SK_SKB: return sock_map_get_from_fd(attr, NULL); - case BPF_LIRC_MODE2: + case BPF_PROG_TYPE_LIRC_MODE2: return lirc_prog_detach(attr); - case BPF_FLOW_DISSECTOR: + case BPF_PROG_TYPE_FLOW_DISSECTOR: return skb_flow_dissector_bpf_prog_detach(attr); - case BPF_CGROUP_SYSCTL: - ptype = BPF_PROG_TYPE_CGROUP_SYSCTL; - break; - case BPF_CGROUP_GETSOCKOPT: - case BPF_CGROUP_SETSOCKOPT: - ptype = BPF_PROG_TYPE_CGROUP_SOCKOPT; - break; + case BPF_PROG_TYPE_CGROUP_DEVICE: + case BPF_PROG_TYPE_CGROUP_SKB: + case BPF_PROG_TYPE_CGROUP_SOCK: + case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: + case BPF_PROG_TYPE_CGROUP_SOCKOPT: + case BPF_PROG_TYPE_CGROUP_SYSCTL: + case BPF_PROG_TYPE_SOCK_OPS: + return cgroup_bpf_prog_detach(attr, ptype); default: return -EINVAL; } - - return cgroup_bpf_prog_detach(attr, ptype); } #define BPF_PROG_QUERY_LAST_FIELD query.prog_cnt @@ -2726,7 +2707,7 @@ static int bpf_prog_query(const union bpf_attr *attr, case BPF_CGROUP_SYSCTL: case BPF_CGROUP_GETSOCKOPT: case BPF_CGROUP_SETSOCKOPT: - break; + return cgroup_bpf_prog_query(attr, uattr); case BPF_LIRC_MODE2: return lirc_prog_query(attr, uattr); case BPF_FLOW_DISSECTOR: @@ -2734,8 +2715,6 @@ static int bpf_prog_query(const union bpf_attr *attr, default: return -EINVAL; } - - return cgroup_bpf_prog_query(attr, uattr); } #define BPF_PROG_TEST_RUN_LAST_FIELD test.ctx_out From patchwork Fri Mar 20 20:36:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 1259228 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.a=rsa-sha256 header.s=facebook header.b=Szhu6rcO; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48kbCh2NWBz9sPk for ; Sat, 21 Mar 2020 07:36:32 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727385AbgCTUgb (ORCPT ); Fri, 20 Mar 2020 16:36:31 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:57030 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727362AbgCTUga (ORCPT ); Fri, 20 Mar 2020 16:36:30 -0400 Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 02KKOWCF022037 for ; Fri, 20 Mar 2020 13:36:30 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=aDjiR8rRR7SMtcJqMdLRuBARv7SDc5iDqXZjaJCcEwU=; b=Szhu6rcO8+Fb25dq7jh6QdxA+8unk7nRSe7UMITSUt6aJHkSn/6P1TDKXC7GXGJltnUB LOISThnUEXv8kR1L3b+n6j408G3GEylHcPFb68JUepdvNvK8kjWm1pnyeW+okQQ83BQm v78VEgoHmJwXYLDNs9UtOT1nK80LPZfIP9Y= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com with ESMTP id 2yu9aw01en-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 20 Mar 2020 13:36:29 -0700 Received: from intmgw004.08.frc2.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:11d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1847.3; Fri, 20 Mar 2020 13:36:29 -0700 Received: by devbig012.ftw2.facebook.com (Postfix, from userid 137359) id 1537A2EC2E57; Fri, 20 Mar 2020 13:36:26 -0700 (PDT) Smtp-Origin-Hostprefix: devbig From: Andrii Nakryiko Smtp-Origin-Hostname: devbig012.ftw2.facebook.com To: , , , CC: , , Andrii Nakryiko Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH bpf-next 3/6] bpf: implement bpf_link-based cgroup BPF program attachment Date: Fri, 20 Mar 2020 13:36:11 -0700 Message-ID: <20200320203615.1519013-4-andriin@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200320203615.1519013-1-andriin@fb.com> References: <20200320203615.1519013-1-andriin@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.645 definitions=2020-03-20_07:2020-03-20,2020-03-20 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 lowpriorityscore=0 bulkscore=0 mlxscore=0 malwarescore=0 clxscore=1015 adultscore=0 suspectscore=25 impostorscore=0 spamscore=0 priorityscore=1501 mlxlogscore=999 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2003200081 X-FB-Internal: deliver Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Implement new sub-command to attach cgroup BPF programs and return FD-based bpf_link back on success. bpf_link, once attached to cgroup, cannot be replaced, except by owner having its FD. cgroup bpf_link has semantics of BPF_F_ALLOW_MULTI BPF program attachments and can co-exist with non-bpf_link-based BPF cgroup attachments. To prevent bpf_cgroup_link from keeping cgroup alive past the point when no BPF program can be executed, implement auto-detachment of link. When cgroup_bpf_release() is called, all attached bpf_links are forced to release cgroup refcounts, but they leave bpf_link otherwise active and allocated, as well as still owning underlying bpf_prog. This is because user-space might still have FDs open and active, so bpf_link as a user-referenced object can't be freed yet. Once last active FD is closed, bpf_link will be freed and underlying bpf_prog refcount will be dropped. But cgroup refcount won't be touched, because cgroup is released already. The inherent race between bpf_cgroup_link release (from closing last FD) and cgroup_bpf_release() is resolved by both operations taking cgroup_mutex. So the only additional check required is when bpf_cgroup_link attempts to detach itself from cgroup. At that time we need to check whether there is still cgroup associated with that link. And if not, exit with success, because bpf_cgroup_link was already successfully detached. Acked-by: Roman Gushchin Signed-off-by: Andrii Nakryiko --- include/linux/bpf-cgroup.h | 27 ++- include/linux/bpf.h | 10 +- include/uapi/linux/bpf.h | 9 +- kernel/bpf/cgroup.c | 313 +++++++++++++++++++++++++-------- kernel/bpf/syscall.c | 62 +++++-- kernel/cgroup/cgroup.c | 14 +- tools/include/uapi/linux/bpf.h | 9 +- 7 files changed, 345 insertions(+), 99 deletions(-) diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index a7cd5c7a2509..ab95824a1d99 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -51,9 +51,16 @@ struct bpf_cgroup_storage { struct rcu_head rcu; }; +struct bpf_cgroup_link { + struct bpf_link link; + struct cgroup *cgroup; + enum bpf_attach_type type; +}; + struct bpf_prog_list { struct list_head node; struct bpf_prog *prog; + struct bpf_cgroup_link *link; struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE]; }; @@ -84,20 +91,23 @@ struct cgroup_bpf { int cgroup_bpf_inherit(struct cgroup *cgrp); void cgroup_bpf_offline(struct cgroup *cgrp); -int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog, - struct bpf_prog *replace_prog, +int __cgroup_bpf_attach(struct cgroup *cgrp, + struct bpf_prog *prog, struct bpf_prog *replace_prog, + struct bpf_cgroup_link *link, enum bpf_attach_type type, u32 flags); int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, + struct bpf_cgroup_link *link, enum bpf_attach_type type); int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr, union bpf_attr __user *uattr); /* Wrapper for __cgroup_bpf_*() protected by cgroup_mutex */ -int cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog, - struct bpf_prog *replace_prog, enum bpf_attach_type type, +int cgroup_bpf_attach(struct cgroup *cgrp, + struct bpf_prog *prog, struct bpf_prog *replace_prog, + struct bpf_cgroup_link *link, enum bpf_attach_type type, u32 flags); int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, - enum bpf_attach_type type, u32 flags); + enum bpf_attach_type type); int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr, union bpf_attr __user *uattr); @@ -332,6 +342,7 @@ int cgroup_bpf_prog_attach(const union bpf_attr *attr, enum bpf_prog_type ptype, struct bpf_prog *prog); int cgroup_bpf_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype); +int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); int cgroup_bpf_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr); #else @@ -354,6 +365,12 @@ static inline int cgroup_bpf_prog_detach(const union bpf_attr *attr, return -EINVAL; } +static inline int cgroup_bpf_link_attach(const union bpf_attr *attr, + struct bpf_prog *prog) +{ + return -EINVAL; +} + static inline int cgroup_bpf_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr) { diff --git a/include/linux/bpf.h b/include/linux/bpf.h index bdb981c204fa..0f7c2f48c734 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1081,15 +1081,23 @@ extern int sysctl_unprivileged_bpf_disabled; int bpf_map_new_fd(struct bpf_map *map, int flags); int bpf_prog_new_fd(struct bpf_prog *prog); -struct bpf_link; +struct bpf_link { + atomic64_t refcnt; + const struct bpf_link_ops *ops; + struct bpf_prog *prog; + struct work_struct work; +}; struct bpf_link_ops { void (*release)(struct bpf_link *link); void (*dealloc)(struct bpf_link *link); + }; void bpf_link_init(struct bpf_link *link, const struct bpf_link_ops *ops, struct bpf_prog *prog); +void bpf_link_cleanup(struct bpf_link *link, struct file *link_file, + int link_fd); void bpf_link_inc(struct bpf_link *link); void bpf_link_put(struct bpf_link *link); int bpf_link_new_fd(struct bpf_link *link); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 5d01c5c7e598..fad9f79bb8f1 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -111,6 +111,7 @@ enum bpf_cmd { BPF_MAP_LOOKUP_AND_DELETE_BATCH, BPF_MAP_UPDATE_BATCH, BPF_MAP_DELETE_BATCH, + BPF_LINK_CREATE, }; enum bpf_map_type { @@ -539,7 +540,7 @@ union bpf_attr { __u32 prog_cnt; } query; - struct { + struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */ __u64 name; __u32 prog_fd; } raw_tracepoint; @@ -567,6 +568,12 @@ union bpf_attr { __u64 probe_offset; /* output: probe_offset */ __u64 probe_addr; /* output: probe_addr */ } task_fd_query; + + struct { /* struct used by BPF_LINK_CREATE command */ + __u32 prog_fd; /* eBPF program to attach */ + __u32 target_fd; /* object to attach to */ + __u32 attach_type; /* attach type */ + } link_create; } __attribute__((aligned(8))); /* The description below is an attempt at providing documentation to eBPF diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 9c8472823a7f..b960e8633f23 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -80,6 +80,17 @@ static void bpf_cgroup_storages_unlink(struct bpf_cgroup_storage *storages[]) bpf_cgroup_storage_unlink(storages[stype]); } +/* Called when bpf_cgroup_link is auto-detached from dying cgroup. + * It drops cgroup and bpf_prog refcounts, and marks bpf_link as defunct. It + * doesn't free link memory, which will eventually be done by bpf_link's + * release() callback, when its last FD is closed. + */ +static void bpf_cgroup_link_auto_detach(struct bpf_cgroup_link *link) +{ + cgroup_put(link->cgroup); + link->cgroup = NULL; +} + /** * cgroup_bpf_release() - put references of all bpf programs and * release all cgroup bpf data @@ -100,7 +111,10 @@ static void cgroup_bpf_release(struct work_struct *work) list_for_each_entry_safe(pl, tmp, progs, node) { list_del(&pl->node); - bpf_prog_put(pl->prog); + if (pl->prog) + bpf_prog_put(pl->prog); + if (pl->link) + bpf_cgroup_link_auto_detach(pl->link); bpf_cgroup_storages_unlink(pl->storage); bpf_cgroup_storages_free(pl->storage); kfree(pl); @@ -134,6 +148,18 @@ static void cgroup_bpf_release_fn(struct percpu_ref *ref) queue_work(system_wq, &cgrp->bpf.release_work); } +/* Get underlying bpf_prog of bpf_prog_list entry, regardless if it's through + * link or direct prog. + */ +static struct bpf_prog *prog_list_prog(struct bpf_prog_list *pl) +{ + if (pl->prog) + return pl->prog; + if (pl->link) + return pl->link->link.prog; + return NULL; +} + /* count number of elements in the list. * it's slow but the list cannot be long */ @@ -143,7 +169,7 @@ static u32 prog_list_length(struct list_head *head) u32 cnt = 0; list_for_each_entry(pl, head, node) { - if (!pl->prog) + if (!prog_list_prog(pl)) continue; cnt++; } @@ -212,11 +238,11 @@ static int compute_effective_progs(struct cgroup *cgrp, continue; list_for_each_entry(pl, &p->bpf.progs[type], node) { - if (!pl->prog) + if (!prog_list_prog(pl)) continue; item = &progs->items[cnt]; - item->prog = pl->prog; + item->prog = prog_list_prog(pl); bpf_cgroup_storages_assign(item->cgroup_storage, pl->storage); cnt++; @@ -333,19 +359,62 @@ static int update_effective_progs(struct cgroup *cgrp, #define BPF_CGROUP_MAX_PROGS 64 +static struct bpf_prog_list *find_attach_entry(struct list_head *progs, + struct bpf_prog *prog, + struct bpf_cgroup_link *link, + struct bpf_prog *replace_prog, + bool allow_multi) +{ + struct bpf_prog_list *pl; + + /* legacy single-attach case */ + if (!allow_multi) { + if (list_empty(progs)) + return NULL; + return list_first_entry(progs, typeof(*pl), node); + } + + /* direct prog multi-attach case */ + if (prog) { + list_for_each_entry(pl, progs, node) { + if (pl->prog == prog) + /* disallow attaching the same prog twice */ + return ERR_PTR(-EINVAL); + if (replace_prog && pl->prog == replace_prog) + /* a match found */ + return pl; + } + if (replace_prog) + /* prog to replace not found for cgroup */ + return ERR_PTR(-ENOENT); + return NULL; + } + + /* link (multi-attach) case */ + list_for_each_entry(pl, progs, node) { + if (pl->link == link) + /* disallow attaching the same link twice */ + return ERR_PTR(-EINVAL); + } + return NULL; +} + /** - * __cgroup_bpf_attach() - Attach the program to a cgroup, and + * __cgroup_bpf_attach() - Attach the program or the link to a cgroup, and * propagate the change to descendants * @cgrp: The cgroup which descendants to traverse * @prog: A program to attach + * @link: A link to attach * @replace_prog: Previously attached program to replace if BPF_F_REPLACE is set * @type: Type of attach operation * @flags: Option flags * + * Exactly one of @prog or @link can be non-null. * Must be called with cgroup_mutex held. */ -int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog, - struct bpf_prog *replace_prog, +int __cgroup_bpf_attach(struct cgroup *cgrp, + struct bpf_prog *prog, struct bpf_prog *replace_prog, + struct bpf_cgroup_link *link, enum bpf_attach_type type, u32 flags) { u32 saved_flags = (flags & (BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI)); @@ -353,13 +422,19 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog, struct bpf_prog *old_prog = NULL; struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE], *old_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {NULL}; - struct bpf_prog_list *pl, *replace_pl = NULL; + struct bpf_prog_list *pl; int err; if (((flags & BPF_F_ALLOW_OVERRIDE) && (flags & BPF_F_ALLOW_MULTI)) || ((flags & BPF_F_REPLACE) && !(flags & BPF_F_ALLOW_MULTI))) /* invalid combination */ return -EINVAL; + if (link && (prog || replace_prog)) + /* only either link or prog/replace_prog can be specified */ + return -EINVAL; + if (!!replace_prog != !!(flags & BPF_F_REPLACE)) + /* replace_prog implies BPF_F_REPLACE, and vice versa */ + return -EINVAL; if (!hierarchy_allows_attach(cgrp, type)) return -EPERM; @@ -374,26 +449,16 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog, if (prog_list_length(progs) >= BPF_CGROUP_MAX_PROGS) return -E2BIG; - if (flags & BPF_F_ALLOW_MULTI) { - list_for_each_entry(pl, progs, node) { - if (pl->prog == prog) - /* disallow attaching the same prog twice */ - return -EINVAL; - if (pl->prog == replace_prog) - replace_pl = pl; - } - if ((flags & BPF_F_REPLACE) && !replace_pl) - /* prog to replace not found for cgroup */ - return -ENOENT; - } else if (!list_empty(progs)) { - replace_pl = list_first_entry(progs, typeof(*pl), node); - } + pl = find_attach_entry(progs, prog, link, replace_prog, + flags & BPF_F_ALLOW_MULTI); + if (IS_ERR(pl)) + return PTR_ERR(pl); - if (bpf_cgroup_storages_alloc(storage, prog)) + if (bpf_cgroup_storages_alloc(storage, prog ? : link->link.prog)) return -ENOMEM; - if (replace_pl) { - pl = replace_pl; + if (pl) { + /* only non-link case is possible */ old_prog = pl->prog; bpf_cgroup_storages_unlink(pl->storage); bpf_cgroup_storages_assign(old_storage, pl->storage); @@ -407,6 +472,7 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog, } pl->prog = prog; + pl->link = link; bpf_cgroup_storages_assign(pl->storage, storage); cgrp->bpf.flags[type] = saved_flags; @@ -414,80 +480,91 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog, if (err) goto cleanup; - static_branch_inc(&cgroup_bpf_enabled_key); bpf_cgroup_storages_free(old_storage); - if (old_prog) { + if (old_prog) bpf_prog_put(old_prog); - static_branch_dec(&cgroup_bpf_enabled_key); - } - bpf_cgroup_storages_link(storage, cgrp, type); + else + static_branch_inc(&cgroup_bpf_enabled_key); + bpf_cgroup_storages_link(pl->storage, cgrp, type); return 0; cleanup: - /* and cleanup the prog list */ - pl->prog = old_prog; + if (old_prog) + pl->prog = old_prog; bpf_cgroup_storages_free(pl->storage); bpf_cgroup_storages_assign(pl->storage, old_storage); bpf_cgroup_storages_link(pl->storage, cgrp, type); - if (!replace_pl) { + if (!old_prog) { list_del(&pl->node); kfree(pl); } return err; } +static struct bpf_prog_list *find_detach_entry(struct list_head *progs, + struct bpf_prog *prog, + struct bpf_cgroup_link *link, + bool allow_multi) +{ + struct bpf_prog_list *pl; + + if (!allow_multi) { + if (list_empty(progs)) + /* report error when trying to detach and nothing is attached */ + return ERR_PTR(-ENOENT); + + /* to maintain backward compatibility NONE and OVERRIDE cgroups + * allow detaching with invalid FD (prog==NULL) + */ + return list_first_entry(progs, typeof(*pl), node); + } + + if (!prog && !link) + /* to detach MULTI prog the user has to specify valid FD + * of the program to be detached + */ + return ERR_PTR(-EINVAL); + + /* find the prog and detach it */ + list_for_each_entry(pl, progs, node) { + if (pl->prog == prog && pl->link == link) + return pl; + } + return ERR_PTR(-ENOENT); +} + /** - * __cgroup_bpf_detach() - Detach the program from a cgroup, and + * __cgroup_bpf_detach() - Detach the program or link from a cgroup, and * propagate the change to descendants * @cgrp: The cgroup which descendants to traverse * @prog: A program to detach or NULL + * @prog: A link to detach or NULL * @type: Type of detach operation * + * At most one of @prog or @link can be non-NULL. * Must be called with cgroup_mutex held. */ int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, - enum bpf_attach_type type) + struct bpf_cgroup_link *link, enum bpf_attach_type type) { struct list_head *progs = &cgrp->bpf.progs[type]; u32 flags = cgrp->bpf.flags[type]; - struct bpf_prog *old_prog = NULL; struct bpf_prog_list *pl; + struct bpf_prog *old_prog; int err; - if (flags & BPF_F_ALLOW_MULTI) { - if (!prog) - /* to detach MULTI prog the user has to specify valid FD - * of the program to be detached - */ - return -EINVAL; - } else { - if (list_empty(progs)) - /* report error when trying to detach and nothing is attached */ - return -ENOENT; - } + if (prog && link) + /* only one of prog or link can be specified */ + return -EINVAL; - if (flags & BPF_F_ALLOW_MULTI) { - /* find the prog and detach it */ - list_for_each_entry(pl, progs, node) { - if (pl->prog != prog) - continue; - old_prog = prog; - /* mark it deleted, so it's ignored while - * recomputing effective - */ - pl->prog = NULL; - break; - } - if (!old_prog) - return -ENOENT; - } else { - /* to maintain backward compatibility NONE and OVERRIDE cgroups - * allow detaching with invalid FD (prog==NULL) - */ - pl = list_first_entry(progs, typeof(*pl), node); - old_prog = pl->prog; - pl->prog = NULL; - } + pl = find_detach_entry(progs, prog, link, flags & BPF_F_ALLOW_MULTI); + if (IS_ERR(pl)) + return PTR_ERR(pl); + + /* mark it deleted, so it's ignored while recomputing effective */ + old_prog = pl->prog; + pl->prog = NULL; + pl->link = NULL; err = update_effective_progs(cgrp, type); if (err) @@ -501,14 +578,15 @@ int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, if (list_empty(progs)) /* last program was detached, reset flags to zero */ cgrp->bpf.flags[type] = 0; - - bpf_prog_put(old_prog); + if (old_prog) + bpf_prog_put(old_prog); static_branch_dec(&cgroup_bpf_enabled_key); return 0; cleanup: - /* and restore back old_prog */ + /* restore back prog or link */ pl->prog = old_prog; + pl->link = link; return err; } @@ -521,6 +599,7 @@ int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr, struct list_head *progs = &cgrp->bpf.progs[type]; u32 flags = cgrp->bpf.flags[type]; struct bpf_prog_array *effective; + struct bpf_prog *prog; int cnt, ret = 0, i; effective = rcu_dereference_protected(cgrp->bpf.effective[type], @@ -551,7 +630,8 @@ int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr, i = 0; list_for_each_entry(pl, progs, node) { - id = pl->prog->aux->id; + prog = prog_list_prog(pl); + id = prog->aux->id; if (copy_to_user(prog_ids + i, &id, sizeof(id))) return -EFAULT; if (++i == cnt) @@ -581,8 +661,8 @@ int cgroup_bpf_prog_attach(const union bpf_attr *attr, } } - ret = cgroup_bpf_attach(cgrp, prog, replace_prog, attr->attach_type, - attr->attach_flags); + ret = cgroup_bpf_attach(cgrp, prog, replace_prog, NULL, + attr->attach_type, attr->attach_flags); if (replace_prog) bpf_prog_put(replace_prog); @@ -604,7 +684,7 @@ int cgroup_bpf_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype) if (IS_ERR(prog)) prog = NULL; - ret = cgroup_bpf_detach(cgrp, prog, attr->attach_type, 0); + ret = cgroup_bpf_detach(cgrp, prog, attr->attach_type); if (prog) bpf_prog_put(prog); @@ -612,6 +692,87 @@ int cgroup_bpf_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype) return ret; } +static void bpf_cgroup_link_release(struct bpf_link *link) +{ + struct bpf_cgroup_link *cg_link = + container_of(link, struct bpf_cgroup_link, link); + + /* link might have been auto-detached by dying cgroup already, + * in that case our work is done here + */ + if (!cg_link->cgroup) + return; + + mutex_lock(&cgroup_mutex); + + /* re-check cgroup under lock again */ + if (!cg_link->cgroup) { + mutex_unlock(&cgroup_mutex); + return; + } + + WARN_ON(__cgroup_bpf_detach(cg_link->cgroup, NULL, cg_link, + cg_link->type)); + + mutex_unlock(&cgroup_mutex); + cgroup_put(cg_link->cgroup); +} + +static void bpf_cgroup_link_dealloc(struct bpf_link *link) +{ + struct bpf_cgroup_link *cg_link = + container_of(link, struct bpf_cgroup_link, link); + + kfree(cg_link); +} + +const struct bpf_link_ops bpf_cgroup_link_lops = { + .release = bpf_cgroup_link_release, + .dealloc = bpf_cgroup_link_dealloc, +}; + +int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) +{ + struct bpf_cgroup_link *link; + struct file *link_file; + struct cgroup *cgrp; + int err, link_fd; + + cgrp = cgroup_get_from_fd(attr->link_create.target_fd); + if (IS_ERR(cgrp)) + return PTR_ERR(cgrp); + + link = kzalloc(sizeof(*link), GFP_USER); + if (!link) { + err = -ENOMEM; + goto out_put_cgroup; + } + bpf_link_init(&link->link, &bpf_cgroup_link_lops, prog); + link->cgroup = cgrp; + link->type = attr->link_create.attach_type; + + link_file = bpf_link_new_file(&link->link, &link_fd); + if (IS_ERR(link_file)) { + kfree(link); + err = PTR_ERR(link_file); + goto out_put_cgroup; + } + + err = cgroup_bpf_attach(cgrp, NULL, NULL, link, link->type, + BPF_F_ALLOW_MULTI); + if (err) { + bpf_link_cleanup(&link->link, link_file, link_fd); + goto out_put_cgroup; + } + + fd_install(link_fd, link_file); + return link_fd; + +out_put_cgroup: + cgroup_put(cgrp); + return err; +} + int cgroup_bpf_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr) { diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index fd4181939064..f6e7d32a2632 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2173,13 +2173,6 @@ static int bpf_obj_get(const union bpf_attr *attr) attr->file_flags); } -struct bpf_link { - atomic64_t refcnt; - const struct bpf_link_ops *ops; - struct bpf_prog *prog; - struct work_struct work; -}; - void bpf_link_init(struct bpf_link *link, const struct bpf_link_ops *ops, struct bpf_prog *prog) { @@ -2193,8 +2186,8 @@ void bpf_link_init(struct bpf_link *link, const struct bpf_link_ops *ops, * anon_inode's release() call. This helper manages marking bpf_link as * defunct, releases anon_inode file and puts reserved FD. */ -static void bpf_link_cleanup(struct bpf_link *link, struct file *link_file, - int link_fd) +void bpf_link_cleanup(struct bpf_link *link, struct file *link_file, + int link_fd) { link->prog = NULL; fput(link_file); @@ -2252,7 +2245,8 @@ static int bpf_link_release(struct inode *inode, struct file *filp) #ifdef CONFIG_PROC_FS static const struct bpf_link_ops bpf_raw_tp_lops; static const struct bpf_link_ops bpf_tracing_link_lops; -static const struct bpf_link_ops bpf_xdp_link_lops; + +extern const struct bpf_link_ops bpf_cgroup_link_lops; static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp) { @@ -2265,6 +2259,8 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp) link_type = "raw_tracepoint"; else if (link->ops == &bpf_tracing_link_lops) link_type = "tracing"; + else if (link->ops == &bpf_cgroup_link_lops) + link_type = "cgroup"; else link_type = "unknown"; @@ -3533,6 +3529,49 @@ static int bpf_map_do_batch(const union bpf_attr *attr, return err; } +#define BPF_LINK_CREATE_LAST_FIELD link_create.attach_type +static int link_create(union bpf_attr *attr) +{ + enum bpf_prog_type ptype; + struct bpf_prog *prog; + int ret; + + if (CHECK_ATTR(BPF_LINK_CREATE)) + return -EINVAL; + + ptype = attach_type_to_prog_type(attr->link_create.attach_type); + if (ptype == BPF_PROG_TYPE_UNSPEC) + return -EINVAL; + + prog = bpf_prog_get_type(attr->link_create.prog_fd, ptype); + if (IS_ERR(prog)) + return PTR_ERR(prog); + + ret = bpf_prog_attach_check_attach_type(prog, + attr->link_create.attach_type); + if (ret) + goto err_out; + + switch (ptype) { + case BPF_PROG_TYPE_CGROUP_SKB: + case BPF_PROG_TYPE_CGROUP_SOCK: + case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: + case BPF_PROG_TYPE_SOCK_OPS: + case BPF_PROG_TYPE_CGROUP_DEVICE: + case BPF_PROG_TYPE_CGROUP_SYSCTL: + case BPF_PROG_TYPE_CGROUP_SOCKOPT: + ret = cgroup_bpf_link_attach(attr, prog); + break; + default: + ret = -EINVAL; + } + +err_out: + if (ret < 0) + bpf_prog_put(prog); + return ret; +} + SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) { union bpf_attr attr = {}; @@ -3643,6 +3682,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz case BPF_MAP_DELETE_BATCH: err = bpf_map_do_batch(&attr, uattr, BPF_MAP_DELETE_BATCH); break; + case BPF_LINK_CREATE: + err = link_create(&attr); + break; default: err = -EINVAL; break; diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 3dead0416b91..219624fba9ba 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -6303,27 +6303,31 @@ void cgroup_sk_free(struct sock_cgroup_data *skcd) #endif /* CONFIG_SOCK_CGROUP_DATA */ #ifdef CONFIG_CGROUP_BPF -int cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog, - struct bpf_prog *replace_prog, enum bpf_attach_type type, +int cgroup_bpf_attach(struct cgroup *cgrp, + struct bpf_prog *prog, struct bpf_prog *replace_prog, + struct bpf_cgroup_link *link, + enum bpf_attach_type type, u32 flags) { int ret; mutex_lock(&cgroup_mutex); - ret = __cgroup_bpf_attach(cgrp, prog, replace_prog, type, flags); + ret = __cgroup_bpf_attach(cgrp, prog, replace_prog, link, type, flags); mutex_unlock(&cgroup_mutex); return ret; } + int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, - enum bpf_attach_type type, u32 flags) + enum bpf_attach_type type) { int ret; mutex_lock(&cgroup_mutex); - ret = __cgroup_bpf_detach(cgrp, prog, type); + ret = __cgroup_bpf_detach(cgrp, prog, NULL, type); mutex_unlock(&cgroup_mutex); return ret; } + int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr, union bpf_attr __user *uattr) { diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 5d01c5c7e598..fad9f79bb8f1 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -111,6 +111,7 @@ enum bpf_cmd { BPF_MAP_LOOKUP_AND_DELETE_BATCH, BPF_MAP_UPDATE_BATCH, BPF_MAP_DELETE_BATCH, + BPF_LINK_CREATE, }; enum bpf_map_type { @@ -539,7 +540,7 @@ union bpf_attr { __u32 prog_cnt; } query; - struct { + struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */ __u64 name; __u32 prog_fd; } raw_tracepoint; @@ -567,6 +568,12 @@ union bpf_attr { __u64 probe_offset; /* output: probe_offset */ __u64 probe_addr; /* output: probe_addr */ } task_fd_query; + + struct { /* struct used by BPF_LINK_CREATE command */ + __u32 prog_fd; /* eBPF program to attach */ + __u32 target_fd; /* object to attach to */ + __u32 attach_type; /* attach type */ + } link_create; } __attribute__((aligned(8))); /* The description below is an attempt at providing documentation to eBPF From patchwork Fri Mar 20 20:36:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 1259232 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.a=rsa-sha256 header.s=facebook header.b=EQxRSEUQ; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48kbCq0Bv7z9sQt for ; Sat, 21 Mar 2020 07:36:39 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727417AbgCTUgh (ORCPT ); Fri, 20 Mar 2020 16:36:37 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:61954 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727403AbgCTUgg (ORCPT ); Fri, 20 Mar 2020 16:36:36 -0400 Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 02KKNYJC021758 for ; Fri, 20 Mar 2020 13:36:35 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=jBOZTxxyHXc9C3gt33ohnDX1FUpNLP+SITQ4TEpB8io=; b=EQxRSEUQGMLpGlM7osozrK+UzTsGpKf8eDJo0sH6d0oT+df7WBRwTU/e/xWZwKkCoqCt QPql3CGe1rv21kTNTkge+iPIKrw9vzoMhPO3PUQPs5+2kaVquVwp2qUaVCpXmSSP4mrc H2v5sjsdIfBALjkGAO9RHdIeEfUJYPJGNX4= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com with ESMTP id 2yua0x7xma-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 20 Mar 2020 13:36:35 -0700 Received: from intmgw001.03.ash8.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:21d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1847.3; Fri, 20 Mar 2020 13:36:33 -0700 Received: by devbig012.ftw2.facebook.com (Postfix, from userid 137359) id 445BA2EC2E57; Fri, 20 Mar 2020 13:36:28 -0700 (PDT) Smtp-Origin-Hostprefix: devbig From: Andrii Nakryiko Smtp-Origin-Hostname: devbig012.ftw2.facebook.com To: , , , CC: , , Andrii Nakryiko Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link Date: Fri, 20 Mar 2020 13:36:12 -0700 Message-ID: <20200320203615.1519013-5-andriin@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200320203615.1519013-1-andriin@fb.com> References: <20200320203615.1519013-1-andriin@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.645 definitions=2020-03-20_07:2020-03-20,2020-03-20 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxlogscore=999 priorityscore=1501 impostorscore=0 bulkscore=0 adultscore=0 lowpriorityscore=0 phishscore=0 malwarescore=0 spamscore=0 clxscore=1015 suspectscore=8 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2003200081 X-FB-Internal: deliver Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add new operation (LINK_UPDATE), which allows to replace active bpf_prog from under given bpf_link. Currently this is only supported for bpf_cgroup_link, but will be extended to other kinds of bpf_links in follow-up patches. For bpf_cgroup_link, implemented functionality matches existing semantics for direct bpf_prog attachment (including BPF_F_REPLACE flag). User can either unconditionally set new bpf_prog regardless of which bpf_prog is currently active under given bpf_link, or, optionally, can specify expected active bpf_prog. If active bpf_prog doesn't match expected one, operation is a noop and returns a failure. cgroup_bpf_replace() operation is resolving race between auto-detachment and bpf_prog update in the same fashion as it's done for bpf_link detachment, except in this case update has no way of succeeding because of target cgroup marked as dying. So in this case error is returned. Signed-off-by: Andrii Nakryiko Reported-by: kbuild test robot Reported-by: kbuild test robot --- include/linux/bpf-cgroup.h | 4 ++ include/uapi/linux/bpf.h | 12 ++++++ kernel/bpf/cgroup.c | 77 ++++++++++++++++++++++++++++++++++++++ kernel/bpf/syscall.c | 60 +++++++++++++++++++++++++++++ kernel/cgroup/cgroup.c | 21 +++++++++++ 5 files changed, 174 insertions(+) diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index ab95824a1d99..5735d8bfd69e 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -98,6 +98,8 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, struct bpf_cgroup_link *link, enum bpf_attach_type type); +int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link, + struct bpf_prog *new_prog); int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr, union bpf_attr __user *uattr); @@ -108,6 +110,8 @@ int cgroup_bpf_attach(struct cgroup *cgrp, u32 flags); int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, enum bpf_attach_type type); +int cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link, + struct bpf_prog *old_prog, struct bpf_prog *new_prog); int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr, union bpf_attr __user *uattr); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index fad9f79bb8f1..fa944093f9fc 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -112,6 +112,7 @@ enum bpf_cmd { BPF_MAP_UPDATE_BATCH, BPF_MAP_DELETE_BATCH, BPF_LINK_CREATE, + BPF_LINK_UPDATE, }; enum bpf_map_type { @@ -574,6 +575,17 @@ union bpf_attr { __u32 target_fd; /* object to attach to */ __u32 attach_type; /* attach type */ } link_create; + + struct { /* struct used by BPF_LINK_UPDATE command */ + __u32 link_fd; /* link fd */ + /* new program fd to update link with */ + __u32 new_prog_fd; + __u32 flags; /* extra flags */ + /* expected link's program fd; is specified only if + * BPF_F_REPLACE flag is set in flags */ + __u32 old_prog_fd; + } link_update; + } __attribute__((aligned(8))); /* The description below is an attempt at providing documentation to eBPF diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index b960e8633f23..b9f4971336f3 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -501,6 +501,83 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, return err; } +/* Swap updated BPF program for given link in effective program arrays across + * all descendant cgroups. This function is guaranteed to succeed. + */ +static void replace_effective_prog(struct cgroup *cgrp, + enum bpf_attach_type type, + struct bpf_cgroup_link *link) +{ + struct bpf_prog_array_item *item; + struct cgroup_subsys_state *css; + struct bpf_prog_array *progs; + struct bpf_prog_list *pl; + struct list_head *head; + struct cgroup *cg; + int pos; + + css_for_each_descendant_pre(css, &cgrp->self) { + struct cgroup *desc = container_of(css, struct cgroup, self); + + if (percpu_ref_is_zero(&desc->bpf.refcnt)) + continue; + + /* found position of link in effective progs array */ + for (pos = 0, cg = desc; cg; cg = cgroup_parent(cg)) { + if (pos && !(cg->bpf.flags[type] & BPF_F_ALLOW_MULTI)) + continue; + + head = &cg->bpf.progs[type]; + list_for_each_entry(pl, head, node) { + if (!prog_list_prog(pl)) + continue; + if (pl->link == link) + goto found; + pos++; + } + } +found: + BUG_ON(!cg); + progs = rcu_dereference_protected( + desc->bpf.effective[type], + lockdep_is_held(&cgroup_mutex)); + item = &progs->items[pos]; + WRITE_ONCE(item->prog, link->link.prog); + } +} + +/** + * __cgroup_bpf_replace() - Replace link's program and propagate the change + * to descendants + * @cgrp: The cgroup which descendants to traverse + * @link: A link for which to replace BPF program + * @type: Type of attach operation + * + * Must be called with cgroup_mutex held. + */ +int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link, + struct bpf_prog *new_prog) +{ + struct list_head *progs = &cgrp->bpf.progs[link->type]; + struct bpf_prog *old_prog; + struct bpf_prog_list *pl; + bool found = false; + + list_for_each_entry(pl, progs, node) { + if (pl->link == link) { + found = true; + break; + } + } + if (!found) + return -ENOENT; + + old_prog = xchg(&link->link.prog, new_prog); + replace_effective_prog(cgrp, link->type, link); + bpf_prog_put(old_prog); + return 0; +} + static struct bpf_prog_list *find_detach_entry(struct list_head *progs, struct bpf_prog *prog, struct bpf_cgroup_link *link, diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index f6e7d32a2632..1ff7aaa2c727 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -3572,6 +3572,63 @@ static int link_create(union bpf_attr *attr) return ret; } +#define BPF_LINK_UPDATE_LAST_FIELD link_update.old_prog_fd + +static int link_update(union bpf_attr *attr) +{ + struct bpf_prog *old_prog = NULL, *new_prog; + enum bpf_prog_type ptype; + struct bpf_link *link; + u32 flags; + int ret; + + if (CHECK_ATTR(BPF_LINK_UPDATE)) + return -EINVAL; + + flags = attr->link_update.flags; + if (flags & ~BPF_F_REPLACE) + return -EINVAL; + + link = bpf_link_get_from_fd(attr->link_update.link_fd); + if (IS_ERR(link)) + return PTR_ERR(link); + + new_prog = bpf_prog_get(attr->link_update.new_prog_fd); + if (IS_ERR(new_prog)) + return PTR_ERR(new_prog); + + if (flags & BPF_F_REPLACE) { + old_prog = bpf_prog_get(attr->link_update.old_prog_fd); + if (IS_ERR(old_prog)) { + ret = PTR_ERR(old_prog); + old_prog = NULL; + goto out_put_progs; + } + } + + if (link->ops == &bpf_cgroup_link_lops) { + struct bpf_cgroup_link *cg_link; + + cg_link = container_of(link, struct bpf_cgroup_link, link); + ptype = attach_type_to_prog_type(cg_link->type); + if (ptype != new_prog->type) { + ret = -EINVAL; + goto out_put_progs; + } + ret = cgroup_bpf_replace(cg_link->cgroup, cg_link, + old_prog, new_prog); + } else { + ret = -EINVAL; + } + +out_put_progs: + if (old_prog) + bpf_prog_put(old_prog); + if (ret) + bpf_prog_put(new_prog); + return ret; +} + SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) { union bpf_attr attr = {}; @@ -3685,6 +3742,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz case BPF_LINK_CREATE: err = link_create(&attr); break; + case BPF_LINK_UPDATE: + err = link_update(&attr); + break; default: err = -EINVAL; break; diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 219624fba9ba..d4787fccf183 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -6317,6 +6317,27 @@ int cgroup_bpf_attach(struct cgroup *cgrp, return ret; } +int cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link, + struct bpf_prog *old_prog, struct bpf_prog *new_prog) +{ + int ret; + + mutex_lock(&cgroup_mutex); + /* link might have been auto-released by dying cgroup, so fail */ + if (!link->cgroup) { + ret = -EINVAL; + goto out_unlock; + } + if (old_prog && link->link.prog != old_prog) { + ret = -EPERM; + goto out_unlock; + } + ret = __cgroup_bpf_replace(cgrp, link, new_prog); +out_unlock: + mutex_unlock(&cgroup_mutex); + return ret; +} + int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, enum bpf_attach_type type) { From patchwork Fri Mar 20 20:36:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 1259230 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.a=rsa-sha256 header.s=facebook header.b=ouSFccuc; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48kbCl1VvXz9sQt for ; Sat, 21 Mar 2020 07:36:35 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727401AbgCTUge (ORCPT ); Fri, 20 Mar 2020 16:36:34 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:14436 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727362AbgCTUgd (ORCPT ); Fri, 20 Mar 2020 16:36:33 -0400 Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 02KKOWFq022043 for ; Fri, 20 Mar 2020 13:36:32 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=8gWIrE8T78c4u0GC0cchtfxIiVPFfR1/451i/Fe6/kg=; b=ouSFccucJ5i/gbMIx1qHP7vzjxT7Ked1WXQ+NttFZwEEv7PQZT5kP4e/qq4pnn0pTD9z coz0rVYa7RVPqzbjdvujML+dANdG/yjD9zf3N137JNSDGxa0hNO1mwUHoYH7BEKjrQ9r 3RpfuQele4gYSKBb4nURJvGkCOyv5jQExsI= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com with ESMTP id 2yu9aw01ev-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 20 Mar 2020 13:36:32 -0700 Received: from intmgw003.08.frc2.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:21d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1847.3; Fri, 20 Mar 2020 13:36:32 -0700 Received: by devbig012.ftw2.facebook.com (Postfix, from userid 137359) id 752952EC2E57; Fri, 20 Mar 2020 13:36:30 -0700 (PDT) Smtp-Origin-Hostprefix: devbig From: Andrii Nakryiko Smtp-Origin-Hostname: devbig012.ftw2.facebook.com To: , , , CC: , , Andrii Nakryiko Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH bpf-next 5/6] libbpf: add support for bpf_link-based cgroup attachment Date: Fri, 20 Mar 2020 13:36:13 -0700 Message-ID: <20200320203615.1519013-6-andriin@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200320203615.1519013-1-andriin@fb.com> References: <20200320203615.1519013-1-andriin@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.645 definitions=2020-03-20_07:2020-03-20,2020-03-20 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 lowpriorityscore=0 bulkscore=0 mlxscore=0 malwarescore=0 clxscore=1015 adultscore=0 suspectscore=25 impostorscore=0 spamscore=0 priorityscore=1501 mlxlogscore=999 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2003200081 X-FB-Internal: deliver Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add bpf_program__attach_cgroup(), which uses BPF_LINK_CREATE subcommand to create an FD-based kernel bpf_link. Also add low-level bpf_link_create() API. If expected_attach_type is not specified explicitly with bpf_program__set_expected_attach_type(), libbpf will try to determine proper attach type from BPF program's section definition. Also add support for bpf_link's underlying BPF program replacement: - unconditional through high-level bpf_link__update_program() API; - cmpxchg-like with specifying expected current BPF program through low-level bpf_link_update() API. Signed-off-by: Andrii Nakryiko --- tools/include/uapi/linux/bpf.h | 12 +++++++++ tools/lib/bpf/bpf.c | 34 +++++++++++++++++++++++++ tools/lib/bpf/bpf.h | 19 ++++++++++++++ tools/lib/bpf/libbpf.c | 46 ++++++++++++++++++++++++++++++++++ tools/lib/bpf/libbpf.h | 8 +++++- tools/lib/bpf/libbpf.map | 4 +++ 6 files changed, 122 insertions(+), 1 deletion(-) diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index fad9f79bb8f1..fa944093f9fc 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -112,6 +112,7 @@ enum bpf_cmd { BPF_MAP_UPDATE_BATCH, BPF_MAP_DELETE_BATCH, BPF_LINK_CREATE, + BPF_LINK_UPDATE, }; enum bpf_map_type { @@ -574,6 +575,17 @@ union bpf_attr { __u32 target_fd; /* object to attach to */ __u32 attach_type; /* attach type */ } link_create; + + struct { /* struct used by BPF_LINK_UPDATE command */ + __u32 link_fd; /* link fd */ + /* new program fd to update link with */ + __u32 new_prog_fd; + __u32 flags; /* extra flags */ + /* expected link's program fd; is specified only if + * BPF_F_REPLACE flag is set in flags */ + __u32 old_prog_fd; + } link_update; + } __attribute__((aligned(8))); /* The description below is an attempt at providing documentation to eBPF diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c index c6dafe563176..35c34fc81bd0 100644 --- a/tools/lib/bpf/bpf.c +++ b/tools/lib/bpf/bpf.c @@ -584,6 +584,40 @@ int bpf_prog_detach2(int prog_fd, int target_fd, enum bpf_attach_type type) return sys_bpf(BPF_PROG_DETACH, &attr, sizeof(attr)); } +int bpf_link_create(int prog_fd, int target_fd, + enum bpf_attach_type attach_type, + const struct bpf_link_create_opts *opts) +{ + union bpf_attr attr; + + if (!OPTS_VALID(opts, bpf_link_create_opts)) + return -EINVAL; + + memset(&attr, 0, sizeof(attr)); + attr.link_create.prog_fd = prog_fd; + attr.link_create.target_fd = target_fd; + attr.link_create.attach_type = attach_type; + + return sys_bpf(BPF_LINK_CREATE, &attr, sizeof(attr)); +} + +int bpf_link_update(int link_fd, int new_prog_fd, + const struct bpf_link_update_opts *opts) +{ + union bpf_attr attr; + + if (!OPTS_VALID(opts, bpf_link_update_opts)) + return -EINVAL; + + memset(&attr, 0, sizeof(attr)); + attr.link_update.link_fd = link_fd; + attr.link_update.new_prog_fd = new_prog_fd; + attr.link_update.flags = OPTS_GET(opts, flags, 0); + attr.link_update.old_prog_fd = OPTS_GET(opts, old_prog_fd, 0); + + return sys_bpf(BPF_LINK_UPDATE, &attr, sizeof(attr)); +} + int bpf_prog_query(int target_fd, enum bpf_attach_type type, __u32 query_flags, __u32 *attach_flags, __u32 *prog_ids, __u32 *prog_cnt) { diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index b976e77316cc..46d47afdd887 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -168,6 +168,25 @@ LIBBPF_API int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type); LIBBPF_API int bpf_prog_detach2(int prog_fd, int attachable_fd, enum bpf_attach_type type); +struct bpf_link_create_opts { + size_t sz; /* size of this struct for forward/backward compatibility */ +}; +#define bpf_link_create_opts__last_field sz + +LIBBPF_API int bpf_link_create(int prog_fd, int target_fd, + enum bpf_attach_type attach_type, + const struct bpf_link_create_opts *opts); + +struct bpf_link_update_opts { + size_t sz; /* size of this struct for forward/backward compatibility */ + __u32 flags; /* extra flags */ + __u32 old_prog_fd; /* expected old program FD */ +}; +#define bpf_link_update_opts__last_field old_prog_fd + +LIBBPF_API int bpf_link_update(int link_fd, int new_prog_fd, + const struct bpf_link_update_opts *opts); + struct bpf_prog_test_run_attr { int prog_fd; int repeat; diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 085e41f9b68e..8b23c70033d3 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -6951,6 +6951,12 @@ struct bpf_link { bool disconnected; }; +/* Replace link's underlying BPF program with the new one */ +int bpf_link__update_program(struct bpf_link *link, struct bpf_program *prog) +{ + return bpf_link_update(bpf_link__fd(link), bpf_program__fd(prog), NULL); +} + /* Release "ownership" of underlying BPF resource (typically, BPF program * attached to some BPF hook, e.g., tracepoint, kprobe, etc). Disconnected * link, when destructed through bpf_link__destroy() call won't attempt to @@ -7489,6 +7495,46 @@ static struct bpf_link *attach_trace(const struct bpf_sec_def *sec, return bpf_program__attach_trace(prog); } +struct bpf_link *bpf_program__attach_cgroup(struct bpf_program *prog, + int cgroup_fd) +{ + const struct bpf_sec_def *sec_def; + enum bpf_attach_type attach_type; + char errmsg[STRERR_BUFSIZE]; + struct bpf_link *link; + int prog_fd, link_fd; + + prog_fd = bpf_program__fd(prog); + if (prog_fd < 0) { + pr_warn("program '%s': can't attach before loaded\n", + bpf_program__title(prog, false)); + return ERR_PTR(-EINVAL); + } + + link = calloc(1, sizeof(*link)); + if (!link) + return ERR_PTR(-ENOMEM); + link->detach = &bpf_link__detach_fd; + + attach_type = bpf_program__get_expected_attach_type(prog); + if (!attach_type) { + sec_def = find_sec_def(bpf_program__title(prog, false)); + if (sec_def) + attach_type = sec_def->attach_type; + } + link_fd = bpf_link_create(prog_fd, cgroup_fd, attach_type, NULL); + if (link_fd < 0) { + link_fd = -errno; + free(link); + pr_warn("program '%s': failed to attach to cgroup: %s\n", + bpf_program__title(prog, false), + libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg))); + return ERR_PTR(link_fd); + } + link->fd = link_fd; + return link; +} + struct bpf_link *bpf_program__attach(struct bpf_program *prog) { const struct bpf_sec_def *sec_def; diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index d38d7a629417..af9ececb8ae9 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -224,6 +224,8 @@ LIBBPF_API int bpf_link__fd(const struct bpf_link *link); LIBBPF_API const char *bpf_link__pin_path(const struct bpf_link *link); LIBBPF_API int bpf_link__pin(struct bpf_link *link, const char *path); LIBBPF_API int bpf_link__unpin(struct bpf_link *link); +LIBBPF_API int bpf_link__update_program(struct bpf_link *link, + struct bpf_program *prog); LIBBPF_API void bpf_link__disconnect(struct bpf_link *link); LIBBPF_API int bpf_link__destroy(struct bpf_link *link); @@ -245,11 +247,15 @@ bpf_program__attach_tracepoint(struct bpf_program *prog, LIBBPF_API struct bpf_link * bpf_program__attach_raw_tracepoint(struct bpf_program *prog, const char *tp_name); - LIBBPF_API struct bpf_link * bpf_program__attach_trace(struct bpf_program *prog); +LIBBPF_API struct bpf_link * +bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd); + struct bpf_map; + LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(struct bpf_map *map); + struct bpf_insn; /* diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map index 5129283c0284..793c5af07b23 100644 --- a/tools/lib/bpf/libbpf.map +++ b/tools/lib/bpf/libbpf.map @@ -243,5 +243,9 @@ LIBBPF_0.0.8 { bpf_link__pin; bpf_link__pin_path; bpf_link__unpin; + bpf_link__update_program; + bpf_link_create; + bpf_link_update; + bpf_program__attach_cgroup; bpf_program__set_attach_target; } LIBBPF_0.0.7; From patchwork Fri Mar 20 20:36:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 1259237 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.a=rsa-sha256 header.s=facebook header.b=EO4xqcfz; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48kbCy67mbz9sQt for ; Sat, 21 Mar 2020 07:36:46 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727437AbgCTUgq (ORCPT ); Fri, 20 Mar 2020 16:36:46 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:10122 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727421AbgCTUgp (ORCPT ); Fri, 20 Mar 2020 16:36:45 -0400 Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 02KKNbft021812 for ; Fri, 20 Mar 2020 13:36:44 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=xb2zNQQ9s+9OAa5gQk3OMr9/2hfZTnETJ+AePiA2oeU=; b=EO4xqcfzoPfJAPa5hb74qsBtETfo3xQaXLHY7/7hcjuqn5FHmjBw0BNh+bUVz2VUUR+H Pd8BmfkBRJvTNCeJLc/UNDDQV7VwzIk9lbe/4NKxzB4I1nUDbsvy47lquqqm4VNvV630 lEZ8/W9s4sfA20u6OIqpVEvgcfSzPZcAITQ= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com with ESMTP id 2yua0x7xn8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 20 Mar 2020 13:36:44 -0700 Received: from intmgw001.08.frc2.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:21d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1847.3; Fri, 20 Mar 2020 13:36:43 -0700 Received: by devbig012.ftw2.facebook.com (Postfix, from userid 137359) id A541E2EC2E57; Fri, 20 Mar 2020 13:36:32 -0700 (PDT) Smtp-Origin-Hostprefix: devbig From: Andrii Nakryiko Smtp-Origin-Hostname: devbig012.ftw2.facebook.com To: , , , CC: , , Andrii Nakryiko Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH bpf-next 6/6] selftests/bpf: test FD-based cgroup attachment Date: Fri, 20 Mar 2020 13:36:14 -0700 Message-ID: <20200320203615.1519013-7-andriin@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200320203615.1519013-1-andriin@fb.com> References: <20200320203615.1519013-1-andriin@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.645 definitions=2020-03-20_07:2020-03-20,2020-03-20 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxlogscore=869 priorityscore=1501 impostorscore=0 bulkscore=0 adultscore=0 lowpriorityscore=0 phishscore=0 malwarescore=0 spamscore=0 clxscore=1015 suspectscore=8 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2003200081 X-FB-Internal: deliver Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add selftests to exercise FD-based cgroup BPF program attachments and their intermixing with legacy stateful cgroup BPF attachments. Auto-detachment and program replacement (both unconditional and cmpxchng-like) are tested as well. Signed-off-by: Andrii Nakryiko --- .../selftests/bpf/prog_tests/cgroup_link.c | 242 ++++++++++++++++++ .../selftests/bpf/progs/test_cgroup_link.c | 24 ++ 2 files changed, 266 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/cgroup_link.c create mode 100644 tools/testing/selftests/bpf/progs/test_cgroup_link.c diff --git a/tools/testing/selftests/bpf/prog_tests/cgroup_link.c b/tools/testing/selftests/bpf/prog_tests/cgroup_link.c new file mode 100644 index 000000000000..9596aa5f2c07 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/cgroup_link.c @@ -0,0 +1,242 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include "cgroup_helpers.h" +#include "test_cgroup_link.skel.h" + +static __u32 duration = 0; +#define PING_CMD "ping -q -c1 -w1 127.0.0.1 > /dev/null" + +void test_cgroup_link(void) +{ + struct test_cgroup_link *skel; + struct { + const char *path; + int fd; + } cgs[] = { + { "/cg1" }, + { "/cg1/cg2" }, + { "/cg1/cg2/cg3" }, + }; + int last_cg = ARRAY_SIZE(cgs) - 1, cg_nr = ARRAY_SIZE(cgs); + DECLARE_LIBBPF_OPTS(bpf_link_update_opts, link_upd_opts); + struct bpf_link *links[ARRAY_SIZE(cgs)] = {}; + __u32 prog_ids[5], prog_cnt = 0, attach_flags; + int i = 0, err, prog_fd; + bool detach_legacy = false; + + skel = test_cgroup_link__open_and_load(); + if (CHECK(!skel, "skel_open_load", "failed to open/load skeleton\n")) + return; + prog_fd = bpf_program__fd(skel->progs.egress); + + err = setup_cgroup_environment(); + if (CHECK(err, "cg_init", "failed: %d\n", err)) + goto cleanup; + + for (i = 0; i < cg_nr; i++) { + cgs[i].fd = create_and_get_cgroup(cgs[i].path); + if (CHECK(cgs[i].fd < 0, "cg_create", "fail: %d\n", cgs[i].fd)) + goto cleanup; + } + + err = join_cgroup(cgs[last_cg].path); + if (CHECK(err, "cg_join", "fail: %d\n", err)) + goto cleanup; + + for (i = 0; i < cg_nr; i++) { + links[i] = bpf_program__attach_cgroup(skel->progs.egress, + cgs[i].fd); + if (CHECK(IS_ERR(links[i]), "cg_attach", "i: %d, err: %ld\n", + i, PTR_ERR(links[i]))) + goto cleanup; + } + + CHECK_FAIL(system(PING_CMD)); + if (CHECK(skel->bss->calls != cg_nr, "call_cnt", "exp %d, got %d\n", + cg_nr, skel->bss->calls)) + goto cleanup; + + /* query the number of effective progs in last cg */ + CHECK_FAIL(bpf_prog_query(cgs[last_cg].fd, BPF_CGROUP_INET_EGRESS, + BPF_F_QUERY_EFFECTIVE, NULL, NULL, + &prog_cnt)); + if (CHECK(prog_cnt != cg_nr, "effect_cnt", "exp %d, got %d\n", + cg_nr, prog_cnt)) + goto cleanup; + /* query the effective prog IDs in last cg */ + CHECK_FAIL(bpf_prog_query(cgs[last_cg].fd, BPF_CGROUP_INET_EGRESS, + BPF_F_QUERY_EFFECTIVE, &attach_flags, + prog_ids, &prog_cnt)); + if (CHECK(prog_cnt != cg_nr, "effect_cnt", "exp %d, got %d\n", + cg_nr, prog_cnt)) + goto cleanup; + CHECK_FAIL(attach_flags != BPF_F_ALLOW_MULTI); + for (i = 1; i < prog_cnt; i++) + CHECK(prog_ids[i - 1] != prog_ids[i], "prod_id_check", + "idx %d, prev id %d, cur id %d\n", + i, prog_ids[i - 1], prog_ids[i]); + + /* detach bottom program and ping again */ + bpf_link__destroy(links[last_cg]); + links[last_cg] = NULL; + + skel->bss->calls = 0; + CHECK_FAIL(system(PING_CMD)); + if (CHECK(skel->bss->calls != cg_nr - 1, "call_cnt", "exp %d, got %d\n", + cg_nr - 1, skel->bss->calls)) + goto cleanup; + + /* mix in with non link-based multi-attachments */ + err = bpf_prog_attach(prog_fd, cgs[last_cg].fd, + BPF_CGROUP_INET_EGRESS, BPF_F_ALLOW_MULTI); + if (CHECK(err, "cg_attach_legacy", "errno=%d\n", errno)) + goto cleanup; + detach_legacy = true; + + links[last_cg] = bpf_program__attach_cgroup(skel->progs.egress, + cgs[last_cg].fd); + if (CHECK(IS_ERR(links[last_cg]), "cg_attach", "err: %ld\n", + PTR_ERR(links[last_cg]))) + goto cleanup; + + skel->bss->calls = 0; + CHECK_FAIL(system(PING_CMD)); + CHECK(skel->bss->calls != cg_nr + 1, "call_cnt", "exp %d, got %d\n", + cg_nr + 1, skel->bss->calls); + + bpf_link__destroy(links[last_cg]); + links[last_cg] = NULL; + + if (CHECK(bpf_prog_detach2(prog_fd, cgs[last_cg].fd, + BPF_CGROUP_INET_EGRESS), "cg_detach_legacy", + "errno=%d\n", errno)) + goto cleanup; + detach_legacy = false; + + /* attempt to mix in with legacy exclusive prog attachment */ + err = bpf_prog_attach(prog_fd, cgs[last_cg].fd, + BPF_CGROUP_INET_EGRESS, 0); + if (CHECK(err, "cg_attach_exclusive", "errno=%d\n", errno)) + goto cleanup; + detach_legacy = true; + + links[last_cg] = bpf_program__attach_cgroup(skel->progs.egress, + cgs[last_cg].fd); + if (CHECK(!IS_ERR(links[last_cg]), "cg_attach_fail", "unexpected success\n")) + goto cleanup; + + skel->bss->calls = 0; + CHECK_FAIL(system(PING_CMD)); + CHECK(skel->bss->calls != cg_nr, "call_cnt", "exp %d, got %d\n", + cg_nr, skel->bss->calls); + + if (CHECK(bpf_prog_detach2(prog_fd, cgs[last_cg].fd, BPF_CGROUP_INET_EGRESS), + "cg_detach_exclusive", "errno=%d\n", errno)) + goto cleanup; + detach_legacy = false; + + /* re-attach last bpf_link to finish off test */ + links[last_cg] = bpf_program__attach_cgroup(skel->progs.egress, + cgs[last_cg].fd); + if (CHECK(IS_ERR(links[last_cg]), "cg_attach", "err: %ld\n", + PTR_ERR(links[last_cg]))) + goto cleanup; + + skel->bss->calls = 0; + CHECK_FAIL(system(PING_CMD)); + if (CHECK(skel->bss->calls != cg_nr, "call_cnt", "exp %d, got %d\n", + cg_nr, skel->bss->calls)) + goto cleanup; + if (CHECK(skel->bss->alt_calls != 0, "alt_call_cnt", "exp %d, got %d\n", + 0, skel->bss->alt_calls)) + goto cleanup; + + /* replace BPF programs inside their links for all but first link */ + for (i = 1; i < cg_nr; i++) { + err = bpf_link__update_program(links[i], skel->progs.egress_alt); + if (CHECK(err, "prog_upd", "link #%d\n", i)) + goto cleanup; + } + + skel->bss->calls = 0; + skel->bss->alt_calls = 0; + CHECK_FAIL(system(PING_CMD)); + if (CHECK(skel->bss->calls != 1, "call_cnt", + "exp %d, got %d\n", 1, skel->bss->calls)) + goto cleanup; + if (CHECK(skel->bss->alt_calls != cg_nr - 1, "alt_call_cnt", + "exp %d, got %d\n", cg_nr - 1, skel->bss->alt_calls)) + goto cleanup; + + /* Attempt program update with wrong expected BPF program */ + link_upd_opts.old_prog_fd = bpf_program__fd(skel->progs.egress_alt); + link_upd_opts.flags = BPF_F_REPLACE; + err = bpf_link_update(bpf_link__fd(links[0]), + bpf_program__fd(skel->progs.egress_alt), + &link_upd_opts); + if (CHECK(err == 0 || errno != EPERM, "prog_cmpxchg1", + "unexpectedly succeeded, err %d, errno %d\n", err, -errno)) + goto cleanup; + + /* Compare-exchange single link program from egress to egress_alt */ + link_upd_opts.old_prog_fd = bpf_program__fd(skel->progs.egress); + link_upd_opts.flags = BPF_F_REPLACE; + err = bpf_link_update(bpf_link__fd(links[0]), + bpf_program__fd(skel->progs.egress_alt), + &link_upd_opts); + if (CHECK(err, "prog_cmpxchg2", "errno %d\n", -errno)) + goto cleanup; + + skel->bss->calls = 0; + skel->bss->alt_calls = 0; + CHECK_FAIL(system(PING_CMD)); + if (CHECK(skel->bss->calls != 0, "call_cnt", + "exp %d, got %d\n", 0, skel->bss->calls)) + goto cleanup; + if (CHECK(skel->bss->alt_calls != cg_nr, "alt_call_cnt", + "exp %d, got %d\n", cg_nr, skel->bss->alt_calls)) + goto cleanup; + + /* close cgroup FDs before detaching links */ + for (i = 0; i < cg_nr; i++) { + if (cgs[i].fd > 0) { + close(cgs[i].fd); + cgs[i].fd = -1; + } + } + + /* BPF programs should still get called */ + skel->bss->alt_calls = 0; + CHECK_FAIL(system(PING_CMD)); + if (CHECK(skel->bss->alt_calls != cg_nr, "alt_all_cnt", + "exp %d, got %d\n", cg_nr, skel->bss->alt_calls)) + goto cleanup; + + /* leave cgroup and remove them, don't detach programs */ + cleanup_cgroup_environment(); + + /* BPF programs should have been auto-detached */ + skel->bss->alt_calls = 0; + CHECK_FAIL(system(PING_CMD)); + if (CHECK(skel->bss->alt_calls, "alt_call_cnt", "exp %d, got %d\n", + cg_nr, skel->bss->alt_calls)) + goto cleanup; + +cleanup: + if (detach_legacy) + bpf_prog_detach2(prog_fd, cgs[last_cg].fd, + BPF_CGROUP_INET_EGRESS); + + for (i = 0; i < cg_nr; i++) { + if (!IS_ERR(links[i])) + bpf_link__destroy(links[i]); + } + test_cgroup_link__destroy(skel); + + for (i = 0; i < cg_nr; i++) { + if (cgs[i].fd > 0) + close(cgs[i].fd); + } + cleanup_cgroup_environment(); +} diff --git a/tools/testing/selftests/bpf/progs/test_cgroup_link.c b/tools/testing/selftests/bpf/progs/test_cgroup_link.c new file mode 100644 index 000000000000..77e47b9e4446 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_cgroup_link.c @@ -0,0 +1,24 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2020 Facebook +#include +#include + +int calls = 0; +int alt_calls = 0; + +SEC("cgroup_skb/egress1") +int egress(struct __sk_buff *skb) +{ + __sync_fetch_and_add(&calls, 1); + return 1; +} + +SEC("cgroup_skb/egress2") +int egress_alt(struct __sk_buff *skb) +{ + __sync_fetch_and_add(&alt_calls, 1); + return 1; +} + +char _license[] SEC("license") = "GPL"; +