From patchwork Mon Jul 20 19:54:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: YiFei Zhu X-Patchwork-Id: 1332623 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=NPwWMrsn; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4B9XWf1BQcz9sRR for ; Tue, 21 Jul 2020 05:55:10 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730136AbgGTTzJ (ORCPT ); Mon, 20 Jul 2020 15:55:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50084 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726619AbgGTTzI (ORCPT ); Mon, 20 Jul 2020 15:55:08 -0400 Received: from mail-il1-x144.google.com (mail-il1-x144.google.com [IPv6:2607:f8b0:4864:20::144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA848C061794 for ; Mon, 20 Jul 2020 12:55:08 -0700 (PDT) Received: by mail-il1-x144.google.com with SMTP id q3so14381496ilt.8 for ; Mon, 20 Jul 2020 12:55:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=K0P8SGFPnoqyVi26FeXQVELcv5US6/6JL1LqKWbrJiA=; b=NPwWMrsnCxov60htjMih9G9jOxyXxuM9hn4j9jHq3tpDbVJwU3DSPwQNxdg4LLtZj2 yuRPiv++6h8foVSsL2AbHGevecGFBuGHlmHf4TsiXQOsn1mH/RSaZaAV+2bwOh0dMDQo gB6YKkNInBKl6AYuegHv++WrecJ5UvKIFxciKQdUOjznzbFSKua1HDeJlXxFuyt5T/WM FdYQvC+FqU1NO0sdYwtIeUgtpFUZvm838sb5ghQIvcYVLz6HrhAaIsQw5bxck7LA4fB0 My5vtFmtDCdXpdEC0HcXHKnb6NpIti1Dpm7R1R33uzFaELNHJYNI/0KSWwG9w13FuLqg U9JA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=K0P8SGFPnoqyVi26FeXQVELcv5US6/6JL1LqKWbrJiA=; b=S99SOXxskaBuRvem1KmkbUUgiRFceYQSCBpM2siTg5ZTjBJYnh1F2FhkrcQAQOwAAL S1y7JAoSvau7+bqXdWMhsIcgwOTkkgbYyULEVv8zFhaj5B9FAK7Wxbkd+x560Ne6vpPo QjuVW3xhWgpPAALnsALLg6+x8fYh/Nyz57R9DKMZRmQy9jyXesnkLdvYVkb2P2aV9CkF hUON8pAjSO8oU3dWylM7w9h229XhfF6YnbsRkyWxixC84AteUm4T+2d6bHMXE9Q5t7Pw 9ktUjOwvrRGXwLuNZeTwy1ygP5I0c0ZtvI03Hd6U549QXj9W/LvK4iu3NFpBo0FwletQ mH9Q== X-Gm-Message-State: AOAM533SSaHeEpeGKZaIY7pg0aXTZSoxd9nW7jrhkGHD28qXrygMzjqm 5+bBNL+uKYy3giTG+RtGJkJuQIXuF65UQA== X-Google-Smtp-Source: ABdhPJwVJoD+dXf7EJY6NUnUZuKMpOMHhGk5tl2Gtiba+6Skn4ubmbfsjJvBNIjmTOPa0dX9xxZQ3Q== X-Received: by 2002:a05:6e02:588:: with SMTP id c8mr21878348ils.253.1595274906438; Mon, 20 Jul 2020 12:55:06 -0700 (PDT) Received: from localhost.localdomain (host-173-230-99-219.tnkngak.clients.pavlovmedia.com. [173.230.99.219]) by smtp.gmail.com with ESMTPSA id v10sm9347174ilj.40.2020.07.20.12.55.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Jul 2020 12:55:06 -0700 (PDT) From: YiFei Zhu To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Daniel Borkmann , Stanislav Fomichev , Mahesh Bandewar , Roman Gushchin , Andrii Nakryiko , Martin KaFai Lau , YiFei Zhu Subject: [PATCH v4 bpf-next 1/5] selftests/bpf: Add test for CGROUP_STORAGE map on multiple attaches Date: Mon, 20 Jul 2020 14:54:51 -0500 Message-Id: <3d22f5e42fbc27624f612d827ed872ca95cac500.1595274799.git.zhuyifei@google.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org From: YiFei Zhu This test creates a parent cgroup, and a child of that cgroup. It attaches a cgroup_skb/egress program that simply counts packets, to a global variable (ARRAY map), and to a CGROUP_STORAGE map. The program is first attached to the parent cgroup only, then to parent and child. The test cases sends a message within the child cgroup, and because the program is inherited across parent / child cgroups, it will trigger the egress program for both the parent and child, if they exist. The program, when looking up a CGROUP_STORAGE map, uses the cgroup and attach type of the attachment parameters; therefore, both attaches uses different cgroup storages. We assert that all packet counts returns what we expects. Signed-off-by: YiFei Zhu --- .../bpf/prog_tests/cg_storage_multi.c | 163 ++++++++++++++++++ .../bpf/progs/cg_storage_multi_egress_only.c | 30 ++++ 2 files changed, 193 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c create mode 100644 tools/testing/selftests/bpf/progs/cg_storage_multi_egress_only.c diff --git a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c new file mode 100644 index 000000000000..6d5a2194e036 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c @@ -0,0 +1,163 @@ +// SPDX-License-Identifier: GPL-2.0-only + +/* + * Copyright 2020 Google LLC. + */ + +#include +#include +#include + +#include "cg_storage_multi_egress_only.skel.h" + +#define PARENT_CGROUP "/cgroup_storage" +#define CHILD_CGROUP "/cgroup_storage/child" + +static int duration; + +static bool assert_storage(struct bpf_map *map, const char *cgroup_path, + __u32 expected) +{ + struct bpf_cgroup_storage_key key = {0}; + __u32 value; + int map_fd; + + map_fd = bpf_map__fd(map); + + key.cgroup_inode_id = get_cgroup_id(cgroup_path); + key.attach_type = BPF_CGROUP_INET_EGRESS; + if (CHECK(bpf_map_lookup_elem(map_fd, &key, &value) < 0, + "map-lookup", "errno %d", errno)) + return true; + if (CHECK(value != expected, + "assert-storage", "got %u expected %u", value, expected)) + return true; + + return false; +} + +static bool assert_storage_noexist(struct bpf_map *map, const char *cgroup_path) +{ + struct bpf_cgroup_storage_key key = {0}; + __u32 value; + int map_fd; + + map_fd = bpf_map__fd(map); + + key.cgroup_inode_id = get_cgroup_id(cgroup_path); + key.attach_type = BPF_CGROUP_INET_EGRESS; + if (CHECK(bpf_map_lookup_elem(map_fd, &key, &value) == 0, + "map-lookup", "succeeded, expected ENOENT")) + return true; + if (CHECK(errno != ENOENT, + "map-lookup", "errno %d, expected ENOENT", errno)) + return true; + + return false; +} + +static bool connect_send(const char *cgroup_path) +{ + bool res = true; + int server_fd = -1, client_fd = -1; + + if (join_cgroup(cgroup_path)) + goto out_clean; + + server_fd = start_server(AF_INET, SOCK_DGRAM, NULL, 0, 0); + if (server_fd < 0) + goto out_clean; + + client_fd = connect_to_fd(server_fd, 0); + if (client_fd < 0) + goto out_clean; + + if (send(client_fd, "message", strlen("message"), 0) < 0) + goto out_clean; + + res = false; + +out_clean: + close(client_fd); + close(server_fd); + return res; +} + +static void test_egress_only(int parent_cgroup_fd, int child_cgroup_fd) +{ + struct cg_storage_multi_egress_only *obj; + struct bpf_link *parent_link = NULL, *child_link = NULL; + bool err; + + obj = cg_storage_multi_egress_only__open_and_load(); + if (CHECK(!obj, "skel-load", "errno %d", errno)) + return; + + /* Attach to parent cgroup, trigger packet from child. + * Assert that there is only one run and in that run the storage is + * parent cgroup's storage. + * Also assert that child cgroup's storage does not exist + */ + parent_link = bpf_program__attach_cgroup(obj->progs.egress, + parent_cgroup_fd); + if (CHECK(IS_ERR(parent_link), "parent-cg-attach", + "err %ld", PTR_ERR(parent_link))) + goto close_bpf_object; + err = connect_send(CHILD_CGROUP); + if (CHECK(err, "first-connect-send", "errno %d", errno)) + goto close_bpf_object; + if (CHECK(obj->bss->invocations != 1, + "first-invoke", "invocations=%d", obj->bss->invocations)) + goto close_bpf_object; + if (assert_storage(obj->maps.cgroup_storage, PARENT_CGROUP, 1)) + goto close_bpf_object; + if (assert_storage_noexist(obj->maps.cgroup_storage, CHILD_CGROUP)) + goto close_bpf_object; + + /* Attach to parent and child cgroup, trigger packet from child. + * Assert that there are two additional runs, one that run with parent + * cgroup's storage and one with child cgroup's storage. + */ + child_link = bpf_program__attach_cgroup(obj->progs.egress, + child_cgroup_fd); + if (CHECK(IS_ERR(child_link), "child-cg-attach", + "err %ld", PTR_ERR(child_link))) + goto close_bpf_object; + err = connect_send(CHILD_CGROUP); + if (CHECK(err, "second-connect-send", "errno %d", errno)) + goto close_bpf_object; + if (CHECK(obj->bss->invocations != 3, + "second-invoke", "invocations=%d", obj->bss->invocations)) + goto close_bpf_object; + if (assert_storage(obj->maps.cgroup_storage, PARENT_CGROUP, 2)) + goto close_bpf_object; + if (assert_storage(obj->maps.cgroup_storage, CHILD_CGROUP, 1)) + goto close_bpf_object; + +close_bpf_object: + if (parent_link) + bpf_link__destroy(parent_link); + if (child_link) + bpf_link__destroy(child_link); + + cg_storage_multi_egress_only__destroy(obj); +} + +void test_cg_storage_multi(void) +{ + int parent_cgroup_fd = -1, child_cgroup_fd = -1; + + parent_cgroup_fd = test__join_cgroup(PARENT_CGROUP); + if (CHECK(parent_cgroup_fd < 0, "cg-create-parent", "errno %d", errno)) + goto close_cgroup_fd; + child_cgroup_fd = create_and_get_cgroup(CHILD_CGROUP); + if (CHECK(child_cgroup_fd < 0, "cg-create-child", "errno %d", errno)) + goto close_cgroup_fd; + + if (test__start_subtest("egress_only")) + test_egress_only(parent_cgroup_fd, child_cgroup_fd); + +close_cgroup_fd: + close(child_cgroup_fd); + close(parent_cgroup_fd); +} diff --git a/tools/testing/selftests/bpf/progs/cg_storage_multi_egress_only.c b/tools/testing/selftests/bpf/progs/cg_storage_multi_egress_only.c new file mode 100644 index 000000000000..ec0165d07105 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/cg_storage_multi_egress_only.c @@ -0,0 +1,30 @@ +// SPDX-License-Identifier: GPL-2.0-only + +/* + * Copyright 2020 Google LLC. + */ + +#include +#include +#include +#include +#include + +struct { + __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE); + __type(key, struct bpf_cgroup_storage_key); + __type(value, __u32); +} cgroup_storage SEC(".maps"); + +__u32 invocations = 0; + +SEC("cgroup_skb/egress") +int egress(struct __sk_buff *skb) +{ + __u32 *ptr_cg_storage = bpf_get_local_storage(&cgroup_storage, 0); + + __sync_fetch_and_add(ptr_cg_storage, 1); + __sync_fetch_and_add(&invocations, 1); + + return 1; +} From patchwork Mon Jul 20 19:54:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: YiFei Zhu X-Patchwork-Id: 1332622 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=AKOumE3X; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4B9XWd2XZhz9sRf for ; Tue, 21 Jul 2020 05:55:09 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729813AbgGTTzI (ORCPT ); Mon, 20 Jul 2020 15:55:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50080 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726619AbgGTTzI (ORCPT ); Mon, 20 Jul 2020 15:55:08 -0400 Received: from mail-io1-xd41.google.com (mail-io1-xd41.google.com [IPv6:2607:f8b0:4864:20::d41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2F508C061794 for ; Mon, 20 Jul 2020 12:55:08 -0700 (PDT) Received: by mail-io1-xd41.google.com with SMTP id l1so18901674ioh.5 for ; Mon, 20 Jul 2020 12:55:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Wx06PrtWwrmuUpUzLQRfCLq3oYG7PmzNPdENRnSUpIU=; b=AKOumE3Xmz04KoyVNI0JDcXcZVeJutYgYci3D9ImmyS7fbnvXqnghiYPUlb23r1Pqp QYN3kqe/6AbkmeOR51LN6R39o9iE2SwAVt1DhLyV0MiHXx/3FFYNgUzIjB+dQOfYLUQ5 GBgFxI5bhkNxZpE0yYpvI3X/yGY6v0InwUk15z8+AdQs9Keta89vmktyckPL5fRf6FT7 szQPYd/dBmFsCVmaBItNulpqoULtIQPyz2f6dN4ZH4zNjuhfEWSM7IIdLB7LujcYmzXV 3mKwdeMzcPNh8HBf5oEhMPmcoEVPr3EcFS0spLpdinfd3cawM3wxt6Anx1Lp8EbpUYv3 6TPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Wx06PrtWwrmuUpUzLQRfCLq3oYG7PmzNPdENRnSUpIU=; b=m4lL0xA9ITNXS6jeJfwZ30kT3V8VAoCNYNzxHfUnc50WwZ61PmuzvRHsoNDF0wwtSS 2SYhiv9N8eEFzQrXqsHPaKCpk7FSgLUesXlejl6cFXkEnP9V4QCXF7TDkFEyr1D65gmb xvwcuQZMhNsCywsaPhuXGKxR0OQ5Q+cDSWni2foPM1XQQEko8gFDIPcAm8UQX8wF8JRv SdjhTHhNSgeeotqfpKo5Qf/YO99Y44rOVs0q4FX0W5dr98zTZIRXhd4ML7So0VO2JUmH wE+/UWIF4P8YnaW2XV2w5xgLkScSGDAzmWAPjKClxmouuHUHJ5lPNFyum0KBYzJ/lEv0 0rlg== X-Gm-Message-State: AOAM533vpInvahqjOWCvj25r133QpykGhnPouC8ArUnctZq/PnLkoXvu IVBgCU4k255x1erqraywFzzeT3t/LUVixA== X-Google-Smtp-Source: ABdhPJyU222niZ5SfDZS+cwq9hDFcc8NXPL4YGdmHW/xn7KUXzMfziQlg3GThPkyEOHnKWGJM5nrhQ== X-Received: by 2002:a02:cb97:: with SMTP id u23mr27320354jap.113.1595274907179; Mon, 20 Jul 2020 12:55:07 -0700 (PDT) Received: from localhost.localdomain (host-173-230-99-219.tnkngak.clients.pavlovmedia.com. [173.230.99.219]) by smtp.gmail.com with ESMTPSA id v10sm9347174ilj.40.2020.07.20.12.55.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Jul 2020 12:55:06 -0700 (PDT) From: YiFei Zhu To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Daniel Borkmann , Stanislav Fomichev , Mahesh Bandewar , Roman Gushchin , Andrii Nakryiko , Martin KaFai Lau , YiFei Zhu Subject: [PATCH v4 bpf-next 2/5] selftests/bpf: Test CGROUP_STORAGE map can't be used by multiple progs Date: Mon, 20 Jul 2020 14:54:52 -0500 Message-Id: <483b7058324359e1c41f7d88a6aa4a7d39e8526d.1595274799.git.zhuyifei@google.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org From: YiFei Zhu The current assumption is that the lifetime of a cgroup storage is tied to the program's attachment. The storage is created in cgroup_bpf_attach, and released upon cgroup_bpf_detach and cgroup_bpf_release. Because the current semantics is that each attachment gets a completely independent cgroup storage, and you can have multiple programs attached to the same (cgroup, attach type) pair, the key of the CGROUP_STORAGE map, looking up the map with this pair could yield multiple storages, and that is not permitted. Therefore, the kernel verifier checks that two programs cannot share the same CGROUP_STORAGE map, even if they have different expected attach types, considering that the actual attach type does not always have to be equal to the expected attach type. The test creates a CGROUP_STORAGE map and make it shared across two different programs, one cgroup_skb/egress and one /ingress. It asserts that the two programs cannot be both loaded, due to verifier failure from the above reason. Signed-off-by: YiFei Zhu --- .../bpf/prog_tests/cg_storage_multi.c | 42 +++++++++++++---- .../selftests/bpf/progs/cg_storage_multi.h | 13 ++++++ .../progs/cg_storage_multi_egress_ingress.c | 45 +++++++++++++++++++ .../bpf/progs/cg_storage_multi_egress_only.c | 9 ++-- 4 files changed, 98 insertions(+), 11 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/cg_storage_multi.h create mode 100644 tools/testing/selftests/bpf/progs/cg_storage_multi_egress_ingress.c diff --git a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c index 6d5a2194e036..1f4ab437ddb9 100644 --- a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c +++ b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c @@ -8,7 +8,10 @@ #include #include +#include "progs/cg_storage_multi.h" + #include "cg_storage_multi_egress_only.skel.h" +#include "cg_storage_multi_egress_ingress.skel.h" #define PARENT_CGROUP "/cgroup_storage" #define CHILD_CGROUP "/cgroup_storage/child" @@ -16,10 +19,10 @@ static int duration; static bool assert_storage(struct bpf_map *map, const char *cgroup_path, - __u32 expected) + struct cgroup_value *expected) { struct bpf_cgroup_storage_key key = {0}; - __u32 value; + struct cgroup_value value; int map_fd; map_fd = bpf_map__fd(map); @@ -29,8 +32,8 @@ static bool assert_storage(struct bpf_map *map, const char *cgroup_path, if (CHECK(bpf_map_lookup_elem(map_fd, &key, &value) < 0, "map-lookup", "errno %d", errno)) return true; - if (CHECK(value != expected, - "assert-storage", "got %u expected %u", value, expected)) + if (CHECK(memcmp(&value, expected, sizeof(struct cgroup_value)), + "assert-storage", "storages differ")) return true; return false; @@ -39,7 +42,7 @@ static bool assert_storage(struct bpf_map *map, const char *cgroup_path, static bool assert_storage_noexist(struct bpf_map *map, const char *cgroup_path) { struct bpf_cgroup_storage_key key = {0}; - __u32 value; + struct cgroup_value value; int map_fd; map_fd = bpf_map__fd(map); @@ -86,6 +89,7 @@ static bool connect_send(const char *cgroup_path) static void test_egress_only(int parent_cgroup_fd, int child_cgroup_fd) { struct cg_storage_multi_egress_only *obj; + struct cgroup_value expected_cgroup_value; struct bpf_link *parent_link = NULL, *child_link = NULL; bool err; @@ -109,7 +113,9 @@ static void test_egress_only(int parent_cgroup_fd, int child_cgroup_fd) if (CHECK(obj->bss->invocations != 1, "first-invoke", "invocations=%d", obj->bss->invocations)) goto close_bpf_object; - if (assert_storage(obj->maps.cgroup_storage, PARENT_CGROUP, 1)) + expected_cgroup_value = (struct cgroup_value) { .egress_pkts = 1 }; + if (assert_storage(obj->maps.cgroup_storage, + PARENT_CGROUP, &expected_cgroup_value)) goto close_bpf_object; if (assert_storage_noexist(obj->maps.cgroup_storage, CHILD_CGROUP)) goto close_bpf_object; @@ -129,9 +135,13 @@ static void test_egress_only(int parent_cgroup_fd, int child_cgroup_fd) if (CHECK(obj->bss->invocations != 3, "second-invoke", "invocations=%d", obj->bss->invocations)) goto close_bpf_object; - if (assert_storage(obj->maps.cgroup_storage, PARENT_CGROUP, 2)) + expected_cgroup_value = (struct cgroup_value) { .egress_pkts = 2 }; + if (assert_storage(obj->maps.cgroup_storage, + PARENT_CGROUP, &expected_cgroup_value)) goto close_bpf_object; - if (assert_storage(obj->maps.cgroup_storage, CHILD_CGROUP, 1)) + expected_cgroup_value = (struct cgroup_value) { .egress_pkts = 1 }; + if (assert_storage(obj->maps.cgroup_storage, + CHILD_CGROUP, &expected_cgroup_value)) goto close_bpf_object; close_bpf_object: @@ -143,6 +153,19 @@ static void test_egress_only(int parent_cgroup_fd, int child_cgroup_fd) cg_storage_multi_egress_only__destroy(obj); } +static void test_egress_ingress(int parent_cgroup_fd, int child_cgroup_fd) +{ + struct cg_storage_multi_egress_ingress *obj; + + /* Cannot load both programs due to verifier failure: + * "only one cgroup storage of each type is allowed" + */ + obj = cg_storage_multi_egress_ingress__open_and_load(); + if (CHECK(obj || errno != EBUSY, + "skel-load", "errno %d, expected EBUSY", errno)) + return; +} + void test_cg_storage_multi(void) { int parent_cgroup_fd = -1, child_cgroup_fd = -1; @@ -157,6 +180,9 @@ void test_cg_storage_multi(void) if (test__start_subtest("egress_only")) test_egress_only(parent_cgroup_fd, child_cgroup_fd); + if (test__start_subtest("egress_ingress")) + test_egress_ingress(parent_cgroup_fd, child_cgroup_fd); + close_cgroup_fd: close(child_cgroup_fd); close(parent_cgroup_fd); diff --git a/tools/testing/selftests/bpf/progs/cg_storage_multi.h b/tools/testing/selftests/bpf/progs/cg_storage_multi.h new file mode 100644 index 000000000000..a0778fe7857a --- /dev/null +++ b/tools/testing/selftests/bpf/progs/cg_storage_multi.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#ifndef __PROGS_CG_STORAGE_MULTI_H +#define __PROGS_CG_STORAGE_MULTI_H + +#include + +struct cgroup_value { + __u32 egress_pkts; + __u32 ingress_pkts; +}; + +#endif diff --git a/tools/testing/selftests/bpf/progs/cg_storage_multi_egress_ingress.c b/tools/testing/selftests/bpf/progs/cg_storage_multi_egress_ingress.c new file mode 100644 index 000000000000..9ce386899365 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/cg_storage_multi_egress_ingress.c @@ -0,0 +1,45 @@ +// SPDX-License-Identifier: GPL-2.0-only + +/* + * Copyright 2020 Google LLC. + */ + +#include +#include +#include +#include +#include + +#include "progs/cg_storage_multi.h" + +struct { + __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE); + __type(key, struct bpf_cgroup_storage_key); + __type(value, struct cgroup_value); +} cgroup_storage SEC(".maps"); + +__u32 invocations = 0; + +SEC("cgroup_skb/egress") +int egress(struct __sk_buff *skb) +{ + struct cgroup_value *ptr_cg_storage = + bpf_get_local_storage(&cgroup_storage, 0); + + __sync_fetch_and_add(&ptr_cg_storage->egress_pkts, 1); + __sync_fetch_and_add(&invocations, 1); + + return 1; +} + +SEC("cgroup_skb/ingress") +int ingress(struct __sk_buff *skb) +{ + struct cgroup_value *ptr_cg_storage = + bpf_get_local_storage(&cgroup_storage, 0); + + __sync_fetch_and_add(&ptr_cg_storage->ingress_pkts, 1); + __sync_fetch_and_add(&invocations, 1); + + return 1; +} diff --git a/tools/testing/selftests/bpf/progs/cg_storage_multi_egress_only.c b/tools/testing/selftests/bpf/progs/cg_storage_multi_egress_only.c index ec0165d07105..44ad46b33539 100644 --- a/tools/testing/selftests/bpf/progs/cg_storage_multi_egress_only.c +++ b/tools/testing/selftests/bpf/progs/cg_storage_multi_egress_only.c @@ -10,10 +10,12 @@ #include #include +#include "progs/cg_storage_multi.h" + struct { __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE); __type(key, struct bpf_cgroup_storage_key); - __type(value, __u32); + __type(value, struct cgroup_value); } cgroup_storage SEC(".maps"); __u32 invocations = 0; @@ -21,9 +23,10 @@ __u32 invocations = 0; SEC("cgroup_skb/egress") int egress(struct __sk_buff *skb) { - __u32 *ptr_cg_storage = bpf_get_local_storage(&cgroup_storage, 0); + struct cgroup_value *ptr_cg_storage = + bpf_get_local_storage(&cgroup_storage, 0); - __sync_fetch_and_add(ptr_cg_storage, 1); + __sync_fetch_and_add(&ptr_cg_storage->egress_pkts, 1); __sync_fetch_and_add(&invocations, 1); return 1; From patchwork Mon Jul 20 19:54:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: YiFei Zhu X-Patchwork-Id: 1332625 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=NV+fbgol; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4B9XWj2bqPz9sRf for ; Tue, 21 Jul 2020 05:55:13 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730637AbgGTTzM (ORCPT ); Mon, 20 Jul 2020 15:55:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50092 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730304AbgGTTzL (ORCPT ); Mon, 20 Jul 2020 15:55:11 -0400 Received: from mail-il1-x141.google.com (mail-il1-x141.google.com [IPv6:2607:f8b0:4864:20::141]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 774F7C061794 for ; Mon, 20 Jul 2020 12:55:10 -0700 (PDT) Received: by mail-il1-x141.google.com with SMTP id k6so14389783ili.6 for ; Mon, 20 Jul 2020 12:55:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=7C4FbpTt1cXqFOfacZ2aXmH+uuIX6W3mPxgZ4Kdjswg=; b=NV+fbgolaeou8qumfNNMAN9tZndvi5ihM2NDNYU9r8PabUurJSLANkN7Xxtb7kITXu EomFRzGfEdcZ4Y2DKfJ8l5LrVJJGrsl1UMOU3GUPV8aEzvvYfuY+YatY/4Eb+L8FunT9 bN32+VMFAIe6uqspbgPJFg8os/NR0tX9SEjVvf2fMu6pTgREXBfDW3cYR6dQvbMuw/Eh SitLTx6gBRU9T8OERFlVkJERpsSBPUt0cVRxTaH51wWQt/AYvsF1p202Xpb4oSvRLa/J O/ed0S+OnGz5N7mlUFte++l+zk8C1TBdhTQMRv9BU0hzn0P8wH2tpM9UsY5CsVXEM4tP pYeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=7C4FbpTt1cXqFOfacZ2aXmH+uuIX6W3mPxgZ4Kdjswg=; b=TJ5RRdcXtXFGZoJCkaue7P5fUTJPW7chfrJ81uCLf4hXtaRKv1TV1QLJ4c5KWTtgBF 6tIH01iC6vIrClpxeX3s8LNwyqSNCFuz7DdqZDFl6p8fTHjBw6fnkFBvH9h2u85T50k3 jB6VIaDpkpTx2CqLtFDw6KJk+ANAB54Wzg0d8j6+2yLkqeAQA1Bzkka9+bmqomn1BjyE hMS9wobY7aTtRx/ZCjPYSvev3ThQLLfauInX9QI60KOCNlfUbmAr5PSJz4d6L8yD2ST3 FE/fwH0WVSUOfcHTqLpKqBSbAtVJNWcD3wCl39U6dRfCPMoj8xu/mrJxZiObKvpyALRX iYsA== X-Gm-Message-State: AOAM532/kX9bni4JFR5TLhl5BvNUykIEOwEjv5uW8l24o1q9BcDBFqRA LB14W/SBWupRf6TfDSvGQiRQyICwkEcdcQ== X-Google-Smtp-Source: ABdhPJyaWyxJaMgXhpzKdKnmztD+XbYaZGg0PxqBf0UbQayfmY5NjbzvZBnXn5X0MVOupbIiD+1G4Q== X-Received: by 2002:a92:d24a:: with SMTP id v10mr25660096ilg.224.1595274907927; Mon, 20 Jul 2020 12:55:07 -0700 (PDT) Received: from localhost.localdomain (host-173-230-99-219.tnkngak.clients.pavlovmedia.com. [173.230.99.219]) by smtp.gmail.com with ESMTPSA id v10sm9347174ilj.40.2020.07.20.12.55.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Jul 2020 12:55:07 -0700 (PDT) From: YiFei Zhu To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Daniel Borkmann , Stanislav Fomichev , Mahesh Bandewar , Roman Gushchin , Andrii Nakryiko , Martin KaFai Lau , YiFei Zhu Subject: [PATCH v4 bpf-next 3/5] bpf: Make cgroup storages shared across attaches on the same cgroup Date: Mon, 20 Jul 2020 14:54:53 -0500 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org From: YiFei Zhu This change comes in several parts: One, the restriction that the CGROUP_STORAGE map can only be used by one program is removed. This results in the removal of the field 'aux' in struct bpf_cgroup_storage_map, and removal of relevant code associated with the field, and removal of now-noop functions bpf_free_cgroup_storage and bpf_cgroup_storage_release. Second, because there could be multiple attach types to the same cgroup, the attach type is completely ignored on comparison in the map key. Newly added keys have it zeroed. The only value in the key that still matters is the cgroup inode. bpftool map dump will also show an attach type of zero. Third, because the storages are now shared, the storages cannot be unconditionally freed on program detach. There could be two ways to solve this issue: * A. Reference count the usage of the storages, and free when the last program is detached. * B. Free only when the storage is impossible to be referred to again, i.e. when either the cgroup_bpf it is attached to, or the map itself, is freed. Option A has the side effect that, when the user detach and reattach a program, whether the program gets a fresh storage depends on whether there is another program attached using that storage. This could trigger races if the user is multi-threaded, and since nondeterminism in data races is evil, go with option B. The both the map and the cgroup_bpf now tracks their associated storages, and the storage unlink and free are removed from cgroup_bpf_detach and added to cgroup_bpf_release and cgroup_storage_map_free. The latter also new holds the cgroup_mutex to prevent any races with the former. Fourth, on attach, we reuse the old storage if the key already exists in the map, via cgroup_storage_lookup. If the storage does not exist yet, we create a new one, and publish it at the last step in the attach process. This does not create a race condition because for the whole attach the cgroup_mutex is held. We keep track of an array of new storages that was allocated and if the process fails only the new storages would get freed. Signed-off-by: YiFei Zhu --- include/linux/bpf-cgroup.h | 15 ++++--- include/uapi/linux/bpf.h | 2 +- kernel/bpf/cgroup.c | 69 ++++++++++++++++++-------------- kernel/bpf/core.c | 12 ------ kernel/bpf/local_storage.c | 73 ++++++++++++---------------------- tools/include/uapi/linux/bpf.h | 2 +- 6 files changed, 76 insertions(+), 97 deletions(-) diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index 2c6f26670acc..2d5a04c0f4be 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -46,7 +46,8 @@ struct bpf_cgroup_storage { }; struct bpf_cgroup_storage_map *map; struct bpf_cgroup_storage_key key; - struct list_head list; + struct list_head list_map; + struct list_head list_cg; struct rb_node node; struct rcu_head rcu; }; @@ -78,6 +79,9 @@ struct cgroup_bpf { struct list_head progs[MAX_BPF_ATTACH_TYPE]; u32 flags[MAX_BPF_ATTACH_TYPE]; + /* list of cgroup shared storages */ + struct list_head storages; + /* temp storage for effective prog array used by prog_attach/detach */ struct bpf_prog_array *inactive; @@ -161,15 +165,16 @@ static inline void bpf_cgroup_storage_set(struct bpf_cgroup_storage this_cpu_write(bpf_cgroup_storage[stype], storage[stype]); } +struct bpf_cgroup_storage * +cgroup_storage_lookup(struct bpf_cgroup_storage_map *map, + struct bpf_cgroup_storage_key *key, bool locked); struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(struct bpf_prog *prog, enum bpf_cgroup_storage_type stype); void bpf_cgroup_storage_free(struct bpf_cgroup_storage *storage); void bpf_cgroup_storage_link(struct bpf_cgroup_storage *storage, - struct cgroup *cgroup, - enum bpf_attach_type type); + struct cgroup *cgroup); void bpf_cgroup_storage_unlink(struct bpf_cgroup_storage *storage); int bpf_cgroup_storage_assign(struct bpf_prog_aux *aux, struct bpf_map *map); -void bpf_cgroup_storage_release(struct bpf_prog_aux *aux, struct bpf_map *map); int bpf_percpu_cgroup_storage_copy(struct bpf_map *map, void *key, void *value); int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key, @@ -383,8 +388,6 @@ static inline void bpf_cgroup_storage_set( struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE]) {} static inline int bpf_cgroup_storage_assign(struct bpf_prog_aux *aux, struct bpf_map *map) { return 0; } -static inline void bpf_cgroup_storage_release(struct bpf_prog_aux *aux, - struct bpf_map *map) {} static inline struct bpf_cgroup_storage *bpf_cgroup_storage_alloc( struct bpf_prog *prog, enum bpf_cgroup_storage_type stype) { return NULL; } static inline void bpf_cgroup_storage_free( diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 54d0c886e3ba..db93a211d2b1 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -78,7 +78,7 @@ struct bpf_lpm_trie_key { struct bpf_cgroup_storage_key { __u64 cgroup_inode_id; /* cgroup inode id */ - __u32 attach_type; /* program attach type */ + __u32 attach_type; /* program attach type, unused */ }; /* BPF syscall commands, see bpf(2) man-page for details. */ diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index ac53102e244a..f91c554d6be6 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -37,17 +37,33 @@ static void bpf_cgroup_storages_free(struct bpf_cgroup_storage *storages[]) } static int bpf_cgroup_storages_alloc(struct bpf_cgroup_storage *storages[], - struct bpf_prog *prog) + struct bpf_cgroup_storage *new_storages[], + struct bpf_prog *prog, + struct cgroup *cgrp) { enum bpf_cgroup_storage_type stype; + struct bpf_cgroup_storage_key key; + struct bpf_map *map; + + key.cgroup_inode_id = cgroup_id(cgrp); + key.attach_type = 0; for_each_cgroup_storage_type(stype) { + map = prog->aux->cgroup_storage[stype]; + if (!map) + continue; + + storages[stype] = cgroup_storage_lookup((void *)map, &key, false); + if (storages[stype]) + continue; + storages[stype] = bpf_cgroup_storage_alloc(prog, stype); if (IS_ERR(storages[stype])) { - storages[stype] = NULL; - bpf_cgroup_storages_free(storages); + bpf_cgroup_storages_free(new_storages); return -ENOMEM; } + + new_storages[stype] = storages[stype]; } return 0; @@ -63,21 +79,12 @@ static void bpf_cgroup_storages_assign(struct bpf_cgroup_storage *dst[], } static void bpf_cgroup_storages_link(struct bpf_cgroup_storage *storages[], - struct cgroup* cgrp, - enum bpf_attach_type attach_type) -{ - enum bpf_cgroup_storage_type stype; - - for_each_cgroup_storage_type(stype) - bpf_cgroup_storage_link(storages[stype], cgrp, attach_type); -} - -static void bpf_cgroup_storages_unlink(struct bpf_cgroup_storage *storages[]) + struct cgroup *cgrp) { enum bpf_cgroup_storage_type stype; for_each_cgroup_storage_type(stype) - bpf_cgroup_storage_unlink(storages[stype]); + bpf_cgroup_storage_link(storages[stype], cgrp); } /* Called when bpf_cgroup_link is auto-detached from dying cgroup. @@ -101,22 +108,23 @@ static void cgroup_bpf_release(struct work_struct *work) struct cgroup *p, *cgrp = container_of(work, struct cgroup, bpf.release_work); struct bpf_prog_array *old_array; + struct list_head *storages = &cgrp->bpf.storages; + struct bpf_cgroup_storage *storage, *stmp; + unsigned int type; mutex_lock(&cgroup_mutex); for (type = 0; type < ARRAY_SIZE(cgrp->bpf.progs); type++) { struct list_head *progs = &cgrp->bpf.progs[type]; - struct bpf_prog_list *pl, *tmp; + struct bpf_prog_list *pl, *pltmp; - list_for_each_entry_safe(pl, tmp, progs, node) { + list_for_each_entry_safe(pl, pltmp, progs, node) { list_del(&pl->node); if (pl->prog) bpf_prog_put(pl->prog); if (pl->link) bpf_cgroup_link_auto_detach(pl->link); - bpf_cgroup_storages_unlink(pl->storage); - bpf_cgroup_storages_free(pl->storage); kfree(pl); static_branch_dec(&cgroup_bpf_enabled_key); } @@ -126,6 +134,11 @@ static void cgroup_bpf_release(struct work_struct *work) bpf_prog_array_free(old_array); } + list_for_each_entry_safe(storage, stmp, storages, list_cg) { + bpf_cgroup_storage_unlink(storage); + bpf_cgroup_storage_free(storage); + } + mutex_unlock(&cgroup_mutex); for (p = cgroup_parent(cgrp); p; p = cgroup_parent(p)) @@ -290,6 +303,8 @@ int cgroup_bpf_inherit(struct cgroup *cgrp) for (i = 0; i < NR; i++) INIT_LIST_HEAD(&cgrp->bpf.progs[i]); + INIT_LIST_HEAD(&cgrp->bpf.storages); + for (i = 0; i < NR; i++) if (compute_effective_progs(cgrp, i, &arrays[i])) goto cleanup; @@ -422,7 +437,7 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct list_head *progs = &cgrp->bpf.progs[type]; struct bpf_prog *old_prog = NULL; struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {}; - struct bpf_cgroup_storage *old_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {}; + struct bpf_cgroup_storage *new_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {}; struct bpf_prog_list *pl; int err; @@ -455,17 +470,16 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, if (IS_ERR(pl)) return PTR_ERR(pl); - if (bpf_cgroup_storages_alloc(storage, prog ? : link->link.prog)) + if (bpf_cgroup_storages_alloc(storage, new_storage, + prog ? : link->link.prog, cgrp)) return -ENOMEM; if (pl) { old_prog = pl->prog; - bpf_cgroup_storages_unlink(pl->storage); - bpf_cgroup_storages_assign(old_storage, pl->storage); } else { pl = kmalloc(sizeof(*pl), GFP_KERNEL); if (!pl) { - bpf_cgroup_storages_free(storage); + bpf_cgroup_storages_free(new_storage); return -ENOMEM; } list_add_tail(&pl->node, progs); @@ -480,12 +494,11 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, if (err) goto cleanup; - bpf_cgroup_storages_free(old_storage); if (old_prog) bpf_prog_put(old_prog); else static_branch_inc(&cgroup_bpf_enabled_key); - bpf_cgroup_storages_link(pl->storage, cgrp, type); + bpf_cgroup_storages_link(new_storage, cgrp); return 0; cleanup: @@ -493,9 +506,7 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, pl->prog = old_prog; pl->link = NULL; } - bpf_cgroup_storages_free(pl->storage); - bpf_cgroup_storages_assign(pl->storage, old_storage); - bpf_cgroup_storages_link(pl->storage, cgrp, type); + bpf_cgroup_storages_free(new_storage); if (!old_prog) { list_del(&pl->node); kfree(pl); @@ -679,8 +690,6 @@ int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, /* now can actually delete it from this cgroup list */ list_del(&pl->node); - bpf_cgroup_storages_unlink(pl->storage); - bpf_cgroup_storages_free(pl->storage); kfree(pl); if (list_empty(progs)) /* last program was detached, reset flags to zero */ diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 7be02e555ab9..bde93344164d 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -2097,24 +2097,12 @@ int bpf_prog_array_copy_info(struct bpf_prog_array *array, : 0; } -static void bpf_free_cgroup_storage(struct bpf_prog_aux *aux) -{ - enum bpf_cgroup_storage_type stype; - - for_each_cgroup_storage_type(stype) { - if (!aux->cgroup_storage[stype]) - continue; - bpf_cgroup_storage_release(aux, aux->cgroup_storage[stype]); - } -} - void __bpf_free_used_maps(struct bpf_prog_aux *aux, struct bpf_map **used_maps, u32 len) { struct bpf_map *map; u32 i; - bpf_free_cgroup_storage(aux); for (i = 0; i < len; i++) { map = used_maps[i]; if (map->ops->map_poke_untrack) diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c index 51bd5a8cb01b..0b94b4ba99ba 100644 --- a/kernel/bpf/local_storage.c +++ b/kernel/bpf/local_storage.c @@ -9,6 +9,8 @@ #include #include +#include "../cgroup/cgroup-internal.h" + DEFINE_PER_CPU(struct bpf_cgroup_storage*, bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]); #ifdef CONFIG_CGROUP_BPF @@ -20,7 +22,6 @@ struct bpf_cgroup_storage_map { struct bpf_map map; spinlock_t lock; - struct bpf_prog_aux *aux; struct rb_root root; struct list_head list; }; @@ -38,16 +39,12 @@ static int bpf_cgroup_storage_key_cmp( return -1; else if (key1->cgroup_inode_id > key2->cgroup_inode_id) return 1; - else if (key1->attach_type < key2->attach_type) - return -1; - else if (key1->attach_type > key2->attach_type) - return 1; return 0; } -static struct bpf_cgroup_storage *cgroup_storage_lookup( - struct bpf_cgroup_storage_map *map, struct bpf_cgroup_storage_key *key, - bool locked) +struct bpf_cgroup_storage * +cgroup_storage_lookup(struct bpf_cgroup_storage_map *map, + struct bpf_cgroup_storage_key *key, bool locked) { struct rb_root *root = &map->root; struct rb_node *node; @@ -131,10 +128,7 @@ static int cgroup_storage_update_elem(struct bpf_map *map, void *_key, struct bpf_cgroup_storage *storage; struct bpf_storage_buffer *new; - if (unlikely(flags & ~(BPF_F_LOCK | BPF_EXIST | BPF_NOEXIST))) - return -EINVAL; - - if (unlikely(flags & BPF_NOEXIST)) + if (unlikely(flags & ~(BPF_F_LOCK | BPF_EXIST))) return -EINVAL; if (unlikely((flags & BPF_F_LOCK) && @@ -250,16 +244,15 @@ static int cgroup_storage_get_next_key(struct bpf_map *_map, void *_key, if (!storage) goto enoent; - storage = list_next_entry(storage, list); + storage = list_next_entry(storage, list_map); if (!storage) goto enoent; } else { storage = list_first_entry(&map->list, - struct bpf_cgroup_storage, list); + struct bpf_cgroup_storage, list_map); } spin_unlock_bh(&map->lock); - next->attach_type = storage->key.attach_type; next->cgroup_inode_id = storage->key.cgroup_inode_id; return 0; @@ -318,6 +311,17 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr) static void cgroup_storage_map_free(struct bpf_map *_map) { struct bpf_cgroup_storage_map *map = map_to_storage(_map); + struct list_head *storages = &map->list; + struct bpf_cgroup_storage *storage, *stmp; + + mutex_lock(&cgroup_mutex); + + list_for_each_entry_safe(storage, stmp, storages, list_map) { + bpf_cgroup_storage_unlink(storage); + bpf_cgroup_storage_free(storage); + } + + mutex_unlock(&cgroup_mutex); WARN_ON(!RB_EMPTY_ROOT(&map->root)); WARN_ON(!list_empty(&map->list)); @@ -426,38 +430,13 @@ const struct bpf_map_ops cgroup_storage_map_ops = { int bpf_cgroup_storage_assign(struct bpf_prog_aux *aux, struct bpf_map *_map) { enum bpf_cgroup_storage_type stype = cgroup_storage_type(_map); - struct bpf_cgroup_storage_map *map = map_to_storage(_map); - int ret = -EBUSY; - - spin_lock_bh(&map->lock); - if (map->aux && map->aux != aux) - goto unlock; if (aux->cgroup_storage[stype] && aux->cgroup_storage[stype] != _map) - goto unlock; + return -EBUSY; - map->aux = aux; aux->cgroup_storage[stype] = _map; - ret = 0; -unlock: - spin_unlock_bh(&map->lock); - - return ret; -} - -void bpf_cgroup_storage_release(struct bpf_prog_aux *aux, struct bpf_map *_map) -{ - enum bpf_cgroup_storage_type stype = cgroup_storage_type(_map); - struct bpf_cgroup_storage_map *map = map_to_storage(_map); - - spin_lock_bh(&map->lock); - if (map->aux == aux) { - WARN_ON(aux->cgroup_storage[stype] != _map); - map->aux = NULL; - aux->cgroup_storage[stype] = NULL; - } - spin_unlock_bh(&map->lock); + return 0; } static size_t bpf_cgroup_storage_calculate_size(struct bpf_map *map, u32 *pages) @@ -563,22 +542,21 @@ void bpf_cgroup_storage_free(struct bpf_cgroup_storage *storage) } void bpf_cgroup_storage_link(struct bpf_cgroup_storage *storage, - struct cgroup *cgroup, - enum bpf_attach_type type) + struct cgroup *cgroup) { struct bpf_cgroup_storage_map *map; if (!storage) return; - storage->key.attach_type = type; storage->key.cgroup_inode_id = cgroup_id(cgroup); map = storage->map; spin_lock_bh(&map->lock); WARN_ON(cgroup_storage_insert(map, storage)); - list_add(&storage->list, &map->list); + list_add(&storage->list_map, &map->list); + list_add(&storage->list_cg, &cgroup->bpf.storages); spin_unlock_bh(&map->lock); } @@ -596,7 +574,8 @@ void bpf_cgroup_storage_unlink(struct bpf_cgroup_storage *storage) root = &map->root; rb_erase(&storage->node, root); - list_del(&storage->list); + list_del(&storage->list_map); + list_del(&storage->list_cg); spin_unlock_bh(&map->lock); } diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 54d0c886e3ba..db93a211d2b1 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -78,7 +78,7 @@ struct bpf_lpm_trie_key { struct bpf_cgroup_storage_key { __u64 cgroup_inode_id; /* cgroup inode id */ - __u32 attach_type; /* program attach type */ + __u32 attach_type; /* program attach type, unused */ }; /* BPF syscall commands, see bpf(2) man-page for details. */ From patchwork Mon Jul 20 19:54:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: YiFei Zhu X-Patchwork-Id: 1332624 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=RGWSJyPC; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4B9XWh4Xd8z9sRR for ; Tue, 21 Jul 2020 05:55:12 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730185AbgGTTzK (ORCPT ); Mon, 20 Jul 2020 15:55:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50088 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726619AbgGTTzJ (ORCPT ); Mon, 20 Jul 2020 15:55:09 -0400 Received: from mail-il1-x142.google.com (mail-il1-x142.google.com [IPv6:2607:f8b0:4864:20::142]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9247AC061794 for ; Mon, 20 Jul 2020 12:55:09 -0700 (PDT) Received: by mail-il1-x142.google.com with SMTP id s21so14383873ilk.5 for ; Mon, 20 Jul 2020 12:55:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=PMjpMjNih1+jm3foEyZS3IDJgvYOqAdwPfkk24YvvLw=; b=RGWSJyPCQUcGlhpL063FDpEgAiJtvgnCMEU2rDNW0ST43OThbRKACt+3ZvIlz95ox6 6lpcTIoFLRHG9rhMfPdWqDx9e7cXSX17wDPShH82zSyBT2X5OTaAWmo6Ak1QHZcl/Pdt zMRViOGCTh52aS/KTXievJPBGgOsdD4HV10WsrcUXfEv7tWg0JquWJNFllnhGiUiTViD Qom/gh2g0LDBgJQH86Sn4BmtDJWRjzx367tZdlXPdg2zNnYvqUv8wNfTqJkGnlRVzE5N f3eD+BoiEb4gfOgsUEUORpDLlI+HJmJwJmY9XQQaaUMDQbs1ibpNufSudMUXeGgYKufK h0IA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=PMjpMjNih1+jm3foEyZS3IDJgvYOqAdwPfkk24YvvLw=; b=GjSc4mCAWj1FIH0Rp1oNHMaAgbDyXzvCOEGGeEAyCqhpwma4P0VaGx5mNdJdMDUCQk yVaosDmYy41oWBbPL5eX21IIssKobIJRDFnt7+qEWKR+yJrw42g8lrCGxyIBlfq2jziX XNadu87uJ+0Z4DZUW1TDAEUxagWZJd+YfJqtCZJu5wmbh7hCpHcquFtibNDb0cRfDkVT PcD5szbjBLkOCL/HX5ehtwBFhUC9wEgwrXh41R8Af23pon+ggwzqjBVEFkvPEfGmK5Qz DIFuvYWE4vzMZwm0iETX9mrn/+aHZgH6wpmdqNxJzvcPWxbCT/Fyl8jky7Gj+2VeGUjy Tefw== X-Gm-Message-State: AOAM531wKBKzYGyJD/BBtUFu2y0vHftkZ92NoVW+qdkEAsePhrgFNSCg AucXK8DHF3fHCav5wJZo1TkBPu6GHb7tSw== X-Google-Smtp-Source: ABdhPJxBT4q5gz6t1eC0iQVU5BtiP40QqcwBevyEOa5IRNwa+cDYzyuER1DS1/UVKxjU4Lhb8qJeNg== X-Received: by 2002:a92:9fcb:: with SMTP id z72mr24837388ilk.195.1595274908728; Mon, 20 Jul 2020 12:55:08 -0700 (PDT) Received: from localhost.localdomain (host-173-230-99-219.tnkngak.clients.pavlovmedia.com. [173.230.99.219]) by smtp.gmail.com with ESMTPSA id v10sm9347174ilj.40.2020.07.20.12.55.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Jul 2020 12:55:08 -0700 (PDT) From: YiFei Zhu To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Daniel Borkmann , Stanislav Fomichev , Mahesh Bandewar , Roman Gushchin , Andrii Nakryiko , Martin KaFai Lau , YiFei Zhu Subject: [PATCH v4 bpf-next 4/5] selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress Date: Mon, 20 Jul 2020 14:54:54 -0500 Message-Id: <3a1fea5958605766b326fff3b6e78488f155510d.1595274799.git.zhuyifei@google.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org From: YiFei Zhu This mirrors the original egress-only test. The cgroup_storage is now extended to have two packet counters, one for egress and one for ingress. The behavior of the counters are exactly the same as the original egress-only test, only that the total number of invocations doubles from having both egress and ingress being counted. The field attach_type in the map key is ignored in the kernel; however, keeping it is pointless here and we are demonstrating the expected usage of the map, so it is removed. That said, keeping the field will not fail the test, for backwards compatibility reasons. In other words, the original egress-only test is not affected by the change in CGROUP_STORAGE behavior and will pass in both cases. Signed-off-by: YiFei Zhu --- .../bpf/prog_tests/cg_storage_multi.c | 90 +++++++++++++++++-- 1 file changed, 83 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c index 1f4ab437ddb9..aa2b448c4214 100644 --- a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c +++ b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c @@ -28,7 +28,6 @@ static bool assert_storage(struct bpf_map *map, const char *cgroup_path, map_fd = bpf_map__fd(map); key.cgroup_inode_id = get_cgroup_id(cgroup_path); - key.attach_type = BPF_CGROUP_INET_EGRESS; if (CHECK(bpf_map_lookup_elem(map_fd, &key, &value) < 0, "map-lookup", "errno %d", errno)) return true; @@ -48,7 +47,6 @@ static bool assert_storage_noexist(struct bpf_map *map, const char *cgroup_path) map_fd = bpf_map__fd(map); key.cgroup_inode_id = get_cgroup_id(cgroup_path); - key.attach_type = BPF_CGROUP_INET_EGRESS; if (CHECK(bpf_map_lookup_elem(map_fd, &key, &value) == 0, "map-lookup", "succeeded, expected ENOENT")) return true; @@ -156,14 +154,92 @@ static void test_egress_only(int parent_cgroup_fd, int child_cgroup_fd) static void test_egress_ingress(int parent_cgroup_fd, int child_cgroup_fd) { struct cg_storage_multi_egress_ingress *obj; + struct cgroup_value expected_cgroup_value; + struct bpf_link *parent_egress_link = NULL, *parent_ingress_link = NULL; + struct bpf_link *child_egress_link = NULL, *child_ingress_link = NULL; + bool err; - /* Cannot load both programs due to verifier failure: - * "only one cgroup storage of each type is allowed" - */ obj = cg_storage_multi_egress_ingress__open_and_load(); - if (CHECK(obj || errno != EBUSY, - "skel-load", "errno %d, expected EBUSY", errno)) + if (CHECK(!obj, "skel-load", "errno %d", errno)) return; + + /* Attach to parent cgroup, trigger packet from child. + * Assert that there is two runs, one with parent cgroup egress and + * one with parent cgroup ingress. + * Also assert that child cgroup's storage does not exist + */ + parent_egress_link = bpf_program__attach_cgroup(obj->progs.egress, + parent_cgroup_fd); + if (CHECK(IS_ERR(parent_egress_link), "parent-egress-cg-attach", + "err %ld", PTR_ERR(parent_egress_link))) + goto close_bpf_object; + parent_ingress_link = bpf_program__attach_cgroup(obj->progs.ingress, + parent_cgroup_fd); + if (CHECK(IS_ERR(parent_ingress_link), "parent-ingress-cg-attach", + "err %ld", PTR_ERR(parent_ingress_link))) + goto close_bpf_object; + err = connect_send(CHILD_CGROUP); + if (CHECK(err, "first-connect-send", "errno %d", errno)) + goto close_bpf_object; + if (CHECK(obj->bss->invocations != 2, + "first-invoke", "invocations=%d", obj->bss->invocations)) + goto close_bpf_object; + expected_cgroup_value = (struct cgroup_value) { + .egress_pkts = 1, + .ingress_pkts = 1, + }; + if (assert_storage(obj->maps.cgroup_storage, + PARENT_CGROUP, &expected_cgroup_value)) + goto close_bpf_object; + if (assert_storage_noexist(obj->maps.cgroup_storage, CHILD_CGROUP)) + goto close_bpf_object; + + /* Attach to parent and child cgroup, trigger packet from child. + * Assert that there is four additional runs, parent cgroup egress and + * ingress, child cgroup egress and ingress. + */ + child_egress_link = bpf_program__attach_cgroup(obj->progs.egress, + child_cgroup_fd); + if (CHECK(IS_ERR(child_egress_link), "child-egress-cg-attach", + "err %ld", PTR_ERR(child_egress_link))) + goto close_bpf_object; + child_ingress_link = bpf_program__attach_cgroup(obj->progs.ingress, + child_cgroup_fd); + if (CHECK(IS_ERR(child_ingress_link), "child-ingress-cg-attach", + "err %ld", PTR_ERR(child_ingress_link))) + goto close_bpf_object; + err = connect_send(CHILD_CGROUP); + if (CHECK(err, "second-connect-send", "errno %d", errno)) + goto close_bpf_object; + if (CHECK(obj->bss->invocations != 6, + "second-invoke", "invocations=%d", obj->bss->invocations)) + goto close_bpf_object; + expected_cgroup_value = (struct cgroup_value) { + .egress_pkts = 2, + .ingress_pkts = 2, + }; + if (assert_storage(obj->maps.cgroup_storage, + PARENT_CGROUP, &expected_cgroup_value)) + goto close_bpf_object; + expected_cgroup_value = (struct cgroup_value) { + .egress_pkts = 1, + .ingress_pkts = 1, + }; + if (assert_storage(obj->maps.cgroup_storage, + CHILD_CGROUP, &expected_cgroup_value)) + goto close_bpf_object; + +close_bpf_object: + if (parent_egress_link) + bpf_link__destroy(parent_egress_link); + if (parent_ingress_link) + bpf_link__destroy(parent_ingress_link); + if (child_egress_link) + bpf_link__destroy(child_egress_link); + if (child_ingress_link) + bpf_link__destroy(child_ingress_link); + + cg_storage_multi_egress_ingress__destroy(obj); } void test_cg_storage_multi(void) From patchwork Mon Jul 20 19:54:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: YiFei Zhu X-Patchwork-Id: 1332626 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=UWccam6O; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4B9XWj6rzhz9sRR for ; Tue, 21 Jul 2020 05:55:13 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730304AbgGTTzN (ORCPT ); Mon, 20 Jul 2020 15:55:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50098 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726619AbgGTTzM (ORCPT ); Mon, 20 Jul 2020 15:55:12 -0400 Received: from mail-il1-x143.google.com (mail-il1-x143.google.com [IPv6:2607:f8b0:4864:20::143]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F111BC0619D2 for ; Mon, 20 Jul 2020 12:55:11 -0700 (PDT) Received: by mail-il1-x143.google.com with SMTP id q3so14381617ilt.8 for ; Mon, 20 Jul 2020 12:55:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=of9+3qI/MyRMYwvrISkr1vjZMf+9lj7qYnhFd6xNdRY=; b=UWccam6Or09YD9Ho7B6+Dxc+ogz5IqHcvBCgQRp1jRZwmw6HbCrgjy7+js36Gx/TPl /byGm3menqG6QKl8MRW6j6NnTQC8UnLKrlNCvyYJ8FRxo0Oyo/ohX3AcXq014umYKml6 r6/9ChJ2NfcpMefHozSCePLarS4/rTXAPYQaoY2KCtj7D4RloqoQAiYhccC+7YIliTJY 6LY7CqlkumEUxI3iHflJgB9a3fK33cJWXdzlNYKqFp4dZeXHv6C0UNk6XVdsHWroPP2s J3cJ9jvZ0b01db2ZB4tViqNdvPzUNjlboUfuVBcDa/AwSbbSa9W7scfQcxQK8+NPDwF2 AIew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=of9+3qI/MyRMYwvrISkr1vjZMf+9lj7qYnhFd6xNdRY=; b=OEy1ukjzh88btkjFXx5jb1+ArjGSjr8M6OY9CmbqqOWVZfdTU7o7XlvBzQaq06XzX8 d0wgH3ImcD1erBCc9oU4rwZxj0N+NXrMrKaUix/q/3kuZ8lfNCyjIsgvUvr1vVgRlIee gOhaWfiNOY2FnDwilR+j/BLbr55+oRQ4YqWc4pNp23000nLae5dUbVZE3CERyFGdr7QE rONxFBx96bvV4ks8xwMPCirdrW+hqEhJIIT7/SSpNhaqGFjKF7sqgx4EYb3NGen0TRgb xQKblRdWf88eAytz7syGPNHAXDyZq9rEpdSZf6JZUp9CZbyYqWUPuBpyusDDbHIgE4YY 7OdQ== X-Gm-Message-State: AOAM532EevJPpJ0KAxQYnjUEMkNXp/c/b7Ql+QOItSPaCXo4obDYjM0d 5RWg6Bp2ruy0nZ8+lk3LuIzQ1xBJSnyJuQ== X-Google-Smtp-Source: ABdhPJxPgfjRiCuebefE9D7zLLz+7OfNFRyDPsIH+FhyCFG0iyExXbNqrCVnAsLkPPd+R+gwnh3nUQ== X-Received: by 2002:a92:488f:: with SMTP id j15mr25169373ilg.269.1595274909640; Mon, 20 Jul 2020 12:55:09 -0700 (PDT) Received: from localhost.localdomain (host-173-230-99-219.tnkngak.clients.pavlovmedia.com. [173.230.99.219]) by smtp.gmail.com with ESMTPSA id v10sm9347174ilj.40.2020.07.20.12.55.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Jul 2020 12:55:09 -0700 (PDT) From: YiFei Zhu To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Daniel Borkmann , Stanislav Fomichev , Mahesh Bandewar , Roman Gushchin , Andrii Nakryiko , Martin KaFai Lau , YiFei Zhu Subject: [PATCH v4 bpf-next 5/5] Documentation/bpf: Document CGROUP_STORAGE map type Date: Mon, 20 Jul 2020 14:54:55 -0500 Message-Id: <0f1ed5471cfa1fa148ac42d8fa5f44e2e1556417.1595274799.git.zhuyifei@google.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org From: YiFei Zhu The machanics and usage are not very straightforward. Given the changes it's better to document how it works and how to use it, rather than having to rely on the examples and implementation to infer what is going on. Signed-off-by: YiFei Zhu --- Documentation/bpf/index.rst | 9 +++ Documentation/bpf/map_cgroup_storage.rst | 95 ++++++++++++++++++++++++ 2 files changed, 104 insertions(+) create mode 100644 Documentation/bpf/map_cgroup_storage.rst diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst index 38b4db8be7a2..26f4bb3107fc 100644 --- a/Documentation/bpf/index.rst +++ b/Documentation/bpf/index.rst @@ -48,6 +48,15 @@ Program types bpf_lsm +Map types +========= + +.. toctree:: + :maxdepth: 1 + + map_cgroup_storage + + Testing and debugging BPF ========================= diff --git a/Documentation/bpf/map_cgroup_storage.rst b/Documentation/bpf/map_cgroup_storage.rst new file mode 100644 index 000000000000..b7210cb3f294 --- /dev/null +++ b/Documentation/bpf/map_cgroup_storage.rst @@ -0,0 +1,95 @@ +.. SPDX-License-Identifier: GPL-2.0-only +.. Copyright (C) 2020 Google LLC. + +=========================== +BPF_MAP_TYPE_CGROUP_STORAGE +=========================== + +The ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type represents a local fix-sized +storage. It is only available with ``CONFIG_CGROUP_BPF``, and to programs that +attach to cgroups; the programs are made available by the same config. The +storage is identified by the cgroup the program is attached to. + +This document describes the usage and semantics of the +``BPF_MAP_TYPE_CGROUP_STORAGE`` map type. Some of its behaviors was changed in +Linux 5.9 and this document will describe the differences. + +Usage +===== + +The map uses key of type ``struct bpf_cgroup_storage_key``, declared in +``linux/bpf.h``:: + + struct bpf_cgroup_storage_key { + __u64 cgroup_inode_id; + __u32 attach_type; + }; + +``cgroup_inode_id`` is the inode id of the cgroup directory. +``attach_type`` was the the program's attach type prior to Linux 5.9, since 5.9 +it is ignored and kept for backwards compatibility. + +To access the storage in a program, use ``bpf_get_local_storage``:: + + void *bpf_get_local_storage(void *map, u64 flags) + +``flags`` is reserved for future use and must be 0. + +There is no implicit synchronization. Storages of ``BPF_MAP_TYPE_CGROUP_STORAGE`` +can be accessed by multiple programs across different CPUs, and user should +take care of synchronization by themselves. + +Example usage:: + + #include + + struct { + __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE); + __type(key, struct bpf_cgroup_storage_key); + __type(value, __u32); + } cgroup_storage SEC(".maps"); + + int program(struct __sk_buff *skb) + { + __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0); + __sync_fetch_and_add(ptr_cg_storage-, 1); + + return 0; + } + +Semantics +========= + +``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE`` is a variant of this map type. This +per-CPU variant will have different memory regions for each CPU for each +storage. The non-per-CPU will have the same memory region for each storage. + +Prior to Linux 5.9, the lifetime of a storage is precisely per-attachment, and +for a single ``CGROUP_STORAGE`` map, there can be at most one program loaded +that uses the map. A program may be attached to multiple cgroups or have +multiple attach types, and each attach creates a fresh zeroed storage. The +storage is freed upon detach. + +Userspace may use the the attach parameters of cgroup and attach type pair +in ``struct bpf_cgroup_storage_key`` as the key to the BPF map APIs to read or +update the storage for a given attachment. + +Since Linux 5.9, storage can be shared by multiple programs, and attach type +is ignored. When a program is attached to a cgroup, the kernel would create a +new storage only if the map does not already contain an entry for the cgroup, +or else the old storage is reused for the new attachment. Storage is freed +only when either the map or the cgroup attached to is being freed. Detaching +will not directly free the storage, but it may cause the reference to the map +to reach zero and indirectly freeing all storage in the map. + +Userspace may use the the attach parameters of cgroup only in +``struct bpf_cgroup_storage_key`` as the key to the BPF map APIs to read or +update the storage for a given attachment. The struct also contains an +``attach_type`` field; this field is ignored. + +In all versions, the storage is bound at attach time. Even if the program is +attached to parent and triggers in child, the storage still belongs to the +parent. + +Userspace cannot create a new entry in the map or delete an existing entry. +Program test runs always use a temporary storage.