From patchwork Thu Apr 18 05:28:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Ignatov X-Patchwork-Id: 1087397 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.b="gvfFYlRC"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44l72J5613z9s00 for ; Thu, 18 Apr 2019 15:29:12 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387950AbfDRF3K (ORCPT ); Thu, 18 Apr 2019 01:29:10 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:50518 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725888AbfDRF3J (ORCPT ); Thu, 18 Apr 2019 01:29:09 -0400 Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x3I5SYoP023207 for ; Wed, 17 Apr 2019 22:29:08 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : mime-version : content-type; s=facebook; bh=RVvChn7wnDtMBzH4+t7ry4WKJ3FagU6T9boaP8UCtIk=; b=gvfFYlRCZP6uC2kxM+zQZV4OiMxUqPWp8HaSo6X08tiLOf+aIZKM1bASojb0GAeZn5MH iThz8GhVuysH+SdTXdHvwCnXr/kugbkV1SATpIff4v6btEQo922cb/J/v79U9LWIEHOw q+Yg5chNVav3VrzTLsWSnmQiAkOJzOJ0WjM= Received: from mail.thefacebook.com (mailout.thefacebook.com [199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 2rx0t2ur27-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Wed, 17 Apr 2019 22:29:08 -0700 Received: from mx-out.facebook.com (2620:10d:c081:10::13) by mail.thefacebook.com (2620:10d:c081:35::126) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA) id 15.1.1713.5; Wed, 17 Apr 2019 22:29:07 -0700 Received: by dev082.prn2.facebook.com (Postfix, from userid 572249) id 915883702930; Wed, 17 Apr 2019 22:29:07 -0700 (PDT) Smtp-Origin-Hostprefix: dev From: Andrey Ignatov Smtp-Origin-Hostname: dev082.prn2.facebook.com To: CC: Andrey Ignatov , , , Smtp-Origin-Cluster: prn2c23 Subject: [PATCH bpf-next] bpf: Document BPF_PROG_TYPE_CGROUP_SYSCTL Date: Wed, 17 Apr 2019 22:28:57 -0700 Message-ID: <20190418052857.2405654-1-rdna@fb.com> X-Mailer: git-send-email 2.17.1 X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-04-18_03:, , signatures=0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add documentation for BPF_PROG_TYPE_CGROUP_SYSCTL, including general info, attach type, context, return code, helpers, example and usage considerations. A separate file prog_cgroup_sysctl.rst is added to Documentation/bpf/. In the future more program types can be documented in their own prog_.rst files. Another way to place program type specific documentation would be to group program types somehow (e.g. cgroup.rst for all cgroup-bpf programs), but it may not scale well since some program types may belong to different groups, e.g. BPF_PROG_TYPE_CGROUP_SKB can be documented together with either cgroup-bpf programs or programs that access skb. The new file is added to the index and verified by `make htmldocs` / sanity-check by lynx. Signed-off-by: Andrey Ignatov Acked-by: Yonghong Song --- Documentation/bpf/index.rst | 9 ++ Documentation/bpf/prog_cgroup_sysctl.rst | 125 +++++++++++++++++++++++ 2 files changed, 134 insertions(+) create mode 100644 Documentation/bpf/prog_cgroup_sysctl.rst diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst index 4e77932959cc..dadcaa9a9f5f 100644 --- a/Documentation/bpf/index.rst +++ b/Documentation/bpf/index.rst @@ -36,6 +36,15 @@ Two sets of Questions and Answers (Q&A) are maintained. bpf_devel_QA +Program types +============= + +.. toctree:: + :maxdepth: 1 + + prog_cgroup_sysctl + + .. Links: .. _Documentation/networking/filter.txt: ../networking/filter.txt .. _man-pages: https://www.kernel.org/doc/man-pages/ diff --git a/Documentation/bpf/prog_cgroup_sysctl.rst b/Documentation/bpf/prog_cgroup_sysctl.rst new file mode 100644 index 000000000000..677d6c637cf3 --- /dev/null +++ b/Documentation/bpf/prog_cgroup_sysctl.rst @@ -0,0 +1,125 @@ +.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) + +=========================== +BPF_PROG_TYPE_CGROUP_SYSCTL +=========================== + +This document describes ``BPF_PROG_TYPE_CGROUP_SYSCTL`` program type that +provides cgroup-bpf hook for sysctl. + +The hook has to be attached to a cgroup and will be called every time a +process inside that cgroup tries to read from or write to sysctl knob in proc. + +1. Attach type +************** + +``BPF_CGROUP_SYSCTL`` attach type has to be used to attach +``BPF_PROG_TYPE_CGROUP_SYSCTL`` program to a cgroup. + +2. Context +********** + +``BPF_PROG_TYPE_CGROUP_SYSCTL`` provides access to the following context from +BPF program:: + + struct bpf_sysctl { + __u32 write; + __u32 file_pos; + }; + +* ``write`` indicates whether sysctl value is being read (``0``) or written + (``1``). This field is read-only. + +* ``file_pos`` indicates file position sysctl is being accessed at, read + or written. This field is read-write. Writing to the field sets the starting + position in sysctl proc file ``read(2)`` will be reading from or ``write(2)`` + will be writing to. Writing zero to the field can be used e.g. to override + whole sysctl value by ``bpf_sysctl_set_new_value()`` on ``write(2)`` even + when it's called by user space on ``file_pos > 0``. Writing non-zero + value to the field can be used to access part of sysctl value starting from + specified ``file_pos``. Not all sysctl support access with ``file_pos != + 0``, e.g. writes to numeric sysctl entries must always be at file position + ``0``. See also ``kernel.sysctl_writes_strict`` sysctl. + +See `linux/bpf.h`_ for more details on how context field can be accessed. + +3. Return code +************** + +``BPF_PROG_TYPE_CGROUP_SYSCTL`` program must return one of the following +return codes: + +* ``0`` means "reject access to sysctl"; +* ``1`` means "proceed with access". + +If program returns ``0`` user space will get ``-1`` from ``read(2)`` or +``write(2)`` and ``errno`` will be set to ``EPERM``. + +4. Helpers +********** + +Since sysctl knob is represented by a name and a value, sysctl specific BPF +helpers focus on providing access to these properties: + +* ``bpf_sysctl_get_name()`` to get sysctl name as it is visible in + ``/proc/sys`` into provided by BPF program buffer; + +* ``bpf_sysctl_get_current_value()`` to get string value currently held by + sysctl into provided by BPF program buffer. This helper is available on both + ``read(2)`` from and ``write(2)`` to sysctl; + +* ``bpf_sysctl_get_new_value()`` to get new string value currently being + written to sysctl before actual write happens. This helper can be used only + on ``ctx->write == 1``; + +* ``bpf_sysctl_set_new_value()`` to override new string value currently being + written to sysctl before actual write happens. Sysctl value will be + overridden starting from the current ``ctx->file_pos``. If the whole value + has to be overridden BPF program can set ``file_pos`` to zero before calling + to the helper. This helper can be used only on ``ctx->write == 1``. New + string value set by the helper is treated and verified by kernel same way as + an equivalent string passed by user space. + +BPF program sees sysctl value same way as user space does in proc filesystem, +i.e. as a string. Since many sysctl values represent an integer or a vector +of integers, the following helpers can be used to get numeric value from the +string: + +* ``bpf_strtol()`` to convert initial part of the string to long integer + similar to user space `strtol(3)`_; +* ``bpf_strtoul()`` to convert initial part of the string to unsigned long + integer similar to user space `strtoul(3)`_; + +See `linux/bpf.h`_ for more details on helpers described here. + +5. Examples +*********** + +See `test_sysctl_prog.c`_ for an example of BPF program in C that access +sysctl name and value, parses string value to get vector of integers and uses +the result to make decision whether to allow or deny access to sysctl. + +6. Notes +******** + +``BPF_PROG_TYPE_CGROUP_SYSCTL`` is intended to be used in **trusted** root +environment, for example to monitor sysctl usage or catch unreasonable values +an application, running as root in a separate cgroup, is trying to set. + +Since `task_dfl_cgroup(current)` is called at `sys_read` / `sys_write` time it +may return results different from that at `sys_open` time, i.e. process that +opened sysctl file in proc filesystem may differ from process that is trying +to read from / write to it and two such processes may run in different +cgroups, what means ``BPF_PROG_TYPE_CGROUP_SYSCTL`` should not be used as a +security mechanism to limit sysctl usage. + +As with any cgroup-bpf program additional care should be taken if an +application running as root in a cgroup should not be allowed to +detach/replace BPF program attached by administrator. + +.. Links +.. _linux/bpf.h: ../../include/uapi/linux/bpf.h +.. _strtol(3): http://man7.org/linux/man-pages/man3/strtol.3p.html +.. _strtoul(3): http://man7.org/linux/man-pages/man3/strtoul.3p.html +.. _test_sysctl_prog.c: + ../../tools/testing/selftests/bpf/progs/test_sysctl_prog.c