[{"id":1771369,"web_url":"http://patchwork.ozlabs.org/comment/1771369/","msgid":"<59C17F16.10503@iogearbox.net>","list_archive_url":null,"date":"2017-09-19T20:33:26","subject":"Re: [PATCH net-next v4 1/4] bpf: add helper\n\tbpf_perf_event_read_value for perf event array map","submitter":{"id":65705,"url":"http://patchwork.ozlabs.org/api/people/65705/","name":"Daniel Borkmann","email":"daniel@iogearbox.net"},"content":"On 09/19/2017 09:04 AM, Yonghong Song wrote:\n[...]\n\nJust some minor things, but a bit scattered.\n\n> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h\n> index 43ab5c4..2c68b9e 100644\n> --- a/include/uapi/linux/bpf.h\n> +++ b/include/uapi/linux/bpf.h\n> @@ -582,6 +582,14 @@ union bpf_attr {\n>    *\t@map: pointer to sockmap to update\n>    *\t@key: key to insert/update sock in map\n>    *\t@flags: same flags as map update elem\n> + *\n> + * int bpf_perf_event_read_value(map, flags, buf, buf_size)\n> + *     read perf event counter value and perf event enabled/running time\n> + *     @map: pointer to perf_event_array map\n> + *     @flags: index of event in the map or bitmask flags\n> + *     @buf: buf to fill\n> + *     @buf_size: size of the buf\n> + *     Return: 0 on success or negative error code\n>    */\n>   #define __BPF_FUNC_MAPPER(FN)\t\t\\\n>   \tFN(unspec),\t\t\t\\\n> @@ -638,6 +646,7 @@ union bpf_attr {\n>   \tFN(redirect_map),\t\t\\\n>   \tFN(sk_redirect_map),\t\t\\\n>   \tFN(sock_map_update),\t\t\\\n> +\tFN(perf_event_read_value),\t\t\\\n\nNit: can you indent the '\\' so it aligns with the rest\n\n>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper\n>    * function eBPF program intends to call\n> @@ -681,7 +690,8 @@ enum bpf_func_id {\n>   #define BPF_F_ZERO_CSUM_TX\t\t(1ULL << 1)\n>   #define BPF_F_DONT_FRAGMENT\t\t(1ULL << 2)\n>\n> -/* BPF_FUNC_perf_event_output and BPF_FUNC_perf_event_read flags. */\n> +/* BPF_FUNC_perf_event_output, BPF_FUNC_perf_event_read and\n> + * BPF_FUNC_perf_event_read_value flags. */\n\nNit: comment style\n\n>   #define BPF_F_INDEX_MASK\t\t0xffffffffULL\n>   #define BPF_F_CURRENT_CPU\t\tBPF_F_INDEX_MASK\n>   /* BPF_FUNC_perf_event_output for sk_buff input context. */\n> @@ -864,4 +874,10 @@ enum {\n>   #define TCP_BPF_IW\t\t1001\t/* Set TCP initial congestion window */\n>   #define TCP_BPF_SNDCWND_CLAMP\t1002\t/* Set sndcwnd_clamp */\n>\n> +struct bpf_perf_event_value {\n> +\t__u64 counter;\n> +\t__u64 enabled;\n> +\t__u64 running;\n> +};\n> +\n[...]\n> diff --git a/kernel/events/core.c b/kernel/events/core.c\n> index 3e691b7..2d5bbe5 100644\n> --- a/kernel/events/core.c\n> +++ b/kernel/events/core.c\n> @@ -3684,10 +3684,12 @@ static inline u64 perf_event_count(struct perf_event *event)\n>    *     will not be local and we cannot read them atomically\n>    *   - must not have a pmu::count method\n>    */\n> -int perf_event_read_local(struct perf_event *event, u64 *value)\n> +int perf_event_read_local(struct perf_event *event, u64 *value,\n> +\t\t\t  u64 *enabled, u64 *running)\n>   {\n>   \tunsigned long flags;\n>   \tint ret = 0;\n> +\tu64 now;\n>\n>   \t/*\n>   \t * Disabling interrupts avoids all counter scheduling (context\n> @@ -3718,14 +3720,21 @@ int perf_event_read_local(struct perf_event *event, u64 *value)\n>   \t\tgoto out;\n>   \t}\n>\n> +\tnow = event->shadow_ctx_time + perf_clock();\n> +\tif (enabled)\n> +\t\t*enabled = now - event->tstamp_enabled;\n>   \t/*\n>   \t * If the event is currently on this CPU, its either a per-task event,\n>   \t * or local to this CPU. Furthermore it means its ACTIVE (otherwise\n>   \t * oncpu == -1).\n>   \t */\n> -\tif (event->oncpu == smp_processor_id())\n> +\tif (event->oncpu == smp_processor_id()) {\n>   \t\tevent->pmu->read(event);\n> -\n> +\t\tif (running)\n> +\t\t\t*running = now - event->tstamp_running;\n> +\t} else if (running) {\n> +\t\t*running = event->total_time_running;\n> +\t}\n>   \t*value = local64_read(&event->count);\n>   out:\n>   \tlocal_irq_restore(flags);\n> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c\n> index dc498b6..39ce5d9 100644\n> --- a/kernel/trace/bpf_trace.c\n> +++ b/kernel/trace/bpf_trace.c\n> @@ -255,13 +255,13 @@ const struct bpf_func_proto *bpf_get_trace_printk_proto(void)\n>   \treturn &bpf_trace_printk_proto;\n>   }\n>\n> -BPF_CALL_2(bpf_perf_event_read, struct bpf_map *, map, u64, flags)\n> -{\n> +static __always_inline int\n> +get_map_perf_counter(struct bpf_map *map, u64 flags,\n> +\t\tu64 *value, u64 *enabled, u64 *running) {\n\nNit: coding style\n\n>   \tstruct bpf_array *array = container_of(map, struct bpf_array, map);\n>   \tunsigned int cpu = smp_processor_id();\n>   \tu64 index = flags & BPF_F_INDEX_MASK;\n>   \tstruct bpf_event_entry *ee;\n> -\tu64 value = 0;\n>   \tint err;\n>\n>   \tif (unlikely(flags & ~(BPF_F_INDEX_MASK)))\n> @@ -275,7 +275,17 @@ BPF_CALL_2(bpf_perf_event_read, struct bpf_map *, map, u64, flags)\n>   \tif (!ee)\n>   \t\treturn -ENOENT;\n>\n> -\terr = perf_event_read_local(ee->event, &value);\n> +\terr = perf_event_read_local(ee->event, value, enabled, running);\n> +\treturn err;\n\nerr can be removed entirely then.\n\n   return perf_event_read_local(ee->event, value, enabled, running);\n\n> +}\n> +\n> +\n\nNit: Two newlines slipped in.\n\n> +BPF_CALL_2(bpf_perf_event_read, struct bpf_map *, map, u64, flags)\n> +{\n> +\tu64 value = 0;\n> +\tint err;\n> +\n> +\terr = get_map_perf_counter(map, flags, &value, NULL, NULL);\n>   \t/*\n>   \t * this api is ugly since we miss [-22..-2] range of valid\n>   \t * counter values, but that's uapi\n> @@ -285,6 +295,20 @@ BPF_CALL_2(bpf_perf_event_read, struct bpf_map *, map, u64, flags)\n>   \treturn value;\n>   }\n\nCan you also move the bpf_perf_event_read_proto definition right\nunderneath here as we have with all other helpers?\n\n> +BPF_CALL_4(bpf_perf_event_read_value, struct bpf_map *, map, u64, flags,\n> +\tstruct bpf_perf_event_value *, buf, u32, size)\n\nNit: indent\n\n> +{\n> +\tint err;\n> +\n> +\tif (unlikely(size != sizeof(struct bpf_perf_event_value)))\n> +\t\treturn -EINVAL;\n> +\terr = get_map_perf_counter(map, flags, &buf->counter, &buf->enabled,\n> +                            &buf->running);\n\n^ This is only indented with spaces.\n\n> +\tif (err)\n> +\t\treturn err;\n> +\treturn 0;\n\nAlso here err can be removed entirely, make it\nless convoluted:\n\nBPF_CALL_4(bpf_perf_event_read_value, struct bpf_map *, map, u64, flags,\n           struct bpf_perf_event_value *, eval, u32, size)\n{\n        if (unlikely(size != sizeof(struct bpf_perf_event_value)))\n                return -EINVAL;\n\n        return get_map_perf_counter(map, flags, &eval->counter, &eval->enabled,\n                                    &eval->running);\n}\n\n> +}\n> +\n>   static const struct bpf_func_proto bpf_perf_event_read_proto = {\n>   \t.func\t\t= bpf_perf_event_read,\n>   \t.gpl_only\t= true,\n> @@ -293,6 +317,16 @@ static const struct bpf_func_proto bpf_perf_event_read_proto = {\n>   \t.arg2_type\t= ARG_ANYTHING,\n>   };\n>\n> +static const struct bpf_func_proto bpf_perf_event_read_value_proto = {\n> +\t.func\t\t= bpf_perf_event_read_value,\n> +\t.gpl_only\t= true,\n> +\t.ret_type\t= RET_INTEGER,\n> +\t.arg1_type\t= ARG_CONST_MAP_PTR,\n> +\t.arg2_type\t= ARG_ANYTHING,\n> +\t.arg3_type\t= ARG_PTR_TO_UNINIT_MEM,\n\nIf you do that, then error paths need to memset the region, e.g.\nsee bpf_skb_get_tunnel_opt() and bpf_skb_get_tunnel_key() helpers\nwhich operate similarly.\n\n> +\t.arg4_type\t= ARG_CONST_SIZE,\n> +};\n> +\n>   static DEFINE_PER_CPU(struct perf_sample_data, bpf_sd);\n>\n>   static __always_inline u64\n> @@ -499,6 +533,8 @@ static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func\n>   \t\treturn &bpf_perf_event_output_proto;\n>   \tcase BPF_FUNC_get_stackid:\n>   \t\treturn &bpf_get_stackid_proto;\n> +\tcase BPF_FUNC_perf_event_read_value:\n> +\t\treturn &bpf_perf_event_read_value_proto;\n>   \tdefault:\n>   \t\treturn tracing_func_proto(func_id);\n>   \t}\n>","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xxZMm0vJ8z9s81\n\tfor <patchwork-incoming@ozlabs.org>;\n\tWed, 20 Sep 2017 06:33:40 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751541AbdISUdh (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tTue, 19 Sep 2017 16:33:37 -0400","from www62.your-server.de ([213.133.104.62]:55395 \"EHLO\n\twww62.your-server.de\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1751348AbdISUdg (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Tue, 19 Sep 2017 16:33:36 -0400","from [85.7.161.218] (helo=localhost.localdomain)\n\tby www62.your-server.de with esmtpsa (TLSv1.2:DHE-RSA-AES256-SHA:256)\n\t(Exim 4.85_2) (envelope-from <daniel@iogearbox.net>)\n\tid 1duPCx-0002Oj-0F; Tue, 19 Sep 2017 22:33:27 +0200"],"Message-ID":"<59C17F16.10503@iogearbox.net>","Date":"Tue, 19 Sep 2017 22:33:26 +0200","From":"Daniel Borkmann <daniel@iogearbox.net>","User-Agent":"Mozilla/5.0 (X11; Linux x86_64;\n\trv:31.0) Gecko/20100101 Thunderbird/31.7.0","MIME-Version":"1.0","To":"Yonghong Song <yhs@fb.com>, peterz@infradead.org,\n\trostedt@goodmis.org, ast@fb.com, netdev@vger.kernel.org","CC":"kernel-team@fb.com","Subject":"Re: [PATCH net-next v4 1/4] bpf: add helper\n\tbpf_perf_event_read_value for perf event array map","References":"<20170919070413.3838201-1-yhs@fb.com>\n\t<20170919070413.3838201-2-yhs@fb.com>","In-Reply-To":"<20170919070413.3838201-2-yhs@fb.com>","Content-Type":"text/plain; charset=windows-1252; format=flowed","Content-Transfer-Encoding":"7bit","X-Authenticated-Sender":"daniel@iogearbox.net","X-Virus-Scanned":"Clear (ClamAV 0.99.2/23853/Tue Sep 19 14:42:49 2017)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}}]