diff mbox series

[bpf-next,v2,27/35] bpf: eliminate rlimit-based memory accounting infra for bpf maps

Message ID 20200727184506.2279656-28-guro@fb.com
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series bpf: switch to memcg-based memory accounting | expand

Commit Message

Roman Gushchin July 27, 2020, 6:44 p.m. UTC
Remove rlimit-based accounting infrastructure code, which is not used
anymore.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 include/linux/bpf.h                           | 12 ----
 kernel/bpf/syscall.c                          | 64 +------------------
 .../selftests/bpf/progs/map_ptr_kern.c        |  5 --
 3 files changed, 2 insertions(+), 79 deletions(-)

Comments

Song Liu July 28, 2020, 5:47 a.m. UTC | #1
On Mon, Jul 27, 2020 at 12:26 PM Roman Gushchin <guro@fb.com> wrote:
>
> Remove rlimit-based accounting infrastructure code, which is not used
> anymore.
>
> Signed-off-by: Roman Gushchin <guro@fb.com>
[...]
>
>  static void bpf_map_put_uref(struct bpf_map *map)
> @@ -541,7 +484,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
>                    "value_size:\t%u\n"
>                    "max_entries:\t%u\n"
>                    "map_flags:\t%#x\n"
> -                  "memlock:\t%llu\n"
> +                  "memlock:\t%llu\n" /* deprecated */

I am not sure whether we can deprecate this one.. How difficult is it
to keep this statistics?

Thanks,
Song
Andrii Nakryiko July 28, 2020, 5:58 a.m. UTC | #2
On Mon, Jul 27, 2020 at 10:47 PM Song Liu <song@kernel.org> wrote:
>
> On Mon, Jul 27, 2020 at 12:26 PM Roman Gushchin <guro@fb.com> wrote:
> >
> > Remove rlimit-based accounting infrastructure code, which is not used
> > anymore.
> >
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> [...]
> >
> >  static void bpf_map_put_uref(struct bpf_map *map)
> > @@ -541,7 +484,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
> >                    "value_size:\t%u\n"
> >                    "max_entries:\t%u\n"
> >                    "map_flags:\t%#x\n"
> > -                  "memlock:\t%llu\n"
> > +                  "memlock:\t%llu\n" /* deprecated */
>
> I am not sure whether we can deprecate this one.. How difficult is it
> to keep this statistics?
>

It's factually correct now, that BPF map doesn't use any memlock memory, no?

This is actually one way to detect whether RLIMIT_MEMLOCK is necessary
or not: create a small map, check if it's fdinfo has memlock: 0 or not
:)

> Thanks,
> Song
Song Liu July 28, 2020, 6:06 a.m. UTC | #3
On Mon, Jul 27, 2020 at 10:58 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Jul 27, 2020 at 10:47 PM Song Liu <song@kernel.org> wrote:
> >
> > On Mon, Jul 27, 2020 at 12:26 PM Roman Gushchin <guro@fb.com> wrote:
> > >
> > > Remove rlimit-based accounting infrastructure code, which is not used
> > > anymore.
> > >
> > > Signed-off-by: Roman Gushchin <guro@fb.com>
> > [...]
> > >
> > >  static void bpf_map_put_uref(struct bpf_map *map)
> > > @@ -541,7 +484,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
> > >                    "value_size:\t%u\n"
> > >                    "max_entries:\t%u\n"
> > >                    "map_flags:\t%#x\n"
> > > -                  "memlock:\t%llu\n"
> > > +                  "memlock:\t%llu\n" /* deprecated */
> >
> > I am not sure whether we can deprecate this one.. How difficult is it
> > to keep this statistics?
> >
>
> It's factually correct now, that BPF map doesn't use any memlock memory, no?

I am not sure whether memlock really means memlock for all users... I bet there
are users who use memlock to check total memory used by the map.

>
> This is actually one way to detect whether RLIMIT_MEMLOCK is necessary
> or not: create a small map, check if it's fdinfo has memlock: 0 or not
> :)

If we do show memlock=0, this is a good check...

Thanks,
Song
Roman Gushchin July 28, 2020, 7:08 p.m. UTC | #4
On Mon, Jul 27, 2020 at 11:06:42PM -0700, Song Liu wrote:
> On Mon, Jul 27, 2020 at 10:58 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Jul 27, 2020 at 10:47 PM Song Liu <song@kernel.org> wrote:
> > >
> > > On Mon, Jul 27, 2020 at 12:26 PM Roman Gushchin <guro@fb.com> wrote:
> > > >
> > > > Remove rlimit-based accounting infrastructure code, which is not used
> > > > anymore.
> > > >
> > > > Signed-off-by: Roman Gushchin <guro@fb.com>
> > > [...]
> > > >
> > > >  static void bpf_map_put_uref(struct bpf_map *map)
> > > > @@ -541,7 +484,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
> > > >                    "value_size:\t%u\n"
> > > >                    "max_entries:\t%u\n"
> > > >                    "map_flags:\t%#x\n"
> > > > -                  "memlock:\t%llu\n"
> > > > +                  "memlock:\t%llu\n" /* deprecated */
> > >
> > > I am not sure whether we can deprecate this one.. How difficult is it
> > > to keep this statistics?
> > >
> >
> > It's factually correct now, that BPF map doesn't use any memlock memory, no?

Right.

> 
> I am not sure whether memlock really means memlock for all users... I bet there
> are users who use memlock to check total memory used by the map.

But this is just the part of struct bpf_map, so I agree with Andrii,
it's a safe check.

> 
> >
> > This is actually one way to detect whether RLIMIT_MEMLOCK is necessary
> > or not: create a small map, check if it's fdinfo has memlock: 0 or not
> > :)
> 
> If we do show memlock=0, this is a good check...

The only question I have if it's worth checking at all? Bumping the rlimit
is a way cheaper operation than creating a temporarily map and checking its
properties.

So is there any win in comparison to just leaving the userspace code* as it is
for now?

* except runqslower and samples
Andrii Nakryiko July 28, 2020, 7:16 p.m. UTC | #5
On Tue, Jul 28, 2020 at 12:09 PM Roman Gushchin <guro@fb.com> wrote:
>
> On Mon, Jul 27, 2020 at 11:06:42PM -0700, Song Liu wrote:
> > On Mon, Jul 27, 2020 at 10:58 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Mon, Jul 27, 2020 at 10:47 PM Song Liu <song@kernel.org> wrote:
> > > >
> > > > On Mon, Jul 27, 2020 at 12:26 PM Roman Gushchin <guro@fb.com> wrote:
> > > > >
> > > > > Remove rlimit-based accounting infrastructure code, which is not used
> > > > > anymore.
> > > > >
> > > > > Signed-off-by: Roman Gushchin <guro@fb.com>
> > > > [...]
> > > > >
> > > > >  static void bpf_map_put_uref(struct bpf_map *map)
> > > > > @@ -541,7 +484,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
> > > > >                    "value_size:\t%u\n"
> > > > >                    "max_entries:\t%u\n"
> > > > >                    "map_flags:\t%#x\n"
> > > > > -                  "memlock:\t%llu\n"
> > > > > +                  "memlock:\t%llu\n" /* deprecated */
> > > >
> > > > I am not sure whether we can deprecate this one.. How difficult is it
> > > > to keep this statistics?
> > > >
> > >
> > > It's factually correct now, that BPF map doesn't use any memlock memory, no?
>
> Right.
>
> >
> > I am not sure whether memlock really means memlock for all users... I bet there
> > are users who use memlock to check total memory used by the map.
>
> But this is just the part of struct bpf_map, so I agree with Andrii,
> it's a safe check.
>
> >
> > >
> > > This is actually one way to detect whether RLIMIT_MEMLOCK is necessary
> > > or not: create a small map, check if it's fdinfo has memlock: 0 or not
> > > :)
> >
> > If we do show memlock=0, this is a good check...
>
> The only question I have if it's worth checking at all? Bumping the rlimit
> is a way cheaper operation than creating a temporarily map and checking its
> properties.
>

for perf and libbpf -- I think it's totally worth it. Bumping
RLIMIT_MEMLOCK automatically means potentially messing up some other
parts of the system (e.g., BCC just bumps it to INFINITY allowing to
over-allocate too much memory, potentially, for unrelated applications
that do rely on RLIMIT_MEMLOCK). It's one of the reasons why libbpf
doesn't do it automatically, actually. So knowing when this is not
necessary, will allow to improve diagnostic messages by libbpf, and
would just avoid potentially risky operation by perf/BCC/etc.

> So is there any win in comparison to just leaving the userspace code* as it is
> for now?
>
> * except runqslower and samples
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8357be349133..055c693d9928 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -112,11 +112,6 @@  struct bpf_map_ops {
 	const struct bpf_iter_seq_info *iter_seq_info;
 };
 
-struct bpf_map_memory {
-	u32 pages;
-	struct user_struct *user;
-};
-
 struct bpf_map {
 	/* The first two cachelines with read-mostly members of which some
 	 * are also accessed in fast-path (e.g. ops, max_entries).
@@ -137,7 +132,6 @@  struct bpf_map {
 	u32 btf_key_type_id;
 	u32 btf_value_type_id;
 	struct btf *btf;
-	struct bpf_map_memory memory;
 	char name[BPF_OBJ_NAME_LEN];
 	u32 btf_vmlinux_value_type_id;
 	bool bypass_spec_v1;
@@ -1117,12 +1111,6 @@  void bpf_map_inc_with_uref(struct bpf_map *map);
 struct bpf_map * __must_check bpf_map_inc_not_zero(struct bpf_map *map);
 void bpf_map_put_with_uref(struct bpf_map *map);
 void bpf_map_put(struct bpf_map *map);
-int bpf_map_charge_memlock(struct bpf_map *map, u32 pages);
-void bpf_map_uncharge_memlock(struct bpf_map *map, u32 pages);
-int bpf_map_charge_init(struct bpf_map_memory *mem, u64 size);
-void bpf_map_charge_finish(struct bpf_map_memory *mem);
-void bpf_map_charge_move(struct bpf_map_memory *dst,
-			 struct bpf_map_memory *src);
 void *bpf_map_area_alloc(u64 size, int numa_node);
 void *bpf_map_area_mmapable_alloc(u64 size, int numa_node);
 void bpf_map_area_free(void *base);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 501b2c071d7b..ae51e2363cc1 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -354,60 +354,6 @@  static void bpf_uncharge_memlock(struct user_struct *user, u32 pages)
 		atomic_long_sub(pages, &user->locked_vm);
 }
 
-int bpf_map_charge_init(struct bpf_map_memory *mem, u64 size)
-{
-	u32 pages = round_up(size, PAGE_SIZE) >> PAGE_SHIFT;
-	struct user_struct *user;
-	int ret;
-
-	if (size >= U32_MAX - PAGE_SIZE)
-		return -E2BIG;
-
-	user = get_current_user();
-	ret = bpf_charge_memlock(user, pages);
-	if (ret) {
-		free_uid(user);
-		return ret;
-	}
-
-	mem->pages = pages;
-	mem->user = user;
-
-	return 0;
-}
-
-void bpf_map_charge_finish(struct bpf_map_memory *mem)
-{
-	bpf_uncharge_memlock(mem->user, mem->pages);
-	free_uid(mem->user);
-}
-
-void bpf_map_charge_move(struct bpf_map_memory *dst,
-			 struct bpf_map_memory *src)
-{
-	*dst = *src;
-
-	/* Make sure src will not be used for the redundant uncharging. */
-	memset(src, 0, sizeof(struct bpf_map_memory));
-}
-
-int bpf_map_charge_memlock(struct bpf_map *map, u32 pages)
-{
-	int ret;
-
-	ret = bpf_charge_memlock(map->memory.user, pages);
-	if (ret)
-		return ret;
-	map->memory.pages += pages;
-	return ret;
-}
-
-void bpf_map_uncharge_memlock(struct bpf_map *map, u32 pages)
-{
-	bpf_uncharge_memlock(map->memory.user, pages);
-	map->memory.pages -= pages;
-}
-
 static int bpf_map_alloc_id(struct bpf_map *map)
 {
 	int id;
@@ -456,13 +402,10 @@  void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock)
 static void bpf_map_free_deferred(struct work_struct *work)
 {
 	struct bpf_map *map = container_of(work, struct bpf_map, work);
-	struct bpf_map_memory mem;
 
-	bpf_map_charge_move(&mem, &map->memory);
 	security_bpf_map_free(map);
 	/* implementation dependent freeing */
 	map->ops->map_free(map);
-	bpf_map_charge_finish(&mem);
 }
 
 static void bpf_map_put_uref(struct bpf_map *map)
@@ -541,7 +484,7 @@  static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
 		   "value_size:\t%u\n"
 		   "max_entries:\t%u\n"
 		   "map_flags:\t%#x\n"
-		   "memlock:\t%llu\n"
+		   "memlock:\t%llu\n" /* deprecated */
 		   "map_id:\t%u\n"
 		   "frozen:\t%u\n",
 		   map->map_type,
@@ -549,7 +492,7 @@  static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
 		   map->value_size,
 		   map->max_entries,
 		   map->map_flags,
-		   map->memory.pages * 1ULL << PAGE_SHIFT,
+		   0LLU,
 		   map->id,
 		   READ_ONCE(map->frozen));
 	if (type) {
@@ -790,7 +733,6 @@  static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 static int map_create(union bpf_attr *attr)
 {
 	int numa_node = bpf_map_attr_numa_node(attr);
-	struct bpf_map_memory mem;
 	struct bpf_map *map;
 	int f_flags;
 	int err;
@@ -887,9 +829,7 @@  static int map_create(union bpf_attr *attr)
 	security_bpf_map_free(map);
 free_map:
 	btf_put(map->btf);
-	bpf_map_charge_move(&mem, &map->memory);
 	map->ops->map_free(map);
-	bpf_map_charge_finish(&mem);
 	return err;
 }
 
diff --git a/tools/testing/selftests/bpf/progs/map_ptr_kern.c b/tools/testing/selftests/bpf/progs/map_ptr_kern.c
index 473665cac67e..49d1dcaf7999 100644
--- a/tools/testing/selftests/bpf/progs/map_ptr_kern.c
+++ b/tools/testing/selftests/bpf/progs/map_ptr_kern.c
@@ -26,17 +26,12 @@  __u32 g_line = 0;
 		return 0;	\
 })
 
-struct bpf_map_memory {
-	__u32 pages;
-} __attribute__((preserve_access_index));
-
 struct bpf_map {
 	enum bpf_map_type map_type;
 	__u32 key_size;
 	__u32 value_size;
 	__u32 max_entries;
 	__u32 id;
-	struct bpf_map_memory memory;
 } __attribute__((preserve_access_index));
 
 static inline int check_bpf_map_fields(struct bpf_map *map, __u32 key_size,