diff mbox series

[bpf,1/2] bpf: prevent re-mmap()'ing BPF map as writable for initially r/o mapping

Message ID 20200410000425.2597887-1-andriin@fb.com
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series [bpf,1/2] bpf: prevent re-mmap()'ing BPF map as writable for initially r/o mapping | expand

Commit Message

Andrii Nakryiko April 10, 2020, 12:04 a.m. UTC
VM_MAYWRITE flag during initial memory mapping determines if already mmap()'ed
pages can be later remapped as writable ones through mprotect() call. To
prevent user application to rewrite contents of memory-mapped as read-only and
subsequently frozen BPF map, remove VM_MAYWRITE flag completely on initially
read-only mapping.

Alternatively, we could treat any memory-mapping on unfrozen map as writable
and bump writecnt instead. But there is little legitimate reason to map
BPF map as read-only and then re-mmap() it as writable through mprotect(),
instead of just mmap()'ing it as read/write from the very beginning.

Fixes: fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 kernel/bpf/syscall.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Jann Horn April 10, 2020, 8:51 a.m. UTC | #1
On Fri, Apr 10, 2020 at 2:04 AM Andrii Nakryiko <andriin@fb.com> wrote:
> VM_MAYWRITE flag during initial memory mapping determines if already mmap()'ed
> pages can be later remapped as writable ones through mprotect() call. To
> prevent user application to rewrite contents of memory-mapped as read-only and
> subsequently frozen BPF map, remove VM_MAYWRITE flag completely on initially
> read-only mapping.
>
> Alternatively, we could treat any memory-mapping on unfrozen map as writable
> and bump writecnt instead. But there is little legitimate reason to map
> BPF map as read-only and then re-mmap() it as writable through mprotect(),
> instead of just mmap()'ing it as read/write from the very beginning.
>
> Fixes: fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY")
> Reported-by: Jann Horn <jannh@google.com>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
>  kernel/bpf/syscall.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 64783da34202..f7f6db50a085 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -635,6 +635,10 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
>         /* set default open/close callbacks */
>         vma->vm_ops = &bpf_map_default_vmops;
>         vma->vm_private_data = map;
> +       vma->vm_flags &= ~VM_MAYEXEC;
> +       if (!(vma->vm_flags & VM_WRITE))
> +               /* disallow re-mapping with PROT_WRITE */
> +               vma->vm_flags &= ~VM_MAYWRITE;

The .open and .close handlers for the VMA are also wrong:

/* called for any extra memory-mapped regions (except initial) */
static void bpf_map_mmap_open(struct vm_area_struct *vma)
{
        struct bpf_map *map = vma->vm_file->private_data;

        bpf_map_inc_with_uref(map);

        if (vma->vm_flags & VM_WRITE) {
                mutex_lock(&map->freeze_mutex);
                map->writecnt++;
                mutex_unlock(&map->freeze_mutex);
        }
}

/* called for all unmapped memory region (including initial) */
static void bpf_map_mmap_close(struct vm_area_struct *vma)
{
        struct bpf_map *map = vma->vm_file->private_data;

        if (vma->vm_flags & VM_WRITE) {
                mutex_lock(&map->freeze_mutex);
                map->writecnt--;
                mutex_unlock(&map->freeze_mutex);
        }

        bpf_map_put_with_uref(map);
}

static const struct vm_operations_struct bpf_map_default_vmops = {
        .open           = bpf_map_mmap_open,
        .close          = bpf_map_mmap_close,
};

You can use mprotect() to flip VM_WRITE off while a VMA exists, and
then the writecnt won't go down when you close it. Or you could even
get the writecnt to go negative if you map as writable, then
mprotect() to readonly, then split the VMA a few times, mprotect the
split VMAs to writable, and then unmap them.

I think you'll want to also check for MAYWRITE here.

Also, the bpf_map_inc_with_uref/bpf_map_put_with_uref here look
superfluous - the VMA holds a reference to the file, and the file
holds a reference to the map.
Andrii Nakryiko April 10, 2020, 7 p.m. UTC | #2
On Fri, Apr 10, 2020 at 1:51 AM Jann Horn <jannh@google.com> wrote:
>
> On Fri, Apr 10, 2020 at 2:04 AM Andrii Nakryiko <andriin@fb.com> wrote:
> > VM_MAYWRITE flag during initial memory mapping determines if already mmap()'ed
> > pages can be later remapped as writable ones through mprotect() call. To
> > prevent user application to rewrite contents of memory-mapped as read-only and
> > subsequently frozen BPF map, remove VM_MAYWRITE flag completely on initially
> > read-only mapping.
> >
> > Alternatively, we could treat any memory-mapping on unfrozen map as writable
> > and bump writecnt instead. But there is little legitimate reason to map
> > BPF map as read-only and then re-mmap() it as writable through mprotect(),
> > instead of just mmap()'ing it as read/write from the very beginning.
> >
> > Fixes: fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY")
> > Reported-by: Jann Horn <jannh@google.com>
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > ---
> >  kernel/bpf/syscall.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 64783da34202..f7f6db50a085 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -635,6 +635,10 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
> >         /* set default open/close callbacks */
> >         vma->vm_ops = &bpf_map_default_vmops;
> >         vma->vm_private_data = map;
> > +       vma->vm_flags &= ~VM_MAYEXEC;
> > +       if (!(vma->vm_flags & VM_WRITE))
> > +               /* disallow re-mapping with PROT_WRITE */
> > +               vma->vm_flags &= ~VM_MAYWRITE;
>
> The .open and .close handlers for the VMA are also wrong:

Yes, it has to check VM_MAYWRITE now, my bad, thanks for catching
this! Extended selftest to validate that scenario as well.

>
> /* called for any extra memory-mapped regions (except initial) */
> static void bpf_map_mmap_open(struct vm_area_struct *vma)
> {
>         struct bpf_map *map = vma->vm_file->private_data;
>
>         bpf_map_inc_with_uref(map);
>
>         if (vma->vm_flags & VM_WRITE) {
>                 mutex_lock(&map->freeze_mutex);
>                 map->writecnt++;
>                 mutex_unlock(&map->freeze_mutex);
>         }
> }
>
> /* called for all unmapped memory region (including initial) */
> static void bpf_map_mmap_close(struct vm_area_struct *vma)
> {
>         struct bpf_map *map = vma->vm_file->private_data;
>
>         if (vma->vm_flags & VM_WRITE) {
>                 mutex_lock(&map->freeze_mutex);
>                 map->writecnt--;
>                 mutex_unlock(&map->freeze_mutex);
>         }
>
>         bpf_map_put_with_uref(map);
> }
>
> static const struct vm_operations_struct bpf_map_default_vmops = {
>         .open           = bpf_map_mmap_open,
>         .close          = bpf_map_mmap_close,
> };
>
> You can use mprotect() to flip VM_WRITE off while a VMA exists, and
> then the writecnt won't go down when you close it. Or you could even
> get the writecnt to go negative if you map as writable, then
> mprotect() to readonly, then split the VMA a few times, mprotect the
> split VMAs to writable, and then unmap them.
>
> I think you'll want to also check for MAYWRITE here.
>
> Also, the bpf_map_inc_with_uref/bpf_map_put_with_uref here look
> superfluous - the VMA holds a reference to the file, and the file
> holds a reference to the map.

Hm.. So the file from which memory-mapping was created originally will
stay referenced by VMA subsystem until the last VMA segment is
unmmaped (and bpf_map_mmap_close is called for it), even if file
itself is closed from user-space? It makes sense, though I didn't
realize it at the time I was implementing this. I'll drop refcounting
then.
diff mbox series

Patch

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 64783da34202..f7f6db50a085 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -635,6 +635,10 @@  static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
 	/* set default open/close callbacks */
 	vma->vm_ops = &bpf_map_default_vmops;
 	vma->vm_private_data = map;
+	vma->vm_flags &= ~VM_MAYEXEC;
+	if (!(vma->vm_flags & VM_WRITE))
+		/* disallow re-mapping with PROT_WRITE */
+		vma->vm_flags &= ~VM_MAYWRITE;
 
 	err = map->ops->map_mmap(map, vma);
 	if (err)