diff mbox series

[bpf] bpf: use get_file_rcu() instead of get_file() for task_file iterator

Message ID 20200817174214.252601-1-yhs@fb.com
State Accepted
Delegated to: BPF Maintainers
Headers show
Series [bpf] bpf: use get_file_rcu() instead of get_file() for task_file iterator | expand

Commit Message

Yonghong Song Aug. 17, 2020, 5:42 p.m. UTC
With latest `bpftool prog` command, we observed the following kernel
panic.
    BUG: kernel NULL pointer dereference, address: 0000000000000000
    #PF: supervisor instruction fetch in kernel mode
    #PF: error_code(0x0010) - not-present page
    PGD dfe894067 P4D dfe894067 PUD deb663067 PMD 0
    Oops: 0010 [#1] SMP
    CPU: 9 PID: 6023 ...
    RIP: 0010:0x0
    Code: Bad RIP value.
    RSP: 0000:ffffc900002b8f18 EFLAGS: 00010286
    RAX: ffff8883a405f400 RBX: ffff888e46a6bf00 RCX: 000000008020000c
    RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883a405f400
    RBP: ffff888e46a6bf50 R08: 0000000000000000 R09: ffffffff81129600
    R10: ffff8883a405f300 R11: 0000160000000000 R12: 0000000000002710
    R13: 000000e9494b690c R14: 0000000000000202 R15: 0000000000000009
    FS:  00007fd9187fe700(0000) GS:ffff888e46a40000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffffffffffffd6 CR3: 0000000de5d33002 CR4: 0000000000360ee0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     <IRQ>
     rcu_core+0x1a4/0x440
     __do_softirq+0xd3/0x2c8
     irq_exit+0x9d/0xa0
     smp_apic_timer_interrupt+0x68/0x120
     apic_timer_interrupt+0xf/0x20
     </IRQ>
    RIP: 0033:0x47ce80
    Code: Bad RIP value.
    RSP: 002b:00007fd9187fba40 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
    RAX: 0000000000000002 RBX: 00007fd931789160 RCX: 000000000000010c
    RDX: 00007fd9308cdfb4 RSI: 00007fd9308cdfb4 RDI: 00007ffedd1ea0a8
    RBP: 00007fd9187fbab0 R08: 000000000000000e R09: 000000000000002a
    R10: 0000000000480210 R11: 00007fd9187fc570 R12: 00007fd9316cc400
    R13: 0000000000000118 R14: 00007fd9308cdfb4 R15: 00007fd9317a9380

After further analysis, the bug is triggered by
Commit eaaacd23910f ("bpf: Add task and task/file iterator targets")
which introduced task_file bpf iterator, which traverses all open file
descriptors for all tasks in the current namespace.
The latest `bpftool prog` calls a task_file bpf program to traverse
all files in the system in order to associate processes with progs/maps, etc.
When traversing files for a given task, rcu read_lock is taken to
access all files in a file_struct. But it used get_file() to grab
a file, which is not right. It is possible file->f_count is 0 and
get_file() will unconditionally increase it.
Later put_file() may cause all kind of issues with the above
as one of sympotoms.

The failure can be reproduced with the following steps in a few seconds:
    $ cat t.c
    #include <stdio.h>
    #include <sys/types.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    #include <unistd.h>

    #define N 10000
    int fd[N];
    int main() {
      int i;

      for (i = 0; i < N; i++) {
        fd[i] = open("./note.txt", 'r');
        if (fd[i] < 0) {
           fprintf(stderr, "failed\n");
           return -1;
        }
      }
      for (i = 0; i < N; i++)
        close(fd[i]);

      return 0;
    }
    $ gcc -O2 t.c
    $ cat run.sh
    #/bin/bash
    for i in {1..100}
    do
      while true; do ./a.out; done &
    done
    $ ./run.sh
    $ while true; do bpftool prog >& /dev/null; done

This patch used get_file_rcu() which only grabs a file if the
file->f_count is not zero. This is to ensure the file pointer
is always valid. The above reproducer did not fail for more
than 30 minutes.

Fixes: eaaacd23910f ("bpf: Add task and task/file iterator targets")
Suggested-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/task_iter.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Josef Bacik Aug. 17, 2020, 6:27 p.m. UTC | #1
On 8/17/20 1:42 PM, Yonghong Song wrote:
> With latest `bpftool prog` command, we observed the following kernel
> panic.
>      BUG: kernel NULL pointer dereference, address: 0000000000000000
>      #PF: supervisor instruction fetch in kernel mode
>      #PF: error_code(0x0010) - not-present page
>      PGD dfe894067 P4D dfe894067 PUD deb663067 PMD 0
>      Oops: 0010 [#1] SMP
>      CPU: 9 PID: 6023 ...
>      RIP: 0010:0x0
>      Code: Bad RIP value.
>      RSP: 0000:ffffc900002b8f18 EFLAGS: 00010286
>      RAX: ffff8883a405f400 RBX: ffff888e46a6bf00 RCX: 000000008020000c
>      RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883a405f400
>      RBP: ffff888e46a6bf50 R08: 0000000000000000 R09: ffffffff81129600
>      R10: ffff8883a405f300 R11: 0000160000000000 R12: 0000000000002710
>      R13: 000000e9494b690c R14: 0000000000000202 R15: 0000000000000009
>      FS:  00007fd9187fe700(0000) GS:ffff888e46a40000(0000) knlGS:0000000000000000
>      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>      CR2: ffffffffffffffd6 CR3: 0000000de5d33002 CR4: 0000000000360ee0
>      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>      Call Trace:
>       <IRQ>
>       rcu_core+0x1a4/0x440
>       __do_softirq+0xd3/0x2c8
>       irq_exit+0x9d/0xa0
>       smp_apic_timer_interrupt+0x68/0x120
>       apic_timer_interrupt+0xf/0x20
>       </IRQ>
>      RIP: 0033:0x47ce80
>      Code: Bad RIP value.
>      RSP: 002b:00007fd9187fba40 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
>      RAX: 0000000000000002 RBX: 00007fd931789160 RCX: 000000000000010c
>      RDX: 00007fd9308cdfb4 RSI: 00007fd9308cdfb4 RDI: 00007ffedd1ea0a8
>      RBP: 00007fd9187fbab0 R08: 000000000000000e R09: 000000000000002a
>      R10: 0000000000480210 R11: 00007fd9187fc570 R12: 00007fd9316cc400
>      R13: 0000000000000118 R14: 00007fd9308cdfb4 R15: 00007fd9317a9380
> 
> After further analysis, the bug is triggered by
> Commit eaaacd23910f ("bpf: Add task and task/file iterator targets")
> which introduced task_file bpf iterator, which traverses all open file
> descriptors for all tasks in the current namespace.
> The latest `bpftool prog` calls a task_file bpf program to traverse
> all files in the system in order to associate processes with progs/maps, etc.
> When traversing files for a given task, rcu read_lock is taken to
> access all files in a file_struct. But it used get_file() to grab
> a file, which is not right. It is possible file->f_count is 0 and
> get_file() will unconditionally increase it.
> Later put_file() may cause all kind of issues with the above
> as one of sympotoms.
> 
> The failure can be reproduced with the following steps in a few seconds:
>      $ cat t.c
>      #include <stdio.h>
>      #include <sys/types.h>
>      #include <sys/stat.h>
>      #include <fcntl.h>
>      #include <unistd.h>
> 
>      #define N 10000
>      int fd[N];
>      int main() {
>        int i;
> 
>        for (i = 0; i < N; i++) {
>          fd[i] = open("./note.txt", 'r');
>          if (fd[i] < 0) {
>             fprintf(stderr, "failed\n");
>             return -1;
>          }
>        }
>        for (i = 0; i < N; i++)
>          close(fd[i]);
> 
>        return 0;
>      }
>      $ gcc -O2 t.c
>      $ cat run.sh
>      #/bin/bash
>      for i in {1..100}
>      do
>        while true; do ./a.out; done &
>      done
>      $ ./run.sh
>      $ while true; do bpftool prog >& /dev/null; done
> 
> This patch used get_file_rcu() which only grabs a file if the
> file->f_count is not zero. This is to ensure the file pointer
> is always valid. The above reproducer did not fail for more
> than 30 minutes.
> 
> Fixes: eaaacd23910f ("bpf: Add task and task/file iterator targets")
> Suggested-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Also verified that the problem no longer happens in production.  Thanks,

Josef
Alexei Starovoitov Aug. 17, 2020, 9:49 p.m. UTC | #2
On Mon, Aug 17, 2020 at 11:27 AM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On 8/17/20 1:42 PM, Yonghong Song wrote:
> > With latest `bpftool prog` command, we observed the following kernel
> > panic.
> >      BUG: kernel NULL pointer dereference, address: 0000000000000000
> >
> > This patch used get_file_rcu() which only grabs a file if the
> > file->f_count is not zero. This is to ensure the file pointer
> > is always valid. The above reproducer did not fail for more
> > than 30 minutes.
> >
> > Fixes: eaaacd23910f ("bpf: Add task and task/file iterator targets")
> > Suggested-by: Josef Bacik <josef@toxicpanda.com>
> > Signed-off-by: Yonghong Song <yhs@fb.com>
> > ---
>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Applied. Thanks
diff mbox series

Patch

diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 232df29793e9..f21b5e1e4540 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -178,10 +178,11 @@  task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info,
 		f = fcheck_files(curr_files, curr_fd);
 		if (!f)
 			continue;
+		if (!get_file_rcu(f))
+			continue;
 
 		/* set info->fd */
 		info->fd = curr_fd;
-		get_file(f);
 		rcu_read_unlock();
 		return f;
 	}