diff mbox series

[bpf-next,1/4] samples: bpf: fix bpf programs with kprobe/sys_connect event

Message ID 20200702021646.90347-2-danieltimlee@gmail.com
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series samples: bpf: refactor BPF map test with libbpf | expand

Commit Message

Daniel T. Lee July 2, 2020, 2:16 a.m. UTC
Currently, BPF programs with kprobe/sys_connect does not work properly.

Commit 34745aed515c ("samples/bpf: fix kprobe attachment issue on x64")
This commit modifies the bpf_load behavior of kprobe events in the x64
architecture. If the current kprobe event target starts with "sys_*",
add the prefix "__x64_" to the front of the event.

Appending "__x64_" prefix with kprobe/sys_* event was appropriate as a
solution to most of the problems caused by the commit below.

    commit d5a00528b58c ("syscalls/core, syscalls/x86: Rename struct
    pt_regs-based sys_*() to __x64_sys_*()")

However, there is a problem with the sys_connect kprobe event that does
not work properly. For __sys_connect event, parameters can be fetched
normally, but for __x64_sys_connect, parameters cannot be fetched.

Because of this problem, this commit fixes the sys_connect event by
specifying the __sys_connect directly and this will bypass the
"__x64_" appending rule of bpf_load.

Fixes: 34745aed515c ("samples/bpf: fix kprobe attachment issue on x64")
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
---
 samples/bpf/map_perf_test_kern.c         | 2 +-
 samples/bpf/test_map_in_map_kern.c       | 2 +-
 samples/bpf/test_probe_write_user_kern.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

Comments

Yonghong Song July 2, 2020, 5:12 a.m. UTC | #1
On 7/1/20 7:16 PM, Daniel T. Lee wrote:
> Currently, BPF programs with kprobe/sys_connect does not work properly.
> 
> Commit 34745aed515c ("samples/bpf: fix kprobe attachment issue on x64")
> This commit modifies the bpf_load behavior of kprobe events in the x64
> architecture. If the current kprobe event target starts with "sys_*",
> add the prefix "__x64_" to the front of the event.
> 
> Appending "__x64_" prefix with kprobe/sys_* event was appropriate as a
> solution to most of the problems caused by the commit below.
> 
>      commit d5a00528b58c ("syscalls/core, syscalls/x86: Rename struct
>      pt_regs-based sys_*() to __x64_sys_*()")
> 
> However, there is a problem with the sys_connect kprobe event that does
> not work properly. For __sys_connect event, parameters can be fetched
> normally, but for __x64_sys_connect, parameters cannot be fetched.
> 
> Because of this problem, this commit fixes the sys_connect event by
> specifying the __sys_connect directly and this will bypass the
> "__x64_" appending rule of bpf_load.

In the kernel code, we have

SYSCALL_DEFINE3(connect, int, fd, struct sockaddr __user *, uservaddr,
                 int, addrlen)
{
         return __sys_connect(fd, uservaddr, addrlen);
}

Depending on compiler, there is no guarantee that __sys_connect will
not be inlined. I would prefer to still use the entry point
__x64_sys_* e.g.,
    SEC("kprobe/" SYSCALL(sys_write))

> 
> Fixes: 34745aed515c ("samples/bpf: fix kprobe attachment issue on x64")
> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> ---
>   samples/bpf/map_perf_test_kern.c         | 2 +-
>   samples/bpf/test_map_in_map_kern.c       | 2 +-
>   samples/bpf/test_probe_write_user_kern.c | 2 +-
>   3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/samples/bpf/map_perf_test_kern.c b/samples/bpf/map_perf_test_kern.c
> index 12e91ae64d4d..cebe2098bb24 100644
> --- a/samples/bpf/map_perf_test_kern.c
> +++ b/samples/bpf/map_perf_test_kern.c
> @@ -154,7 +154,7 @@ int stress_percpu_hmap_alloc(struct pt_regs *ctx)
>   	return 0;
>   }
>   
> -SEC("kprobe/sys_connect")
> +SEC("kprobe/__sys_connect")
>   int stress_lru_hmap_alloc(struct pt_regs *ctx)
>   {
>   	char fmt[] = "Failed at stress_lru_hmap_alloc. ret:%dn";
> diff --git a/samples/bpf/test_map_in_map_kern.c b/samples/bpf/test_map_in_map_kern.c
> index 6cee61e8ce9b..b1562ba2f025 100644
> --- a/samples/bpf/test_map_in_map_kern.c
> +++ b/samples/bpf/test_map_in_map_kern.c
> @@ -102,7 +102,7 @@ static __always_inline int do_inline_hash_lookup(void *inner_map, u32 port)
>   	return result ? *result : -ENOENT;
>   }
>   
> -SEC("kprobe/sys_connect")
> +SEC("kprobe/__sys_connect")
>   int trace_sys_connect(struct pt_regs *ctx)
>   {
>   	struct sockaddr_in6 *in6;
> diff --git a/samples/bpf/test_probe_write_user_kern.c b/samples/bpf/test_probe_write_user_kern.c
> index 6579639a83b2..9b3c3918c37d 100644
> --- a/samples/bpf/test_probe_write_user_kern.c
> +++ b/samples/bpf/test_probe_write_user_kern.c
> @@ -26,7 +26,7 @@ struct {
>    * This example sits on a syscall, and the syscall ABI is relatively stable
>    * of course, across platforms, and over time, the ABI may change.
>    */
> -SEC("kprobe/sys_connect")
> +SEC("kprobe/__sys_connect")
>   int bpf_prog1(struct pt_regs *ctx)
>   {
>   	struct sockaddr_in new_addr, orig_addr = {};
>
Daniel T. Lee July 2, 2020, 11:13 a.m. UTC | #2
On Thu, Jul 2, 2020 at 2:13 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/1/20 7:16 PM, Daniel T. Lee wrote:
> > Currently, BPF programs with kprobe/sys_connect does not work properly.
> >
> > Commit 34745aed515c ("samples/bpf: fix kprobe attachment issue on x64")
> > This commit modifies the bpf_load behavior of kprobe events in the x64
> > architecture. If the current kprobe event target starts with "sys_*",
> > add the prefix "__x64_" to the front of the event.
> >
> > Appending "__x64_" prefix with kprobe/sys_* event was appropriate as a
> > solution to most of the problems caused by the commit below.
> >
> >      commit d5a00528b58c ("syscalls/core, syscalls/x86: Rename struct
> >      pt_regs-based sys_*() to __x64_sys_*()")
> >
> > However, there is a problem with the sys_connect kprobe event that does
> > not work properly. For __sys_connect event, parameters can be fetched
> > normally, but for __x64_sys_connect, parameters cannot be fetched.
> >
> > Because of this problem, this commit fixes the sys_connect event by
> > specifying the __sys_connect directly and this will bypass the
> > "__x64_" appending rule of bpf_load.
>
> In the kernel code, we have
>
> SYSCALL_DEFINE3(connect, int, fd, struct sockaddr __user *, uservaddr,
>                  int, addrlen)
> {
>          return __sys_connect(fd, uservaddr, addrlen);
> }
>
> Depending on compiler, there is no guarantee that __sys_connect will
> not be inlined. I would prefer to still use the entry point
> __x64_sys_* e.g.,
>     SEC("kprobe/" SYSCALL(sys_write))
>

As you mentioned, there is clearly a possibility that problems may arise
because the symbol does not exist according to the compiler.

However, in x64, when using Kprobe for __x64_sys_connect event, the
tests are not working properly because the parameters cannot be fetched,
and the test under selftests/bpf is using "kprobe/_sys_connect" directly.

I'm not sure how to deal with this problem. Any advice and suggestions
will be greatly appreciated.

Thanks for your time and effort for the review.
Daniel

> >
> > Fixes: 34745aed515c ("samples/bpf: fix kprobe attachment issue on x64")
> > Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> > ---
> >   samples/bpf/map_perf_test_kern.c         | 2 +-
> >   samples/bpf/test_map_in_map_kern.c       | 2 +-
> >   samples/bpf/test_probe_write_user_kern.c | 2 +-
> >   3 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/samples/bpf/map_perf_test_kern.c b/samples/bpf/map_perf_test_kern.c
> > index 12e91ae64d4d..cebe2098bb24 100644
> > --- a/samples/bpf/map_perf_test_kern.c
> > +++ b/samples/bpf/map_perf_test_kern.c
> > @@ -154,7 +154,7 @@ int stress_percpu_hmap_alloc(struct pt_regs *ctx)
> >       return 0;
> >   }
> >
> > -SEC("kprobe/sys_connect")
> > +SEC("kprobe/__sys_connect")
> >   int stress_lru_hmap_alloc(struct pt_regs *ctx)
> >   {
> >       char fmt[] = "Failed at stress_lru_hmap_alloc. ret:%dn";
> > diff --git a/samples/bpf/test_map_in_map_kern.c b/samples/bpf/test_map_in_map_kern.c
> > index 6cee61e8ce9b..b1562ba2f025 100644
> > --- a/samples/bpf/test_map_in_map_kern.c
> > +++ b/samples/bpf/test_map_in_map_kern.c
> > @@ -102,7 +102,7 @@ static __always_inline int do_inline_hash_lookup(void *inner_map, u32 port)
> >       return result ? *result : -ENOENT;
> >   }
> >
> > -SEC("kprobe/sys_connect")
> > +SEC("kprobe/__sys_connect")
> >   int trace_sys_connect(struct pt_regs *ctx)
> >   {
> >       struct sockaddr_in6 *in6;
> > diff --git a/samples/bpf/test_probe_write_user_kern.c b/samples/bpf/test_probe_write_user_kern.c
> > index 6579639a83b2..9b3c3918c37d 100644
> > --- a/samples/bpf/test_probe_write_user_kern.c
> > +++ b/samples/bpf/test_probe_write_user_kern.c
> > @@ -26,7 +26,7 @@ struct {
> >    * This example sits on a syscall, and the syscall ABI is relatively stable
> >    * of course, across platforms, and over time, the ABI may change.
> >    */
> > -SEC("kprobe/sys_connect")
> > +SEC("kprobe/__sys_connect")
> >   int bpf_prog1(struct pt_regs *ctx)
> >   {
> >       struct sockaddr_in new_addr, orig_addr = {};
> >
Yonghong Song July 2, 2020, 4:04 p.m. UTC | #3
On 7/2/20 4:13 AM, Daniel T. Lee wrote:
> On Thu, Jul 2, 2020 at 2:13 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 7/1/20 7:16 PM, Daniel T. Lee wrote:
>>> Currently, BPF programs with kprobe/sys_connect does not work properly.
>>>
>>> Commit 34745aed515c ("samples/bpf: fix kprobe attachment issue on x64")
>>> This commit modifies the bpf_load behavior of kprobe events in the x64
>>> architecture. If the current kprobe event target starts with "sys_*",
>>> add the prefix "__x64_" to the front of the event.
>>>
>>> Appending "__x64_" prefix with kprobe/sys_* event was appropriate as a
>>> solution to most of the problems caused by the commit below.
>>>
>>>       commit d5a00528b58c ("syscalls/core, syscalls/x86: Rename struct
>>>       pt_regs-based sys_*() to __x64_sys_*()")
>>>
>>> However, there is a problem with the sys_connect kprobe event that does
>>> not work properly. For __sys_connect event, parameters can be fetched
>>> normally, but for __x64_sys_connect, parameters cannot be fetched.
>>>
>>> Because of this problem, this commit fixes the sys_connect event by
>>> specifying the __sys_connect directly and this will bypass the
>>> "__x64_" appending rule of bpf_load.
>>
>> In the kernel code, we have
>>
>> SYSCALL_DEFINE3(connect, int, fd, struct sockaddr __user *, uservaddr,
>>                   int, addrlen)
>> {
>>           return __sys_connect(fd, uservaddr, addrlen);
>> }
>>
>> Depending on compiler, there is no guarantee that __sys_connect will
>> not be inlined. I would prefer to still use the entry point
>> __x64_sys_* e.g.,
>>      SEC("kprobe/" SYSCALL(sys_write))
>>
> 
> As you mentioned, there is clearly a possibility that problems may arise
> because the symbol does not exist according to the compiler.
> 
> However, in x64, when using Kprobe for __x64_sys_connect event, the
> tests are not working properly because the parameters cannot be fetched,
> and the test under selftests/bpf is using "kprobe/_sys_connect" directly.

This is the assembly code for __x64_sys_connect.

ffffffff818d3520 <__x64_sys_connect>:
ffffffff818d3520: e8 fb df 32 00        callq   0xffffffff81c01520 
<__fentry__>
ffffffff818d3525: 48 8b 57 60           movq    96(%rdi), %rdx
ffffffff818d3529: 48 8b 77 68           movq    104(%rdi), %rsi
ffffffff818d352d: 48 8b 7f 70           movq    112(%rdi), %rdi
ffffffff818d3531: e8 1a ff ff ff        callq   0xffffffff818d3450 
<__sys_connect>
ffffffff818d3536: 48 98                 cltq
ffffffff818d3538: c3                    retq
ffffffff818d3539: 0f 1f 80 00 00 00 00  nopl    (%rax)

In bpf program, the step is:
       struct pt_regs *real_regs = PT_REGS_PARM1(pt_regs);
       param1 = PT_REGS_PARM1(real_regs);
       param2 = PT_REGS_PARM2(real_regs);
       param3 = PT_REGS_PARM3(real_regs);
The same for s390.

For other architectures, no above indirection is needed.

I guess you can abstract the above into trace_common.h?

> 
> I'm not sure how to deal with this problem. Any advice and suggestions
> will be greatly appreciated.
> 
> Thanks for your time and effort for the review.
> Daniel
> 
>>>
>>> Fixes: 34745aed515c ("samples/bpf: fix kprobe attachment issue on x64")
>>> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
>>> ---
>>>    samples/bpf/map_perf_test_kern.c         | 2 +-
>>>    samples/bpf/test_map_in_map_kern.c       | 2 +-
>>>    samples/bpf/test_probe_write_user_kern.c | 2 +-
>>>    3 files changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/samples/bpf/map_perf_test_kern.c b/samples/bpf/map_perf_test_kern.c
>>> index 12e91ae64d4d..cebe2098bb24 100644
>>> --- a/samples/bpf/map_perf_test_kern.c
>>> +++ b/samples/bpf/map_perf_test_kern.c
>>> @@ -154,7 +154,7 @@ int stress_percpu_hmap_alloc(struct pt_regs *ctx)
>>>        return 0;
>>>    }
>>>
>>> -SEC("kprobe/sys_connect")
>>> +SEC("kprobe/__sys_connect")
>>>    int stress_lru_hmap_alloc(struct pt_regs *ctx)
>>>    {
>>>        char fmt[] = "Failed at stress_lru_hmap_alloc. ret:%dn";
>>> diff --git a/samples/bpf/test_map_in_map_kern.c b/samples/bpf/test_map_in_map_kern.c
>>> index 6cee61e8ce9b..b1562ba2f025 100644
>>> --- a/samples/bpf/test_map_in_map_kern.c
>>> +++ b/samples/bpf/test_map_in_map_kern.c
>>> @@ -102,7 +102,7 @@ static __always_inline int do_inline_hash_lookup(void *inner_map, u32 port)
>>>        return result ? *result : -ENOENT;
>>>    }
>>>
>>> -SEC("kprobe/sys_connect")
>>> +SEC("kprobe/__sys_connect")
>>>    int trace_sys_connect(struct pt_regs *ctx)
>>>    {
>>>        struct sockaddr_in6 *in6;
>>> diff --git a/samples/bpf/test_probe_write_user_kern.c b/samples/bpf/test_probe_write_user_kern.c
>>> index 6579639a83b2..9b3c3918c37d 100644
>>> --- a/samples/bpf/test_probe_write_user_kern.c
>>> +++ b/samples/bpf/test_probe_write_user_kern.c
>>> @@ -26,7 +26,7 @@ struct {
>>>     * This example sits on a syscall, and the syscall ABI is relatively stable
>>>     * of course, across platforms, and over time, the ABI may change.
>>>     */
>>> -SEC("kprobe/sys_connect")
>>> +SEC("kprobe/__sys_connect")
>>>    int bpf_prog1(struct pt_regs *ctx)
>>>    {
>>>        struct sockaddr_in new_addr, orig_addr = {};
>>>
Daniel T. Lee July 6, 2020, 10:26 a.m. UTC | #4
On Fri, Jul 3, 2020 at 1:04 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/2/20 4:13 AM, Daniel T. Lee wrote:
> > On Thu, Jul 2, 2020 at 2:13 PM Yonghong Song <yhs@fb.com> wrote:
> >>
> >>
> >>
> >> On 7/1/20 7:16 PM, Daniel T. Lee wrote:
> >>> Currently, BPF programs with kprobe/sys_connect does not work properly.
> >>>
> >>> Commit 34745aed515c ("samples/bpf: fix kprobe attachment issue on x64")
> >>> This commit modifies the bpf_load behavior of kprobe events in the x64
> >>> architecture. If the current kprobe event target starts with "sys_*",
> >>> add the prefix "__x64_" to the front of the event.
> >>>
> >>> Appending "__x64_" prefix with kprobe/sys_* event was appropriate as a
> >>> solution to most of the problems caused by the commit below.
> >>>
> >>>       commit d5a00528b58c ("syscalls/core, syscalls/x86: Rename struct
> >>>       pt_regs-based sys_*() to __x64_sys_*()")
> >>>
> >>> However, there is a problem with the sys_connect kprobe event that does
> >>> not work properly. For __sys_connect event, parameters can be fetched
> >>> normally, but for __x64_sys_connect, parameters cannot be fetched.
> >>>
> >>> Because of this problem, this commit fixes the sys_connect event by
> >>> specifying the __sys_connect directly and this will bypass the
> >>> "__x64_" appending rule of bpf_load.
> >>
> >> In the kernel code, we have
> >>
> >> SYSCALL_DEFINE3(connect, int, fd, struct sockaddr __user *, uservaddr,
> >>                   int, addrlen)
> >> {
> >>           return __sys_connect(fd, uservaddr, addrlen);
> >> }
> >>
> >> Depending on compiler, there is no guarantee that __sys_connect will
> >> not be inlined. I would prefer to still use the entry point
> >> __x64_sys_* e.g.,
> >>      SEC("kprobe/" SYSCALL(sys_write))
> >>
> >
> > As you mentioned, there is clearly a possibility that problems may arise
> > because the symbol does not exist according to the compiler.
> >
> > However, in x64, when using Kprobe for __x64_sys_connect event, the
> > tests are not working properly because the parameters cannot be fetched,
> > and the test under selftests/bpf is using "kprobe/_sys_connect" directly.
>
> This is the assembly code for __x64_sys_connect.
>
> ffffffff818d3520 <__x64_sys_connect>:
> ffffffff818d3520: e8 fb df 32 00        callq   0xffffffff81c01520
> <__fentry__>
> ffffffff818d3525: 48 8b 57 60           movq    96(%rdi), %rdx
> ffffffff818d3529: 48 8b 77 68           movq    104(%rdi), %rsi
> ffffffff818d352d: 48 8b 7f 70           movq    112(%rdi), %rdi
> ffffffff818d3531: e8 1a ff ff ff        callq   0xffffffff818d3450
> <__sys_connect>
> ffffffff818d3536: 48 98                 cltq
> ffffffff818d3538: c3                    retq
> ffffffff818d3539: 0f 1f 80 00 00 00 00  nopl    (%rax)
>
> In bpf program, the step is:
>        struct pt_regs *real_regs = PT_REGS_PARM1(pt_regs);
>        param1 = PT_REGS_PARM1(real_regs);
>        param2 = PT_REGS_PARM2(real_regs);
>        param3 = PT_REGS_PARM3(real_regs);
> The same for s390.
>

I'm sorry that I seem to get it wrong,
But is it available to access 'struct pt_regs *' recursively?

It seems nested use of PT_REGS_PARM causes invalid memory access.

    $ sudo ./test_probe_write_user
    libbpf: load bpf program failed: Permission denied
    libbpf: -- BEGIN DUMP LOG ---
    libbpf:
    Unrecognized arg#0 type PTR
    ; struct pt_regs *real_regs = PT_REGS_PARM1(ctx);
    0: (79) r1 = *(u64 *)(r1 +112)
    ; void *sockaddr_arg = (void *)PT_REGS_PARM2(real_regs);
    1: (79) r6 = *(u64 *)(r1 +104)
    R1 invalid mem access 'inv'
    processed 2 insns (limit 1000000) max_states_per_insn 0
total_states 0 peak_states 0 mark_read 0

    libbpf: -- END LOG --
    libbpf: failed to load program 'kprobe/__x64_sys_connect'
    libbpf: failed to load object './test_probe_write_user_kern.o'
    ERROR: loading BPF object file failed

I'm not fully aware of the BPF verifier's internal structure.
Is there any workaround to solve this problem?

Thanks for your time and effort for the review.
Daniel.

>
> For other architectures, no above indirection is needed.
>
> I guess you can abstract the above into trace_common.h?
>
> >
> > I'm not sure how to deal with this problem. Any advice and suggestions
> > will be greatly appreciated.
> >
> > Thanks for your time and effort for the review.
> > Daniel
> >
> >>>
> >>> Fixes: 34745aed515c ("samples/bpf: fix kprobe attachment issue on x64")
> >>> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> >>> ---
> >>>    samples/bpf/map_perf_test_kern.c         | 2 +-
> >>>    samples/bpf/test_map_in_map_kern.c       | 2 +-
> >>>    samples/bpf/test_probe_write_user_kern.c | 2 +-
> >>>    3 files changed, 3 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/samples/bpf/map_perf_test_kern.c b/samples/bpf/map_perf_test_kern.c
> >>> index 12e91ae64d4d..cebe2098bb24 100644
> >>> --- a/samples/bpf/map_perf_test_kern.c
> >>> +++ b/samples/bpf/map_perf_test_kern.c
> >>> @@ -154,7 +154,7 @@ int stress_percpu_hmap_alloc(struct pt_regs *ctx)
> >>>        return 0;
> >>>    }
> >>>
> >>> -SEC("kprobe/sys_connect")
> >>> +SEC("kprobe/__sys_connect")
> >>>    int stress_lru_hmap_alloc(struct pt_regs *ctx)
> >>>    {
> >>>        char fmt[] = "Failed at stress_lru_hmap_alloc. ret:%dn";
> >>> diff --git a/samples/bpf/test_map_in_map_kern.c b/samples/bpf/test_map_in_map_kern.c
> >>> index 6cee61e8ce9b..b1562ba2f025 100644
> >>> --- a/samples/bpf/test_map_in_map_kern.c
> >>> +++ b/samples/bpf/test_map_in_map_kern.c
> >>> @@ -102,7 +102,7 @@ static __always_inline int do_inline_hash_lookup(void *inner_map, u32 port)
> >>>        return result ? *result : -ENOENT;
> >>>    }
> >>>
> >>> -SEC("kprobe/sys_connect")
> >>> +SEC("kprobe/__sys_connect")
> >>>    int trace_sys_connect(struct pt_regs *ctx)
> >>>    {
> >>>        struct sockaddr_in6 *in6;
> >>> diff --git a/samples/bpf/test_probe_write_user_kern.c b/samples/bpf/test_probe_write_user_kern.c
> >>> index 6579639a83b2..9b3c3918c37d 100644
> >>> --- a/samples/bpf/test_probe_write_user_kern.c
> >>> +++ b/samples/bpf/test_probe_write_user_kern.c
> >>> @@ -26,7 +26,7 @@ struct {
> >>>     * This example sits on a syscall, and the syscall ABI is relatively stable
> >>>     * of course, across platforms, and over time, the ABI may change.
> >>>     */
> >>> -SEC("kprobe/sys_connect")
> >>> +SEC("kprobe/__sys_connect")
> >>>    int bpf_prog1(struct pt_regs *ctx)
> >>>    {
> >>>        struct sockaddr_in new_addr, orig_addr = {};
> >>>
Andrii Nakryiko July 6, 2020, 11:49 p.m. UTC | #5
On Mon, Jul 6, 2020 at 3:28 AM Daniel T. Lee <danieltimlee@gmail.com> wrote:
>
> On Fri, Jul 3, 2020 at 1:04 AM Yonghong Song <yhs@fb.com> wrote:
> >
> >
> >
> > On 7/2/20 4:13 AM, Daniel T. Lee wrote:
> > > On Thu, Jul 2, 2020 at 2:13 PM Yonghong Song <yhs@fb.com> wrote:
> > >>
> > >>
> > >>
> > >> On 7/1/20 7:16 PM, Daniel T. Lee wrote:
> > >>> Currently, BPF programs with kprobe/sys_connect does not work properly.
> > >>>
> > >>> Commit 34745aed515c ("samples/bpf: fix kprobe attachment issue on x64")
> > >>> This commit modifies the bpf_load behavior of kprobe events in the x64
> > >>> architecture. If the current kprobe event target starts with "sys_*",
> > >>> add the prefix "__x64_" to the front of the event.
> > >>>
> > >>> Appending "__x64_" prefix with kprobe/sys_* event was appropriate as a
> > >>> solution to most of the problems caused by the commit below.
> > >>>
> > >>>       commit d5a00528b58c ("syscalls/core, syscalls/x86: Rename struct
> > >>>       pt_regs-based sys_*() to __x64_sys_*()")
> > >>>
> > >>> However, there is a problem with the sys_connect kprobe event that does
> > >>> not work properly. For __sys_connect event, parameters can be fetched
> > >>> normally, but for __x64_sys_connect, parameters cannot be fetched.
> > >>>
> > >>> Because of this problem, this commit fixes the sys_connect event by
> > >>> specifying the __sys_connect directly and this will bypass the
> > >>> "__x64_" appending rule of bpf_load.
> > >>
> > >> In the kernel code, we have
> > >>
> > >> SYSCALL_DEFINE3(connect, int, fd, struct sockaddr __user *, uservaddr,
> > >>                   int, addrlen)
> > >> {
> > >>           return __sys_connect(fd, uservaddr, addrlen);
> > >> }
> > >>
> > >> Depending on compiler, there is no guarantee that __sys_connect will
> > >> not be inlined. I would prefer to still use the entry point
> > >> __x64_sys_* e.g.,
> > >>      SEC("kprobe/" SYSCALL(sys_write))
> > >>
> > >
> > > As you mentioned, there is clearly a possibility that problems may arise
> > > because the symbol does not exist according to the compiler.
> > >
> > > However, in x64, when using Kprobe for __x64_sys_connect event, the
> > > tests are not working properly because the parameters cannot be fetched,
> > > and the test under selftests/bpf is using "kprobe/_sys_connect" directly.
> >
> > This is the assembly code for __x64_sys_connect.
> >
> > ffffffff818d3520 <__x64_sys_connect>:
> > ffffffff818d3520: e8 fb df 32 00        callq   0xffffffff81c01520
> > <__fentry__>
> > ffffffff818d3525: 48 8b 57 60           movq    96(%rdi), %rdx
> > ffffffff818d3529: 48 8b 77 68           movq    104(%rdi), %rsi
> > ffffffff818d352d: 48 8b 7f 70           movq    112(%rdi), %rdi
> > ffffffff818d3531: e8 1a ff ff ff        callq   0xffffffff818d3450
> > <__sys_connect>
> > ffffffff818d3536: 48 98                 cltq
> > ffffffff818d3538: c3                    retq
> > ffffffff818d3539: 0f 1f 80 00 00 00 00  nopl    (%rax)
> >
> > In bpf program, the step is:
> >        struct pt_regs *real_regs = PT_REGS_PARM1(pt_regs);
> >        param1 = PT_REGS_PARM1(real_regs);
> >        param2 = PT_REGS_PARM2(real_regs);
> >        param3 = PT_REGS_PARM3(real_regs);
> > The same for s390.
> >
>
> I'm sorry that I seem to get it wrong,
> But is it available to access 'struct pt_regs *' recursively?
>
> It seems nested use of PT_REGS_PARM causes invalid memory access.
>
>     $ sudo ./test_probe_write_user
>     libbpf: load bpf program failed: Permission denied
>     libbpf: -- BEGIN DUMP LOG ---
>     libbpf:
>     Unrecognized arg#0 type PTR
>     ; struct pt_regs *real_regs = PT_REGS_PARM1(ctx);
>     0: (79) r1 = *(u64 *)(r1 +112)
>     ; void *sockaddr_arg = (void *)PT_REGS_PARM2(real_regs);
>     1: (79) r6 = *(u64 *)(r1 +104)
>     R1 invalid mem access 'inv'
>     processed 2 insns (limit 1000000) max_states_per_insn 0
> total_states 0 peak_states 0 mark_read 0
>
>     libbpf: -- END LOG --
>     libbpf: failed to load program 'kprobe/__x64_sys_connect'
>     libbpf: failed to load object './test_probe_write_user_kern.o'
>     ERROR: loading BPF object file failed
>
> I'm not fully aware of the BPF verifier's internal structure.
> Is there any workaround to solve this problem?

You need to use bpf_probe_read_kernel() to get those arguments from
real_args. Or better just use PT_REGS_PARM1_CORE(x) and others, which
does that for you (+ CO-RE relocation).


>
> Thanks for your time and effort for the review.
> Daniel.
>
> >
> > For other architectures, no above indirection is needed.
> >
> > I guess you can abstract the above into trace_common.h?
> >
> > >
> > > I'm not sure how to deal with this problem. Any advice and suggestions
> > > will be greatly appreciated.
> > >
> > > Thanks for your time and effort for the review.
> > > Daniel
> > >
> > >>>
> > >>> Fixes: 34745aed515c ("samples/bpf: fix kprobe attachment issue on x64")
> > >>> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> > >>> ---
> > >>>    samples/bpf/map_perf_test_kern.c         | 2 +-
> > >>>    samples/bpf/test_map_in_map_kern.c       | 2 +-
> > >>>    samples/bpf/test_probe_write_user_kern.c | 2 +-
> > >>>    3 files changed, 3 insertions(+), 3 deletions(-)
> > >>>
> > >>> diff --git a/samples/bpf/map_perf_test_kern.c b/samples/bpf/map_perf_test_kern.c
> > >>> index 12e91ae64d4d..cebe2098bb24 100644
> > >>> --- a/samples/bpf/map_perf_test_kern.c
> > >>> +++ b/samples/bpf/map_perf_test_kern.c
> > >>> @@ -154,7 +154,7 @@ int stress_percpu_hmap_alloc(struct pt_regs *ctx)
> > >>>        return 0;
> > >>>    }
> > >>>
> > >>> -SEC("kprobe/sys_connect")
> > >>> +SEC("kprobe/__sys_connect")
> > >>>    int stress_lru_hmap_alloc(struct pt_regs *ctx)
> > >>>    {
> > >>>        char fmt[] = "Failed at stress_lru_hmap_alloc. ret:%dn";
> > >>> diff --git a/samples/bpf/test_map_in_map_kern.c b/samples/bpf/test_map_in_map_kern.c
> > >>> index 6cee61e8ce9b..b1562ba2f025 100644
> > >>> --- a/samples/bpf/test_map_in_map_kern.c
> > >>> +++ b/samples/bpf/test_map_in_map_kern.c
> > >>> @@ -102,7 +102,7 @@ static __always_inline int do_inline_hash_lookup(void *inner_map, u32 port)
> > >>>        return result ? *result : -ENOENT;
> > >>>    }
> > >>>
> > >>> -SEC("kprobe/sys_connect")
> > >>> +SEC("kprobe/__sys_connect")
> > >>>    int trace_sys_connect(struct pt_regs *ctx)
> > >>>    {
> > >>>        struct sockaddr_in6 *in6;
> > >>> diff --git a/samples/bpf/test_probe_write_user_kern.c b/samples/bpf/test_probe_write_user_kern.c
> > >>> index 6579639a83b2..9b3c3918c37d 100644
> > >>> --- a/samples/bpf/test_probe_write_user_kern.c
> > >>> +++ b/samples/bpf/test_probe_write_user_kern.c
> > >>> @@ -26,7 +26,7 @@ struct {
> > >>>     * This example sits on a syscall, and the syscall ABI is relatively stable
> > >>>     * of course, across platforms, and over time, the ABI may change.
> > >>>     */
> > >>> -SEC("kprobe/sys_connect")
> > >>> +SEC("kprobe/__sys_connect")
> > >>>    int bpf_prog1(struct pt_regs *ctx)
> > >>>    {
> > >>>        struct sockaddr_in new_addr, orig_addr = {};
> > >>>
Daniel T. Lee July 7, 2020, 2:33 a.m. UTC | #6
On Tue, Jul 7, 2020 at 8:50 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Jul 6, 2020 at 3:28 AM Daniel T. Lee <danieltimlee@gmail.com> wrote:
> >
> > On Fri, Jul 3, 2020 at 1:04 AM Yonghong Song <yhs@fb.com> wrote:
> > >
> > >
> > >
> > > On 7/2/20 4:13 AM, Daniel T. Lee wrote:
> > > > On Thu, Jul 2, 2020 at 2:13 PM Yonghong Song <yhs@fb.com> wrote:
> > > >>
> > > >>
> > > >>
> > > >> On 7/1/20 7:16 PM, Daniel T. Lee wrote:
> > > >>> Currently, BPF programs with kprobe/sys_connect does not work properly.
> > > >>>
> > > >>> Commit 34745aed515c ("samples/bpf: fix kprobe attachment issue on x64")
> > > >>> This commit modifies the bpf_load behavior of kprobe events in the x64
> > > >>> architecture. If the current kprobe event target starts with "sys_*",
> > > >>> add the prefix "__x64_" to the front of the event.
> > > >>>
> > > >>> Appending "__x64_" prefix with kprobe/sys_* event was appropriate as a
> > > >>> solution to most of the problems caused by the commit below.
> > > >>>
> > > >>>       commit d5a00528b58c ("syscalls/core, syscalls/x86: Rename struct
> > > >>>       pt_regs-based sys_*() to __x64_sys_*()")
> > > >>>
> > > >>> However, there is a problem with the sys_connect kprobe event that does
> > > >>> not work properly. For __sys_connect event, parameters can be fetched
> > > >>> normally, but for __x64_sys_connect, parameters cannot be fetched.
> > > >>>
> > > >>> Because of this problem, this commit fixes the sys_connect event by
> > > >>> specifying the __sys_connect directly and this will bypass the
> > > >>> "__x64_" appending rule of bpf_load.
> > > >>
> > > >> In the kernel code, we have
> > > >>
> > > >> SYSCALL_DEFINE3(connect, int, fd, struct sockaddr __user *, uservaddr,
> > > >>                   int, addrlen)
> > > >> {
> > > >>           return __sys_connect(fd, uservaddr, addrlen);
> > > >> }
> > > >>
> > > >> Depending on compiler, there is no guarantee that __sys_connect will
> > > >> not be inlined. I would prefer to still use the entry point
> > > >> __x64_sys_* e.g.,
> > > >>      SEC("kprobe/" SYSCALL(sys_write))
> > > >>
> > > >
> > > > As you mentioned, there is clearly a possibility that problems may arise
> > > > because the symbol does not exist according to the compiler.
> > > >
> > > > However, in x64, when using Kprobe for __x64_sys_connect event, the
> > > > tests are not working properly because the parameters cannot be fetched,
> > > > and the test under selftests/bpf is using "kprobe/_sys_connect" directly.
> > >
> > > This is the assembly code for __x64_sys_connect.
> > >
> > > ffffffff818d3520 <__x64_sys_connect>:
> > > ffffffff818d3520: e8 fb df 32 00        callq   0xffffffff81c01520
> > > <__fentry__>
> > > ffffffff818d3525: 48 8b 57 60           movq    96(%rdi), %rdx
> > > ffffffff818d3529: 48 8b 77 68           movq    104(%rdi), %rsi
> > > ffffffff818d352d: 48 8b 7f 70           movq    112(%rdi), %rdi
> > > ffffffff818d3531: e8 1a ff ff ff        callq   0xffffffff818d3450
> > > <__sys_connect>
> > > ffffffff818d3536: 48 98                 cltq
> > > ffffffff818d3538: c3                    retq
> > > ffffffff818d3539: 0f 1f 80 00 00 00 00  nopl    (%rax)
> > >
> > > In bpf program, the step is:
> > >        struct pt_regs *real_regs = PT_REGS_PARM1(pt_regs);
> > >        param1 = PT_REGS_PARM1(real_regs);
> > >        param2 = PT_REGS_PARM2(real_regs);
> > >        param3 = PT_REGS_PARM3(real_regs);
> > > The same for s390.
> > >
> >
> > I'm sorry that I seem to get it wrong,
> > But is it available to access 'struct pt_regs *' recursively?
> >
> > It seems nested use of PT_REGS_PARM causes invalid memory access.
> >
> >     $ sudo ./test_probe_write_user
> >     libbpf: load bpf program failed: Permission denied
> >     libbpf: -- BEGIN DUMP LOG ---
> >     libbpf:
> >     Unrecognized arg#0 type PTR
> >     ; struct pt_regs *real_regs = PT_REGS_PARM1(ctx);
> >     0: (79) r1 = *(u64 *)(r1 +112)
> >     ; void *sockaddr_arg = (void *)PT_REGS_PARM2(real_regs);
> >     1: (79) r6 = *(u64 *)(r1 +104)
> >     R1 invalid mem access 'inv'
> >     processed 2 insns (limit 1000000) max_states_per_insn 0
> > total_states 0 peak_states 0 mark_read 0
> >
> >     libbpf: -- END LOG --
> >     libbpf: failed to load program 'kprobe/__x64_sys_connect'
> >     libbpf: failed to load object './test_probe_write_user_kern.o'
> >     ERROR: loading BPF object file failed
> >
> > I'm not fully aware of the BPF verifier's internal structure.
> > Is there any workaround to solve this problem?
>
> You need to use bpf_probe_read_kernel() to get those arguments from
> real_args. Or better just use PT_REGS_PARM1_CORE(x) and others, which
> does that for you (+ CO-RE relocation).
>
>

Thanks for the tip!

I've just tried the old hack '_(P)':
(which is similar implementation with BPF_CORE_READ())

    #define _(P) ({typeof(P) val = 0; bpf_probe_read(&val,
sizeof(val), &P); val;})
    [...]
    struct pt_regs *regs = (struct pt_regs *)PT_REGS_PARM1(ctx);
    void *sockaddr_arg = (void *)_(PT_REGS_PARM2(regs));
    int sockaddr_len = (int)_(PT_REGS_PARM3(regs));

and it works properly.

Just wondering, why is the pointer chasing of the original ctx
considered as an unsafe pointer here?

    ; struct pt_regs *real_regs = (struct pt_regs *)PT_REGS_PARM1(ctx);
    0: (79) r1 = *(u64 *)(r1 +112)
    [...]
    ; void *sockaddr_arg = (void *)PT_REGS_PARM2(real_regs);
    4: (79) r6 = *(u64 *)(r1 +104)

Is it considered as an unsafe pointer since it is unknown what exists
in the pointer (r1 + 104), but the instruction is trying to access it?


I am a little concerned about using PT_REGS_PARM1_CORE
because it is not a CORE-related patch, but if using CORE is the
direction BPF wants to take, I will use PT_REGS_PARM1_CORE()
instead of _(P) hack using bpf_probe_read().

In addition, PT_REGS_PARM1_CORE() allows me to write code
neatly without having to define additional macro _(P).

Thank you for your time and effort for the review.
Daniel

> >
> > Thanks for your time and effort for the review.
> > Daniel.
> >
> > >
> > > For other architectures, no above indirection is needed.
> > >
> > > I guess you can abstract the above into trace_common.h?
> > >
> > > >
> > > > I'm not sure how to deal with this problem. Any advice and suggestions
> > > > will be greatly appreciated.
> > > >
> > > > Thanks for your time and effort for the review.
> > > > Daniel
> > > >
> > > >>>
> > > >>> Fixes: 34745aed515c ("samples/bpf: fix kprobe attachment issue on x64")
> > > >>> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> > > >>> ---
> > > >>>    samples/bpf/map_perf_test_kern.c         | 2 +-
> > > >>>    samples/bpf/test_map_in_map_kern.c       | 2 +-
> > > >>>    samples/bpf/test_probe_write_user_kern.c | 2 +-
> > > >>>    3 files changed, 3 insertions(+), 3 deletions(-)
> > > >>>
> > > >>> diff --git a/samples/bpf/map_perf_test_kern.c b/samples/bpf/map_perf_test_kern.c
> > > >>> index 12e91ae64d4d..cebe2098bb24 100644
> > > >>> --- a/samples/bpf/map_perf_test_kern.c
> > > >>> +++ b/samples/bpf/map_perf_test_kern.c
> > > >>> @@ -154,7 +154,7 @@ int stress_percpu_hmap_alloc(struct pt_regs *ctx)
> > > >>>        return 0;
> > > >>>    }
> > > >>>
> > > >>> -SEC("kprobe/sys_connect")
> > > >>> +SEC("kprobe/__sys_connect")
> > > >>>    int stress_lru_hmap_alloc(struct pt_regs *ctx)
> > > >>>    {
> > > >>>        char fmt[] = "Failed at stress_lru_hmap_alloc. ret:%dn";
> > > >>> diff --git a/samples/bpf/test_map_in_map_kern.c b/samples/bpf/test_map_in_map_kern.c
> > > >>> index 6cee61e8ce9b..b1562ba2f025 100644
> > > >>> --- a/samples/bpf/test_map_in_map_kern.c
> > > >>> +++ b/samples/bpf/test_map_in_map_kern.c
> > > >>> @@ -102,7 +102,7 @@ static __always_inline int do_inline_hash_lookup(void *inner_map, u32 port)
> > > >>>        return result ? *result : -ENOENT;
> > > >>>    }
> > > >>>
> > > >>> -SEC("kprobe/sys_connect")
> > > >>> +SEC("kprobe/__sys_connect")
> > > >>>    int trace_sys_connect(struct pt_regs *ctx)
> > > >>>    {
> > > >>>        struct sockaddr_in6 *in6;
> > > >>> diff --git a/samples/bpf/test_probe_write_user_kern.c b/samples/bpf/test_probe_write_user_kern.c
> > > >>> index 6579639a83b2..9b3c3918c37d 100644
> > > >>> --- a/samples/bpf/test_probe_write_user_kern.c
> > > >>> +++ b/samples/bpf/test_probe_write_user_kern.c
> > > >>> @@ -26,7 +26,7 @@ struct {
> > > >>>     * This example sits on a syscall, and the syscall ABI is relatively stable
> > > >>>     * of course, across platforms, and over time, the ABI may change.
> > > >>>     */
> > > >>> -SEC("kprobe/sys_connect")
> > > >>> +SEC("kprobe/__sys_connect")
> > > >>>    int bpf_prog1(struct pt_regs *ctx)
> > > >>>    {
> > > >>>        struct sockaddr_in new_addr, orig_addr = {};
> > > >>>
Andrii Nakryiko July 7, 2020, 5:15 a.m. UTC | #7
On Mon, Jul 6, 2020 at 7:33 PM Daniel T. Lee <danieltimlee@gmail.com> wrote:
>
> On Tue, Jul 7, 2020 at 8:50 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Jul 6, 2020 at 3:28 AM Daniel T. Lee <danieltimlee@gmail.com> wrote:
> > >
> > > On Fri, Jul 3, 2020 at 1:04 AM Yonghong Song <yhs@fb.com> wrote:
> > > >
> > > >
> > > >
> > > > On 7/2/20 4:13 AM, Daniel T. Lee wrote:
> > > > > On Thu, Jul 2, 2020 at 2:13 PM Yonghong Song <yhs@fb.com> wrote:
> > > > >>
> > > > >>
> > > > >>
> > > > >> On 7/1/20 7:16 PM, Daniel T. Lee wrote:
> > > > >>> Currently, BPF programs with kprobe/sys_connect does not work properly.
> > > > >>>
> > > > >>> Commit 34745aed515c ("samples/bpf: fix kprobe attachment issue on x64")
> > > > >>> This commit modifies the bpf_load behavior of kprobe events in the x64
> > > > >>> architecture. If the current kprobe event target starts with "sys_*",
> > > > >>> add the prefix "__x64_" to the front of the event.
> > > > >>>
> > > > >>> Appending "__x64_" prefix with kprobe/sys_* event was appropriate as a
> > > > >>> solution to most of the problems caused by the commit below.
> > > > >>>
> > > > >>>       commit d5a00528b58c ("syscalls/core, syscalls/x86: Rename struct
> > > > >>>       pt_regs-based sys_*() to __x64_sys_*()")
> > > > >>>
> > > > >>> However, there is a problem with the sys_connect kprobe event that does
> > > > >>> not work properly. For __sys_connect event, parameters can be fetched
> > > > >>> normally, but for __x64_sys_connect, parameters cannot be fetched.
> > > > >>>
> > > > >>> Because of this problem, this commit fixes the sys_connect event by
> > > > >>> specifying the __sys_connect directly and this will bypass the
> > > > >>> "__x64_" appending rule of bpf_load.
> > > > >>
> > > > >> In the kernel code, we have
> > > > >>
> > > > >> SYSCALL_DEFINE3(connect, int, fd, struct sockaddr __user *, uservaddr,
> > > > >>                   int, addrlen)
> > > > >> {
> > > > >>           return __sys_connect(fd, uservaddr, addrlen);
> > > > >> }
> > > > >>
> > > > >> Depending on compiler, there is no guarantee that __sys_connect will
> > > > >> not be inlined. I would prefer to still use the entry point
> > > > >> __x64_sys_* e.g.,
> > > > >>      SEC("kprobe/" SYSCALL(sys_write))
> > > > >>
> > > > >
> > > > > As you mentioned, there is clearly a possibility that problems may arise
> > > > > because the symbol does not exist according to the compiler.
> > > > >
> > > > > However, in x64, when using Kprobe for __x64_sys_connect event, the
> > > > > tests are not working properly because the parameters cannot be fetched,
> > > > > and the test under selftests/bpf is using "kprobe/_sys_connect" directly.
> > > >
> > > > This is the assembly code for __x64_sys_connect.
> > > >
> > > > ffffffff818d3520 <__x64_sys_connect>:
> > > > ffffffff818d3520: e8 fb df 32 00        callq   0xffffffff81c01520
> > > > <__fentry__>
> > > > ffffffff818d3525: 48 8b 57 60           movq    96(%rdi), %rdx
> > > > ffffffff818d3529: 48 8b 77 68           movq    104(%rdi), %rsi
> > > > ffffffff818d352d: 48 8b 7f 70           movq    112(%rdi), %rdi
> > > > ffffffff818d3531: e8 1a ff ff ff        callq   0xffffffff818d3450
> > > > <__sys_connect>
> > > > ffffffff818d3536: 48 98                 cltq
> > > > ffffffff818d3538: c3                    retq
> > > > ffffffff818d3539: 0f 1f 80 00 00 00 00  nopl    (%rax)
> > > >
> > > > In bpf program, the step is:
> > > >        struct pt_regs *real_regs = PT_REGS_PARM1(pt_regs);
> > > >        param1 = PT_REGS_PARM1(real_regs);
> > > >        param2 = PT_REGS_PARM2(real_regs);
> > > >        param3 = PT_REGS_PARM3(real_regs);
> > > > The same for s390.
> > > >
> > >
> > > I'm sorry that I seem to get it wrong,
> > > But is it available to access 'struct pt_regs *' recursively?
> > >
> > > It seems nested use of PT_REGS_PARM causes invalid memory access.
> > >
> > >     $ sudo ./test_probe_write_user
> > >     libbpf: load bpf program failed: Permission denied
> > >     libbpf: -- BEGIN DUMP LOG ---
> > >     libbpf:
> > >     Unrecognized arg#0 type PTR
> > >     ; struct pt_regs *real_regs = PT_REGS_PARM1(ctx);
> > >     0: (79) r1 = *(u64 *)(r1 +112)
> > >     ; void *sockaddr_arg = (void *)PT_REGS_PARM2(real_regs);
> > >     1: (79) r6 = *(u64 *)(r1 +104)
> > >     R1 invalid mem access 'inv'
> > >     processed 2 insns (limit 1000000) max_states_per_insn 0
> > > total_states 0 peak_states 0 mark_read 0
> > >
> > >     libbpf: -- END LOG --
> > >     libbpf: failed to load program 'kprobe/__x64_sys_connect'
> > >     libbpf: failed to load object './test_probe_write_user_kern.o'
> > >     ERROR: loading BPF object file failed
> > >
> > > I'm not fully aware of the BPF verifier's internal structure.
> > > Is there any workaround to solve this problem?
> >
> > You need to use bpf_probe_read_kernel() to get those arguments from
> > real_args. Or better just use PT_REGS_PARM1_CORE(x) and others, which
> > does that for you (+ CO-RE relocation).
> >
> >
>
> Thanks for the tip!
>
> I've just tried the old hack '_(P)':
> (which is similar implementation with BPF_CORE_READ())
>
>     #define _(P) ({typeof(P) val = 0; bpf_probe_read(&val,
> sizeof(val), &P); val;})
>     [...]
>     struct pt_regs *regs = (struct pt_regs *)PT_REGS_PARM1(ctx);
>     void *sockaddr_arg = (void *)_(PT_REGS_PARM2(regs));
>     int sockaddr_len = (int)_(PT_REGS_PARM3(regs));
>
> and it works properly.
>
> Just wondering, why is the pointer chasing of the original ctx
> considered as an unsafe pointer here?
>
>     ; struct pt_regs *real_regs = (struct pt_regs *)PT_REGS_PARM1(ctx);
>     0: (79) r1 = *(u64 *)(r1 +112)
>     [...]
>     ; void *sockaddr_arg = (void *)PT_REGS_PARM2(real_regs);
>     4: (79) r6 = *(u64 *)(r1 +104)
>
> Is it considered as an unsafe pointer since it is unknown what exists
> in the pointer (r1 + 104), but the instruction is trying to access it?
>

Yes.
Because after the initial pointer read, the verifier assumes that you
are reading a random piece of memory.

>
> I am a little concerned about using PT_REGS_PARM1_CORE
> because it is not a CORE-related patch, but if using CORE is the
> direction BPF wants to take, I will use PT_REGS_PARM1_CORE()
> instead of _(P) hack using bpf_probe_read().

bpf_probe_read() works as well. But yeah, BPF CO-RE is the way modern
tracing applications are leaning, look at selftests and see how many
are using CO-RE already. It's pretty much the only way to write
portable tracing BPF applications, short of taking Clang/LLVM
**runtime** dependency, the way BCC makes you do.

>
> In addition, PT_REGS_PARM1_CORE() allows me to write code
> neatly without having to define additional macro _(P).
>
> Thank you for your time and effort for the review.
> Daniel
>
> > >
> > > Thanks for your time and effort for the review.
> > > Daniel.
> > >
> > > >
> > > > For other architectures, no above indirection is needed.
> > > >
> > > > I guess you can abstract the above into trace_common.h?
> > > >
> > > > >
> > > > > I'm not sure how to deal with this problem. Any advice and suggestions
> > > > > will be greatly appreciated.
> > > > >
> > > > > Thanks for your time and effort for the review.
> > > > > Daniel
> > > > >
> > > > >>>
> > > > >>> Fixes: 34745aed515c ("samples/bpf: fix kprobe attachment issue on x64")
> > > > >>> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> > > > >>> ---

[...]
Daniel T. Lee July 7, 2020, 5:46 a.m. UTC | #8
On Tue, Jul 7, 2020 at 2:15 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Jul 6, 2020 at 7:33 PM Daniel T. Lee <danieltimlee@gmail.com> wrote:
> >
> > On Tue, Jul 7, 2020 at 8:50 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Mon, Jul 6, 2020 at 3:28 AM Daniel T. Lee <danieltimlee@gmail.com> wrote:
> > > >
> > > > On Fri, Jul 3, 2020 at 1:04 AM Yonghong Song <yhs@fb.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > On 7/2/20 4:13 AM, Daniel T. Lee wrote:
> > > > > > On Thu, Jul 2, 2020 at 2:13 PM Yonghong Song <yhs@fb.com> wrote:
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On 7/1/20 7:16 PM, Daniel T. Lee wrote:
> > > > > >>> Currently, BPF programs with kprobe/sys_connect does not work properly.
> > > > > >>>
> > > > > >>> Commit 34745aed515c ("samples/bpf: fix kprobe attachment issue on x64")
> > > > > >>> This commit modifies the bpf_load behavior of kprobe events in the x64
> > > > > >>> architecture. If the current kprobe event target starts with "sys_*",
> > > > > >>> add the prefix "__x64_" to the front of the event.
> > > > > >>>
> > > > > >>> Appending "__x64_" prefix with kprobe/sys_* event was appropriate as a
> > > > > >>> solution to most of the problems caused by the commit below.
> > > > > >>>
> > > > > >>>       commit d5a00528b58c ("syscalls/core, syscalls/x86: Rename struct
> > > > > >>>       pt_regs-based sys_*() to __x64_sys_*()")
> > > > > >>>
> > > > > >>> However, there is a problem with the sys_connect kprobe event that does
> > > > > >>> not work properly. For __sys_connect event, parameters can be fetched
> > > > > >>> normally, but for __x64_sys_connect, parameters cannot be fetched.
> > > > > >>>
> > > > > >>> Because of this problem, this commit fixes the sys_connect event by
> > > > > >>> specifying the __sys_connect directly and this will bypass the
> > > > > >>> "__x64_" appending rule of bpf_load.
> > > > > >>
> > > > > >> In the kernel code, we have
> > > > > >>
> > > > > >> SYSCALL_DEFINE3(connect, int, fd, struct sockaddr __user *, uservaddr,
> > > > > >>                   int, addrlen)
> > > > > >> {
> > > > > >>           return __sys_connect(fd, uservaddr, addrlen);
> > > > > >> }
> > > > > >>
> > > > > >> Depending on compiler, there is no guarantee that __sys_connect will
> > > > > >> not be inlined. I would prefer to still use the entry point
> > > > > >> __x64_sys_* e.g.,
> > > > > >>      SEC("kprobe/" SYSCALL(sys_write))
> > > > > >>
> > > > > >
> > > > > > As you mentioned, there is clearly a possibility that problems may arise
> > > > > > because the symbol does not exist according to the compiler.
> > > > > >
> > > > > > However, in x64, when using Kprobe for __x64_sys_connect event, the
> > > > > > tests are not working properly because the parameters cannot be fetched,
> > > > > > and the test under selftests/bpf is using "kprobe/_sys_connect" directly.
> > > > >
> > > > > This is the assembly code for __x64_sys_connect.
> > > > >
> > > > > ffffffff818d3520 <__x64_sys_connect>:
> > > > > ffffffff818d3520: e8 fb df 32 00        callq   0xffffffff81c01520
> > > > > <__fentry__>
> > > > > ffffffff818d3525: 48 8b 57 60           movq    96(%rdi), %rdx
> > > > > ffffffff818d3529: 48 8b 77 68           movq    104(%rdi), %rsi
> > > > > ffffffff818d352d: 48 8b 7f 70           movq    112(%rdi), %rdi
> > > > > ffffffff818d3531: e8 1a ff ff ff        callq   0xffffffff818d3450
> > > > > <__sys_connect>
> > > > > ffffffff818d3536: 48 98                 cltq
> > > > > ffffffff818d3538: c3                    retq
> > > > > ffffffff818d3539: 0f 1f 80 00 00 00 00  nopl    (%rax)
> > > > >
> > > > > In bpf program, the step is:
> > > > >        struct pt_regs *real_regs = PT_REGS_PARM1(pt_regs);
> > > > >        param1 = PT_REGS_PARM1(real_regs);
> > > > >        param2 = PT_REGS_PARM2(real_regs);
> > > > >        param3 = PT_REGS_PARM3(real_regs);
> > > > > The same for s390.
> > > > >
> > > >
> > > > I'm sorry that I seem to get it wrong,
> > > > But is it available to access 'struct pt_regs *' recursively?
> > > >
> > > > It seems nested use of PT_REGS_PARM causes invalid memory access.
> > > >
> > > >     $ sudo ./test_probe_write_user
> > > >     libbpf: load bpf program failed: Permission denied
> > > >     libbpf: -- BEGIN DUMP LOG ---
> > > >     libbpf:
> > > >     Unrecognized arg#0 type PTR
> > > >     ; struct pt_regs *real_regs = PT_REGS_PARM1(ctx);
> > > >     0: (79) r1 = *(u64 *)(r1 +112)
> > > >     ; void *sockaddr_arg = (void *)PT_REGS_PARM2(real_regs);
> > > >     1: (79) r6 = *(u64 *)(r1 +104)
> > > >     R1 invalid mem access 'inv'
> > > >     processed 2 insns (limit 1000000) max_states_per_insn 0
> > > > total_states 0 peak_states 0 mark_read 0
> > > >
> > > >     libbpf: -- END LOG --
> > > >     libbpf: failed to load program 'kprobe/__x64_sys_connect'
> > > >     libbpf: failed to load object './test_probe_write_user_kern.o'
> > > >     ERROR: loading BPF object file failed
> > > >
> > > > I'm not fully aware of the BPF verifier's internal structure.
> > > > Is there any workaround to solve this problem?
> > >
> > > You need to use bpf_probe_read_kernel() to get those arguments from
> > > real_args. Or better just use PT_REGS_PARM1_CORE(x) and others, which
> > > does that for you (+ CO-RE relocation).
> > >
> > >
> >
> > Thanks for the tip!
> >
> > I've just tried the old hack '_(P)':
> > (which is similar implementation with BPF_CORE_READ())
> >
> >     #define _(P) ({typeof(P) val = 0; bpf_probe_read(&val,
> > sizeof(val), &P); val;})
> >     [...]
> >     struct pt_regs *regs = (struct pt_regs *)PT_REGS_PARM1(ctx);
> >     void *sockaddr_arg = (void *)_(PT_REGS_PARM2(regs));
> >     int sockaddr_len = (int)_(PT_REGS_PARM3(regs));
> >
> > and it works properly.
> >
> > Just wondering, why is the pointer chasing of the original ctx
> > considered as an unsafe pointer here?
> >
> >     ; struct pt_regs *real_regs = (struct pt_regs *)PT_REGS_PARM1(ctx);
> >     0: (79) r1 = *(u64 *)(r1 +112)
> >     [...]
> >     ; void *sockaddr_arg = (void *)PT_REGS_PARM2(real_regs);
> >     4: (79) r6 = *(u64 *)(r1 +104)
> >
> > Is it considered as an unsafe pointer since it is unknown what exists
> > in the pointer (r1 + 104), but the instruction is trying to access it?
> >
>
> Yes.
> Because after the initial pointer read, the verifier assumes that you
> are reading a random piece of memory.
>

Thanks for the confirmation. It's very helpful to me.

> >
> > I am a little concerned about using PT_REGS_PARM1_CORE
> > because it is not a CORE-related patch, but if using CORE is the
> > direction BPF wants to take, I will use PT_REGS_PARM1_CORE()
> > instead of _(P) hack using bpf_probe_read().
>
> bpf_probe_read() works as well. But yeah, BPF CO-RE is the way modern
> tracing applications are leaning, look at selftests and see how many
> are using CO-RE already. It's pretty much the only way to write
> portable tracing BPF applications, short of taking Clang/LLVM
> **runtime** dependency, the way BCC makes you do.
>

I see. I'll stick with PT_REGS_PARM1_CORE approach.

I will send the next version of the patch soon.

Thank you for your time and effort for the review.
Daniel

> >
> > In addition, PT_REGS_PARM1_CORE() allows me to write code
> > neatly without having to define additional macro _(P).
> >
> > Thank you for your time and effort for the review.
> > Daniel
> >
> > > >
> > > > Thanks for your time and effort for the review.
> > > > Daniel.
> > > >
> > > > >
> > > > > For other architectures, no above indirection is needed.
> > > > >
> > > > > I guess you can abstract the above into trace_common.h?
> > > > >
> > > > > >
> > > > > > I'm not sure how to deal with this problem. Any advice and suggestions
> > > > > > will be greatly appreciated.
> > > > > >
> > > > > > Thanks for your time and effort for the review.
> > > > > > Daniel
> > > > > >
> > > > > >>>
> > > > > >>> Fixes: 34745aed515c ("samples/bpf: fix kprobe attachment issue on x64")
> > > > > >>> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> > > > > >>> ---
>
> [...]
diff mbox series

Patch

diff --git a/samples/bpf/map_perf_test_kern.c b/samples/bpf/map_perf_test_kern.c
index 12e91ae64d4d..cebe2098bb24 100644
--- a/samples/bpf/map_perf_test_kern.c
+++ b/samples/bpf/map_perf_test_kern.c
@@ -154,7 +154,7 @@  int stress_percpu_hmap_alloc(struct pt_regs *ctx)
 	return 0;
 }
 
-SEC("kprobe/sys_connect")
+SEC("kprobe/__sys_connect")
 int stress_lru_hmap_alloc(struct pt_regs *ctx)
 {
 	char fmt[] = "Failed at stress_lru_hmap_alloc. ret:%dn";
diff --git a/samples/bpf/test_map_in_map_kern.c b/samples/bpf/test_map_in_map_kern.c
index 6cee61e8ce9b..b1562ba2f025 100644
--- a/samples/bpf/test_map_in_map_kern.c
+++ b/samples/bpf/test_map_in_map_kern.c
@@ -102,7 +102,7 @@  static __always_inline int do_inline_hash_lookup(void *inner_map, u32 port)
 	return result ? *result : -ENOENT;
 }
 
-SEC("kprobe/sys_connect")
+SEC("kprobe/__sys_connect")
 int trace_sys_connect(struct pt_regs *ctx)
 {
 	struct sockaddr_in6 *in6;
diff --git a/samples/bpf/test_probe_write_user_kern.c b/samples/bpf/test_probe_write_user_kern.c
index 6579639a83b2..9b3c3918c37d 100644
--- a/samples/bpf/test_probe_write_user_kern.c
+++ b/samples/bpf/test_probe_write_user_kern.c
@@ -26,7 +26,7 @@  struct {
  * This example sits on a syscall, and the syscall ABI is relatively stable
  * of course, across platforms, and over time, the ABI may change.
  */
-SEC("kprobe/sys_connect")
+SEC("kprobe/__sys_connect")
 int bpf_prog1(struct pt_regs *ctx)
 {
 	struct sockaddr_in new_addr, orig_addr = {};