diff mbox series

[RFC,bpf] tools: bpftool: Fix tags for bpf-to-bpf calls

Message ID 20180227121346.16199-1-sandipan@linux.vnet.ibm.com
State RFC, archived
Delegated to: BPF Maintainers
Headers show
Series [RFC,bpf] tools: bpftool: Fix tags for bpf-to-bpf calls | expand

Commit Message

Sandipan Das Feb. 27, 2018, 12:13 p.m. UTC
"Naveen N. Rao" wrote:
> I'm wondering if we can instead encode the bpf prog id in
> imm32. That way, we should be able to indicate the BPF
> function being called into.  Daniel, is that something we
> can consider?

Since each subprog does not get a separate id, we cannot fetch
the fd and therefore the tag of a subprog. Instead we can use
the tag of the complete program as shown below.

"Daniel Borkmann" wrote:
> I think one limitation that would still need to be addressed later
> with such approach would be regarding the xlated prog dump in bpftool,
> see 'BPF calls via JIT' in 7105e828c087 ("bpf: allow for correlation
> of maps and helpers in dump"). Any ideas for this (potentially if we
> could use off + imm for calls, we'd get to 48 bits, but that seems
> still not be enough as you say)?

As an alternative, this is what I am thinking of:

Currently, for bpf-to-bpf calls, if bpf_jit_kallsyms is enabled,
bpftool looks up the name of the corresponding symbol for the
JIT-ed subprogram and shows it in the xlated dump alongside the
actual call instruction. However, the lookup is based on the
target address which is calculated using the imm field of the
instruction. So, once again, if imm is truncated, we will end
up with the wrong address. Also, the subprog aux data (which
has been proposed as a mitigation for this) is not accessible
from this tool.

We can still access the tag for the complete bpf program and use
this with the correct offset in an objdump-like notation as an
alterative for the name of the subprog that is the target of a
bpf-to-bpf call instruction.

Currently, an xlated dump looks like this:
   0: (85) call pc+2#bpf_prog_5f76847930402518_F
   1: (b7) r0 = 1
   2: (95) exit
   3: (b7) r0 = 2
   4: (95) exit

With this patch, it will look like this:
   0: (85) call pc+2#bpf_prog_8f85936f29a7790a+3
   1: (b7) r0 = 1
   2: (95) exit
   3: (b7) r0 = 2
   4: (95) exit

where 8f85936f29a7790a is the tag of the bpf program and 3 is
the offset to the start of the subprog from the start of the
program.

Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
---
 tools/bpf/bpftool/prog.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

Comments

Daniel Borkmann Feb. 27, 2018, 2:44 p.m. UTC | #1
On 02/27/2018 01:13 PM, Sandipan Das wrote:
> "Naveen N. Rao" wrote:
>> I'm wondering if we can instead encode the bpf prog id in
>> imm32. That way, we should be able to indicate the BPF
>> function being called into.  Daniel, is that something we
>> can consider?
> 
> Since each subprog does not get a separate id, we cannot fetch
> the fd and therefore the tag of a subprog. Instead we can use
> the tag of the complete program as shown below.
> 
> "Daniel Borkmann" wrote:
>> I think one limitation that would still need to be addressed later
>> with such approach would be regarding the xlated prog dump in bpftool,
>> see 'BPF calls via JIT' in 7105e828c087 ("bpf: allow for correlation
>> of maps and helpers in dump"). Any ideas for this (potentially if we
>> could use off + imm for calls, we'd get to 48 bits, but that seems
>> still not be enough as you say)?
> 
> As an alternative, this is what I am thinking of:
> 
> Currently, for bpf-to-bpf calls, if bpf_jit_kallsyms is enabled,
> bpftool looks up the name of the corresponding symbol for the
> JIT-ed subprogram and shows it in the xlated dump alongside the
> actual call instruction. However, the lookup is based on the
> target address which is calculated using the imm field of the
> instruction. So, once again, if imm is truncated, we will end
> up with the wrong address. Also, the subprog aux data (which
> has been proposed as a mitigation for this) is not accessible
> from this tool.
> 
> We can still access the tag for the complete bpf program and use
> this with the correct offset in an objdump-like notation as an
> alterative for the name of the subprog that is the target of a
> bpf-to-bpf call instruction.
> 
> Currently, an xlated dump looks like this:
>    0: (85) call pc+2#bpf_prog_5f76847930402518_F
>    1: (b7) r0 = 1
>    2: (95) exit
>    3: (b7) r0 = 2
>    4: (95) exit
> 
> With this patch, it will look like this:
>    0: (85) call pc+2#bpf_prog_8f85936f29a7790a+3

(Note the +2 is the insn->off already.)

>    1: (b7) r0 = 1
>    2: (95) exit
>    3: (b7) r0 = 2
>    4: (95) exit
> 
> where 8f85936f29a7790a is the tag of the bpf program and 3 is
> the offset to the start of the subprog from the start of the
> program.

The problem with this approach would be that right now the name is
something like bpf_prog_5f76847930402518_F where the subprog tag is
just a placeholder so in future, this may well adapt to e.g. the actual
function name from the elf file. Note that when kallsyms is enabled
then a name like bpf_prog_5f76847930402518_F will also appear in stack
traces, perf records, etc, so for correlation/debugging it would really
help to have them the same everywhere.

Worst case if there's nothing better, potentially what one could do in
bpf_prog_get_info_by_fd() is to dump an array of full addresses and
have the imm part as the index pointing to one of them, just unfortunate
that it's likely only needed in ppc64.
Naveen N. Rao March 1, 2018, 8:51 a.m. UTC | #2
Daniel Borkmann wrote:
> On 02/27/2018 01:13 PM, Sandipan Das wrote:
>> With this patch, it will look like this:
>>    0: (85) call pc+2#bpf_prog_8f85936f29a7790a+3
> 
> (Note the +2 is the insn->off already.)
> 
>>    1: (b7) r0 = 1
>>    2: (95) exit
>>    3: (b7) r0 = 2
>>    4: (95) exit
>> 
>> where 8f85936f29a7790a is the tag of the bpf program and 3 is
>> the offset to the start of the subprog from the start of the
>> program.
> 
> The problem with this approach would be that right now the name is
> something like bpf_prog_5f76847930402518_F where the subprog tag is
> just a placeholder so in future, this may well adapt to e.g. the actual
> function name from the elf file. Note that when kallsyms is enabled
> then a name like bpf_prog_5f76847930402518_F will also appear in stack
> traces, perf records, etc, so for correlation/debugging it would really
> help to have them the same everywhere.
> 
> Worst case if there's nothing better, potentially what one could do in
> bpf_prog_get_info_by_fd() is to dump an array of full addresses and
> have the imm part as the index pointing to one of them, just unfortunate
> that it's likely only needed in ppc64.

Ok. We seem to have discussed a few different aspects in this thread.  
Let me summarize the different aspects we have discussed:
1. Passing address of JIT'ed function to the JIT engines:
    Two approaches discussed:
    a. Existing approach, where the subprog address is encoded as an 
    offset from __bpf_call_base() in imm32 field of the BPF call 
    instruction. This requires the JIT'ed function to be within 2GB of 
    __bpf_call_base(), which won't be true on ppc64, at the least. So, 
    this won't on ppc64 (and any other architectures where vmalloc'ed 
    (module_alloc()) memory is from a different, far, address range).
    
    [As a side note, is it _actually_ guaranteed that JIT'ed functions 
    will be within 2GB (signed 32-bit...) on all other architectures 
    where BPF JIT is supported? I'm not quite sure how memory allocation 
    works on other architectures, but it looks like this can fail if 
    there are other larger allocations.]

    b. Pass the full 64-bit address of the call target in an auxiliary 
    field for the JIT engine to use (as implemented in this mail chain).  
    We can then use this to determine the call target if this is a 
    pseudo call.

    There is a third option we can consider:
    c. Convert BPF pseudo call instruction into a 2-instruction sequence 
    (similar to BPF_DW) and encode the full 64-bit call target in the 
    second bpf instruction. To distinguish this from other instruction 
    forms, we can set imm32 to -1.

    If we go with (b) or (c), we will need to take a call on whether we 
    will implement this in the same manner across all architectures, or 
    if we should have ppc64 (and any other affected architectures) work 
    differently from the rest.

    Further more, for (b), bpftool won't be able to derive the target 
    function call address, but approaches (a) and (c) are fine. More 
    about that below...

2. Indicating target function in bpftool:
    In the existing approach, bpftool can determine target address since 
    the offset is encoded in imm32 and is able to lookup the name from 
    kallsyms, if enabled.

    If we go with approach (b) for ppc64, this won't work and we will 
    have to minimally update bpftool to detect that the target address 
    is not available on ppc64.

    If we go with approach (c), the target address will be available and 
    we should be able to update bpftool to look that up.
 
    [As a side note, I suppose part of Sandipan's point with the 
    previous patch was to make the bpftool output consistent whether or 
    not JIT is enabled. It does look a bit weird that bpftool shows the 
    address of a JIT'ed function when asked to print the BPF bytecode.]

Thoughts?


- Naveen
Alexei Starovoitov March 5, 2018, 5:02 p.m. UTC | #3
On 3/1/18 12:51 AM, Naveen N. Rao wrote:
> Daniel Borkmann wrote:
>> On 02/27/2018 01:13 PM, Sandipan Das wrote:
>>> With this patch, it will look like this:
>>>    0: (85) call pc+2#bpf_prog_8f85936f29a7790a+3
>>
>> (Note the +2 is the insn->off already.)
>>
>>>    1: (b7) r0 = 1
>>>    2: (95) exit
>>>    3: (b7) r0 = 2
>>>    4: (95) exit
>>>
>>> where 8f85936f29a7790a is the tag of the bpf program and 3 is
>>> the offset to the start of the subprog from the start of the
>>> program.
>>
>> The problem with this approach would be that right now the name is
>> something like bpf_prog_5f76847930402518_F where the subprog tag is
>> just a placeholder so in future, this may well adapt to e.g. the actual
>> function name from the elf file. Note that when kallsyms is enabled
>> then a name like bpf_prog_5f76847930402518_F will also appear in stack
>> traces, perf records, etc, so for correlation/debugging it would really
>> help to have them the same everywhere.
>>
>> Worst case if there's nothing better, potentially what one could do in
>> bpf_prog_get_info_by_fd() is to dump an array of full addresses and
>> have the imm part as the index pointing to one of them, just unfortunate
>> that it's likely only needed in ppc64.
>
> Ok. We seem to have discussed a few different aspects in this thread.
> Let me summarize the different aspects we have discussed:
> 1. Passing address of JIT'ed function to the JIT engines:
>    Two approaches discussed:
>    a. Existing approach, where the subprog address is encoded as an
> offset from __bpf_call_base() in imm32 field of the BPF call
> instruction. This requires the JIT'ed function to be within 2GB of
> __bpf_call_base(), which won't be true on ppc64, at the least. So,
> this won't on ppc64 (and any other architectures where vmalloc'ed
> (module_alloc()) memory is from a different, far, address range).

it looks like ppc64 doesn't guarantee today that all of module_alloc()
will be within 32-bit, but I think it should be trivial to add such
guarantee. If so, we can define another __bpf_call_base specifically
for bpf-to-bpf calls when jit is on.
Then jit_subprogs() math will fit:
insn->imm = func[subprog]->bpf_func - __bpf_call_base_for_jited_progs;
and will make it easier for ppc64 jit to optimize and use
near calls for bpf-to-bpf calls while still using trampoline
for bpf-to-kernel.
Also it solves bpftool issue.
For all other archs we can keep
__bpf_call_base_for_jited_progs == __bpf_call_base

>    There is a third option we can consider:
>    c. Convert BPF pseudo call instruction into a 2-instruction sequence
>    (similar to BPF_DW) and encode the full 64-bit call target in the
> second bpf instruction. To distinguish this from other instruction
> forms, we can set imm32 to -1.

Adding new instruction just for that case looks like overkill.
Naveen N. Rao May 3, 2018, 3:20 p.m. UTC | #4
Alexei Starovoitov wrote:
> On 3/1/18 12:51 AM, Naveen N. Rao wrote:
>> Daniel Borkmann wrote:
>>>
>>> Worst case if there's nothing better, potentially what one could do in
>>> bpf_prog_get_info_by_fd() is to dump an array of full addresses and
>>> have the imm part as the index pointing to one of them, just unfortunate
>>> that it's likely only needed in ppc64.
>>
>> Ok. We seem to have discussed a few different aspects in this thread.
>> Let me summarize the different aspects we have discussed:
>> 1. Passing address of JIT'ed function to the JIT engines:
>>    Two approaches discussed:
>>    a. Existing approach, where the subprog address is encoded as an
>> offset from __bpf_call_base() in imm32 field of the BPF call
>> instruction. This requires the JIT'ed function to be within 2GB of
>> __bpf_call_base(), which won't be true on ppc64, at the least. So,
>> this won't on ppc64 (and any other architectures where vmalloc'ed
>> (module_alloc()) memory is from a different, far, address range).
> 
> it looks like ppc64 doesn't guarantee today that all of module_alloc()
> will be within 32-bit, but I think it should be trivial to add such
> guarantee. If so, we can define another __bpf_call_base specifically
> for bpf-to-bpf calls when jit is on.

Ok, we prefer not to do that for powerpc (atleast, not for all of 
module_alloc()) at this point.

And since option (c) below is not preferable, I think we will implement 
what Daniel suggested above. This patchset already handles communicating 
the BPF function addresses to the JIT engine, and enhancing 
bpf_prog_get_info_by_fd() should address the concerns with bpftool.


- Naveen
diff mbox series

Patch

diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index e8e2baaf93c2..93746d5d1e3c 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -415,9 +415,11 @@  struct kernel_sym {
 };
 
 struct dump_data {
+	unsigned char prog_tag[BPF_TAG_SIZE];
 	unsigned long address_call_base;
 	struct kernel_sym *sym_mapping;
 	__u32 sym_count;
+	unsigned int curr_line;
 	char scratch_buff[SYM_MAX_NAME];
 };
 
@@ -499,16 +501,16 @@  static void print_insn(struct bpf_verifier_env *env, const char *fmt, ...)
 }
 
 static const char *print_call_pcrel(struct dump_data *dd,
-				    struct kernel_sym *sym,
-				    unsigned long address,
 				    const struct bpf_insn *insn)
 {
-	if (sym)
-		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
-			 "%+d#%s", insn->off, sym->name);
-	else
-		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
-			 "%+d#0x%lx", insn->off, address);
+	snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
+		 "+%d#bpf_prog_%02x%02x%02x%02x%02x%02x%02x%02x+%d",
+		 insn->off,
+		 dd->prog_tag[0], dd->prog_tag[1],
+		 dd->prog_tag[2], dd->prog_tag[3],
+		 dd->prog_tag[4], dd->prog_tag[5],
+		 dd->prog_tag[6], dd->prog_tag[7],
+		 dd->curr_line + insn->off + 1);
 	return dd->scratch_buff;
 }
 
@@ -534,7 +536,7 @@  static const char *print_call(void *private_data,
 
 	sym = kernel_syms_search(dd, address);
 	if (insn->src_reg == BPF_PSEUDO_CALL)
-		return print_call_pcrel(dd, sym, address, insn);
+		return print_call_pcrel(dd, insn);
 	else
 		return print_call_helper(dd, sym, address);
 }
@@ -576,6 +578,7 @@  static void dump_xlated_plain(struct dump_data *dd, void *buf,
 		double_insn = insn[i].code == (BPF_LD | BPF_IMM | BPF_DW);
 
 		printf("% 4d: ", i);
+		dd->curr_line = i;
 		print_bpf_insn(&cbs, NULL, insn + i, true);
 
 		if (opcodes) {
@@ -628,6 +631,7 @@  static void dump_xlated_json(struct dump_data *dd, void *buf,
 
 		jsonw_start_object(json_wtr);
 		jsonw_name(json_wtr, "disasm");
+		dd->curr_line = i;
 		print_bpf_insn(&cbs, NULL, insn + i, true);
 
 		if (opcodes) {
@@ -788,6 +792,7 @@  static int do_dump(int argc, char **argv)
 
 			disasm_print_insn(buf, *member_len, opcodes, name);
 		} else {
+			memcpy(dd.prog_tag, info.tag, sizeof(info.tag));
 			kernel_syms_load(&dd);
 			if (json_output)
 				dump_xlated_json(&dd, buf, *member_len, opcodes);