diff mbox series

[PATCH/RFC,bpf-next,04/16] bpf: mark sub-register writes that really need zero extension to high bits

Message ID 1553623539-15474-5-git-send-email-jiong.wang@netronome.com
State RFC
Delegated to: BPF Maintainers
Headers show
Series bpf: eliminate zero extensions for sub-register writes | expand

Commit Message

Jiong Wang March 26, 2019, 6:05 p.m. UTC
eBPF ISA specification requires high 32-bit cleared when low 32-bit
sub-register is written. This applies to destination register of ALU32 etc.
JIT back-ends must guarantee this semantic when doing code-gen.

x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
extension sequence to meet such semantic.

This is important, because for code the following:

  u64_value = (u64) u32_value
  ... other uses of u64_value

compiler could exploit the semantic described above and save those zero
extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
could go up to more for some arches which requires two shifts for zero
extension) because JIT back-end needs to do extra code-gen for all such
instructions.

However this is not always necessary in case u32_value is never cast into
a u64, which is quite normal in real life program. So, it would be really
good if we could identify those places where such type cast happened, and
only do zero extensions for them, not for the others. This could save a lot
of BPF code-gen.

Algo:
 - Record indices of instructions that do sub-register def (write). And
   these indices need to stay with function state so path pruning and bpf
   to bpf function call could be handled properly.

   These indices are kept up to date while doing insn walk.

 - A full register read on an active sub-register def marks the def insn as
   needing zero extension on dst register.

 - A new sub-register write overrides the old one.

   A new full register write makes the register free of zero extension on
   dst register.

 - When propagating register read64 during path pruning, it also marks def
   insns whose defs are hanging active sub-register, if there is any read64
   from shown from the equal state.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 include/linux/bpf_verifier.h |  4 +++
 kernel/bpf/verifier.c        | 85 +++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 84 insertions(+), 5 deletions(-)

Comments

Edward Cree March 26, 2019, 6:44 p.m. UTC | #1
On 26/03/2019 18:05, Jiong Wang wrote:
> eBPF ISA specification requires high 32-bit cleared when low 32-bit
> sub-register is written. This applies to destination register of ALU32 etc.
> JIT back-ends must guarantee this semantic when doing code-gen.
>
> x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
> back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
> etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
> extension sequence to meet such semantic.
>
> This is important, because for code the following:
>
>   u64_value = (u64) u32_value
>   ... other uses of u64_value
>
> compiler could exploit the semantic described above and save those zero
> extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
> JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
> ~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
> could go up to more for some arches which requires two shifts for zero
> extension) because JIT back-end needs to do extra code-gen for all such
> instructions.
>
> However this is not always necessary in case u32_value is never cast into
> a u64, which is quite normal in real life program. So, it would be really
> good if we could identify those places where such type cast happened, and
> only do zero extensions for them, not for the others. This could save a lot
> of BPF code-gen.
>
> Algo:
>  - Record indices of instructions that do sub-register def (write). And
>    these indices need to stay with function state so path pruning and bpf
>    to bpf function call could be handled properly.
>
>    These indices are kept up to date while doing insn walk.
>
>  - A full register read on an active sub-register def marks the def insn as
>    needing zero extension on dst register.
>
>  - A new sub-register write overrides the old one.
>
>    A new full register write makes the register free of zero extension on
>    dst register.
>
>  - When propagating register read64 during path pruning, it also marks def
>    insns whose defs are hanging active sub-register, if there is any read64
>    from shown from the equal state.
>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> ---
>  include/linux/bpf_verifier.h |  4 +++
>  kernel/bpf/verifier.c        | 85 +++++++++++++++++++++++++++++++++++++++++---
>  2 files changed, 84 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 27761ab..0ae9a3f 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -181,6 +181,9 @@ struct bpf_func_state {
>  	 */
>  	u32 subprogno;
>  
> +	/* tracks subreg definition. */
Ideally this comment should mention that the stored value is the insn_idx
 of the writing insn.  Perhaps also that this is safe because patching
 (bpf_patch_insn_data()) only happens after main verification completes.

-Ed
Jiong Wang March 26, 2019, 7:47 p.m. UTC | #2
On 26/03/2019 18:44, Edward Cree wrote:
> On 26/03/2019 18:05, Jiong Wang wrote:
>> eBPF ISA specification requires high 32-bit cleared when low 32-bit
>> sub-register is written. This applies to destination register of ALU32 etc.
>> JIT back-ends must guarantee this semantic when doing code-gen.
>>
>> x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
>> back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
>> etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
>> extension sequence to meet such semantic.
>>
>> This is important, because for code the following:
>>
>>    u64_value = (u64) u32_value
>>    ... other uses of u64_value
>>
>> compiler could exploit the semantic described above and save those zero
>> extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
>> JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
>> ~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
>> could go up to more for some arches which requires two shifts for zero
>> extension) because JIT back-end needs to do extra code-gen for all such
>> instructions.
>>
>> However this is not always necessary in case u32_value is never cast into
>> a u64, which is quite normal in real life program. So, it would be really
>> good if we could identify those places where such type cast happened, and
>> only do zero extensions for them, not for the others. This could save a lot
>> of BPF code-gen.
>>
>> Algo:
>>   - Record indices of instructions that do sub-register def (write). And
>>     these indices need to stay with function state so path pruning and bpf
>>     to bpf function call could be handled properly.
>>
>>     These indices are kept up to date while doing insn walk.
>>
>>   - A full register read on an active sub-register def marks the def insn as
>>     needing zero extension on dst register.
>>
>>   - A new sub-register write overrides the old one.
>>
>>     A new full register write makes the register free of zero extension on
>>     dst register.
>>
>>   - When propagating register read64 during path pruning, it also marks def
>>     insns whose defs are hanging active sub-register, if there is any read64
>>     from shown from the equal state.
>>
>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> ---
>>   include/linux/bpf_verifier.h |  4 +++
>>   kernel/bpf/verifier.c        | 85 +++++++++++++++++++++++++++++++++++++++++---
>>   2 files changed, 84 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index 27761ab..0ae9a3f 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -181,6 +181,9 @@ struct bpf_func_state {
>>   	 */
>>   	u32 subprogno;
>>   
>> +	/* tracks subreg definition. */
> Ideally this comment should mention that the stored value is the insn_idx
>   of the writing insn.  Perhaps also that this is safe because patching
>   (bpf_patch_insn_data()) only happens after main verification completes.

OK, will add more comments for this.

Regards,
Jiong
Alexei Starovoitov March 27, 2019, 4:50 p.m. UTC | #3
On Tue, Mar 26, 2019 at 06:05:27PM +0000, Jiong Wang wrote:
> eBPF ISA specification requires high 32-bit cleared when low 32-bit
> sub-register is written. This applies to destination register of ALU32 etc.
> JIT back-ends must guarantee this semantic when doing code-gen.
> 
> x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
> back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
> etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
> extension sequence to meet such semantic.
> 
> This is important, because for code the following:
> 
>   u64_value = (u64) u32_value
>   ... other uses of u64_value
> 
> compiler could exploit the semantic described above and save those zero
> extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
> JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
> ~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
> could go up to more for some arches which requires two shifts for zero
> extension) because JIT back-end needs to do extra code-gen for all such
> instructions.
> 
> However this is not always necessary in case u32_value is never cast into
> a u64, which is quite normal in real life program. So, it would be really
> good if we could identify those places where such type cast happened, and
> only do zero extensions for them, not for the others. This could save a lot
> of BPF code-gen.
> 
> Algo:
>  - Record indices of instructions that do sub-register def (write). And
>    these indices need to stay with function state so path pruning and bpf
>    to bpf function call could be handled properly.
> 
>    These indices are kept up to date while doing insn walk.
> 
>  - A full register read on an active sub-register def marks the def insn as
>    needing zero extension on dst register.
> 
>  - A new sub-register write overrides the old one.
> 
>    A new full register write makes the register free of zero extension on
>    dst register.
> 
>  - When propagating register read64 during path pruning, it also marks def
>    insns whose defs are hanging active sub-register, if there is any read64
>    from shown from the equal state.
> 
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> ---
>  include/linux/bpf_verifier.h |  4 +++
>  kernel/bpf/verifier.c        | 85 +++++++++++++++++++++++++++++++++++++++++---
>  2 files changed, 84 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 27761ab..0ae9a3f 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -181,6 +181,9 @@ struct bpf_func_state {
>  	 */
>  	u32 subprogno;
>  
> +	/* tracks subreg definition. */
> +	s32 subreg_def[MAX_BPF_REG];

Same comment as Ed and another question: why it's not part of bpf_reg_state ?
Jiong Wang March 27, 2019, 5:06 p.m. UTC | #4
> On 27 Mar 2019, at 16:50, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> On Tue, Mar 26, 2019 at 06:05:27PM +0000, Jiong Wang wrote:
>> eBPF ISA specification requires high 32-bit cleared when low 32-bit
>> sub-register is written. This applies to destination register of ALU32 etc.
>> JIT back-ends must guarantee this semantic when doing code-gen.
>> 
>> x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
>> back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
>> etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
>> extension sequence to meet such semantic.
>> 
>> This is important, because for code the following:
>> 
>>  u64_value = (u64) u32_value
>>  ... other uses of u64_value
>> 
>> compiler could exploit the semantic described above and save those zero
>> extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
>> JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
>> ~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
>> could go up to more for some arches which requires two shifts for zero
>> extension) because JIT back-end needs to do extra code-gen for all such
>> instructions.
>> 
>> However this is not always necessary in case u32_value is never cast into
>> a u64, which is quite normal in real life program. So, it would be really
>> good if we could identify those places where such type cast happened, and
>> only do zero extensions for them, not for the others. This could save a lot
>> of BPF code-gen.
>> 
>> Algo:
>> - Record indices of instructions that do sub-register def (write). And
>>   these indices need to stay with function state so path pruning and bpf
>>   to bpf function call could be handled properly.
>> 
>>   These indices are kept up to date while doing insn walk.
>> 
>> - A full register read on an active sub-register def marks the def insn as
>>   needing zero extension on dst register.
>> 
>> - A new sub-register write overrides the old one.
>> 
>>   A new full register write makes the register free of zero extension on
>>   dst register.
>> 
>> - When propagating register read64 during path pruning, it also marks def
>>   insns whose defs are hanging active sub-register, if there is any read64
>>   from shown from the equal state.
>> 
>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> ---
>> include/linux/bpf_verifier.h |  4 +++
>> kernel/bpf/verifier.c        | 85 +++++++++++++++++++++++++++++++++++++++++---
>> 2 files changed, 84 insertions(+), 5 deletions(-)
>> 
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index 27761ab..0ae9a3f 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -181,6 +181,9 @@ struct bpf_func_state {
>> 	 */
>> 	u32 subprogno;
>> 
>> +	/* tracks subreg definition. */
>> +	s32 subreg_def[MAX_BPF_REG];
> 
> Same comment as Ed and another question: why it's not part of bpf_reg_state ?

Thanks for spotting this, indeed, it looks perfect to be merged into bpf_reg_state.

Regards,
Jiong
Jiong Wang April 5, 2019, 8:44 p.m. UTC | #5
> On 26 Mar 2019, at 18:44, Edward Cree <ecree@solarflare.com> wrote:
> 
> On 26/03/2019 18:05, Jiong Wang wrote:
>> eBPF ISA specification requires high 32-bit cleared when low 32-bit
>> sub-register is written. This applies to destination register of ALU32 etc.
>> JIT back-ends must guarantee this semantic when doing code-gen.
>> 
>> x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
>> back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
>> etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
>> extension sequence to meet such semantic.
>> 
>> This is important, because for code the following:
>> 
>>  u64_value = (u64) u32_value
>>  ... other uses of u64_value
>> 
>> compiler could exploit the semantic described above and save those zero
>> extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
>> JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
>> ~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
>> could go up to more for some arches which requires two shifts for zero
>> extension) because JIT back-end needs to do extra code-gen for all such
>> instructions.
>> 
>> However this is not always necessary in case u32_value is never cast into
>> a u64, which is quite normal in real life program. So, it would be really
>> good if we could identify those places where such type cast happened, and
>> only do zero extensions for them, not for the others. This could save a lot
>> of BPF code-gen.
>> 
>> Algo:
>> - Record indices of instructions that do sub-register def (write). And
>>   these indices need to stay with function state so path pruning and bpf
>>   to bpf function call could be handled properly.
>> 
>>   These indices are kept up to date while doing insn walk.
>> 
>> - A full register read on an active sub-register def marks the def insn as
>>   needing zero extension on dst register.
>> 
>> - A new sub-register write overrides the old one.
>> 
>>   A new full register write makes the register free of zero extension on
>>   dst register.
>> 
>> - When propagating register read64 during path pruning, it also marks def
>>   insns whose defs are hanging active sub-register, if there is any read64
>>   from shown from the equal state.
>> 
>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> ---
>> include/linux/bpf_verifier.h |  4 +++
>> kernel/bpf/verifier.c        | 85 +++++++++++++++++++++++++++++++++++++++++---
>> 2 files changed, 84 insertions(+), 5 deletions(-)
>> 
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index 27761ab..0ae9a3f 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -181,6 +181,9 @@ struct bpf_func_state {
>> 	 */
>> 	u32 subprogno;
>> 
>> +	/* tracks subreg definition. */
> Ideally this comment should mention that the stored value is the insn_idx
>  of the writing insn.  Perhaps also that this is safe because patching
>  (bpf_patch_insn_data()) only happens after main verification completes.

During full x86_64 host tests, found one new issue.                                    
                                                                                         
“convert_ctx_accesses” will change load size, A BPF_W load could be transformed          
into BPF_DW or kept as BPF_W depending on the underlying ctx field size. And             
“convert_ctx_accesses” happens after zero extension insertion.                           
                                                                                         
So, a BPF_W load could have been marked and zero extensions inserted after               
it, however, the later happened “convert_ctx_accesses” then figured out it’s             
transformed load size is actually BPF_DW then re-write to that. But the                  
previously inserted zero extensions then break things, the high 32 bits are              
wrongly cleared. For example:

1: r2 = *(u32 *)(r1 + 80)                                                                
2: r1 = *(u32 *)(r1 + 76)                                                                
3: r3 = r1                                                                               
4: r3 += 14                                                                              
5: if r3 > r2 goto +35                                                                   
                                                                                         
insn 1 and 2 could be turned into BPF_DW load if they are loading xdp “data"
and “data_end". There shouldn’t be zero-extension inserted after them will
will destroy the pointer. However they are treated as 32-bit load initially,
and later due to 64-bit use at insn 3 and 5, they are marked as needing zero
extension.                                                                        
                                                                                         
I am thinking normally the field sizes in *_md inside uapi/linux/bpf.h are
the same those in real underlying context, only when one field is pointer
type, then it could be possible be a u32 to u64 conversion. So, I guess
we just need to mark the dst register as a full 64-bit register write 
inside check_mem_access when for PTR_TO_CTX, the reg type of the dust reg
returned by check_ctx_access is ptr type.

Please let me know if I am thinking wrong.                                                               
                                                                                 
Thanks.
                                                     
Regards,                                                                         
Jiong
Alexei Starovoitov April 6, 2019, 3:41 a.m. UTC | #6
On Fri, Apr 05, 2019 at 09:44:49PM +0100, Jiong Wang wrote:
> 
> > On 26 Mar 2019, at 18:44, Edward Cree <ecree@solarflare.com> wrote:
> > 
> > On 26/03/2019 18:05, Jiong Wang wrote:
> >> eBPF ISA specification requires high 32-bit cleared when low 32-bit
> >> sub-register is written. This applies to destination register of ALU32 etc.
> >> JIT back-ends must guarantee this semantic when doing code-gen.
> >> 
> >> x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
> >> back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
> >> etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
> >> extension sequence to meet such semantic.
> >> 
> >> This is important, because for code the following:
> >> 
> >>  u64_value = (u64) u32_value
> >>  ... other uses of u64_value
> >> 
> >> compiler could exploit the semantic described above and save those zero
> >> extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
> >> JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
> >> ~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
> >> could go up to more for some arches which requires two shifts for zero
> >> extension) because JIT back-end needs to do extra code-gen for all such
> >> instructions.
> >> 
> >> However this is not always necessary in case u32_value is never cast into
> >> a u64, which is quite normal in real life program. So, it would be really
> >> good if we could identify those places where such type cast happened, and
> >> only do zero extensions for them, not for the others. This could save a lot
> >> of BPF code-gen.
> >> 
> >> Algo:
> >> - Record indices of instructions that do sub-register def (write). And
> >>   these indices need to stay with function state so path pruning and bpf
> >>   to bpf function call could be handled properly.
> >> 
> >>   These indices are kept up to date while doing insn walk.
> >> 
> >> - A full register read on an active sub-register def marks the def insn as
> >>   needing zero extension on dst register.
> >> 
> >> - A new sub-register write overrides the old one.
> >> 
> >>   A new full register write makes the register free of zero extension on
> >>   dst register.
> >> 
> >> - When propagating register read64 during path pruning, it also marks def
> >>   insns whose defs are hanging active sub-register, if there is any read64
> >>   from shown from the equal state.
> >> 
> >> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> >> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> >> ---
> >> include/linux/bpf_verifier.h |  4 +++
> >> kernel/bpf/verifier.c        | 85 +++++++++++++++++++++++++++++++++++++++++---
> >> 2 files changed, 84 insertions(+), 5 deletions(-)
> >> 
> >> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> >> index 27761ab..0ae9a3f 100644
> >> --- a/include/linux/bpf_verifier.h
> >> +++ b/include/linux/bpf_verifier.h
> >> @@ -181,6 +181,9 @@ struct bpf_func_state {
> >> 	 */
> >> 	u32 subprogno;
> >> 
> >> +	/* tracks subreg definition. */
> > Ideally this comment should mention that the stored value is the insn_idx
> >  of the writing insn.  Perhaps also that this is safe because patching
> >  (bpf_patch_insn_data()) only happens after main verification completes.
> 
> During full x86_64 host tests, found one new issue.                                    
>                                                                                          
> “convert_ctx_accesses” will change load size, A BPF_W load could be transformed          
> into BPF_DW or kept as BPF_W depending on the underlying ctx field size. And             
> “convert_ctx_accesses” happens after zero extension insertion.                           
>                                                                                          
> So, a BPF_W load could have been marked and zero extensions inserted after               
> it, however, the later happened “convert_ctx_accesses” then figured out it’s             
> transformed load size is actually BPF_DW then re-write to that. But the                  
> previously inserted zero extensions then break things, the high 32 bits are              
> wrongly cleared. For example:
> 
> 1: r2 = *(u32 *)(r1 + 80)                                                                
> 2: r1 = *(u32 *)(r1 + 76)                                                                
> 3: r3 = r1                                                                               
> 4: r3 += 14                                                                              
> 5: if r3 > r2 goto +35                                                                   
>                                                                                          
> insn 1 and 2 could be turned into BPF_DW load if they are loading xdp “data"
> and “data_end". There shouldn’t be zero-extension inserted after them will
> will destroy the pointer. However they are treated as 32-bit load initially,
> and later due to 64-bit use at insn 3 and 5, they are marked as needing zero
> extension.                                                                        
>                                                                                          
> I am thinking normally the field sizes in *_md inside uapi/linux/bpf.h are
> the same those in real underlying context, only when one field is pointer
> type, then it could be possible be a u32 to u64 conversion. So, I guess
> we just need to mark the dst register as a full 64-bit register write 
> inside check_mem_access when for PTR_TO_CTX, the reg type of the dust reg
> returned by check_ctx_access is ptr type.

Since the register containing ctx->data was used later in the load insn and
it's type was pointer the analysis should have marked it as 64-bit access.

It feels that there is an issue in propagating 64-bit access through
parentage chain. Since insn 5 above recognized r2 as 64-bit access
then how come insn 1 was still allowed to poison upper bits?
Jiong Wang April 6, 2019, 6:56 a.m. UTC | #7
Alexei Starovoitov writes:

> On Fri, Apr 05, 2019 at 09:44:49PM +0100, Jiong Wang wrote:
>> 
>> > On 26 Mar 2019, at 18:44, Edward Cree <ecree@solarflare.com> wrote:
>> > 
>> > On 26/03/2019 18:05, Jiong Wang wrote:
>> >> eBPF ISA specification requires high 32-bit cleared when low 32-bit
>> >> sub-register is written. This applies to destination register of ALU32 etc.
>> >> JIT back-ends must guarantee this semantic when doing code-gen.
>> >> 
>> >> x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
>> >> back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
>> >> etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
>> >> extension sequence to meet such semantic.
>> >> 
>> >> This is important, because for code the following:
>> >> 
>> >>  u64_value = (u64) u32_value
>> >>  ... other uses of u64_value
>> >> 
>> >> compiler could exploit the semantic described above and save those zero
>> >> extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
>> >> JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
>> >> ~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
>> >> could go up to more for some arches which requires two shifts for zero
>> >> extension) because JIT back-end needs to do extra code-gen for all such
>> >> instructions.
>> >> 
>> >> However this is not always necessary in case u32_value is never cast into
>> >> a u64, which is quite normal in real life program. So, it would be really
>> >> good if we could identify those places where such type cast happened, and
>> >> only do zero extensions for them, not for the others. This could save a lot
>> >> of BPF code-gen.
>> >> 
>> >> Algo:
>> >> - Record indices of instructions that do sub-register def (write). And
>> >>   these indices need to stay with function state so path pruning and bpf
>> >>   to bpf function call could be handled properly.
>> >> 
>> >>   These indices are kept up to date while doing insn walk.
>> >> 
>> >> - A full register read on an active sub-register def marks the def insn as
>> >>   needing zero extension on dst register.
>> >> 
>> >> - A new sub-register write overrides the old one.
>> >> 
>> >>   A new full register write makes the register free of zero extension on
>> >>   dst register.
>> >> 
>> >> - When propagating register read64 during path pruning, it also marks def
>> >>   insns whose defs are hanging active sub-register, if there is any read64
>> >>   from shown from the equal state.
>> >> 
>> >> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>> >> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> >> ---
>> >> include/linux/bpf_verifier.h |  4 +++
>> >> kernel/bpf/verifier.c        | 85 +++++++++++++++++++++++++++++++++++++++++---
>> >> 2 files changed, 84 insertions(+), 5 deletions(-)
>> >> 
>> >> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> >> index 27761ab..0ae9a3f 100644
>> >> --- a/include/linux/bpf_verifier.h
>> >> +++ b/include/linux/bpf_verifier.h
>> >> @@ -181,6 +181,9 @@ struct bpf_func_state {
>> >> 	 */
>> >> 	u32 subprogno;
>> >> 
>> >> +	/* tracks subreg definition. */
>> > Ideally this comment should mention that the stored value is the insn_idx
>> >  of the writing insn.  Perhaps also that this is safe because patching
>> >  (bpf_patch_insn_data()) only happens after main verification completes.
>> 
>> During full x86_64 host tests, found one new issue.                                    
>>                                                                                          
>> “convert_ctx_accesses” will change load size, A BPF_W load could be transformed          
>> into BPF_DW or kept as BPF_W depending on the underlying ctx field size. And             
>> “convert_ctx_accesses” happens after zero extension insertion.                           
>>                                                                                          
>> So, a BPF_W load could have been marked and zero extensions inserted after               
>> it, however, the later happened “convert_ctx_accesses” then figured out it’s             
>> transformed load size is actually BPF_DW then re-write to that. But the                  
>> previously inserted zero extensions then break things, the high 32 bits are              
>> wrongly cleared. For example:
>> 
>> 1: r2 = *(u32 *)(r1 + 80)                                                                
>> 2: r1 = *(u32 *)(r1 + 76)                                                                
>> 3: r3 = r1                                                                               
>> 4: r3 += 14                                                                              
>> 5: if r3 > r2 goto +35                                                                   
>>                                                                                          
>> insn 1 and 2 could be turned into BPF_DW load if they are loading xdp “data"
>> and “data_end". There shouldn’t be zero-extension inserted after them will
>> will destroy the pointer. However they are treated as 32-bit load initially,
>> and later due to 64-bit use at insn 3 and 5, they are marked as needing zero
>> extension.                                                                        
>>                                                                                          
>> I am thinking normally the field sizes in *_md inside uapi/linux/bpf.h are
>> the same those in real underlying context, only when one field is pointer
>> type, then it could be possible be a u32 to u64 conversion. So, I guess
>> we just need to mark the dst register as a full 64-bit register write 
>> inside check_mem_access when for PTR_TO_CTX, the reg type of the dust reg
>> returned by check_ctx_access is ptr type.
>
> Since the register containing ctx->data was used later in the load insn and
> it's type was pointer the analysis should have marked it as 64-bit access.
>
> It feels that there is an issue in propagating 64-bit access through
> parentage chain. Since insn 5 above recognized r2 as 64-bit access
> then how come insn 1 was still allowed to poison upper bits?

Guess my description was misleading. The high bits of insn 1 was not
poisoned, they are truncated, the analysis pass is correct here.

It is a BPF_W (4-byte) load, so initially it is marked as a sub-register
def and JIT compiler doesn't need to guarantee high 32-bit cleared. However
later insn 5 found it has a 64-bit use (as 64-bit operand in the
comparison), so it become mandatory to guarantee high 32-bit cleared, so
sequence transformed into:

1: r2 = *(u32 *)(r1 + 80)
2. r2 <<= 32
3. r2 >>= 32
4: r1 = *(u32 *)(r1 + 76)
5: r1 <<=  32
6: r1 >>= 32
5: r3 = r1                                                                               
6: r3 += 14                                                                              
7: if r3 > r2 goto +35

After the zero extension insertion, later in convert_ctx_access, it will
be further transformed into something like:

1: r2 = *(u64 *)(r1 + 80)
2. r2 <<= 32
3. r2 >>= 32

However, the inserted zero extension (insn 2/3) is still there and will
clear the high 32-bit of the loaded 64-bit value.

This issue should have been exposed before. But as described in the cover
letter, the opt is disabled on x86, my previous test methodology on x86 was
forcing it on through sysctl for a couple of insn matching unit tests only,
and was hoping other host arches like ppc could give it a full run on bpf
selftest before which the correctness of the opt was not verified by full
bpf selftest. I have done full run of some internal offload tests which
could be a good coverage, but the offload handles PTR_TO_CTX in a different
way so this issue was not caught.

Now as you suggested, the test methodology switched to poisoning high
32-bit on x86, so full test on bpf selftest is able to be enabled on x86
test and this issue is caught.

Regards,
Jiong
Alexei Starovoitov April 7, 2019, 2:51 a.m. UTC | #8
On Sat, Apr 06, 2019 at 07:56:25AM +0100, Jiong Wang wrote:
> 
> Alexei Starovoitov writes:
> 
> > On Fri, Apr 05, 2019 at 09:44:49PM +0100, Jiong Wang wrote:
> >> 
> >> > On 26 Mar 2019, at 18:44, Edward Cree <ecree@solarflare.com> wrote:
> >> > 
> >> > On 26/03/2019 18:05, Jiong Wang wrote:
> >> >> eBPF ISA specification requires high 32-bit cleared when low 32-bit
> >> >> sub-register is written. This applies to destination register of ALU32 etc.
> >> >> JIT back-ends must guarantee this semantic when doing code-gen.
> >> >> 
> >> >> x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
> >> >> back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
> >> >> etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
> >> >> extension sequence to meet such semantic.
> >> >> 
> >> >> This is important, because for code the following:
> >> >> 
> >> >>  u64_value = (u64) u32_value
> >> >>  ... other uses of u64_value
> >> >> 
> >> >> compiler could exploit the semantic described above and save those zero
> >> >> extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
> >> >> JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
> >> >> ~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
> >> >> could go up to more for some arches which requires two shifts for zero
> >> >> extension) because JIT back-end needs to do extra code-gen for all such
> >> >> instructions.
> >> >> 
> >> >> However this is not always necessary in case u32_value is never cast into
> >> >> a u64, which is quite normal in real life program. So, it would be really
> >> >> good if we could identify those places where such type cast happened, and
> >> >> only do zero extensions for them, not for the others. This could save a lot
> >> >> of BPF code-gen.
> >> >> 
> >> >> Algo:
> >> >> - Record indices of instructions that do sub-register def (write). And
> >> >>   these indices need to stay with function state so path pruning and bpf
> >> >>   to bpf function call could be handled properly.
> >> >> 
> >> >>   These indices are kept up to date while doing insn walk.
> >> >> 
> >> >> - A full register read on an active sub-register def marks the def insn as
> >> >>   needing zero extension on dst register.
> >> >> 
> >> >> - A new sub-register write overrides the old one.
> >> >> 
> >> >>   A new full register write makes the register free of zero extension on
> >> >>   dst register.
> >> >> 
> >> >> - When propagating register read64 during path pruning, it also marks def
> >> >>   insns whose defs are hanging active sub-register, if there is any read64
> >> >>   from shown from the equal state.
> >> >> 
> >> >> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> >> >> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> >> >> ---
> >> >> include/linux/bpf_verifier.h |  4 +++
> >> >> kernel/bpf/verifier.c        | 85 +++++++++++++++++++++++++++++++++++++++++---
> >> >> 2 files changed, 84 insertions(+), 5 deletions(-)
> >> >> 
> >> >> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> >> >> index 27761ab..0ae9a3f 100644
> >> >> --- a/include/linux/bpf_verifier.h
> >> >> +++ b/include/linux/bpf_verifier.h
> >> >> @@ -181,6 +181,9 @@ struct bpf_func_state {
> >> >> 	 */
> >> >> 	u32 subprogno;
> >> >> 
> >> >> +	/* tracks subreg definition. */
> >> > Ideally this comment should mention that the stored value is the insn_idx
> >> >  of the writing insn.  Perhaps also that this is safe because patching
> >> >  (bpf_patch_insn_data()) only happens after main verification completes.
> >> 
> >> During full x86_64 host tests, found one new issue.                                    
> >>                                                                                          
> >> “convert_ctx_accesses” will change load size, A BPF_W load could be transformed          
> >> into BPF_DW or kept as BPF_W depending on the underlying ctx field size. And             
> >> “convert_ctx_accesses” happens after zero extension insertion.                           
> >>                                                                                          
> >> So, a BPF_W load could have been marked and zero extensions inserted after               
> >> it, however, the later happened “convert_ctx_accesses” then figured out it’s             
> >> transformed load size is actually BPF_DW then re-write to that. But the                  
> >> previously inserted zero extensions then break things, the high 32 bits are              
> >> wrongly cleared. For example:
> >> 
> >> 1: r2 = *(u32 *)(r1 + 80)                                                                
> >> 2: r1 = *(u32 *)(r1 + 76)                                                                
> >> 3: r3 = r1                                                                               
> >> 4: r3 += 14                                                                              
> >> 5: if r3 > r2 goto +35                                                                   
> >>                                                                                          
> >> insn 1 and 2 could be turned into BPF_DW load if they are loading xdp “data"
> >> and “data_end". There shouldn’t be zero-extension inserted after them will
> >> will destroy the pointer. However they are treated as 32-bit load initially,
> >> and later due to 64-bit use at insn 3 and 5, they are marked as needing zero
> >> extension.                                                                        
> >>                                                                                          
> >> I am thinking normally the field sizes in *_md inside uapi/linux/bpf.h are
> >> the same those in real underlying context, only when one field is pointer
> >> type, then it could be possible be a u32 to u64 conversion. So, I guess
> >> we just need to mark the dst register as a full 64-bit register write 
> >> inside check_mem_access when for PTR_TO_CTX, the reg type of the dust reg
> >> returned by check_ctx_access is ptr type.
> >
> > Since the register containing ctx->data was used later in the load insn and
> > it's type was pointer the analysis should have marked it as 64-bit access.
> >
> > It feels that there is an issue in propagating 64-bit access through
> > parentage chain. Since insn 5 above recognized r2 as 64-bit access
> > then how come insn 1 was still allowed to poison upper bits?
> 
> Guess my description was misleading. The high bits of insn 1 was not
> poisoned, they are truncated, the analysis pass is correct here.
> 
> It is a BPF_W (4-byte) load, so initially it is marked as a sub-register
> def and JIT compiler doesn't need to guarantee high 32-bit cleared. However
> later insn 5 found it has a 64-bit use (as 64-bit operand in the
> comparison), so it become mandatory to guarantee high 32-bit cleared, so
> sequence transformed into:
> 
> 1: r2 = *(u32 *)(r1 + 80)
> 2. r2 <<= 32
> 3. r2 >>= 32
> 4: r1 = *(u32 *)(r1 + 76)
> 5: r1 <<=  32
> 6: r1 >>= 32
> 5: r3 = r1                                                                               
> 6: r3 += 14                                                                              
> 7: if r3 > r2 goto +35
> 
> After the zero extension insertion, later in convert_ctx_access, it will
> be further transformed into something like:
> 
> 1: r2 = *(u64 *)(r1 + 80)
> 2. r2 <<= 32
> 3. r2 >>= 32
> 
> However, the inserted zero extension (insn 2/3) is still there and will
> clear the high 32-bit of the loaded 64-bit value.
> 
> This issue should have been exposed before. But as described in the cover
> letter, the opt is disabled on x86, my previous test methodology on x86 was
> forcing it on through sysctl for a couple of insn matching unit tests only,
> and was hoping other host arches like ppc could give it a full run on bpf
> selftest before which the correctness of the opt was not verified by full
> bpf selftest. I have done full run of some internal offload tests which
> could be a good coverage, but the offload handles PTR_TO_CTX in a different
> way so this issue was not caught.
> 
> Now as you suggested, the test methodology switched to poisoning high
> 32-bit on x86, so full test on bpf selftest is able to be enabled on x86
> test and this issue is caught.

Got it. Yes. checking that check_ctx_access returns ptr type will be enough.
diff mbox series

Patch

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 27761ab..0ae9a3f 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -181,6 +181,9 @@  struct bpf_func_state {
 	 */
 	u32 subprogno;
 
+	/* tracks subreg definition. */
+	s32 subreg_def[MAX_BPF_REG];
+
 	/* The following fields should be last. See copy_func_state() */
 	int acquired_refs;
 	struct bpf_reference_state *refs;
@@ -232,6 +235,7 @@  struct bpf_insn_aux_data {
 	int ctx_field_size; /* the ctx field size for load insn, maybe 0 */
 	int sanitize_stack_off; /* stack slot to be cleared */
 	bool seen; /* this insn was processed by the verifier */
+	bool zext_dst; /* this insn zero extend dst reg */
 	u8 alu_state; /* used in combination with alu_limit */
 	unsigned int orig_idx; /* original instruction index */
 };
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b95c438..66e5e65 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -971,16 +971,19 @@  static void mark_reg_not_init(struct bpf_verifier_env *env,
 	__mark_reg_not_init(regs + regno);
 }
 
+#define DEF_NOT_SUBREG	(-1)
 static void init_reg_state(struct bpf_verifier_env *env,
 			   struct bpf_func_state *state)
 {
 	struct bpf_reg_state *regs = state->regs;
+	s32 *subreg_def = state->subreg_def;
 	int i;
 
 	for (i = 0; i < MAX_BPF_REG; i++) {
 		mark_reg_not_init(env, regs, i);
 		regs[i].live = REG_LIVE_NONE;
 		regs[i].parent = NULL;
+		subreg_def[i] = DEF_NOT_SUBREG;
 	}
 
 	/* frame pointer */
@@ -1149,18 +1152,66 @@  static int mark_reg_read(struct bpf_verifier_env *env,
 	return 0;
 }
 
+/* This function is supposed to be used by the following check_reg_arg only. */
+static bool insn_has_reg64(struct bpf_insn *insn)
+{
+	u8 code, class, op;
+
+	code = insn->code;
+	class = BPF_CLASS(code);
+	op = BPF_OP(code);
+
+	/* BPF_EXIT will reach here because of return value readability test for
+	 * "main" which has s32 return value.
+	 * BPF_CALL will reach here because of marking caller saved clobber with
+	 * DST_OP_NO_MARK for which we don't care the register def because they
+	 * are anyway marked as NOT_INIT already.
+	 *
+	 * So, return false for both.
+	 */
+	if (class == BPF_JMP && (op == BPF_EXIT || op == BPF_CALL))
+		return false;
+
+	if (class == BPF_ALU64 || class == BPF_JMP ||
+	    /* BPF_END always use BPF_ALU class. */
+	    (class == BPF_ALU && op == BPF_END && insn->imm == 64))
+		return true;
+
+	if (class == BPF_ALU || class == BPF_JMP32)
+		return false;
+
+	/* LD/ST/LDX/STX */
+	return BPF_SIZE(code) == BPF_DW;
+}
+
+static void mark_insn_zext(struct bpf_verifier_env *env,
+			   struct bpf_func_state *state, u8 regno)
+{
+	s32 def_idx = state->subreg_def[regno];
+
+	if (def_idx == DEF_NOT_SUBREG)
+		return;
+
+	env->insn_aux_data[def_idx].zext_dst = true;
+	/* The dst will be zero extended, so won't be sub-register anymore. */
+	state->subreg_def[regno] = DEF_NOT_SUBREG;
+}
+
 static int check_reg_arg(struct bpf_verifier_env *env, u32 regno,
 			 enum reg_arg_type t)
 {
 	struct bpf_verifier_state *vstate = env->cur_state;
 	struct bpf_func_state *state = vstate->frame[vstate->curframe];
+	struct bpf_insn *insn = env->prog->insnsi + env->insn_idx;
 	struct bpf_reg_state *regs = state->regs;
+	bool dw_reg;
 
 	if (regno >= MAX_BPF_REG) {
 		verbose(env, "R%d is invalid\n", regno);
 		return -EINVAL;
 	}
 
+	dw_reg = insn_has_reg64(insn);
 	if (t == SRC_OP) {
 		/* check whether register used as source operand can be read */
 		if (regs[regno].type == NOT_INIT) {
@@ -1168,9 +1219,12 @@  static int check_reg_arg(struct bpf_verifier_env *env, u32 regno,
 			return -EACCES;
 		}
 		/* We don't need to worry about FP liveness because it's read-only */
-		if (regno != BPF_REG_FP)
+		if (regno != BPF_REG_FP) {
+			if (dw_reg)
+				mark_insn_zext(env, state, regno);
 			return mark_reg_read(env, &regs[regno],
-					     regs[regno].parent, true);
+					     regs[regno].parent, dw_reg);
+		}
 	} else {
 		/* check whether register used as dest operand can be written to */
 		if (regno == BPF_REG_FP) {
@@ -1178,6 +1232,8 @@  static int check_reg_arg(struct bpf_verifier_env *env, u32 regno,
 			return -EACCES;
 		}
 		regs[regno].live |= REG_LIVE_WRITTEN;
+		state->subreg_def[regno] =
+			dw_reg ? DEF_NOT_SUBREG : env->insn_idx;
 		if (t == DST_OP)
 			mark_reg_unknown(env, regs, regno);
 	}
@@ -2360,6 +2416,9 @@  static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 	if (err)
 		return err;
 
+	/* arg_type doesn't differentiate 32 and 64-bit arg, always zext. */
+	mark_insn_zext(env, cur_func(env), regno);
+
 	if (arg_type == ARG_ANYTHING) {
 		if (is_pointer_value(env, regno)) {
 			verbose(env, "R%d leaks addr into helper function\n",
@@ -2880,10 +2939,13 @@  static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		return err;
 
 	/* copy r1 - r5 args that callee can access.  The copy includes parent
-	 * pointers, which connects us up to the liveness chain
+	 * pointers, which connects us up to the liveness chain. subreg_def for
+	 * them need to be copied as well.
 	 */
-	for (i = BPF_REG_1; i <= BPF_REG_5; i++)
+	for (i = BPF_REG_1; i <= BPF_REG_5; i++) {
 		callee->regs[i] = caller->regs[i];
+		callee->subreg_def[i] = caller->subreg_def[i];
+	}
 
 	/* after the call registers r0 - r5 were scratched */
 	for (i = 0; i < CALLER_SAVED_REGS; i++) {
@@ -2928,8 +2990,11 @@  static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
 
 	state->curframe--;
 	caller = state->frame[state->curframe];
-	/* return to the caller whatever r0 had in the callee */
+	/* return to the caller whatever r0 had in the callee, subreg_def should
+	 * be copied to caller as well.
+	 */
 	caller->regs[BPF_REG_0] = *r0;
+	caller->subreg_def[BPF_REG_0] = callee->subreg_def[BPF_REG_0];
 
 	/* Transfer references to the caller */
 	err = transfer_reference_state(caller, callee);
@@ -3118,6 +3183,9 @@  static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
 		check_reg_arg(env, caller_saved[i], DST_OP_NO_MARK);
 	}
 
+	/* helper call must return full 64-bit R0. */
+	cur_func(env)->subreg_def[BPF_REG_0] = DEF_NOT_SUBREG;
+
 	/* update return register (already marked as written above) */
 	if (fn->ret_type == RET_INTEGER) {
 		/* sets type to SCALAR_VALUE */
@@ -5114,6 +5182,8 @@  static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
 	 * Already marked as written above.
 	 */
 	mark_reg_unknown(env, regs, BPF_REG_0);
+	/* ld_abs load up to 32-bit skb data. */
+	cur_func(env)->subreg_def[BPF_REG_0] = env->insn_idx;
 	return 0;
 }
 
@@ -6092,12 +6162,17 @@  static int propagate_liveness(struct bpf_verifier_env *env,
 	BUILD_BUG_ON(BPF_REG_FP + 1 != MAX_BPF_REG);
 	parent_regs = vparent->frame[vparent->curframe]->regs;
 	regs = vstate->frame[vstate->curframe]->regs;
+	parent = vparent->frame[vparent->curframe];
 	/* We don't need to worry about FP liveness because it's read-only */
 	for (i = 0; i < BPF_REG_FP; i++) {
 		err = propagate_liveness_reg(env, &regs[i], &parent_regs[i],
 					     REG_LIVE_READ64);
 		if (err < 0)
 			return err;
+
+		if (err > 0)
+			mark_insn_zext(env, parent, i);
+
 		err = propagate_liveness_reg(env, &regs[i], &parent_regs[i],
 					     REG_LIVE_READ32);
 		if (err < 0)