diff mbox series

[bpf-next,v3,4/9] bpf/verifier: improve register value range tracking with ARSH

Message ID 20180420221842.742330-5-yhs@fb.com
State Changes Requested, archived
Delegated to: BPF Maintainers
Headers show
Series bpf: add bpf_get_stack helper | expand

Commit Message

Yonghong Song April 20, 2018, 10:18 p.m. UTC
When helpers like bpf_get_stack returns an int value
and later on used for arithmetic computation, the LSH and ARSH
operations are often required to get proper sign extension into
64-bit. For example, without this patch:
    54: R0=inv(id=0,umax_value=800)
    54: (bf) r8 = r0
    55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
    55: (67) r8 <<= 32
    56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
    56: (c7) r8 s>>= 32
    57: R8=inv(id=0)
With this patch:
    54: R0=inv(id=0,umax_value=800)
    54: (bf) r8 = r0
    55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
    55: (67) r8 <<= 32
    56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
    56: (c7) r8 s>>= 32
    57: R8=inv(id=0, umax_value=800,var_off=(0x0; 0x3ff))
With better range of "R8", later on when "R8" is added to other register,
e.g., a map pointer or scalar-value register, the better register
range can be derived and verifier failure may be avoided.

In our later example,
    ......
    usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
    if (usize < 0)
        return 0;
    ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
    ......
Without improving ARSH value range tracking, the register representing
"max_len - usize" will have smin_value equal to S64_MIN and will be
rejected by verifier.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/verifier.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

Comments

Alexei Starovoitov April 23, 2018, 12:16 a.m. UTC | #1
On Fri, Apr 20, 2018 at 03:18:37PM -0700, Yonghong Song wrote:
> When helpers like bpf_get_stack returns an int value
> and later on used for arithmetic computation, the LSH and ARSH
> operations are often required to get proper sign extension into
> 64-bit. For example, without this patch:
>     54: R0=inv(id=0,umax_value=800)
>     54: (bf) r8 = r0
>     55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
>     55: (67) r8 <<= 32
>     56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
>     56: (c7) r8 s>>= 32
>     57: R8=inv(id=0)
> With this patch:
>     54: R0=inv(id=0,umax_value=800)
>     54: (bf) r8 = r0
>     55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
>     55: (67) r8 <<= 32
>     56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
>     56: (c7) r8 s>>= 32
>     57: R8=inv(id=0, umax_value=800,var_off=(0x0; 0x3ff))
> With better range of "R8", later on when "R8" is added to other register,
> e.g., a map pointer or scalar-value register, the better register
> range can be derived and verifier failure may be avoided.
> 
> In our later example,
>     ......
>     usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
>     if (usize < 0)
>         return 0;
>     ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
>     ......
> Without improving ARSH value range tracking, the register representing
> "max_len - usize" will have smin_value equal to S64_MIN and will be
> rejected by verifier.
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  kernel/bpf/verifier.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 3c8bb92..01c215d 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -2975,6 +2975,32 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
>  		/* We may learn something more from the var_off */
>  		__update_reg_bounds(dst_reg);
>  		break;
> +	case BPF_ARSH:
> +		if (umax_val >= insn_bitness) {
> +			/* Shifts greater than 31 or 63 are undefined.
> +			 * This includes shifts by a negative number.
> +			 */
> +			mark_reg_unknown(env, regs, insn->dst_reg);
> +			break;
> +		}
> +		if (dst_reg->smin_value < 0)
> +			dst_reg->smin_value >>= umin_val;
> +		else
> +			dst_reg->smin_value >>= umax_val;
> +		if (dst_reg->smax_value < 0)
> +			dst_reg->smax_value >>= umax_val;
> +		else
> +			dst_reg->smax_value >>= umin_val;
> +		if (src_known)
> +			dst_reg->var_off = tnum_rshift(dst_reg->var_off,
> +						       umin_val);
> +		else
> +			dst_reg->var_off = tnum_rshift(tnum_unknown, umin_val);
> +		dst_reg->umin_value >>= umax_val;
> +		dst_reg->umax_value >>= umin_val;
> +		/* We may learn something more from the var_off */
> +		__update_reg_bounds(dst_reg);

I'm struggling to understand how these bounds are computed.
Could you add examples in the comments?
In particular if dst_reg is unknown (tnum.mask == -1)
the above tnum_rshift() will clear upper bits and will make it
64-bit positive, but that doesn't seem correct.
What am I missing?
Yonghong Song April 23, 2018, 2:49 a.m. UTC | #2
On 4/22/18 5:16 PM, Alexei Starovoitov wrote:
> On Fri, Apr 20, 2018 at 03:18:37PM -0700, Yonghong Song wrote:
>> When helpers like bpf_get_stack returns an int value
>> and later on used for arithmetic computation, the LSH and ARSH
>> operations are often required to get proper sign extension into
>> 64-bit. For example, without this patch:
>>      54: R0=inv(id=0,umax_value=800)
>>      54: (bf) r8 = r0
>>      55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
>>      55: (67) r8 <<= 32
>>      56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
>>      56: (c7) r8 s>>= 32
>>      57: R8=inv(id=0)
>> With this patch:
>>      54: R0=inv(id=0,umax_value=800)
>>      54: (bf) r8 = r0
>>      55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
>>      55: (67) r8 <<= 32
>>      56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
>>      56: (c7) r8 s>>= 32
>>      57: R8=inv(id=0, umax_value=800,var_off=(0x0; 0x3ff))
>> With better range of "R8", later on when "R8" is added to other register,
>> e.g., a map pointer or scalar-value register, the better register
>> range can be derived and verifier failure may be avoided.
>>
>> In our later example,
>>      ......
>>      usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
>>      if (usize < 0)
>>          return 0;
>>      ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
>>      ......
>> Without improving ARSH value range tracking, the register representing
>> "max_len - usize" will have smin_value equal to S64_MIN and will be
>> rejected by verifier.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   kernel/bpf/verifier.c | 26 ++++++++++++++++++++++++++
>>   1 file changed, 26 insertions(+)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 3c8bb92..01c215d 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -2975,6 +2975,32 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
>>   		/* We may learn something more from the var_off */
>>   		__update_reg_bounds(dst_reg);
>>   		break;
>> +	case BPF_ARSH:
>> +		if (umax_val >= insn_bitness) {
>> +			/* Shifts greater than 31 or 63 are undefined.
>> +			 * This includes shifts by a negative number.
>> +			 */
>> +			mark_reg_unknown(env, regs, insn->dst_reg);
>> +			break;
>> +		}
>> +		if (dst_reg->smin_value < 0)
>> +			dst_reg->smin_value >>= umin_val;
>> +		else
>> +			dst_reg->smin_value >>= umax_val;
>> +		if (dst_reg->smax_value < 0)
>> +			dst_reg->smax_value >>= umax_val;
>> +		else
>> +			dst_reg->smax_value >>= umin_val;
>> +		if (src_known)
>> +			dst_reg->var_off = tnum_rshift(dst_reg->var_off,
>> +						       umin_val);
>> +		else
>> +			dst_reg->var_off = tnum_rshift(tnum_unknown, umin_val);
>> +		dst_reg->umin_value >>= umax_val;
>> +		dst_reg->umax_value >>= umin_val;
>> +		/* We may learn something more from the var_off */
>> +		__update_reg_bounds(dst_reg);
> 
> I'm struggling to understand how these bounds are computed.
> Could you add examples in the comments?

Okay, let me try to add some comments for better understanding.

> In particular if dst_reg is unknown (tnum.mask == -1)
> the above tnum_rshift() will clear upper bits and will make it
> 64-bit positive, but that doesn't seem correct.
> What am I missing?

Considering this is arith shift, we probably should just have
dst_reg->var_off = tnum_unknown to be conservative.

I could miss something here as well. Let me try to write more
detailed explanation, hopefully to cover all corner cases.
Alexei Starovoitov April 23, 2018, 4:19 a.m. UTC | #3
On Sun, Apr 22, 2018 at 07:49:13PM -0700, Yonghong Song wrote:
> 
> 
> On 4/22/18 5:16 PM, Alexei Starovoitov wrote:
> > On Fri, Apr 20, 2018 at 03:18:37PM -0700, Yonghong Song wrote:
> > > When helpers like bpf_get_stack returns an int value
> > > and later on used for arithmetic computation, the LSH and ARSH
> > > operations are often required to get proper sign extension into
> > > 64-bit. For example, without this patch:
> > >      54: R0=inv(id=0,umax_value=800)
> > >      54: (bf) r8 = r0
> > >      55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
> > >      55: (67) r8 <<= 32
> > >      56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
> > >      56: (c7) r8 s>>= 32
> > >      57: R8=inv(id=0)
> > > With this patch:
> > >      54: R0=inv(id=0,umax_value=800)
> > >      54: (bf) r8 = r0
> > >      55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
> > >      55: (67) r8 <<= 32
> > >      56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
> > >      56: (c7) r8 s>>= 32
> > >      57: R8=inv(id=0, umax_value=800,var_off=(0x0; 0x3ff))
> > > With better range of "R8", later on when "R8" is added to other register,
> > > e.g., a map pointer or scalar-value register, the better register
> > > range can be derived and verifier failure may be avoided.
> > > 
> > > In our later example,
> > >      ......
> > >      usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
> > >      if (usize < 0)
> > >          return 0;
> > >      ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
> > >      ......
> > > Without improving ARSH value range tracking, the register representing
> > > "max_len - usize" will have smin_value equal to S64_MIN and will be
> > > rejected by verifier.
> > > 
> > > Signed-off-by: Yonghong Song <yhs@fb.com>
> > > ---
> > >   kernel/bpf/verifier.c | 26 ++++++++++++++++++++++++++
> > >   1 file changed, 26 insertions(+)
> > > 
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index 3c8bb92..01c215d 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -2975,6 +2975,32 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
> > >   		/* We may learn something more from the var_off */
> > >   		__update_reg_bounds(dst_reg);
> > >   		break;
> > > +	case BPF_ARSH:
> > > +		if (umax_val >= insn_bitness) {
> > > +			/* Shifts greater than 31 or 63 are undefined.
> > > +			 * This includes shifts by a negative number.
> > > +			 */
> > > +			mark_reg_unknown(env, regs, insn->dst_reg);
> > > +			break;
> > > +		}
> > > +		if (dst_reg->smin_value < 0)
> > > +			dst_reg->smin_value >>= umin_val;
> > > +		else
> > > +			dst_reg->smin_value >>= umax_val;
> > > +		if (dst_reg->smax_value < 0)
> > > +			dst_reg->smax_value >>= umax_val;
> > > +		else
> > > +			dst_reg->smax_value >>= umin_val;
> > > +		if (src_known)
> > > +			dst_reg->var_off = tnum_rshift(dst_reg->var_off,
> > > +						       umin_val);
> > > +		else
> > > +			dst_reg->var_off = tnum_rshift(tnum_unknown, umin_val);
> > > +		dst_reg->umin_value >>= umax_val;
> > > +		dst_reg->umax_value >>= umin_val;
> > > +		/* We may learn something more from the var_off */
> > > +		__update_reg_bounds(dst_reg);
> > 
> > I'm struggling to understand how these bounds are computed.
> > Could you add examples in the comments?
> 
> Okay, let me try to add some comments for better understanding.
> 
> > In particular if dst_reg is unknown (tnum.mask == -1)
> > the above tnum_rshift() will clear upper bits and will make it
> > 64-bit positive, but that doesn't seem correct.
> > What am I missing?
> 
> Considering this is arith shift, we probably should just have
> dst_reg->var_off = tnum_unknown to be conservative.
> 
> I could miss something here as well. Let me try to write more
> detailed explanation, hopefully to cover all corner cases.

Is there a use case for !src_known ?
I think test_verifier should have 100% line coverage of verifier.c
and every 'if' condition in the verifier needs to have real use case
behind it.
It's still on my todo list to get rid of [su][min|max]_value tracking
that was introduced without solid justification.
Yonghong Song April 23, 2018, 4:31 a.m. UTC | #4
On 4/22/18 9:19 PM, Alexei Starovoitov wrote:
> On Sun, Apr 22, 2018 at 07:49:13PM -0700, Yonghong Song wrote:
>>
>>
>> On 4/22/18 5:16 PM, Alexei Starovoitov wrote:
>>> On Fri, Apr 20, 2018 at 03:18:37PM -0700, Yonghong Song wrote:
>>>> When helpers like bpf_get_stack returns an int value
>>>> and later on used for arithmetic computation, the LSH and ARSH
>>>> operations are often required to get proper sign extension into
>>>> 64-bit. For example, without this patch:
>>>>       54: R0=inv(id=0,umax_value=800)
>>>>       54: (bf) r8 = r0
>>>>       55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
>>>>       55: (67) r8 <<= 32
>>>>       56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
>>>>       56: (c7) r8 s>>= 32
>>>>       57: R8=inv(id=0)
>>>> With this patch:
>>>>       54: R0=inv(id=0,umax_value=800)
>>>>       54: (bf) r8 = r0
>>>>       55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
>>>>       55: (67) r8 <<= 32
>>>>       56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
>>>>       56: (c7) r8 s>>= 32
>>>>       57: R8=inv(id=0, umax_value=800,var_off=(0x0; 0x3ff))
>>>> With better range of "R8", later on when "R8" is added to other register,
>>>> e.g., a map pointer or scalar-value register, the better register
>>>> range can be derived and verifier failure may be avoided.
>>>>
>>>> In our later example,
>>>>       ......
>>>>       usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
>>>>       if (usize < 0)
>>>>           return 0;
>>>>       ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
>>>>       ......
>>>> Without improving ARSH value range tracking, the register representing
>>>> "max_len - usize" will have smin_value equal to S64_MIN and will be
>>>> rejected by verifier.
>>>>
>>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>>> ---
>>>>    kernel/bpf/verifier.c | 26 ++++++++++++++++++++++++++
>>>>    1 file changed, 26 insertions(+)
>>>>
>>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>>> index 3c8bb92..01c215d 100644
>>>> --- a/kernel/bpf/verifier.c
>>>> +++ b/kernel/bpf/verifier.c
>>>> @@ -2975,6 +2975,32 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
>>>>    		/* We may learn something more from the var_off */
>>>>    		__update_reg_bounds(dst_reg);
>>>>    		break;
>>>> +	case BPF_ARSH:
>>>> +		if (umax_val >= insn_bitness) {
>>>> +			/* Shifts greater than 31 or 63 are undefined.
>>>> +			 * This includes shifts by a negative number.
>>>> +			 */
>>>> +			mark_reg_unknown(env, regs, insn->dst_reg);
>>>> +			break;
>>>> +		}
>>>> +		if (dst_reg->smin_value < 0)
>>>> +			dst_reg->smin_value >>= umin_val;
>>>> +		else
>>>> +			dst_reg->smin_value >>= umax_val;
>>>> +		if (dst_reg->smax_value < 0)
>>>> +			dst_reg->smax_value >>= umax_val;
>>>> +		else
>>>> +			dst_reg->smax_value >>= umin_val;
>>>> +		if (src_known)
>>>> +			dst_reg->var_off = tnum_rshift(dst_reg->var_off,
>>>> +						       umin_val);
>>>> +		else
>>>> +			dst_reg->var_off = tnum_rshift(tnum_unknown, umin_val);
>>>> +		dst_reg->umin_value >>= umax_val;
>>>> +		dst_reg->umax_value >>= umin_val;
>>>> +		/* We may learn something more from the var_off */
>>>> +		__update_reg_bounds(dst_reg);
>>>
>>> I'm struggling to understand how these bounds are computed.
>>> Could you add examples in the comments?
>>
>> Okay, let me try to add some comments for better understanding.
>>
>>> In particular if dst_reg is unknown (tnum.mask == -1)
>>> the above tnum_rshift() will clear upper bits and will make it
>>> 64-bit positive, but that doesn't seem correct.
>>> What am I missing?
>>
>> Considering this is arith shift, we probably should just have
>> dst_reg->var_off = tnum_unknown to be conservative.
>>
>> I could miss something here as well. Let me try to write more
>> detailed explanation, hopefully to cover all corner cases.
> 
> Is there a use case for !src_known ?

For typical bpf programs, the shift amount should always be known...
If src_known is true, it must be dealing custom packets or custom
data structures in tracing, etc.


> I think test_verifier should have 100% line coverage of verifier.c
> and every 'if' condition in the verifier needs to have real use case
> behind it.
> It's still on my todo list to get rid of [su][min|max]_value tracking
> that was introduced without solid justification.
>
Alexei Starovoitov April 23, 2018, 4:40 a.m. UTC | #5
On Sun, Apr 22, 2018 at 09:31:19PM -0700, Yonghong Song wrote:
> 
> 
> On 4/22/18 9:19 PM, Alexei Starovoitov wrote:
> > On Sun, Apr 22, 2018 at 07:49:13PM -0700, Yonghong Song wrote:
> > > 
> > > 
> > > On 4/22/18 5:16 PM, Alexei Starovoitov wrote:
> > > > On Fri, Apr 20, 2018 at 03:18:37PM -0700, Yonghong Song wrote:
> > > > > When helpers like bpf_get_stack returns an int value
> > > > > and later on used for arithmetic computation, the LSH and ARSH
> > > > > operations are often required to get proper sign extension into
> > > > > 64-bit. For example, without this patch:
> > > > >       54: R0=inv(id=0,umax_value=800)
> > > > >       54: (bf) r8 = r0
> > > > >       55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
> > > > >       55: (67) r8 <<= 32
> > > > >       56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
> > > > >       56: (c7) r8 s>>= 32
> > > > >       57: R8=inv(id=0)
> > > > > With this patch:
> > > > >       54: R0=inv(id=0,umax_value=800)
> > > > >       54: (bf) r8 = r0
> > > > >       55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
> > > > >       55: (67) r8 <<= 32
> > > > >       56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
> > > > >       56: (c7) r8 s>>= 32
> > > > >       57: R8=inv(id=0, umax_value=800,var_off=(0x0; 0x3ff))
> > > > > With better range of "R8", later on when "R8" is added to other register,
> > > > > e.g., a map pointer or scalar-value register, the better register
> > > > > range can be derived and verifier failure may be avoided.
> > > > > 
> > > > > In our later example,
> > > > >       ......
> > > > >       usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
> > > > >       if (usize < 0)
> > > > >           return 0;
> > > > >       ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
> > > > >       ......
> > > > > Without improving ARSH value range tracking, the register representing
> > > > > "max_len - usize" will have smin_value equal to S64_MIN and will be
> > > > > rejected by verifier.
> > > > > 
> > > > > Signed-off-by: Yonghong Song <yhs@fb.com>
> > > > > ---
> > > > >    kernel/bpf/verifier.c | 26 ++++++++++++++++++++++++++
> > > > >    1 file changed, 26 insertions(+)
> > > > > 
> > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > > index 3c8bb92..01c215d 100644
> > > > > --- a/kernel/bpf/verifier.c
> > > > > +++ b/kernel/bpf/verifier.c
> > > > > @@ -2975,6 +2975,32 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
> > > > >    		/* We may learn something more from the var_off */
> > > > >    		__update_reg_bounds(dst_reg);
> > > > >    		break;
> > > > > +	case BPF_ARSH:
> > > > > +		if (umax_val >= insn_bitness) {
> > > > > +			/* Shifts greater than 31 or 63 are undefined.
> > > > > +			 * This includes shifts by a negative number.
> > > > > +			 */
> > > > > +			mark_reg_unknown(env, regs, insn->dst_reg);
> > > > > +			break;
> > > > > +		}
> > > > > +		if (dst_reg->smin_value < 0)
> > > > > +			dst_reg->smin_value >>= umin_val;
> > > > > +		else
> > > > > +			dst_reg->smin_value >>= umax_val;
> > > > > +		if (dst_reg->smax_value < 0)
> > > > > +			dst_reg->smax_value >>= umax_val;
> > > > > +		else
> > > > > +			dst_reg->smax_value >>= umin_val;
> > > > > +		if (src_known)
> > > > > +			dst_reg->var_off = tnum_rshift(dst_reg->var_off,
> > > > > +						       umin_val);
> > > > > +		else
> > > > > +			dst_reg->var_off = tnum_rshift(tnum_unknown, umin_val);
> > > > > +		dst_reg->umin_value >>= umax_val;
> > > > > +		dst_reg->umax_value >>= umin_val;
> > > > > +		/* We may learn something more from the var_off */
> > > > > +		__update_reg_bounds(dst_reg);
> > > > 
> > > > I'm struggling to understand how these bounds are computed.
> > > > Could you add examples in the comments?
> > > 
> > > Okay, let me try to add some comments for better understanding.
> > > 
> > > > In particular if dst_reg is unknown (tnum.mask == -1)
> > > > the above tnum_rshift() will clear upper bits and will make it
> > > > 64-bit positive, but that doesn't seem correct.
> > > > What am I missing?
> > > 
> > > Considering this is arith shift, we probably should just have
> > > dst_reg->var_off = tnum_unknown to be conservative.
> > > 
> > > I could miss something here as well. Let me try to write more
> > > detailed explanation, hopefully to cover all corner cases.
> > 
> > Is there a use case for !src_known ?
> 
> For typical bpf programs, the shift amount should always be known...
> If src_known is true, it must be dealing custom packets or custom
> data structures in tracing, etc.

In the example it was <<= 32 and s>>= 32 only because newly
introduced helper returns signed 32-bit integer that is later
used in the math. I have a hard time imagining useful C code that
needs arithmetic shift with a variable.
Edward Cree April 23, 2018, 12:25 p.m. UTC | #6
On 20/04/18 23:18, Yonghong Song wrote:
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 3c8bb92..01c215d 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -2975,6 +2975,32 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
>  		/* We may learn something more from the var_off */
>  		__update_reg_bounds(dst_reg);
>  		break;
> +	case BPF_ARSH:
> +		if (umax_val >= insn_bitness) {
> +			/* Shifts greater than 31 or 63 are undefined.
> +			 * This includes shifts by a negative number.
> +			 */
> +			mark_reg_unknown(env, regs, insn->dst_reg);
> +			break;
> +		}
> +		if (dst_reg->smin_value < 0)
> +			dst_reg->smin_value >>= umin_val;
> +		else
> +			dst_reg->smin_value >>= umax_val;
> +		if (dst_reg->smax_value < 0)
> +			dst_reg->smax_value >>= umax_val;
> +		else
> +			dst_reg->smax_value >>= umin_val;
> +		if (src_known)
> +			dst_reg->var_off = tnum_rshift(dst_reg->var_off,
> +						       umin_val);
tnum_rshift is an unsigned shift, it won't do what you want here.
I think you could write a tnum_arshift that looks something like this
 (UNTESTED!):

    struct tnum tnum_arshift(struct tnum a, u8 shift)
    {
        return TNUM(((s64)a.value) >> shift, ((s64)a.mask) >> shift);
    }
Theory: if value sign bit is 1 then number is known negative so populate
 upper bits with known 1s.  If mask sign bit is 1 then number might be
 negative so populate upper bits with unknown.  Otherwise, number is
 known positive so populate upper bits with known 0s.

> +		else
> +			dst_reg->var_off = tnum_rshift(tnum_unknown, umin_val);
Applying the above here, tnum_arshift(tnum_unknown, ...) would always just
 return tnum_unknown, so just do "dst_reg->var_off = tnum_unknown;".
The reason for the corresponding logic in the BPF_RSH case is that a right
 logical shift _always_ populates upper bits with zeroes.
In any case these 'else' branches are currently never taken because they
 fall foul of the check Alexei added just before the switch,
    if (!src_known &&
        opcode != BPF_ADD && opcode != BPF_SUB && opcode != BPF_AND) {
        __mark_reg_unknown(dst_reg);
        return 0;
    }
So I can guarantee you haven't tested this code :-)

> +		dst_reg->umin_value >>= umax_val;
> +		dst_reg->umax_value >>= umin_val;
FWIW I think the way to handle umin/umax here is to blow them away and
 just rely on inferring new ubounds from the sbounds (i.e. the inverse of
 what we do just above in case BPF_RSH) since BPF_ARSH is essentially an
 operation on the signed value.  I don't think there is a need to support
 cases where the unsigned bounds of a signed shift of a value that may
 cross the sign boundary at (1<<63) are needed to verify a program.
(Unlike in the unsigned shift case, it is at least _possible_ for there to
 be information from the ubounds that we can't get from the sbounds - but
 it's a contrived case that isn't likely to be useful in real programs.)

-Ed
> +		/* We may learn something more from the var_off */
> +		__update_reg_bounds(dst_reg);
> +		break;
>  	default:
>  		mark_reg_unknown(env, regs, insn->dst_reg);
>  		break;
Yonghong Song April 23, 2018, 4:19 p.m. UTC | #7
On 4/23/18 5:25 AM, Edward Cree wrote:
> On 20/04/18 23:18, Yonghong Song wrote:
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 3c8bb92..01c215d 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -2975,6 +2975,32 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
>>   		/* We may learn something more from the var_off */
>>   		__update_reg_bounds(dst_reg);
>>   		break;
>> +	case BPF_ARSH:
>> +		if (umax_val >= insn_bitness) {
>> +			/* Shifts greater than 31 or 63 are undefined.
>> +			 * This includes shifts by a negative number.
>> +			 */
>> +			mark_reg_unknown(env, regs, insn->dst_reg);
>> +			break;
>> +		}
>> +		if (dst_reg->smin_value < 0)
>> +			dst_reg->smin_value >>= umin_val;
>> +		else
>> +			dst_reg->smin_value >>= umax_val;
>> +		if (dst_reg->smax_value < 0)
>> +			dst_reg->smax_value >>= umax_val;
>> +		else
>> +			dst_reg->smax_value >>= umin_val;
>> +		if (src_known)
>> +			dst_reg->var_off = tnum_rshift(dst_reg->var_off,
>> +						       umin_val);
> tnum_rshift is an unsigned shift, it won't do what you want here.
> I think you could write a tnum_arshift that looks something like this
>   (UNTESTED!):
> 
>      struct tnum tnum_arshift(struct tnum a, u8 shift)
>      {
>          return TNUM(((s64)a.value) >> shift, ((s64)a.mask) >> shift);
>      }
> Theory: if value sign bit is 1 then number is known negative so populate
>   upper bits with known 1s.  If mask sign bit is 1 then number might be
>   negative so populate upper bits with unknown.  Otherwise, number is
>   known positive so populate upper bits with known 0s.

Right, my last version just used this tnum_arshift :-).


>> +		else
>> +			dst_reg->var_off = tnum_rshift(tnum_unknown, umin_val);
> Applying the above here, tnum_arshift(tnum_unknown, ...) would always just
>   return tnum_unknown, so just do "dst_reg->var_off = tnum_unknown;".
> The reason for the corresponding logic in the BPF_RSH case is that a right
>   logical shift _always_ populates upper bits with zeroes.
> In any case these 'else' branches are currently never taken because they
>   fall foul of the check Alexei added just before the switch,
>      if (!src_known &&
>          opcode != BPF_ADD && opcode != BPF_SUB && opcode != BPF_AND) {
>          __mark_reg_unknown(dst_reg);
>          return 0;
>      }
> So I can guarantee you haven't tested this code :-)

I just noticed this last night and removed the !src_known branch all 
together here and from LSH/RSH.

> 
>> +		dst_reg->umin_value >>= umax_val;
>> +		dst_reg->umax_value >>= umin_val;
> FWIW I think the way to handle umin/umax here is to blow them away and
>   just rely on inferring new ubounds from the sbounds (i.e. the inverse of
>   what we do just above in case BPF_RSH) since BPF_ARSH is essentially an
>   operation on the signed value.  I don't think there is a need to support
>   cases where the unsigned bounds of a signed shift of a value that may
>   cross the sign boundary at (1<<63) are needed to verify a program.
> (Unlike in the unsigned shift case, it is at least _possible_ for there to
>   be information from the ubounds that we can't get from the sbounds - but
>   it's a contrived case that isn't likely to be useful in real programs.)

This makes sense and will make code simpler and easy to understand.
Will make the change.

Thanks!

> 
> -Ed
>> +		/* We may learn something more from the var_off */
>> +		__update_reg_bounds(dst_reg);
>> +		break;
>>   	default:
>>   		mark_reg_unknown(env, regs, insn->dst_reg);
>>   		break;
diff mbox series

Patch

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3c8bb92..01c215d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2975,6 +2975,32 @@  static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
 		/* We may learn something more from the var_off */
 		__update_reg_bounds(dst_reg);
 		break;
+	case BPF_ARSH:
+		if (umax_val >= insn_bitness) {
+			/* Shifts greater than 31 or 63 are undefined.
+			 * This includes shifts by a negative number.
+			 */
+			mark_reg_unknown(env, regs, insn->dst_reg);
+			break;
+		}
+		if (dst_reg->smin_value < 0)
+			dst_reg->smin_value >>= umin_val;
+		else
+			dst_reg->smin_value >>= umax_val;
+		if (dst_reg->smax_value < 0)
+			dst_reg->smax_value >>= umax_val;
+		else
+			dst_reg->smax_value >>= umin_val;
+		if (src_known)
+			dst_reg->var_off = tnum_rshift(dst_reg->var_off,
+						       umin_val);
+		else
+			dst_reg->var_off = tnum_rshift(tnum_unknown, umin_val);
+		dst_reg->umin_value >>= umax_val;
+		dst_reg->umax_value >>= umin_val;
+		/* We may learn something more from the var_off */
+		__update_reg_bounds(dst_reg);
+		break;
 	default:
 		mark_reg_unknown(env, regs, insn->dst_reg);
 		break;