diff mbox series

[RFC,1/4] net/ipv4/fib: Remove run-time check in tnode_alloc()

Message ID 20190326153026.24493-2-dima@arista.com
State RFC
Delegated to: David Miller
Headers show
Series net/fib: Speed up trie rebalancing for full view | expand

Commit Message

Dmitry Safonov March 26, 2019, 3:30 p.m. UTC
TNODE_KMALLOC_MAX is not used anywhere, while TNODE_VMALLOC_MAX check in
tnode_alloc() only adds additional cmp/jmp instructions to tnode
allocation. During rebalancing of the trie the function can be called
thousands of times. Runtime check takes cache line and predictor entry.
Futhermore, this check is always false on 64-bit platfroms and ipv4 has
only 4 byte address range and bits are limited by KEYLENGTH (32).

Move the check under unlikely() and change comparison to BITS_PER_LONG,
optimizing allocation of tnode during rebalancing (and removing it
complitely for platforms with BITS_PER_LONG > KEYLENGTH).

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 net/ipv4/fib_trie.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

Comments

Alexander Duyck April 1, 2019, 3:40 p.m. UTC | #1
On Tue, 2019-03-26 at 15:30 +0000, Dmitry Safonov wrote:
> TNODE_KMALLOC_MAX is not used anywhere, while TNODE_VMALLOC_MAX check in
> tnode_alloc() only adds additional cmp/jmp instructions to tnode
> allocation. During rebalancing of the trie the function can be called
> thousands of times. Runtime check takes cache line and predictor entry.
> Futhermore, this check is always false on 64-bit platfroms and ipv4 has
> only 4 byte address range and bits are limited by KEYLENGTH (32).
> 
> Move the check under unlikely() and change comparison to BITS_PER_LONG,
> optimizing allocation of tnode during rebalancing (and removing it
> complitely for platforms with BITS_PER_LONG > KEYLENGTH).
> 
> Signed-off-by: Dmitry Safonov <dima@arista.com>
> ---
>  net/ipv4/fib_trie.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> index a573e37e0615..ad7d56c421cb 100644
> --- a/net/ipv4/fib_trie.c
> +++ b/net/ipv4/fib_trie.c
> @@ -312,11 +312,6 @@ static inline void alias_free_mem_rcu(struct fib_alias *fa)
>  	call_rcu(&fa->rcu, __alias_free_mem);
>  }
>  
> -#define TNODE_KMALLOC_MAX \
> -	ilog2((PAGE_SIZE - TNODE_SIZE(0)) / sizeof(struct key_vector *))
> -#define TNODE_VMALLOC_MAX \
> -	ilog2((SIZE_MAX - TNODE_SIZE(0)) / sizeof(struct key_vector *))
> -
>  static void __node_free_rcu(struct rcu_head *head)
>  {
>  	struct tnode *n = container_of(head, struct tnode, rcu);
> @@ -333,8 +328,7 @@ static struct tnode *tnode_alloc(int bits)
>  {
>  	size_t size;
>  
> -	/* verify bits is within bounds */
> -	if (bits > TNODE_VMALLOC_MAX)
> +	if ((BITS_PER_LONG <= KEYLENGTH) && unlikely(bits >= BITS_PER_LONG))
>  		return NULL;
>  
>  	/* determine size and verify it is non-zero and didn't overflow */

I think it would be better if we left TNODE_VMALLOC_MAX instead of
replacing it with BITS_PER_LONG. This way we know that we are limited
by the size of the node on 32b systems, and by the KEYLENGTH on 64b
systems. The basic idea is to maintain the logic as to why we are doing
it this way instead of just burying things by using built in constants
that are close enough to work.

So for example I believe TNODE_VMALLOC_MAX is 31 on a 32b system. The
main reason for that is because we have to subtract the TNODE_SIZE from
the upper limit for size. By replacing TNODE_VMALLOC_MAX with
BITS_PER_LONG that becomes abstracted away and it becomes more likely
that somebody will mishandle it later.

- Alex
Dmitry Safonov April 1, 2019, 3:55 p.m. UTC | #2
Hi Alexander,

On 4/1/19 4:40 PM, Alexander Duyck wrote:
>> @@ -333,8 +328,7 @@ static struct tnode *tnode_alloc(int bits)
>>  {
>>  	size_t size;
>>  
>> -	/* verify bits is within bounds */
>> -	if (bits > TNODE_VMALLOC_MAX)
>> +	if ((BITS_PER_LONG <= KEYLENGTH) && unlikely(bits >= BITS_PER_LONG))
>>  		return NULL;
>>  
>>  	/* determine size and verify it is non-zero and didn't overflow */
> 
> I think it would be better if we left TNODE_VMALLOC_MAX instead of
> replacing it with BITS_PER_LONG. This way we know that we are limited
> by the size of the node on 32b systems, and by the KEYLENGTH on 64b
> systems. The basic idea is to maintain the logic as to why we are doing
> it this way instead of just burying things by using built in constants
> that are close enough to work.
> 
> So for example I believe TNODE_VMALLOC_MAX is 31 on a 32b system.

This is also true after the change: bits == 31 will *not* return.

> The
> main reason for that is because we have to subtract the TNODE_SIZE from
> the upper limit for size. By replacing TNODE_VMALLOC_MAX with
> BITS_PER_LONG that becomes abstracted away and it becomes more likely
> that somebody will mishandle it later.

So, I wanted to remove run-time check here on x86_64..
I could do it by adding !CONFIG_64BIT around the check.

But, I thought about the value of the check:
I believe it's here not to limit maximum allocated size:
kzalloc()/vzalloc() will fail and we should be fine with that.

In my opinion it's rather to check that (1UL << bits) wouldn't result in UB.


Thanks,
          Dmitry
Alexander Duyck April 1, 2019, 5:50 p.m. UTC | #3
On Mon, 2019-04-01 at 16:55 +0100, Dmitry Safonov wrote:
> Hi Alexander,
> 
> On 4/1/19 4:40 PM, Alexander Duyck wrote:
> > > @@ -333,8 +328,7 @@ static struct tnode *tnode_alloc(int bits)
> > >  {
> > >  	size_t size;
> > >  
> > > -	/* verify bits is within bounds */
> > > -	if (bits > TNODE_VMALLOC_MAX)
> > > +	if ((BITS_PER_LONG <= KEYLENGTH) && unlikely(bits >= BITS_PER_LONG))
> > >  		return NULL;
> > >  
> > >  	/* determine size and verify it is non-zero and didn't overflow */
> > 
> > I think it would be better if we left TNODE_VMALLOC_MAX instead of
> > replacing it with BITS_PER_LONG. This way we know that we are limited
> > by the size of the node on 32b systems, and by the KEYLENGTH on 64b
> > systems. The basic idea is to maintain the logic as to why we are doing
> > it this way instead of just burying things by using built in constants
> > that are close enough to work.
> > 
> > So for example I believe TNODE_VMALLOC_MAX is 31 on a 32b system.
> 
> This is also true after the change: bits == 31 will *not* return.

Actually now that I think about it TNODE_VMALLOC_MAX is likely much
less than 31. The logic that we have to be concerned with is:
	size = TNODE_SIZE(1ul << bits);

If size is a 32b value, and the size of a pointer is 4 bytes, then our
upper limit is roughly ilog2((4G - 28) / 4), which comes out to 29.
What we are trying to avoid is overflowing the size variable, not
actually limiting the vmalloc itself.

> > The
> > main reason for that is because we have to subtract the TNODE_SIZE from
> > the upper limit for size. By replacing TNODE_VMALLOC_MAX with
> > BITS_PER_LONG that becomes abstracted away and it becomes more likely
> > that somebody will mishandle it later.
> 
> So, I wanted to remove run-time check here on x86_64..
> I could do it by adding !CONFIG_64BIT around the check.

I have no problem with that. All I am suggesting is that if at all
possible we should use TNODE_VMALLOC_MAX instead of BITS_PER_LONG.

> But, I thought about the value of the check:
> I believe it's here not to limit maximum allocated size:
> kzalloc()/vzalloc() will fail and we should be fine with that.

No, the problem is we don't want to overflow size. The allocation will
succeed, but give us a much smaller allocation then we expected.

> In my opinion it's rather to check that (1UL << bits) wouldn't result in UB.

Sort of, however we have to keep mind that 1ul << bits is an index so
it is also increased by the size of a pointer. As such the logic might
be better expressed as sizeof(void*) << bits.
Dmitry Safonov April 4, 2019, 4:33 p.m. UTC | #4
On 4/1/19 6:50 PM, Alexander Duyck wrote:
> On Mon, 2019-04-01 at 16:55 +0100, Dmitry Safonov wrote:
> Actually now that I think about it TNODE_VMALLOC_MAX is likely much
> less than 31. The logic that we have to be concerned with is:
> 	size = TNODE_SIZE(1ul << bits);
> 
> If size is a 32b value, and the size of a pointer is 4 bytes, then our
> upper limit is roughly ilog2((4G - 28) / 4), which comes out to 29.
> What we are trying to avoid is overflowing the size variable, not
> actually limiting the vmalloc itself.

Oh yes, I see - managed to forget that size can also overflow inside
TNODE_SIZE().
>> So, I wanted to remove run-time check here on x86_64..
>> I could do it by adding !CONFIG_64BIT around the check.
> 
> I have no problem with that. All I am suggesting is that if at all
> possible we should use TNODE_VMALLOC_MAX instead of BITS_PER_LONG.

Yeah, will rework this part.

Thanks,
          Dmitry
diff mbox series

Patch

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index a573e37e0615..ad7d56c421cb 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -312,11 +312,6 @@  static inline void alias_free_mem_rcu(struct fib_alias *fa)
 	call_rcu(&fa->rcu, __alias_free_mem);
 }
 
-#define TNODE_KMALLOC_MAX \
-	ilog2((PAGE_SIZE - TNODE_SIZE(0)) / sizeof(struct key_vector *))
-#define TNODE_VMALLOC_MAX \
-	ilog2((SIZE_MAX - TNODE_SIZE(0)) / sizeof(struct key_vector *))
-
 static void __node_free_rcu(struct rcu_head *head)
 {
 	struct tnode *n = container_of(head, struct tnode, rcu);
@@ -333,8 +328,7 @@  static struct tnode *tnode_alloc(int bits)
 {
 	size_t size;
 
-	/* verify bits is within bounds */
-	if (bits > TNODE_VMALLOC_MAX)
+	if ((BITS_PER_LONG <= KEYLENGTH) && unlikely(bits >= BITS_PER_LONG))
 		return NULL;
 
 	/* determine size and verify it is non-zero and didn't overflow */