net: optimize cmpxchg in ip_idents_reserve
diff mbox series

Message ID 1579058620-26684-1-git-send-email-zhangshaokun@hisilicon.com
State Changes Requested
Delegated to: David Miller
Headers show
Series
  • net: optimize cmpxchg in ip_idents_reserve
Related show

Commit Message

Shaokun Zhang Jan. 15, 2020, 3:23 a.m. UTC
From: Yuqi Jin <jinyuqi@huawei.com>

atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce
the access number of the global variable @p_id in the loop. Let's
optimize it for performance.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Yang Guo <guoyang2@huawei.com>
Signed-off-by: Yuqi Jin <jinyuqi@huawei.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
---
 net/ipv4/route.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

David Miller Jan. 16, 2020, 12:27 p.m. UTC | #1
From: Shaokun Zhang <zhangshaokun@hisilicon.com>
Date: Wed, 15 Jan 2020 11:23:40 +0800

> From: Yuqi Jin <jinyuqi@huawei.com>
> 
> atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce
> the access number of the global variable @p_id in the loop. Let's
> optimize it for performance.
> 
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Yang Guo <guoyang2@huawei.com>
> Signed-off-by: Yuqi Jin <jinyuqi@huawei.com>
> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>

I doubt this makes any measurable improvement in performance.

If you can document a specific measurable improvement under
a useful set of circumstances for real usage, then put those
details into the commit message and resubmit.

Otherwise, I'm not applying this, sorry.
Shaokun Zhang Jan. 16, 2020, 2:05 p.m. UTC | #2
Hi David,

On 2020/1/16 20:27, David Miller wrote:
> From: Shaokun Zhang <zhangshaokun@hisilicon.com>
> Date: Wed, 15 Jan 2020 11:23:40 +0800
> 
>> From: Yuqi Jin <jinyuqi@huawei.com>
>>
>> atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce
>> the access number of the global variable @p_id in the loop. Let's
>> optimize it for performance.
>>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
>> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
>> Cc: Eric Dumazet <edumazet@google.com>
>> Cc: Yang Guo <guoyang2@huawei.com>
>> Signed-off-by: Yuqi Jin <jinyuqi@huawei.com>
>> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
> 
> I doubt this makes any measurable improvement in performance.
> 
> If you can document a specific measurable improvement under
> a useful set of circumstances for real usage, then put those
> details into the commit message and resubmit.

Ok, I will do it and resubmit.

Thanks your reply,
Shaokun

> 
> Otherwise, I'm not applying this, sorry.
> 
> .
>
Eric Dumazet Jan. 16, 2020, 3:12 p.m. UTC | #3
On 1/16/20 4:27 AM, David Miller wrote:
> From: Shaokun Zhang <zhangshaokun@hisilicon.com>
> Date: Wed, 15 Jan 2020 11:23:40 +0800
> 
>> From: Yuqi Jin <jinyuqi@huawei.com>
>>
>> atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce
>> the access number of the global variable @p_id in the loop. Let's
>> optimize it for performance.
>>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
>> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
>> Cc: Eric Dumazet <edumazet@google.com>
>> Cc: Yang Guo <guoyang2@huawei.com>
>> Signed-off-by: Yuqi Jin <jinyuqi@huawei.com>
>> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
> 
> I doubt this makes any measurable improvement in performance.
> 
> If you can document a specific measurable improvement under
> a useful set of circumstances for real usage, then put those
> details into the commit message and resubmit.
> 
> Otherwise, I'm not applying this, sorry.
> 


Real difference that could be made here is to 
only use this cmpxchg() dance for CONFIG_UBSAN

When CONFIG_UBSAN is not set, atomic_add_return() is just fine.

(Supposedly UBSAN should not warn about that either, but this depends on compiler version)
Eric Dumazet Jan. 16, 2020, 3:19 p.m. UTC | #4
On 1/16/20 7:12 AM, Eric Dumazet wrote:
> 
> 
> On 1/16/20 4:27 AM, David Miller wrote:
>> From: Shaokun Zhang <zhangshaokun@hisilicon.com>
>> Date: Wed, 15 Jan 2020 11:23:40 +0800
>>
>>> From: Yuqi Jin <jinyuqi@huawei.com>
>>>
>>> atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce
>>> the access number of the global variable @p_id in the loop. Let's
>>> optimize it for performance.
>>>
>>> Cc: "David S. Miller" <davem@davemloft.net>
>>> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
>>> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
>>> Cc: Eric Dumazet <edumazet@google.com>
>>> Cc: Yang Guo <guoyang2@huawei.com>
>>> Signed-off-by: Yuqi Jin <jinyuqi@huawei.com>
>>> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
>>
>> I doubt this makes any measurable improvement in performance.
>>
>> If you can document a specific measurable improvement under
>> a useful set of circumstances for real usage, then put those
>> details into the commit message and resubmit.
>>
>> Otherwise, I'm not applying this, sorry.
>>
> 
> 
> Real difference that could be made here is to 
> only use this cmpxchg() dance for CONFIG_UBSAN
> 
> When CONFIG_UBSAN is not set, atomic_add_return() is just fine.
> 
> (Supposedly UBSAN should not warn about that either, but this depends on compiler version)

I will test something like :

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 2010888e68ca96ae880481973a6d808d6c5612c5..e2fa972f5c78f2aefc801db6a45b2a81141c3028 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -495,11 +495,15 @@ u32 ip_idents_reserve(u32 hash, int segs)
        if (old != now && cmpxchg(p_tstamp, old, now) == old)
                delta = prandom_u32_max(now - old);
 
-       /* Do not use atomic_add_return() as it makes UBSAN unhappy */
+#ifdef CONFIG_UBSAN
+       /* Do not use atomic_add_return() as it makes old UBSAN versions unhappy */
        do {
                old = (u32)atomic_read(p_id);
                new = old + delta + segs;
        } while (atomic_cmpxchg(p_id, old, new) != old);
+#else
+       new = atomic_add_return(segs + delta, p_id);
+#endif
 
        return new - segs;
 }
Shaokun Zhang Jan. 17, 2020, 6:54 a.m. UTC | #5
+Cc Will Deacon,

On 2020/1/16 23:19, Eric Dumazet wrote:
> 
> 
> On 1/16/20 7:12 AM, Eric Dumazet wrote:
>>
>>
>> On 1/16/20 4:27 AM, David Miller wrote:
>>> From: Shaokun Zhang <zhangshaokun@hisilicon.com>
>>> Date: Wed, 15 Jan 2020 11:23:40 +0800
>>>
>>>> From: Yuqi Jin <jinyuqi@huawei.com>
>>>>
>>>> atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce
>>>> the access number of the global variable @p_id in the loop. Let's
>>>> optimize it for performance.
>>>>
>>>> Cc: "David S. Miller" <davem@davemloft.net>
>>>> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
>>>> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
>>>> Cc: Eric Dumazet <edumazet@google.com>
>>>> Cc: Yang Guo <guoyang2@huawei.com>
>>>> Signed-off-by: Yuqi Jin <jinyuqi@huawei.com>
>>>> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
>>>
>>> I doubt this makes any measurable improvement in performance.
>>>
>>> If you can document a specific measurable improvement under
>>> a useful set of circumstances for real usage, then put those
>>> details into the commit message and resubmit.
>>>
>>> Otherwise, I'm not applying this, sorry.
>>>
>>
>>
>> Real difference that could be made here is to 
>> only use this cmpxchg() dance for CONFIG_UBSAN
>>
>> When CONFIG_UBSAN is not set, atomic_add_return() is just fine.
>>
>> (Supposedly UBSAN should not warn about that either, but this depends on compiler version)
> 
> I will test something like :
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 2010888e68ca96ae880481973a6d808d6c5612c5..e2fa972f5c78f2aefc801db6a45b2a81141c3028 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -495,11 +495,15 @@ u32 ip_idents_reserve(u32 hash, int segs)
>         if (old != now && cmpxchg(p_tstamp, old, now) == old)
>                 delta = prandom_u32_max(now - old);
>  
> -       /* Do not use atomic_add_return() as it makes UBSAN unhappy */
> +#ifdef CONFIG_UBSAN

I appreciate Eric's idea.

> +       /* Do not use atomic_add_return() as it makes old UBSAN versions unhappy */
>         do {
>                 old = (u32)atomic_read(p_id);
>                 new = old + delta + segs;
>         } while (atomic_cmpxchg(p_id, old, new) != old);

But about this, I still think that we can try to use atomic_try_cmpxchg instead
of atomic_cmpxchg, like refcount_add_not_zero in include/linux/refcount.h

I abstract the model, as follow:
while (get_cycles() <= (time_temp + t_cnt))
{
	do{
		old_temp = wk_in->num.counter;
		oldt = atomic64_cmpxchg(&wk_in->num, old_temp, old_temp+1);
	}while(oldt != old_temp);

	myndelay(delay_o);
	w_cnt+=1;
}

val = atomic64_read(&wk_in->num);
while (get_cycles() <= (time_temp + t_cnt))
{
	do{
		new_val = val + 1;
			
	}while(!atomic64_try_cmpxchg(&wk_in->num, &val, new_val));

	myndelay(delay_o);
	w_cnt += 1;
}

If we take the access global variable out of loop, the performance become better
on both x86 and arm64, as follow:
Kunpeng920: 24 cores call it and the successful operations per second
atomic64_read in loop 	   atomic64_read out of the loop
6291034	                   8751962

Intel 6248: 20 cores call it and the successful operations per second
atomic64_read in loop 	   atomic64_read out of the loop
8934792	                   9978998

So how about this? ;-)

                delta = prandom_u32_max(now - old);

+#ifdef CONFIG_UBSAN
        /* Do not use atomic_add_return() as it makes UBSAN unhappy */
+       old = (u32)atomic_read(p_id);
        do {
-               old = (u32)atomic_read(p_id);
                new = old + delta + segs;
-       } while (atomic_cmpxchg(p_id, old, new) != old);
+       } while (!atomic_try_cmpxchg(p_id, &old, new));
+#else
+       new = atomic_add_return(segs + delta, p_id);
+#endif

Thanks,
Shaokun


> +#else
> +       new = atomic_add_return(segs + delta, p_id);
> +#endif
>  
>         return new - segs;
>  }
> 
> 
> .
>
Peter Zijlstra Jan. 17, 2020, 12:32 p.m. UTC | #6
On Fri, Jan 17, 2020 at 02:54:03PM +0800, Shaokun Zhang wrote:
> So how about this? ;-)
> 
>                 delta = prandom_u32_max(now - old);
> 
> +#ifdef CONFIG_UBSAN
>         /* Do not use atomic_add_return() as it makes UBSAN unhappy */
> +       old = (u32)atomic_read(p_id);
>         do {
> -               old = (u32)atomic_read(p_id);
>                 new = old + delta + segs;
> -       } while (atomic_cmpxchg(p_id, old, new) != old);
> +       } while (!atomic_try_cmpxchg(p_id, &old, new));
> +#else
> +       new = atomic_add_return(segs + delta, p_id);
> +#endif

That's crazy, just accept that UBSAN is taking bonghits and ignore it.
Use atomic_add_return() unconditionally.
Eric Dumazet Jan. 17, 2020, 4:35 p.m. UTC | #7
On 1/17/20 4:32 AM, Peter Zijlstra wrote:

> 
> That's crazy, just accept that UBSAN is taking bonghits and ignore it.
> Use atomic_add_return() unconditionally.
> 

Yes, we might simply add a comment so that people do not bug us if
their compiler is too old.

/* If UBSAN reports an error there, please make sure your compiler
 * supports -fno-strict-overflow before reporting it.
 */
return atomic_add_return(segs + delta, p_id) - segs;
Arvind Sankar Jan. 17, 2020, 6:03 p.m. UTC | #8
On Fri, Jan 17, 2020 at 08:35:07AM -0800, Eric Dumazet wrote:
> 
> 
> On 1/17/20 4:32 AM, Peter Zijlstra wrote:
> 
> > 
> > That's crazy, just accept that UBSAN is taking bonghits and ignore it.
> > Use atomic_add_return() unconditionally.
> > 
> 
> Yes, we might simply add a comment so that people do not bug us if
> their compiler is too old.
> 
> /* If UBSAN reports an error there, please make sure your compiler
>  * supports -fno-strict-overflow before reporting it.
>  */
> return atomic_add_return(segs + delta, p_id) - segs;
> 

Do we need that comment any more? The flag was apparently introduced in
gcc-4.2 and we only support 4.6+ anyway?
Eric Dumazet Jan. 17, 2020, 6:16 p.m. UTC | #9
On 1/17/20 10:03 AM, Arvind Sankar wrote:
> On Fri, Jan 17, 2020 at 08:35:07AM -0800, Eric Dumazet wrote:
>>
>>
>> On 1/17/20 4:32 AM, Peter Zijlstra wrote:
>>
>>>
>>> That's crazy, just accept that UBSAN is taking bonghits and ignore it.
>>> Use atomic_add_return() unconditionally.
>>>
>>
>> Yes, we might simply add a comment so that people do not bug us if
>> their compiler is too old.
>>
>> /* If UBSAN reports an error there, please make sure your compiler
>>  * supports -fno-strict-overflow before reporting it.
>>  */
>> return atomic_add_return(segs + delta, p_id) - segs;
>>
> 
> Do we need that comment any more? The flag was apparently introduced in
> gcc-4.2 and we only support 4.6+ anyway?

Wasńt it the case back in 2016 already for linux-4.8 ?

What will prevent someone to send another report to netdev/lkml ?

 -fno-strict-overflow support is not a prereq for CONFIG_UBSAN.

Fact that we kept in lib/ubsan.c and lib/test_ubsan.c code for 
test_ubsan_add_overflow() and test_ubsan_sub_overflow() is disturbing.


commit adb03115f4590baa280ddc440a8eff08a6be0cb7
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Sep 20 18:06:17 2016 -0700

    net: get rid of an signed integer overflow in ip_idents_reserve()
    
    Jiri Pirko reported an UBSAN warning happening in ip_idents_reserve()
    
    [] UBSAN: Undefined behaviour in ./arch/x86/include/asm/atomic.h:156:11
    [] signed integer overflow:
    [] -2117905507 + -695755206 cannot be represented in type 'int'
    
    Since we do not have uatomic_add_return() yet, use atomic_cmpxchg()
    so that the arithmetics can be done using unsigned int.
    
    Fixes: 04ca6973f7c1 ("ip: make IP identifiers less predictable")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: Jiri Pirko <jiri@resnulli.us>
    Signed-off-by: David S. Miller <davem@davemloft.net>
Arvind Sankar Jan. 17, 2020, 6:38 p.m. UTC | #10
On Fri, Jan 17, 2020 at 10:16:45AM -0800, Eric Dumazet wrote:
> Wasńt it the case back in 2016 already for linux-4.8 ?
> 
> What will prevent someone to send another report to netdev/lkml ?
> 
>  -fno-strict-overflow support is not a prereq for CONFIG_UBSAN.
> 
> Fact that we kept in lib/ubsan.c and lib/test_ubsan.c code for 
> test_ubsan_add_overflow() and test_ubsan_sub_overflow() is disturbing.
> 

No, it was bumped in 2018 in commit cafa0010cd51 ("Raise the minimum
required gcc version to 4.6"). That raised it from 3.2 -> 4.6.
Eric Dumazet Jan. 17, 2020, 6:48 p.m. UTC | #11
On 1/17/20 10:38 AM, Arvind Sankar wrote:
> On Fri, Jan 17, 2020 at 10:16:45AM -0800, Eric Dumazet wrote:
>> Wasńt it the case back in 2016 already for linux-4.8 ?
>>
>> What will prevent someone to send another report to netdev/lkml ?
>>
>>  -fno-strict-overflow support is not a prereq for CONFIG_UBSAN.
>>
>> Fact that we kept in lib/ubsan.c and lib/test_ubsan.c code for 
>> test_ubsan_add_overflow() and test_ubsan_sub_overflow() is disturbing.
>>
> 
> No, it was bumped in 2018 in commit cafa0010cd51 ("Raise the minimum
> required gcc version to 4.6"). That raised it from 3.2 -> 4.6.
> 

This seems good to me, for gcc at least.

Maybe it is time to enfore -fno-strict-overflow in KBUILD_CFLAGS 
instead of making it conditional.

Thanks.
Shaokun Zhang Jan. 19, 2020, 3:46 a.m. UTC | #12
Hi Peter,

On 2020/1/17 20:32, Peter Zijlstra wrote:
> On Fri, Jan 17, 2020 at 02:54:03PM +0800, Shaokun Zhang wrote:
>> So how about this? ;-)
>>
>>                 delta = prandom_u32_max(now - old);
>>
>> +#ifdef CONFIG_UBSAN
>>         /* Do not use atomic_add_return() as it makes UBSAN unhappy */
>> +       old = (u32)atomic_read(p_id);
>>         do {
>> -               old = (u32)atomic_read(p_id);
>>                 new = old + delta + segs;
>> -       } while (atomic_cmpxchg(p_id, old, new) != old);
>> +       } while (!atomic_try_cmpxchg(p_id, &old, new));
>> +#else
>> +       new = atomic_add_return(segs + delta, p_id);
>> +#endif
> 
> That's crazy, just accept that UBSAN is taking bonghits and ignore it.
> Use atomic_add_return() unconditionally.

We have used the atomic_add_return[1], but it makes the UBSAN unhappy followed
by the comment.
It seems that Eric also agreed to do it if some comments are added. I will do
it later.

Thanks,
Shaokun

[1] https://lkml.org/lkml/2019/7/26/217

> 
> .
>
Eric Dumazet Jan. 19, 2020, 4:12 a.m. UTC | #13
On Sat, Jan 18, 2020 at 7:47 PM Shaokun Zhang
<zhangshaokun@hisilicon.com> wrote:
>

> We have used the atomic_add_return[1], but it makes the UBSAN unhappy followed
> by the comment.
> It seems that Eric also agreed to do it if some comments are added. I will do
> it later.
>
> Thanks,
> Shaokun
>
> [1] https://lkml.org/lkml/2019/7/26/217
>

In case you have missed it, we needed a proper analysis.
My feedback was quite simple :

<quote>
Have you first checked that current UBSAN versions will not complain anymore ?
</quote>

You never did this work, never replied to my question, and months
later you come back
with a convoluted patch while we simply can proceed with a revert now
we are sure that linux kernels are compiled with the proper option.

As mentioned yesterday, no need for a comment.
Instead the changelog should be explaining why the revert is now safe.
Peter Zijlstra Jan. 20, 2020, 8:18 a.m. UTC | #14
On Fri, Jan 17, 2020 at 10:48:19AM -0800, Eric Dumazet wrote:
> 
> 
> On 1/17/20 10:38 AM, Arvind Sankar wrote:
> > On Fri, Jan 17, 2020 at 10:16:45AM -0800, Eric Dumazet wrote:
> >> Wasńt it the case back in 2016 already for linux-4.8 ?
> >>
> >> What will prevent someone to send another report to netdev/lkml ?
> >>
> >>  -fno-strict-overflow support is not a prereq for CONFIG_UBSAN.
> >>
> >> Fact that we kept in lib/ubsan.c and lib/test_ubsan.c code for 
> >> test_ubsan_add_overflow() and test_ubsan_sub_overflow() is disturbing.
> >>
> > 
> > No, it was bumped in 2018 in commit cafa0010cd51 ("Raise the minimum
> > required gcc version to 4.6"). That raised it from 3.2 -> 4.6.
> > 
> 
> This seems good to me, for gcc at least.
> 
> Maybe it is time to enfore -fno-strict-overflow in KBUILD_CFLAGS 
> instead of making it conditional.

IIRC there was a bug in UBSAN vs -fwrapv/-fno-strict-overflow that was
only fixed in gcc-8 or 9 or so.

So while the -fwrapv/-fno-strict-overflow flag has been correctly
supported since like forever, UBSAN was buggy until quite recent when
used in conjustion with that flag.
Shaokun Zhang Jan. 21, 2020, 2:40 a.m. UTC | #15
Hi Eric,

On 2020/1/19 12:12, Eric Dumazet wrote:
> On Sat, Jan 18, 2020 at 7:47 PM Shaokun Zhang
> <zhangshaokun@hisilicon.com> wrote:
>>
> 
>> We have used the atomic_add_return[1], but it makes the UBSAN unhappy followed
>> by the comment.
>> It seems that Eric also agreed to do it if some comments are added. I will do
>> it later.
>>
>> Thanks,
>> Shaokun
>>
>> [1] https://lkml.org/lkml/2019/7/26/217
>>
> 
> In case you have missed it, we needed a proper analysis.
> My feedback was quite simple :
> 
> <quote>
> Have you first checked that current UBSAN versions will not complain anymore ?
> </quote>
> 
> You never did this work, never replied to my question, and months

Yeah, I'm not sure how to deal with the UBSAN issue and you said that some of
you would work this.

> later you come back
> with a convoluted patch while we simply can proceed with a revert now

After several months, we thought that we can do it like refcount_add_not_zero,
so we submit this patch.

> we are sure that linux kernels are compiled with the proper option.
> 
> As mentioned yesterday, no need for a comment.
> Instead the changelog should be explaining why the revert is now safe.
> 

Ok, it is really needed to consider this.

Thanks,
Shaokun

> .
>
Peter Zijlstra Jan. 22, 2020, 8:49 a.m. UTC | #16
On Sat, Jan 18, 2020 at 08:12:48PM -0800, Eric Dumazet wrote:
> Instead the changelog should be explaining why the revert is now safe.

FWIW, it was always safe, UBSAN was just buggy.

Patch
diff mbox series

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 87e979f2b74a..7e28c7121c20 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -496,10 +496,10 @@  u32 ip_idents_reserve(u32 hash, int segs)
 		delta = prandom_u32_max(now - old);
 
 	/* Do not use atomic_add_return() as it makes UBSAN unhappy */
+	old = (u32)atomic_read(p_id);
 	do {
-		old = (u32)atomic_read(p_id);
 		new = old + delta + segs;
-	} while (atomic_cmpxchg(p_id, old, new) != old);
+	} while (!atomic_try_cmpxchg(p_id, &old, new));
 
 	return new - segs;
 }