diff mbox series

[v3] Add single-threaded fast path to rand()

Message ID PAWPR08MB8982DEE9CC5B731BEB69572483B72@PAWPR08MB8982.eurprd08.prod.outlook.com
State New
Headers show
Series [v3] Add single-threaded fast path to rand() | expand

Commit Message

Wilco Dijkstra July 29, 2024, 2:23 p.m. UTC
Improve performance of rand() and __random() by adding a single-threaded fast
path.  Bench-random-lock shows about 5x speedup on Neoverse V1.

---

Comments

Wilco Dijkstra Sept. 10, 2024, 12:11 p.m. UTC | #1
ping
 
Improve performance of rand() and __random() by adding a single-threaded fast
path.  Bench-random-lock shows about 5x speedup on Neoverse V1.

---

diff --git a/stdlib/random.c b/stdlib/random.c
index 62f22fac8d58c7977f09c134bf80a797750da645..174603a8915fd8aa4b3ae64d023003c9e2c038f2 100644
--- a/stdlib/random.c
+++ b/stdlib/random.c
@@ -51,6 +51,7 @@
    SUCH DAMAGE.*/
 
 #include <libc-lock.h>
+#include <sys/single_threaded.h>
 #include <limits.h>
 #include <stddef.h>
 #include <stdlib.h>
@@ -288,6 +289,12 @@ __random (void)
 {
   int32_t retval;
 
+  if (SINGLE_THREAD_P)
+    {
+      (void) __random_r (&unsafe_state, &retval);
+      return retval;
+    }
+
   __libc_lock_lock (lock);
 
   (void) __random_r (&unsafe_state, &retval);
Wilco Dijkstra Feb. 20, 2025, 3:14 p.m. UTC | #2
ping
 
Improve performance of rand() and __random() by adding a single-threaded fast
path.  Bench-random-lock shows about 5x speedup on Neoverse V1.

---

diff --git a/stdlib/random.c b/stdlib/random.c
index 62f22fac8d58c7977f09c134bf80a797750da645..174603a8915fd8aa4b3ae64d023003c9e2c038f2 100644
--- a/stdlib/random.c
+++ b/stdlib/random.c
@@ -51,6 +51,7 @@
    SUCH DAMAGE.*/
 
 #include <libc-lock.h>
+#include <sys/single_threaded.h>
 #include <limits.h>
 #include <stddef.h>
 #include <stdlib.h>
@@ -288,6 +289,12 @@ __random (void)
 {
   int32_t retval;
 
+  if (SINGLE_THREAD_P)
+    {
+      (void) __random_r (&unsafe_state, &retval);
+      return retval;
+    }
+
   __libc_lock_lock (lock);
 
   (void) __random_r (&unsafe_state, &retval);
Adhemerval Zanella Netto Feb. 21, 2025, 6:42 p.m. UTC | #3
On 29/07/24 11:23, Wilco Dijkstra wrote:
> Improve performance of rand() and __random() by adding a single-threaded fast
> path.  Bench-random-lock shows about 5x speedup on Neoverse V1.

LGTM, although I am not sure which kind of workload does really
benefit from this optimization.

> 
> ---
> 
> diff --git a/stdlib/random.c b/stdlib/random.c
> index 62f22fac8d58c7977f09c134bf80a797750da645..174603a8915fd8aa4b3ae64d023003c9e2c038f2 100644
> --- a/stdlib/random.c
> +++ b/stdlib/random.c
> @@ -51,6 +51,7 @@
>     SUCH DAMAGE.*/
>  
>  #include <libc-lock.h>
> +#include <sys/single_threaded.h>
>  #include <limits.h>
>  #include <stddef.h>
>  #include <stdlib.h>
> @@ -288,6 +289,12 @@ __random (void)
>  {
>    int32_t retval;
>  
> +  if (SINGLE_THREAD_P)
> +    {
> +      (void) __random_r (&unsafe_state, &retval);
> +      return retval;
> +    }
> +
>    __libc_lock_lock (lock);
>  
>    (void) __random_r (&unsafe_state, &retval);
Wilco Dijkstra Feb. 24, 2025, 2:01 p.m. UTC | #4
Hi Adhemerval,

>On 29/07/24 11:23, Wilco Dijkstra wrote:
>> Improve performance of rand() and __random() by adding a single-threaded fast
>> path.  Bench-random-lock shows about 5x speedup on Neoverse V1.
>
> LGTM, although I am not sure which kind of workload does really
> benefit from this optimization.

It's a very common pitfall when one creates benchmarks and wants some randomized
inputs. Then they find out 90+% of the time is spent in rand() and not in the code they
try to benchmark... We've been asked plenty of times to improve rand() performance
by various customers, so this is just a quick fix for that common issue.

However using a simpler and faster LFSR without lots of global state would be even
better since that could just use an atomic for the state update and avoid locks. 
Rand() is not required to be perfectly random, let alone cryptographically safe...

Cheers,
Wilco
Adhemerval Zanella Netto Feb. 25, 2025, 7:29 p.m. UTC | #5
On 24/02/25 11:01, Wilco Dijkstra wrote:
> Hi Adhemerval,
> 
>> On 29/07/24 11:23, Wilco Dijkstra wrote:
>>> Improve performance of rand() and __random() by adding a single-threaded fast
>>> path.  Bench-random-lock shows about 5x speedup on Neoverse V1.
>>
>> LGTM, although I am not sure which kind of workload does really
>> benefit from this optimization.
> 
> It's a very common pitfall when one creates benchmarks and wants some randomized
> inputs. Then they find out 90+% of the time is spent in rand() and not in the code they
> try to benchmark... We've been asked plenty of times to improve rand() performance
> by various customers, so this is just a quick fix for that common issue.

Right, I guess users still rely on such bad interface for RNG with a seed.

> 
> However using a simpler and faster LFSR without lots of global state would be even
> better since that could just use an atomic for the state update and avoid locks. 
> Rand() is not required to be perfectly random, let alone cryptographically safe...

Yeah, although I think we most likely need a symbol version for such change.

> 
> Cheers,
> Wilco

In any case,

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
diff mbox series

Patch

diff --git a/stdlib/random.c b/stdlib/random.c
index 62f22fac8d58c7977f09c134bf80a797750da645..174603a8915fd8aa4b3ae64d023003c9e2c038f2 100644
--- a/stdlib/random.c
+++ b/stdlib/random.c
@@ -51,6 +51,7 @@ 
    SUCH DAMAGE.*/
 
 #include <libc-lock.h>
+#include <sys/single_threaded.h>
 #include <limits.h>
 #include <stddef.h>
 #include <stdlib.h>
@@ -288,6 +289,12 @@  __random (void)
 {
   int32_t retval;
 
+  if (SINGLE_THREAD_P)
+    {
+      (void) __random_r (&unsafe_state, &retval);
+      return retval;
+    }
+
   __libc_lock_lock (lock);
 
   (void) __random_r (&unsafe_state, &retval);