Message ID | 20210713142632.2813759-1-adhemerval.zanella@linaro.org |
---|---|
State | New |
Headers | show |
Series | linux: Simplify get_nprocs | expand |
* Adhemerval Zanella: > This patch simplifies the memory allocation code and uses the sched > routines instead of reimplement it. This still uses a stack > allocation buffer, so it can be used on malloc initialization code. Could you add a run-time test for that limit? With the added test, I think this is fine. Thanks, Florian
On 13/07/2021 11:38, Florian Weimer wrote: > * Adhemerval Zanella: > >> This patch simplifies the memory allocation code and uses the sched >> routines instead of reimplement it. This still uses a stack >> allocation buffer, so it can be used on malloc initialization code. > > Could you add a run-time test for that limit? With the added test, I > think this is fine. What kind of test? To check if on a larger system with more than 32k logical CPU the value is capped?
* Adhemerval Zanella: > On 13/07/2021 11:38, Florian Weimer wrote: >> * Adhemerval Zanella: >> >>> This patch simplifies the memory allocation code and uses the sched >>> routines instead of reimplement it. This still uses a stack >>> allocation buffer, so it can be used on malloc initialization code. >> >> Could you add a run-time test for that limit? With the added test, I >> think this is fine. > > What kind of test? To check if on a larger system with more than 32k > logical CPU the value is capped? Sorry, no, to check that sched_getaffiny with a 4K buffer succeeds. Thanks, Florian
On 13/07/2021 15:39, Florian Weimer wrote: > * Adhemerval Zanella: > >> On 13/07/2021 11:38, Florian Weimer wrote: >>> * Adhemerval Zanella: >>> >>>> This patch simplifies the memory allocation code and uses the sched >>>> routines instead of reimplement it. This still uses a stack >>>> allocation buffer, so it can be used on malloc initialization code. >>> >>> Could you add a run-time test for that limit? With the added test, I >>> think this is fine. >> >> What kind of test? To check if on a larger system with more than 32k >> logical CPU the value is capped? > > Sorry, no, to check that sched_getaffiny with a 4K buffer succeeds. Ah right, I will work on it.
On 13/07/2021 16:03, Adhemerval Zanella wrote: > > > On 13/07/2021 15:39, Florian Weimer wrote: >> * Adhemerval Zanella: >> >>> On 13/07/2021 11:38, Florian Weimer wrote: >>>> * Adhemerval Zanella: >>>> >>>>> This patch simplifies the memory allocation code and uses the sched >>>>> routines instead of reimplement it. This still uses a stack >>>>> allocation buffer, so it can be used on malloc initialization code. >>>> >>>> Could you add a run-time test for that limit? With the added test, I >>>> think this is fine. >>> >>> What kind of test? To check if on a larger system with more than 32k >>> logical CPU the value is capped? >> >> Sorry, no, to check that sched_getaffiny with a 4K buffer succeeds. > > Ah right, I will work on it. > Although currently we already call get_nprocs on some internal usages (malloc for instance) and I haven't see any regression with my change. Do we really need to add an explicit test for it?
* Adhemerval Zanella: > Although currently we already call get_nprocs on some internal usages > (malloc for instance) and I haven't see any regression with my change. > Do we really need to add an explicit test for it? Yes, we want to know if such large systems arrive. Think of it as a reminder for something that might happen twenty years from now. Thanks, Florian
On 13/07/2021 16:07, Florian Weimer wrote: > * Adhemerval Zanella: > >> Although currently we already call get_nprocs on some internal usages >> (malloc for instance) and I haven't see any regression with my change. >> Do we really need to add an explicit test for it? > > Yes, we want to know if such large systems arrive. Think of it as a > reminder for something that might happen twenty years from now. Right, but what exactly would we be exercising that we already do on malloc initialization, for instance (that would call sched_getaffiny with the 4k stack allocated buffer)?
* Adhemerval Zanella: > On 13/07/2021 16:07, Florian Weimer wrote: >> * Adhemerval Zanella: >> >>> Although currently we already call get_nprocs on some internal usages >>> (malloc for instance) and I haven't see any regression with my change. >>> Do we really need to add an explicit test for it? >> >> Yes, we want to know if such large systems arrive. Think of it as a >> reminder for something that might happen twenty years from now. > > Right, but what exactly would we be exercising that we already do on > malloc initialization, for instance (that would call sched_getaffiny > with the 4k stack allocated buffer)? malloc initialization will still succeed if sched_getaffiny fails, so we'll never learn about it unless we have a test for it. Thanks, Florian
diff --git a/sysdeps/unix/sysv/linux/getsysstats.c b/sysdeps/unix/sysv/linux/getsysstats.c index 1391e360b8..e752b0a111 100644 --- a/sysdeps/unix/sysv/linux/getsysstats.c +++ b/sysdeps/unix/sysv/linux/getsysstats.c @@ -29,61 +29,25 @@ #include <sys/sysinfo.h> #include <sysdep.h> -/* Compute the population count of the entire array. */ -static int -__get_nprocs_count (const unsigned long int *array, size_t length) -{ - int count = 0; - for (size_t i = 0; i < length; ++i) - if (__builtin_add_overflow (count, __builtin_popcountl (array[i]), - &count)) - return INT_MAX; - return count; -} - -/* __get_nprocs with a large buffer. */ -static int -__get_nprocs_large (void) -{ - /* This code cannot use scratch_buffer because it is used during - malloc initialization. */ - size_t pagesize = GLRO (dl_pagesize); - unsigned long int *page = __mmap (0, pagesize, PROT_READ | PROT_WRITE, - MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); - if (page == MAP_FAILED) - return 2; - int r = INTERNAL_SYSCALL_CALL (sched_getaffinity, 0, pagesize, page); - int count; - if (r > 0) - count = __get_nprocs_count (page, pagesize / sizeof (unsigned long int)); - else if (r == -EINVAL) - /* One page is still not enough to store the bits. A more-or-less - arbitrary value. This assumes t hat such large systems never - happen in practice. */ - count = GLRO (dl_pagesize) * CHAR_BIT; - else - count = 2; - __munmap (page, GLRO (dl_pagesize)); - return count; -} - int __get_nprocs (void) { - /* Fast path for most systems. The kernel expects a buffer size - that is a multiple of 8. */ - unsigned long int small_buffer[1024 / CHAR_BIT / sizeof (unsigned long int)]; - int r = INTERNAL_SYSCALL_CALL (sched_getaffinity, 0, - sizeof (small_buffer), small_buffer); + /* This cannot use malloc because it is used on malloc initialization. */ + enum { max_num_cpus = 32768 }; + size_t cpu_bits_size = CPU_ALLOC_SIZE (max_num_cpus); + __cpu_mask cpu_bits[cpu_bits_size / sizeof (__cpu_mask)]; + int r = INTERNAL_SYSCALL_CALL (sched_getaffinity, 0, cpu_bits_size, + cpu_bits); if (r > 0) - return __get_nprocs_count (small_buffer, r / sizeof (unsigned long int)); + return CPU_COUNT_S (cpu_bits_size, (cpu_set_t*) cpu_bits); else if (r == -EINVAL) - /* The kernel requests a larger buffer to store the data. */ - return __get_nprocs_large (); - else - /* Some other error. 2 is conservative (not a uniprocessor - system, so atomics are needed). */ - return 2; + /* The input buffer is still not enough to store the number of cpus. This + is an arbitrary values assuming such systems should be rare and there + is no offline cpus. */ + return max_num_cpus; + /* Some other error. 2 is conservative (not a uniprocessor system, so + atomics are needed). */ + return 2; } libc_hidden_def (__get_nprocs) weak_alias (__get_nprocs, get_nprocs)