Message ID | 558DB0A0.2040707@gmail.com |
---|---|
State | New |
Headers | show |
Michael, given the approach is accepted by Carlos and Roland, I have some minor textual suggestions for the patch itself. On 06/26/2015 10:05 PM, Michael Kerrisk (man-pages) wrote: > --- a/man2/sched_setaffinity.2 > +++ b/man2/sched_setaffinity.2 > @@ -223,6 +223,47 @@ system call returns the size (in bytes) of the > .I cpumask_t > data type that is used internally by the kernel to > represent the CPU set bit mask. > +.SS Handling systems with more than 1024 CPUs What if the system has exactly 1024 CPUs ? Suggestion: systems with 1024 or more CPUs > +The > +.I cpu_set_t > +data type used by glibc has a fixed size of 128 bytes, > +meaning that the maximum CPU number that can be represented is 1023. > +.\" FIXME . See https://sourceware.org/bugzilla/show_bug.cgi?id=15630 > +.\" and https://sourceware.org/ml/libc-alpha/2013-07/msg00288.html No objection, although I have never really noticed external references in man-pages (esp. web refs). Shouldn't these be generally avoided ? (and yes, I have noticed the FIXME) > +If the system has more than 1024 CPUs, then calls of the form: 1024 or more CPUs. > + > + sched_getaffinity(pid, sizeof(cpu_set_t), &mask); > + > +will fail with the error > +.BR EINVAL , > +the error produced by the underlying system call for the case where the > +.I mask > +size specified in > +.I cpusetsize > +is smaller than the size of the affinity mask used by the kernel. > +.PP > +The underlying system calls (which represent CPU masks as bit masks of type > +.IR "unsigned long\ *" ) > +impose no restriction on the size of the mask. > +To handle systems with more than 1024 CPUs, one must dynamically allocate the > +.I mask > +argument using > +.BR CPU_ALLOC (3) I would rewrite the sentence to avoid "one must". > +and manipulate the mask using the "_S" macros described in and manipulate the macros ending with "_S" as described in > +.BR CPU_ALLOC (3). > +Using an allocation based on the number of online CPUs: > + > + cpu_set_t *mask = CPU_ALLOC(CPU_ALLOC_SIZE( > + sysconf(_SC_NPROCESSORS_CONF))); > + > +is probably sufficient, although the value returned by the > +.BR sysconf () > +call can in theory change during the lifetime of the process. > +Alternatively, one can obtain a value that is guaranteed to be stable for Like above, I would replace "one can obtain a value" by "a value can be obtained". > +the lifetime of the process by proby for the size of the required mask using s/proby/probing/. > +.BR sched_getaffinity () > +calls with increasing mask sizes until the call does not fail with the error > +.BR EINVAL . I would replace "until the call does not fail with error ..." by "while the call succeeds". Also, the sentence too long, IMHO. Best regards Tolga Dalman
On 06/26/2015 10:05 PM, Michael Kerrisk (man-pages) wrote: > +.SS Handling systems with more than 1024 CPUs > +The > +.I cpu_set_t > +data type used by glibc has a fixed size of 128 bytes, > +meaning that the maximum CPU number that can be represented is 1023. > +.\" FIXME . See https://sourceware.org/bugzilla/show_bug.cgi?id=15630 > +.\" and https://sourceware.org/ml/libc-alpha/2013-07/msg00288.html > +If the system has more than 1024 CPUs, then calls of the form: > + > + sched_getaffinity(pid, sizeof(cpu_set_t), &mask); > + > +will fail with the error > +.BR EINVAL , > +the error produced by the underlying system call for the case where the > +.I mask > +size specified in > +.I cpusetsize > +is smaller than the size of the affinity mask used by the kernel. I think it is best to leave this as unspecified as possible. Kernel behavior already changed once, and I can imagine it changing again. Carlos and I tried to get clarification of the future direction of the kernel interface here: <https://sourceware.org/ml/libc-alpha/2015-06/msg00210.html> No reply so far, unless I missed something. > +.PP > +The underlying system calls (which represent CPU masks as bit masks of type > +.IR "unsigned long\ *" ) > +impose no restriction on the size of the mask. > +To handle systems with more than 1024 CPUs, one must dynamically allocate the > +.I mask > +argument using > +.BR CPU_ALLOC (3) > +and manipulate the mask using the "_S" macros described in > +.BR CPU_ALLOC (3). > +Using an allocation based on the number of online CPUs: > + > + cpu_set_t *mask = CPU_ALLOC(CPU_ALLOC_SIZE( > + sysconf(_SC_NPROCESSORS_CONF))); I believe this is incorrect in several ways: CPU_ALLOC uses the raw CPU counts. CPU_ALLOC_SIZE converts from the raw count to the size in bytes. (This API is misdesigned.) sysconf(_SC_NPROCESSORS_CONF) is not related to the kernel CPU mask size, so it is not the correct value. > +is probably sufficient, although the value returned by the > +.BR sysconf () > +call can in theory change during the lifetime of the process. > +Alternatively, one can obtain a value that is guaranteed to be stable for > +the lifetime of the process by proby for the size of the required mask using > +.BR sched_getaffinity () > +calls with increasing mask sizes until the call does not fail with the error This is the only possible way right now if you do not want to read sysconf values. It's also worth noting that the system call and the glibc function have different return values.
Hello Florian, Thanks for your comments, and sorry for the delayed follow-up. On 07/01/2015 02:37 PM, Florian Weimer wrote: > On 06/26/2015 10:05 PM, Michael Kerrisk (man-pages) wrote: > >> +.SS Handling systems with more than 1024 CPUs >> +The >> +.I cpu_set_t >> +data type used by glibc has a fixed size of 128 bytes, >> +meaning that the maximum CPU number that can be represented is 1023. >> +.\" FIXME . See https://sourceware.org/bugzilla/show_bug.cgi?id=15630 >> +.\" and https://sourceware.org/ml/libc-alpha/2013-07/msg00288.html >> +If the system has more than 1024 CPUs, then calls of the form: >> + >> + sched_getaffinity(pid, sizeof(cpu_set_t), &mask); >> + >> +will fail with the error >> +.BR EINVAL , >> +the error produced by the underlying system call for the case where the >> +.I mask >> +size specified in >> +.I cpusetsize >> +is smaller than the size of the affinity mask used by the kernel. > > I think it is best to leave this as unspecified as possible. Kernel > behavior already changed once, and I can imagine it changing again. Hmmm. Something needs to be said about what the kernel is doing though. Otherwise, it's hard to make sense of this subsection. Did you have a suggested rewording that removes the piece you find problematic? > Carlos and I tried to get clarification of the future direction of the > kernel interface here: > > <https://sourceware.org/ml/libc-alpha/2015-06/msg00210.html> > > No reply so far, unless I missed something. Okay >> +.PP >> +The underlying system calls (which represent CPU masks as bit masks of type >> +.IR "unsigned long\ *" ) >> +impose no restriction on the size of the mask. >> +To handle systems with more than 1024 CPUs, one must dynamically allocate the >> +.I mask >> +argument using >> +.BR CPU_ALLOC (3) >> +and manipulate the mask using the "_S" macros described in >> +.BR CPU_ALLOC (3). >> +Using an allocation based on the number of online CPUs: >> + >> + cpu_set_t *mask = CPU_ALLOC(CPU_ALLOC_SIZE( >> + sysconf(_SC_NPROCESSORS_CONF))); > > I believe this is incorrect in several ways: > > CPU_ALLOC uses the raw CPU counts. CPU_ALLOC_SIZE converts from the raw > count to the size in bytes. (This API is misdesigned.) D'oh! Yes, the use of CPU_ALLOC_SIZE() was clearly misguided. > sysconf(_SC_NPROCESSORS_CONF) is not related to the kernel CPU mask > size, so it is not the correct value. Yes, I understand now. >> +is probably sufficient, although the value returned by the >> +.BR sysconf () >> +call can in theory change during the lifetime of the process. >> +Alternatively, one can obtain a value that is guaranteed to be stable for >> +the lifetime of the process by proby for the size of the required mask using >> +.BR sched_getaffinity () >> +calls with increasing mask sizes until the call does not fail with the error > > This is the only possible way right now if you do not want to read > sysconf values. Okay. I've amended the text to remove the first piece. > It's also worth noting that the system call and the glibc function have > different return values. Yes, I already cover that elsewhere in the page. See the quoted text below. Okay, so now I have: C library/kernel differences This manual page describes the glibc interface for the CPU affinity calls. The actual system call interface is slightly different, with the mask being typed as unsigned long *, reflecting the fact that the underlying implementation of CPU sets is a simple bit mask. On success, the raw sched_getaffin‐ ity() system call returns the size (in bytes) of the cpumask_t data type that is used internally by the kernel to represent the CPU set bit mask. Handling systems with more than 1024 CPUs The underlying system calls (which represent CPU masks as bit masks of type unsigned long *) impose no restriction on the size of the CPU mask. However, the cpu_set_t data type used by glibc has a fixed size of 128 bytes, meaning that the maximum CPU number that can be represented is 1023. If the system has more than 1024 CPUs, then calls of the form: sched_getaffinity(pid, sizeof(cpu_set_t), &mask); will fail with the error EINVAL, the error produced by the underlying system call for the case where the mask size speci‐ fied in cpusetsize is smaller than the size of the affinity mask used by the kernel. When working on systems with more than 1024 CPUs, one must dynamically allocate the mask argument. Currently, the only way to do this is by probing for the size of the required mask using sched_getaffinity() calls with increasing mask sizes (until the call does not fail with the error EINVAL). Better? Cheers, Michael
Hello Tolga, On 06/29/2015 11:40 PM, Tolga Dalman wrote: > Michael, > > given the approach is accepted by Carlos and Roland, I have > some minor textual suggestions for the patch itself. > > On 06/26/2015 10:05 PM, Michael Kerrisk (man-pages) wrote: >> --- a/man2/sched_setaffinity.2 >> +++ b/man2/sched_setaffinity.2 >> @@ -223,6 +223,47 @@ system call returns the size (in bytes) of the >> .I cpumask_t >> data type that is used internally by the kernel to >> represent the CPU set bit mask. >> +.SS Handling systems with more than 1024 CPUs > > What if the system has exactly 1024 CPUs ? > Suggestion: systems with 1024 or more CPUs I think you've missed something here. CPUs are numbered starting at 0. "more than 1024 CPUs" is correct here, I belive. > >> +The >> +.I cpu_set_t >> +data type used by glibc has a fixed size of 128 bytes, >> +meaning that the maximum CPU number that can be represented is 1023. >> +.\" FIXME . See https://sourceware.org/bugzilla/show_bug.cgi?id=15630 >> +.\" and https://sourceware.org/ml/libc-alpha/2013-07/msg00288.html > > No objection, although I have never really noticed external references > in man-pages (esp. web refs). Shouldn't these be generally avoided ? > (and yes, I have noticed the FIXME) Those pieces are comments in the page source (not rendered by man(1)). >> +If the system has more than 1024 CPUs, then calls of the form: > > 1024 or more CPUs. See above >> + >> + sched_getaffinity(pid, sizeof(cpu_set_t), &mask); >> + >> +will fail with the error >> +.BR EINVAL , >> +the error produced by the underlying system call for the case where the >> +.I mask >> +size specified in >> +.I cpusetsize >> +is smaller than the size of the affinity mask used by the kernel. >> +.PP >> +The underlying system calls (which represent CPU masks as bit masks of type >> +.IR "unsigned long\ *" ) >> +impose no restriction on the size of the mask. >> +To handle systems with more than 1024 CPUs, one must dynamically allocate the >> +.I mask >> +argument using >> +.BR CPU_ALLOC (3) > > I would rewrite the sentence to avoid "one must". This is a "voice" thing. I personally find "one must" is okay. >> +and manipulate the mask using the "_S" macros described in > > and manipulate the macros ending with "_S" as described in I think you've misread the text. I think it's okay. >> +.BR CPU_ALLOC (3). >> +Using an allocation based on the number of online CPUs: >> + >> + cpu_set_t *mask = CPU_ALLOC(CPU_ALLOC_SIZE( >> + sysconf(_SC_NPROCESSORS_CONF))); >> + >> +is probably sufficient, although the value returned by the >> +.BR sysconf () >> +call can in theory change during the lifetime of the process. >> +Alternatively, one can obtain a value that is guaranteed to be stable for > > Like above, I would replace "one can obtain a value" by "a value can be obtained". See above. >> +the lifetime of the process by proby for the size of the required mask using > > s/proby/probing/. Thanks--I'd already spotted that one and fixed. >> +.BR sched_getaffinity () >> +calls with increasing mask sizes until the call does not fail with the error >> +.BR EINVAL . > > I would replace "until the call does not fail with error ..." by "while the call succeeds". I think you've misunderstood the logic here... Take another look at the sentence. Thanks, Michael
On 07/21/2015 05:03 PM, Michael Kerrisk (man-pages) wrote: > Hello Florian, > > Thanks for your comments, and sorry for the delayed follow-up. > > On 07/01/2015 02:37 PM, Florian Weimer wrote: >> On 06/26/2015 10:05 PM, Michael Kerrisk (man-pages) wrote: >> >>> +.SS Handling systems with more than 1024 CPUs >>> +The >>> +.I cpu_set_t >>> +data type used by glibc has a fixed size of 128 bytes, >>> +meaning that the maximum CPU number that can be represented is 1023. >>> +.\" FIXME . See https://sourceware.org/bugzilla/show_bug.cgi?id=15630 >>> +.\" and https://sourceware.org/ml/libc-alpha/2013-07/msg00288.html >>> +If the system has more than 1024 CPUs, then calls of the form: >>> + >>> + sched_getaffinity(pid, sizeof(cpu_set_t), &mask); >>> + >>> +will fail with the error >>> +.BR EINVAL , >>> +the error produced by the underlying system call for the case where the >>> +.I mask >>> +size specified in >>> +.I cpusetsize >>> +is smaller than the size of the affinity mask used by the kernel. >> >> I think it is best to leave this as unspecified as possible. Kernel >> behavior already changed once, and I can imagine it changing again. > > Hmmm. Something needs to be said about what the kernel is doing though. > Otherwise, it's hard to make sense of this subsection. Did you have a > suggested rewording that removes the piece you find problematic? What about this? “If the kernel affinity mask is larger than 1024 then … is smaller than the size of the affinity mask used by the kernel. Depending on the system CPU topology, the kernel affinity mask can be substantially larger than the number of active CPUs in the system. ” I.e., make clear that the size of the mask can be quite different from the CPU count. > Handling systems with more than 1024 CPUs > The underlying system calls (which represent CPU masks as bit > masks of type unsigned long *) impose no restriction on the > size of the CPU mask. However, the cpu_set_t data type used by > glibc has a fixed size of 128 bytes, meaning that the maximum > CPU number that can be represented is 1023. If the system has > more than 1024 CPUs, then calls of the form: > > sched_getaffinity(pid, sizeof(cpu_set_t), &mask); > > will fail with the error EINVAL, the error produced by the > underlying system call for the case where the mask size speci‐ > fied in cpusetsize is smaller than the size of the affinity > mask used by the kernel. > > When working on systems with more than 1024 CPUs, one must > dynamically allocate the mask argument. Currently, the only > way to do this is by probing for the size of the required mask > using sched_getaffinity() calls with increasing mask sizes > (until the call does not fail with the error EINVAL). > > Better? “more than 1024 CPUs” should be “large [kernel CPU] affinity masks” throughout.
Hello Florian, On 22 July 2015 at 18:02, Florian Weimer <fweimer@redhat.com> wrote: > On 07/21/2015 05:03 PM, Michael Kerrisk (man-pages) wrote: >> Hello Florian, >> >> Thanks for your comments, and sorry for the delayed follow-up. >> >> On 07/01/2015 02:37 PM, Florian Weimer wrote: >>> On 06/26/2015 10:05 PM, Michael Kerrisk (man-pages) wrote: >>> >>>> +.SS Handling systems with more than 1024 CPUs >>>> +The >>>> +.I cpu_set_t >>>> +data type used by glibc has a fixed size of 128 bytes, >>>> +meaning that the maximum CPU number that can be represented is 1023. >>>> +.\" FIXME . See https://sourceware.org/bugzilla/show_bug.cgi?id=15630 >>>> +.\" and https://sourceware.org/ml/libc-alpha/2013-07/msg00288.html >>>> +If the system has more than 1024 CPUs, then calls of the form: >>>> + >>>> + sched_getaffinity(pid, sizeof(cpu_set_t), &mask); >>>> + >>>> +will fail with the error >>>> +.BR EINVAL , >>>> +the error produced by the underlying system call for the case where the >>>> +.I mask >>>> +size specified in >>>> +.I cpusetsize >>>> +is smaller than the size of the affinity mask used by the kernel. >>> >>> I think it is best to leave this as unspecified as possible. Kernel >>> behavior already changed once, and I can imagine it changing again. >> >> Hmmm. Something needs to be said about what the kernel is doing though. >> Otherwise, it's hard to make sense of this subsection. Did you have a >> suggested rewording that removes the piece you find problematic? > > What about this? > > “If the kernel affinity mask is larger than 1024 then > … > is smaller than the size of the affinity mask used by the kernel. > Depending on the system CPU topology, the kernel affinity mask can > be substantially larger than the number of active CPUs in the system. > ” Looks good. I've taken that. > I.e., make clear that the size of the mask can be quite different from > the CPU count. > >> Handling systems with more than 1024 CPUs >> The underlying system calls (which represent CPU masks as bit >> masks of type unsigned long *) impose no restriction on the >> size of the CPU mask. However, the cpu_set_t data type used by >> glibc has a fixed size of 128 bytes, meaning that the maximum >> CPU number that can be represented is 1023. If the system has >> more than 1024 CPUs, then calls of the form: >> >> sched_getaffinity(pid, sizeof(cpu_set_t), &mask); >> >> will fail with the error EINVAL, the error produced by the >> underlying system call for the case where the mask size speci‐ >> fied in cpusetsize is smaller than the size of the affinity >> mask used by the kernel. >> >> When working on systems with more than 1024 CPUs, one must >> dynamically allocate the mask argument. Currently, the only >> way to do this is by probing for the size of the required mask >> using sched_getaffinity() calls with increasing mask sizes >> (until the call does not fail with the error EINVAL). >> >> Better? > > “more than 1024 CPUs” should be “large [kernel CPU] affinity masks” > throughout. Done. Thanks for your further input. So now we have: C library/kernel differences This manual page describes the glibc interface for the CPU affin‐ ity calls. The actual system call interface is slightly differ‐ ent, with the mask being typed as unsigned long *, reflecting the fact that the underlying implementation of CPU sets is a simple bit mask. On success, the raw sched_getaffinity() system call returns the size (in bytes) of the cpumask_t data type that is used internally by the kernel to represent the CPU set bit mask. Handling systems with large CPU affinity masks The underlying system calls (which represent CPU masks as bit masks of type unsigned long *) impose no restriction on the size of the CPU mask. However, the cpu_set_t data type used by glibc has a fixed size of 128 bytes, meaning that the maximum CPU num‐ ber that can be represented is 1023. If the kernel CPU affinity mask is larger than 1024, then calls of the form: sched_getaffinity(pid, sizeof(cpu_set_t), &mask); will fail with the error EINVAL, the error produced by the under‐ lying system call for the case where the mask size specified in cpusetsize is smaller than the size of the affinity mask used by the kernel. (Depending on the system CPU topology, the kernel affinity mask can be substantially larger than the number of active CPUs in the system.) When working on systems with large kernel CPU affinity masks, one must dynamically allocate the mask argument. Currently, the only way to do this is by probing for the size of the required mask using sched_getaffinity() calls with increasing mask sizes (until the call does not fail with the error EINVAL). Cheers, Michael
--- a/man2/sched_setaffinity.2 +++ b/man2/sched_setaffinity.2 @@ -223,6 +223,47 @@ system call returns the size (in bytes) of the .I cpumask_t data type that is used internally by the kernel to represent the CPU set bit mask. +.SS Handling systems with more than 1024 CPUs +The +.I cpu_set_t +data type used by glibc has a fixed size of 128 bytes, +meaning that the maximum CPU number that can be represented is 1023. +.\" FIXME . See https://sourceware.org/bugzilla/show_bug.cgi?id=15630 +.\" and https://sourceware.org/ml/libc-alpha/2013-07/msg00288.html +If the system has more than 1024 CPUs, then calls of the form: + + sched_getaffinity(pid, sizeof(cpu_set_t), &mask); + +will fail with the error +.BR EINVAL , +the error produced by the underlying system call for the case where the +.I mask +size specified in +.I cpusetsize +is smaller than the size of the affinity mask used by the kernel. +.PP +The underlying system calls (which represent CPU masks as bit masks of type +.IR "unsigned long\ *" ) +impose no restriction on the size of the mask. +To handle systems with more than 1024 CPUs, one must dynamically allocate the +.I mask +argument using +.BR CPU_ALLOC (3) +and manipulate the mask using the "_S" macros described in +.BR CPU_ALLOC (3). +Using an allocation based on the number of online CPUs: + + cpu_set_t *mask = CPU_ALLOC(CPU_ALLOC_SIZE( + sysconf(_SC_NPROCESSORS_CONF))); + +is probably sufficient, although the value returned by the +.BR sysconf () +call can in theory change during the lifetime of the process. +Alternatively, one can obtain a value that is guaranteed to be stable for +the lifetime of the process by proby for the size of the required mask using +.BR sched_getaffinity () +calls with increasing mask sizes until the call does not fail with the error +.BR EINVAL . .SH EXAMPLE The program below creates a child process. The parent and child then each assign themselves to a specified CPU