diff mbox

pselect: Use linux pselect syscall when available

Message ID 1450496359-12069-1-git-send-email-nic.dade@gmail.com
State New
Headers show

Commit Message

Nicolas S. Dade Dec. 19, 2015, 3:39 a.m. UTC
This supercedes the previous pselect patch from 16 Dec 2015.

Linux has a pselect syscall since 2.6.something. Using it
rather than emulating it with sigprocmask+select+sigprocmask
is smaller code, and works properly. (The emulation has
race conditions when unblocked signals arrive before or
after the select)

The tv.nsec >= 1E9 handling comes from uclibc's linux select()
implementation, which itself uses pselect() internally if the
pselect syscall exists. I though it would be good to do the
same here.

Note that although the libc pselect() API has 6 arguments,
the linux kernel syscall as 7 arguments. There is an extra,
somewhat vestigial, sizeof the signal mask argument.

Signed-off-by: Nicolas S. Dade <nic.dade@gmail.com>
---
 libc/sysdeps/linux/common/pselect.c | 52 +++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

Comments

Rich Felker Dec. 19, 2015, 4:44 a.m. UTC | #1
On Fri, Dec 18, 2015 at 07:39:19PM -0800, Nicolas S. Dade wrote:
> This supercedes the previous pselect patch from 16 Dec 2015.
> 
> Linux has a pselect syscall since 2.6.something. Using it
> rather than emulating it with sigprocmask+select+sigprocmask
> is smaller code, and works properly. (The emulation has
> race conditions when unblocked signals arrive before or
> after the select)
> 
> The tv.nsec >= 1E9 handling comes from uclibc's linux select()
> implementation, which itself uses pselect() internally if the
> pselect syscall exists. I though it would be good to do the
> same here.
> 
> Note that although the libc pselect() API has 6 arguments,
> the linux kernel syscall as 7 arguments. There is an extra,
> somewhat vestigial, sizeof the signal mask argument.
> 
> Signed-off-by: Nicolas S. Dade <nic.dade@gmail.com>
> ---
>  libc/sysdeps/linux/common/pselect.c | 52 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
> 
> diff --git a/libc/sysdeps/linux/common/pselect.c b/libc/sysdeps/linux/common/pselect.c
> index bf19ce3..3f1dd28 100644
> --- a/libc/sysdeps/linux/common/pselect.c
> +++ b/libc/sysdeps/linux/common/pselect.c
> @@ -30,6 +30,57 @@ static int __NC(pselect)(int nfds, fd_set *readfds, fd_set *writefds,
>  			 fd_set *exceptfds, const struct timespec *timeout,
>  			 const sigset_t *sigmask)
>  {
> +#ifdef __NR_pselect6
> +#define NSEC_PER_SEC 1000000000L
> +	struct timespec _ts, *ts = 0;
> +	if (timeout) {
> +		/* The Linux kernel can in some situations update the timeout value.
> +		 * We do not want that so use a local variable.
> +		 */
> +		_ts = *timeout;
> +
> +		/* GNU extension: allow for timespec values where the sub-sec
> +		* field is equal to or more than 1 second.  The kernel will
> +		* reject this on us, so take care of the time shift ourself.
> +		* Some applications (like readline and linphone) do this.
> +		* See 'clarification on select() type calls and invalid timeouts'
> +		* on the POSIX general list for more information.
> +		*/
> +		if (_ts.tv_nsec >= NSEC_PER_SEC) {
> +			_ts.tv_sec += _ts.tv_nsec / NSEC_PER_SEC;
> +			_ts.tv_nsec %= NSEC_PER_SEC;
> +		}
> +
> +		ts = &_ts;
> +	}
> +
> +	/* The pselect6 syscall API is strange. It wants a 7th arg to be
> +	 * the sizeof(*sigmask). However syscalls with > 6 arguments aren't
> +	 * supported on linux. So arguments 6 and 7 are stuffed in a struct
> +	 * and a pointer to that struct is passed as the 6th argument to
> +	 * the syscall.
> +	 * Glibc stuffs arguments 6 and 7 in a ulong[2]. Linux reads
> +	 * them as if there were a struct { sigset_t*; size_t } in
> +	 * userspace. There woudl be trouble if userspace and the kernel are
> +	 * compiled differently enough that size_t isn't the same as ulong,
> +	 * but not enough to trigger the compat layer in linux. I can't
> +	 * think of such a case, so I'm using linux's struct.
> +	 * Furthermore Glibc sets the sigsetsize to _NSIG/8. However linux
> +	 * checks for sizeof(sigset_t), which internally is a ulong array.
> +	 * This means that if _NSIG isn't a multiple of BITS_PER_LONG then
> +	 * linux will refuse glibc's value. So I prefer sizeof(sigset_t) for
> +	 * the value of sigsetsize.
> +	 */
> +	struct {
> +		const sigset_t *sigmask;
> +		size_t sigsetsize;
> +	} args67 = {
> +		sigmask,
> +		sizeof(sigset_t),
> +	};

This might work if uclibc defines sigset_t not to have the ridiculous
expansion space for 1024 signals like glibc does, but it still seems
fragile. I'd use _NSIG/8. That's what we do in musl. (If the size you
pass to the kernel is wrong you get EINVAL or something like that.)

Rich
Waldemar Brodkorb Dec. 19, 2015, 8:58 a.m. UTC | #2
Hi Nicolas,
Nicolas S. Dade wrote,

> This supercedes the previous pselect patch from 16 Dec 2015.

Can you resend next time with PATCH v3 in the subject and
a short Changelog after the comment ---. Thanks.

What you think about Rich comment?

best regards
 Waldemar
Nicolas S. Dade Dec. 19, 2015, 7:34 p.m. UTC | #3
> v3

Sure. Next time.

> if ... expansion space

Well uclibc does not do that. It defines sigset_t to match the kernel, and
even has a comment about that.

/* A 'sigset_t' has a bit for each signal.
 * glibc has space for 1024 signals (!), but most arches supported
 * by Linux have 64 signals, and only MIPS has 128.
 * There seems to be some historical baggage in sparc[64]
 * where they might have (or had in the past) 32 signals only,
 * I hope it's irrelevant now.
 * Signal 0 does not exist, so we have signals 1..64, not 0..63.
 * In uclibc, kernel and userspace sigset_t is always the same.
 * BTW, struct sigaction is also the same on kernel and userspace side.
 */
#if defined(__mips__)
# define _SIGSET_NWORDS (128 / (8 * sizeof (unsigned long)))
#else
# define _SIGSET_NWORDS (64 / (8 * sizeof (unsigned long)))
#endif
typedef struct {
    unsigned long __val[_SIGSET_NWORDS];
} __sigset_t;


The patched uclibc pselect works properly on ARM (32-bit), both in behavior
and watching the operation in strace. ARM is the only uclibc target to
which I have easy access.

-Nicolas Dade

On Sat, Dec 19, 2015 at 12:58 AM, Waldemar Brodkorb <wbx@openadk.org> wrote:

> Hi Nicolas,
> Nicolas S. Dade wrote,
>
> > This supercedes the previous pselect patch from 16 Dec 2015.
>
> Can you resend next time with PATCH v3 in the subject and
> a short Changelog after the comment ---. Thanks.
>
> What you think about Rich comment?
>
> best regards
>  Waldemar
>
diff mbox

Patch

diff --git a/libc/sysdeps/linux/common/pselect.c b/libc/sysdeps/linux/common/pselect.c
index bf19ce3..3f1dd28 100644
--- a/libc/sysdeps/linux/common/pselect.c
+++ b/libc/sysdeps/linux/common/pselect.c
@@ -30,6 +30,57 @@  static int __NC(pselect)(int nfds, fd_set *readfds, fd_set *writefds,
 			 fd_set *exceptfds, const struct timespec *timeout,
 			 const sigset_t *sigmask)
 {
+#ifdef __NR_pselect6
+#define NSEC_PER_SEC 1000000000L
+	struct timespec _ts, *ts = 0;
+	if (timeout) {
+		/* The Linux kernel can in some situations update the timeout value.
+		 * We do not want that so use a local variable.
+		 */
+		_ts = *timeout;
+
+		/* GNU extension: allow for timespec values where the sub-sec
+		* field is equal to or more than 1 second.  The kernel will
+		* reject this on us, so take care of the time shift ourself.
+		* Some applications (like readline and linphone) do this.
+		* See 'clarification on select() type calls and invalid timeouts'
+		* on the POSIX general list for more information.
+		*/
+		if (_ts.tv_nsec >= NSEC_PER_SEC) {
+			_ts.tv_sec += _ts.tv_nsec / NSEC_PER_SEC;
+			_ts.tv_nsec %= NSEC_PER_SEC;
+		}
+
+		ts = &_ts;
+	}
+
+	/* The pselect6 syscall API is strange. It wants a 7th arg to be
+	 * the sizeof(*sigmask). However syscalls with > 6 arguments aren't
+	 * supported on linux. So arguments 6 and 7 are stuffed in a struct
+	 * and a pointer to that struct is passed as the 6th argument to
+	 * the syscall.
+	 * Glibc stuffs arguments 6 and 7 in a ulong[2]. Linux reads
+	 * them as if there were a struct { sigset_t*; size_t } in
+	 * userspace. There woudl be trouble if userspace and the kernel are
+	 * compiled differently enough that size_t isn't the same as ulong,
+	 * but not enough to trigger the compat layer in linux. I can't
+	 * think of such a case, so I'm using linux's struct.
+	 * Furthermore Glibc sets the sigsetsize to _NSIG/8. However linux
+	 * checks for sizeof(sigset_t), which internally is a ulong array.
+	 * This means that if _NSIG isn't a multiple of BITS_PER_LONG then
+	 * linux will refuse glibc's value. So I prefer sizeof(sigset_t) for
+	 * the value of sigsetsize.
+	 */
+	struct {
+		const sigset_t *sigmask;
+		size_t sigsetsize;
+	} args67 = {
+		sigmask,
+		sizeof(sigset_t),
+	};
+
+	return INLINE_SYSCALL(pselect6, 6, nfds, readfds, writefds, exceptfds, ts, &args67);
+#else
 	struct timeval tval;
 	int retval;
 	sigset_t savemask;
@@ -57,6 +108,7 @@  static int __NC(pselect)(int nfds, fd_set *readfds, fd_set *writefds,
 		sigprocmask (SIG_SETMASK, &savemask, NULL);
 
 	return retval;
+#endif
 }
 CANCELLABLE_SYSCALL(int, pselect, (int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds,
 				   const struct timespec *timeout, const sigset_t *sigmask),