diff mbox

pselect: Use linux pselect syscall when available

Message ID 1450313828-6669-1-git-send-email-nic.dade@gmail.com
State New
Headers show

Commit Message

Nicolas S. Dade Dec. 17, 2015, 12:57 a.m. UTC
Linux has a pselect syscall since 2.6.something. Using it
rather than emulating it with sigprocmask+select+sigprocmask
is smaller code, and works properly. (The emulation has
race conditions when unblocked signals arrive before or
after the select)

The tv.nsec >= 1E9 handling comes from uclibc's linux select()
implementation, which itself uses pselect() internally if the
pselect syscall exists. I though it would be good to do the
same here.

Signed-off-by: Nicolas S. Dade <nic.dade@gmail.com>
---
 libc/sysdeps/linux/common/pselect.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

Comments

Waldemar Brodkorb Dec. 17, 2015, 8:05 p.m. UTC | #1
Hi Nicolas,
Nicolas S. Dade wrote,

> Linux has a pselect syscall since 2.6.something. Using it
> rather than emulating it with sigprocmask+select+sigprocmask
> is smaller code, and works properly. (The emulation has
> race conditions when unblocked signals arrive before or
> after the select)

Did you see the race condition in real code or do you have 
a testcase for it?

best regards
 Waldemar
Rich Felker Dec. 18, 2015, 6:37 p.m. UTC | #2
On Thu, Dec 17, 2015 at 09:05:48PM +0100, Waldemar Brodkorb wrote:
> Hi Nicolas,
> Nicolas S. Dade wrote,
> 
> > Linux has a pselect syscall since 2.6.something. Using it
> > rather than emulating it with sigprocmask+select+sigprocmask
> > is smaller code, and works properly. (The emulation has
> > race conditions when unblocked signals arrive before or
> > after the select)
> 
> Did you see the race condition in real code or do you have 
> a testcase for it?

The race conditions should be easy to reproduce since you have a
relatively huge syscall-latency-length (several us) window to race
with. Fake pselect implementations are historically known to be
harmful; the whole reason pselect was invented was to fix these races
in old code that used sigprocmask followed by select.

Rich
Nicolas S. Dade Dec. 19, 2015, 3:28 a.m. UTC | #3
> real code / testcase

Should be easy.... yup. With an unpatched uclibc this code hangs in
pselect() because the pending signal fires during the window between the
sigprocmkas() and the select(). A proper pselect() returns with errno EINTR.

---------------------------------------------------------------------------------------------
#define _GNU_SOURCE // gimme the good stuff

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/select.h>

// our SIGALRM handler
void handler(int signum) {
        (void)signum;
        write(1, "got signal\n", 11);
}

int main(int argc, char** argv) {
        (void)argc;
        (void)argv;

        // block SIGALRM. We want to handle it only when we're ready
        sigset_t wait_mask, mask_sigchld;
        sigemptyset(&mask_sigchld);
        sigaddset(&mask_sigchld, SIGALRM);
        sigprocmask(SIG_BLOCK,  &mask_sigchld, &wait_mask);
        sigdelset(&wait_mask, SIGALRM);

        // register a signal handler so we can see when the signal arrives
        struct sigaction act;
        memset(&act, 0, sizeof(act));
        sigemptyset(&act.sa_mask); // just in case an empty set isn't all
0's (total paranoia)
        act.sa_handler = handler;
        sigaction(SIGALRM, &act, NULL);

        // send ourselves a SIGARLM. It will pend until we unblock that
signal in pselect()
        printf("sending ourselves a signal\n");
        kill(getpid(), SIGALRM);

        printf("signal is pending; calling pselect()\n");
        int rc = pselect(0, NULL, NULL, NULL, NULL, &wait_mask);
        int e = errno;
        printf("pselect() returned %d, errno %d (%s)\n", rc, e,
strerror(e));

        return 0;
}


On Fri, Dec 18, 2015 at 10:37 AM, Rich Felker <dalias@libc.org> wrote:

> On Thu, Dec 17, 2015 at 09:05:48PM +0100, Waldemar Brodkorb wrote:
> > Hi Nicolas,
> > Nicolas S. Dade wrote,
> >
> > > Linux has a pselect syscall since 2.6.something. Using it
> > > rather than emulating it with sigprocmask+select+sigprocmask
> > > is smaller code, and works properly. (The emulation has
> > > race conditions when unblocked signals arrive before or
> > > after the select)
> >
> > Did you see the race condition in real code or do you have
> > a testcase for it?
>
> The race conditions should be easy to reproduce since you have a
> relatively huge syscall-latency-length (several us) window to race
> with. Fake pselect implementations are historically known to be
> harmful; the whole reason pselect was invented was to fix these races
> in old code that used sigprocmask followed by select.
>
> Rich
>
Nicolas S. Dade Dec. 19, 2015, 3:32 a.m. UTC | #4
Off list a kind person pointed out that the pselect syscall really takes 7
arguments, and that my patch worked (or rather, didn't fail) for me by dumb
luck. Please disregard the original patch. I'll resend a new patch shortly.

-Nicolas S. Dade
diff mbox

Patch

diff --git a/libc/sysdeps/linux/common/pselect.c b/libc/sysdeps/linux/common/pselect.c
index bf19ce3..928577e 100644
--- a/libc/sysdeps/linux/common/pselect.c
+++ b/libc/sysdeps/linux/common/pselect.c
@@ -30,6 +30,32 @@  static int __NC(pselect)(int nfds, fd_set *readfds, fd_set *writefds,
 			 fd_set *exceptfds, const struct timespec *timeout,
 			 const sigset_t *sigmask)
 {
+#ifdef __NR_pselect6
+#define NSEC_PER_SEC 1000000000L
+	struct timespec _ts, *ts = 0;
+	if (timeout) {
+		/* The Linux kernel can in some situations update the timeout value.
+		 * We do not want that so use a local variable.
+		 */
+		_ts.tv_sec = timeout->tv_sec;
+		_ts.tv_nsec = timeout->tv_nsec;
+
+		/* GNU extension: allow for timespec values where the sub-sec
+		* field is equal to or more than 1 second.  The kernel will
+		* reject this on us, so take care of the time shift ourself.
+		* Some applications (like readline and linphone) do this.
+		* See 'clarification on select() type calls and invalid timeouts'
+		* on the POSIX general list for more information.
+		*/
+		if (_ts.tv_nsec >= NSEC_PER_SEC) {
+			_ts.tv_sec += _ts.tv_nsec / NSEC_PER_SEC;
+			_ts.tv_nsec %= NSEC_PER_SEC;
+		}
+
+		ts = &_ts;
+	}
+	return INLINE_SYSCALL(pselect6, 6, nfds, readfds, writefds, exceptfds, ts, sigmask);
+#else
 	struct timeval tval;
 	int retval;
 	sigset_t savemask;
@@ -57,6 +83,7 @@  static int __NC(pselect)(int nfds, fd_set *readfds, fd_set *writefds,
 		sigprocmask (SIG_SETMASK, &savemask, NULL);
 
 	return retval;
+#endif
 }
 CANCELLABLE_SYSCALL(int, pselect, (int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds,
 				   const struct timespec *timeout, const sigset_t *sigmask),