Message ID | 1357148750.21409.17169.camel@edumazet-glaptop |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
Eric Dumazet <eric.dumazet@gmail.com> wrote: > First, thanks for working on this issue. No problem! > It seems the real problem is the epi->event.events = event->events; > which is done without taking ep->lock Yes. I am hoping it is possible to do it without a lock there, but your change is more obviously correct. > While a smp_mb() could reduce the race window, I believe there is still > a race, and the following patch would close it. I'm not an experienced kernel hacker, can you describe where the race would be? > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > index be56b21..25e5c53 100644 > --- a/fs/eventpoll.c > +++ b/fs/eventpoll.c > @@ -1313,7 +1313,10 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even > * otherwise we might miss an event that happens between the > * f_op->poll() call and the new event set registering. > */ > + spin_lock_irq(&ep->lock); > epi->event.events = event->events; > + spin_unlock_irq(&ep->lock); > + > pt._key = event->events; > epi->event.data = event->data; /* protected by mtx */ > if (epi->event.events & EPOLLWAKEUP) { -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2013-01-02 at 18:40 +0000, Eric Wong wrote: > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > First, thanks for working on this issue. > > No problem! > > > It seems the real problem is the epi->event.events = event->events; > > which is done without taking ep->lock > > Yes. I am hoping it is possible to do it without a lock there, > but your change is more obviously correct. > > > While a smp_mb() could reduce the race window, I believe there is still > > a race, and the following patch would close it. > > I'm not an experienced kernel hacker, can you describe where the race > would be? It would be for example in ep_send_events_proc() doing : if (epi->event.events & EPOLLONESHOT) epi->event.events &= EP_PRIVATE_BITS; And this could happen at the same time. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2013-01-02 at 18:40 +0000, Eric Wong wrote: > > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > It seems the real problem is the epi->event.events = event->events; > > > which is done without taking ep->lock > > > > Yes. I am hoping it is possible to do it without a lock there, > > but your change is more obviously correct. > > > > > While a smp_mb() could reduce the race window, I believe there is still > > > a race, and the following patch would close it. > > > > I'm not an experienced kernel hacker, can you describe where the race > > would be? > > It would be for example in ep_send_events_proc() doing : > > if (epi->event.events & EPOLLONESHOT) > epi->event.events &= EP_PRIVATE_BITS; > > And this could happen at the same time. That modification in ep_send_events_proc() is protected by ep->mtx (as is ep_modify()), though. Maybe there are other places, but I don't see it. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2013-01-02 at 19:32 +0000, Eric Wong wrote: > That modification in ep_send_events_proc() is protected by ep->mtx > (as is ep_modify()), though. Maybe there are other places, but I > don't see it. Yes, and using a mutex for protecting this field while its read from interrupt context (so without mutex synch help) is why there were races. Some users rely on barriers included in spin_lock/spin_unlock, others in explicit barriers, or before your patch pure luck. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/eventpoll.c b/fs/eventpoll.c index be56b21..25e5c53 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1313,7 +1313,10 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even * otherwise we might miss an event that happens between the * f_op->poll() call and the new event set registering. */ + spin_lock_irq(&ep->lock); epi->event.events = event->events; + spin_unlock_irq(&ep->lock); + pt._key = event->events; epi->event.data = event->data; /* protected by mtx */ if (epi->event.events & EPOLLWAKEUP) {