Message ID | 20170914211441.67326-1-willemb@google.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Series | [net] packet: hold bind lock when rebinding to fanout hook | expand |
On Thu, 2017-09-14 at 17:14 -0400, Willem de Bruijn wrote: > Packet socket bind operations must hold the po->bind_lock. This keeps > po->running consistent with whether the socket is actually on a ptype > list to receive packets. > > fanout_add unbinds a socket and its packet_rcv/tpacket_rcv call, then > binds the fanout object to receive through packet_rcv_fanout. > > Make it hold the po->bind_lock when testing po->running and rebinding. > Else, it can race with other rebind operations, such as that in > packet_set_ring from packet_rcv to tpacket_rcv. Concurrent updates > can result in a socket being added to a fanout group twice, causing > use-after-free KASAN bug reports, among others. > > Reported independently by both trinity and syzkaller. > Verified that the syzkaller reproducer passes after this patch. > > Reported-by: nixioaming <nixiaoming@huawei.com> > Signed-off-by: Willem de Bruijn <willemb@google.com> > --- > net/packet/af_packet.c | 16 +++++++++++----- > 1 file changed, 11 insertions(+), 5 deletions(-) > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > index c26172995511..d288f52c53f7 100644 > --- a/net/packet/af_packet.c > +++ b/net/packet/af_packet.c > @@ -1684,10 +1684,6 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags) > > mutex_lock(&fanout_mutex); > > - err = -EINVAL; > - if (!po->running) > - goto out; > - > err = -EALREADY; > if (po->fanout) > goto out; > @@ -1749,7 +1745,10 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags) > list_add(&match->list, &fanout_list); > } > err = -EINVAL; > - if (match->type == type && > + > + spin_lock(&po->bind_lock); > + if (po->running && > + match->type == type && > match->prot_hook.type == po->prot_hook.type && > match->prot_hook.dev == po->prot_hook.dev) { > err = -ENOSPC; > @@ -1761,6 +1760,13 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags) > err = 0; > } > } > + spin_unlock(&po->bind_lock); > + > + if (err && !refcount_read(&match->sk_ref)) { It seems sk_ref is always read/changed under mutex_lock(&fanout_mutex) protection. Not sure why we use a refcount_t (or an atomic_t in older kernels) All these atomic/spinlock/mutexes are a maze. > + list_del(&match->list); > + kfree(match); > + } > + > out: > if (err && rollover) { > kfree(rollover);
On Thu, Sep 14, 2017 at 5:14 PM, Willem de Bruijn <willemb@google.com> wrote: > Packet socket bind operations must hold the po->bind_lock. This keeps > po->running consistent with whether the socket is actually on a ptype > list to receive packets. > > fanout_add unbinds a socket and its packet_rcv/tpacket_rcv call, then > binds the fanout object to receive through packet_rcv_fanout. > > Make it hold the po->bind_lock when testing po->running and rebinding. > Else, it can race with other rebind operations, such as that in > packet_set_ring from packet_rcv to tpacket_rcv. Concurrent updates > can result in a socket being added to a fanout group twice, causing > use-after-free KASAN bug reports, among others. > > Reported independently by both trinity and syzkaller. > Verified that the syzkaller reproducer passes after this patch. > I forgot to add the Fixes tag, sorry. Fixes: dc99f600698d ("packet: Add fanout support.") > Reported-by: nixioaming <nixiaoming@huawei.com> > Signed-off-by: Willem de Bruijn <willemb@google.com>
On Thu, Sep 14, 2017 at 2:14 PM, Willem de Bruijn <willemb@google.com> wrote: > Packet socket bind operations must hold the po->bind_lock. This keeps > po->running consistent with whether the socket is actually on a ptype > list to receive packets. > > fanout_add unbinds a socket and its packet_rcv/tpacket_rcv call, then > binds the fanout object to receive through packet_rcv_fanout. > > Make it hold the po->bind_lock when testing po->running and rebinding. > Else, it can race with other rebind operations, such as that in > packet_set_ring from packet_rcv to tpacket_rcv. Concurrent updates > can result in a socket being added to a fanout group twice, causing > use-after-free KASAN bug reports, among others. > > Reported independently by both trinity and syzkaller. > Verified that the syzkaller reproducer passes after this patch. > > Reported-by: nixioaming <nixiaoming@huawei.com> > Signed-off-by: Willem de Bruijn <willemb@google.com> > --- > net/packet/af_packet.c | 16 +++++++++++----- > 1 file changed, 11 insertions(+), 5 deletions(-) > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > index c26172995511..d288f52c53f7 100644 > --- a/net/packet/af_packet.c > +++ b/net/packet/af_packet.c > @@ -1684,10 +1684,6 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags) > > mutex_lock(&fanout_mutex); > > - err = -EINVAL; > - if (!po->running) > - goto out; > - > err = -EALREADY; > if (po->fanout) > goto out; > @@ -1749,7 +1745,10 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags) > list_add(&match->list, &fanout_list); > } > err = -EINVAL; > - if (match->type == type && > + > + spin_lock(&po->bind_lock); > + if (po->running && As you move the po->running check later after setting po->rollover, I wonder if po->rollover possibly depends on po>running on other path? > + match->type == type && > match->prot_hook.type == po->prot_hook.type && > match->prot_hook.dev == po->prot_hook.dev) { > err = -ENOSPC; > @@ -1761,6 +1760,13 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags) > err = 0; > } > } > + spin_unlock(&po->bind_lock); > + > + if (err && !refcount_read(&match->sk_ref)) { > + list_del(&match->list); > + kfree(match); > + } This looks correct but still seems odd, it smells you don't use refcnt in an expected way.
>> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c >> index c26172995511..d288f52c53f7 100644 >> --- a/net/packet/af_packet.c >> +++ b/net/packet/af_packet.c >> @@ -1684,10 +1684,6 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags) >> >> mutex_lock(&fanout_mutex); >> >> - err = -EINVAL; >> - if (!po->running) >> - goto out; >> - >> err = -EALREADY; >> if (po->fanout) >> goto out; >> @@ -1749,7 +1745,10 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags) >> list_add(&match->list, &fanout_list); >> } >> err = -EINVAL; >> - if (match->type == type && >> + >> + spin_lock(&po->bind_lock); >> + if (po->running && > > > As you move the po->running check later after setting po->rollover, I wonder > if po->rollover possibly depends on po>running on other path? The rollover code does not explicitly check po->running, if that is what you are concerned about. A newly allocated po->rollover structure will also only be accessed from the datapath once the socket is added to a fanout group. Both before and after this patch, that happens well after the po->running check. If the socket already had a po->rollover, then it must already have had a po->fanout, too, so fanout_add does not reach this code. >> + match->type == type && >> match->prot_hook.type == po->prot_hook.type && >> match->prot_hook.dev == po->prot_hook.dev) { >> err = -ENOSPC; >> @@ -1761,6 +1760,13 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags) >> err = 0; >> } >> } >> + spin_unlock(&po->bind_lock); >> + >> + if (err && !refcount_read(&match->sk_ref)) { >> + list_del(&match->list); >> + kfree(match); >> + } > > This looks correct but still seems odd, it smells you don't use refcnt in an > expected way. It tests whether the object has no references, in which case it must have been newly allocated and the fanout join operation must have failed. I don't see an obviously simpler patch. The entire code could perhaps be restructured eventually. There is, for instance, no reason to test match->{type, prot_hook.type, prot_hook.dev} when having just allocated the structure. Nor to test whether sk_ref exceeds PACKET_FANOUT_MAX. Conversely, this sk_ref == 0 test only makes sense when having taken the (!match) branch earlier. But a refactor is out of scope for a bug fix.
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Date: Fri, 15 Sep 2017 10:07:46 -0400 > On Thu, Sep 14, 2017 at 5:14 PM, Willem de Bruijn <willemb@google.com> wrote: >> Packet socket bind operations must hold the po->bind_lock. This keeps >> po->running consistent with whether the socket is actually on a ptype >> list to receive packets. >> >> fanout_add unbinds a socket and its packet_rcv/tpacket_rcv call, then >> binds the fanout object to receive through packet_rcv_fanout. >> >> Make it hold the po->bind_lock when testing po->running and rebinding. >> Else, it can race with other rebind operations, such as that in >> packet_set_ring from packet_rcv to tpacket_rcv. Concurrent updates >> can result in a socket being added to a fanout group twice, causing >> use-after-free KASAN bug reports, among others. >> >> Reported independently by both trinity and syzkaller. >> Verified that the syzkaller reproducer passes after this patch. >> > > I forgot to add the Fixes tag, sorry. > > Fixes: dc99f600698d ("packet: Add fanout support.") Applied and queued up for stable as it fixes this race and I can't see any new problems it introduces. But boy is this one messy area. The scariest thing to me now is the save/restore sequence done by packet_set_ring(), for example. spin_lock(&po->bind_lock); was_running = po->running; num = po->num; if (was_running) { po->num = 0; __unregister_prot_hook(sk, false); } spin_unlock(&po->bind_lock); ... spin_lock(&po->bind_lock); if (was_running) { po->num = num; register_prot_hook(sk); } spin_unlock(&po->bind_lock); The socket is also locked during this sequence but that doesn't prevent parallel changes to the running state. Since po->bind_lock is dropped, it's possible for another thread to grab bind_lock and bind it meanwhile. The above code seems to assume that can't happen, and that register_prot_hook() will always see po->running set to zero and rebind the socket. If the race happens we'll have weird state, because we did not rebind yet we modified po->num. We seem to have a hierachy of sleeping and non-sleeping locks that do not work well together.
On Wed, Sep 20, 2017 at 5:03 PM, David Miller <davem@davemloft.net> wrote: > From: Willem de Bruijn <willemdebruijn.kernel@gmail.com> > Date: Fri, 15 Sep 2017 10:07:46 -0400 > >> On Thu, Sep 14, 2017 at 5:14 PM, Willem de Bruijn <willemb@google.com> wrote: >>> Packet socket bind operations must hold the po->bind_lock. This keeps >>> po->running consistent with whether the socket is actually on a ptype >>> list to receive packets. >>> >>> fanout_add unbinds a socket and its packet_rcv/tpacket_rcv call, then >>> binds the fanout object to receive through packet_rcv_fanout. >>> >>> Make it hold the po->bind_lock when testing po->running and rebinding. >>> Else, it can race with other rebind operations, such as that in >>> packet_set_ring from packet_rcv to tpacket_rcv. Concurrent updates >>> can result in a socket being added to a fanout group twice, causing >>> use-after-free KASAN bug reports, among others. >>> >>> Reported independently by both trinity and syzkaller. >>> Verified that the syzkaller reproducer passes after this patch. >>> >> >> I forgot to add the Fixes tag, sorry. >> >> Fixes: dc99f600698d ("packet: Add fanout support.") > > Applied and queued up for stable as it fixes this race and I can't > see any new problems it introduces. > > But boy is this one messy area. Yeah, I've been staring at this code for a while now. But I don't see an obvious way to simplify it. packet_notifier does not hold the socket lock, so even if all of bind, set_ring and fanout_add do hold that, bind_lock is still needed. packet_mmap may not take the socket lock, so pg_vec_lock must remain to synchronize with packet_set_ring even if for no other reason. I had a look at using rcu pointers for rx_ring and tx_ring, to avoid taking that lock in the datapath and possibly updating without the unbind/bind dance. But that update needs to be atomic with purging the socket queue, so the socket must be properly quiesced. > The scariest thing to me now is the save/restore sequence done by > packet_set_ring(), for example. > > spin_lock(&po->bind_lock); > was_running = po->running; > num = po->num; > if (was_running) { > po->num = 0; > __unregister_prot_hook(sk, false); > } > spin_unlock(&po->bind_lock); > ... > spin_lock(&po->bind_lock); > if (was_running) { > po->num = num; > register_prot_hook(sk); > } > spin_unlock(&po->bind_lock); > > The socket is also locked during this sequence but that doesn't > prevent parallel changes to the running state. > > Since po->bind_lock is dropped, it's possible for another thread > to grab bind_lock and bind it meanwhile. > > The above code seems to assume that can't happen, and that > register_prot_hook() will always see po->running set to zero > and rebind the socket. > > If the race happens we'll have weird state, because we did not > rebind yet we modified po->num. It appears that the only path that may try to bind without holding the socket lock is packet_notifier. That skips register_prot_hook if !po->num. > We seem to have a hierachy of sleeping and non-sleeping locks > that do not work well together. Given the number recent bugs that were fixed by locking the socket inside a particular setsockopt case, I think that we should lock that entire function, similar to other protocol families (after verifying that all cases can indeed sleep). Another issue that looks fragile is the test for po->fanout in packet_do_bind before taking the socket lock.
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index c26172995511..d288f52c53f7 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1684,10 +1684,6 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags) mutex_lock(&fanout_mutex); - err = -EINVAL; - if (!po->running) - goto out; - err = -EALREADY; if (po->fanout) goto out; @@ -1749,7 +1745,10 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags) list_add(&match->list, &fanout_list); } err = -EINVAL; - if (match->type == type && + + spin_lock(&po->bind_lock); + if (po->running && + match->type == type && match->prot_hook.type == po->prot_hook.type && match->prot_hook.dev == po->prot_hook.dev) { err = -ENOSPC; @@ -1761,6 +1760,13 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags) err = 0; } } + spin_unlock(&po->bind_lock); + + if (err && !refcount_read(&match->sk_ref)) { + list_del(&match->list); + kfree(match); + } + out: if (err && rollover) { kfree(rollover);
Packet socket bind operations must hold the po->bind_lock. This keeps po->running consistent with whether the socket is actually on a ptype list to receive packets. fanout_add unbinds a socket and its packet_rcv/tpacket_rcv call, then binds the fanout object to receive through packet_rcv_fanout. Make it hold the po->bind_lock when testing po->running and rebinding. Else, it can race with other rebind operations, such as that in packet_set_ring from packet_rcv to tpacket_rcv. Concurrent updates can result in a socket being added to a fanout group twice, causing use-after-free KASAN bug reports, among others. Reported independently by both trinity and syzkaller. Verified that the syzkaller reproducer passes after this patch. Reported-by: nixioaming <nixiaoming@huawei.com> Signed-off-by: Willem de Bruijn <willemb@google.com> --- net/packet/af_packet.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-)