Message ID | 1476946352-15770-1-git-send-email-xiyou.wangcong@gmail.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Le 20/10/2016 à 08:52, Cong Wang a écrit : > A kernel warning inside __local_bh_enable_ip() was reported by people > running SELinux, this is caused due to some SELinux functions > (indirectly) call peernet2id() with IRQ disabled in process context, > when we re-enable BH with IRQ disabled kernel complains. Shut up this > warning by saving IRQ context in peernet2id(), BH is still implicitly > disabled. Franckly, reverting the original patch seems better for me. The intention for that patch was "it's not needed", see the commit log: "We never read or change netns id in hardirq context, the only place we read netns id in softirq context is in vxlan_xmit(). So, it should be enough to just disable BH." Now, we see that "it's needed" and that the analysis was wrong. If a race is introduced by this patch, it will be hard to detect and fix it. Regards, Nicolas
On 10/20/2016 02:52 AM, Cong Wang wrote: > A kernel warning inside __local_bh_enable_ip() was reported by people > running SELinux, this is caused due to some SELinux functions > (indirectly) call peernet2id() with IRQ disabled in process context, > when we re-enable BH with IRQ disabled kernel complains. Shut up this > warning by saving IRQ context in peernet2id(), BH is still implicitly > disabled. Not sure this suffices; kill_fasync() -> send_sigio() -> send_sigio_to_task() -> sigio_perm() -> security_file_send_sigiotask() -> selinux_file_send_sigiotask() -> ... -> audit_log() -> ... -> peernet2id() > > Fixes: bc51dddf98c9 ("netns: avoid disabling irq for netns id") > Reported-by: Stephen Smalley <sds@tycho.nsa.gov> > Reported-by: Elad Raz <e@eladraz.com> > Tested-by: Paul Moore <paul@paul-moore.com> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> > --- > net/core/net_namespace.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c > index 989434f..17b52a2 100644 > --- a/net/core/net_namespace.c > +++ b/net/core/net_namespace.c > @@ -230,11 +230,16 @@ int peernet2id_alloc(struct net *net, struct net *peer) > /* This function returns, if assigned, the id of a peer netns. */ > int peernet2id(struct net *net, struct net *peer) > { > + unsigned long flags; > int id; > > - spin_lock_bh(&net->nsid_lock); > + /* This is ugly, technically we only need to disable BH, however some > + * callers (indirectly) call this with IRQ disabled, which implies > + * that we should save the IRQ context here. > + */ > + spin_lock_irqsave(&net->nsid_lock, flags); > id = __peernet2id(net, peer); > - spin_unlock_bh(&net->nsid_lock); > + spin_unlock_irqrestore(&net->nsid_lock, flags); > return id; > } > EXPORT_SYMBOL(peernet2id); >
On Thu, Oct 20, 2016 at 3:17 AM, Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote: > > Now, we see that "it's needed" and that the analysis was wrong. If a race is > introduced by this patch, it will be hard to detect and fix it. It is _not_ needed for protection, it is needed to shut up a warning, I thought this is pretty clear in the changelog. If we could invent some way to save BH (*lock_bhsave()), we don't need to bother this at all, currently we don't.
On Thu, Oct 20, 2016 at 7:58 AM, Stephen Smalley <sds@tycho.nsa.gov> wrote: > On 10/20/2016 02:52 AM, Cong Wang wrote: >> A kernel warning inside __local_bh_enable_ip() was reported by people >> running SELinux, this is caused due to some SELinux functions >> (indirectly) call peernet2id() with IRQ disabled in process context, >> when we re-enable BH with IRQ disabled kernel complains. Shut up this >> warning by saving IRQ context in peernet2id(), BH is still implicitly >> disabled. > > Not sure this suffices; kill_fasync() -> send_sigio() -> > send_sigio_to_task() -> sigio_perm() -> security_file_send_sigiotask() > -> selinux_file_send_sigiotask() -> ... -> audit_log() -> ... -> > peernet2id() Oh, this is a new one. kill_fasync() is called in IRQ handler, so we actually do multicast in IRQ context.... It makes no sense, netlink multicast could be very expensive if we have many listeners. I am Cc'ing Richard who added that multicast in audit_log_end(). It seems not easy to just move the multicast to a workqueue, since the skb is copied from audit_buffer which is freed immediately after that, probably need another queue like audit_skb_queue.
On Thu, Oct 20, 2016 at 2:29 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote: > On Thu, Oct 20, 2016 at 7:58 AM, Stephen Smalley <sds@tycho.nsa.gov> wrote: >> On 10/20/2016 02:52 AM, Cong Wang wrote: >>> A kernel warning inside __local_bh_enable_ip() was reported by people >>> running SELinux, this is caused due to some SELinux functions >>> (indirectly) call peernet2id() with IRQ disabled in process context, >>> when we re-enable BH with IRQ disabled kernel complains. Shut up this >>> warning by saving IRQ context in peernet2id(), BH is still implicitly >>> disabled. >> >> Not sure this suffices; kill_fasync() -> send_sigio() -> >> send_sigio_to_task() -> sigio_perm() -> security_file_send_sigiotask() >> -> selinux_file_send_sigiotask() -> ... -> audit_log() -> ... -> >> peernet2id() > > Oh, this is a new one. kill_fasync() is called in IRQ handler, so we actually > do multicast in IRQ context.... It makes no sense, netlink multicast could > be very expensive if we have many listeners. I'm sure there are a few others I don't know about, but I believe the only commonly used audit multicast listener is systemd. > I am Cc'ing Richard who added that multicast in audit_log_end(). It seems > not easy to just move the multicast to a workqueue, since the skb is copied > from audit_buffer which is freed immediately after that, probably need another > queue like audit_skb_queue. This approach would double the queue size which is something I want to avoid. I would suggest sticking with a single queue and dealing with the netlink message link fixup and multicast send in the existing netlink unicast thread; basically we would just be moving the multicast code from audit_log_end() into kauditd_thread(). This is the same approach I mentioned earlier off-list. However, that isn't something I want to mess with as a regression fix, mostly because I really want to see this regression gone by -rc2 as it is making SELinux testing a real pain. If the patch posted at the top of this thread isn't a suitable fix, we really should revert the original patch.
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 989434f..17b52a2 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -230,11 +230,16 @@ int peernet2id_alloc(struct net *net, struct net *peer) /* This function returns, if assigned, the id of a peer netns. */ int peernet2id(struct net *net, struct net *peer) { + unsigned long flags; int id; - spin_lock_bh(&net->nsid_lock); + /* This is ugly, technically we only need to disable BH, however some + * callers (indirectly) call this with IRQ disabled, which implies + * that we should save the IRQ context here. + */ + spin_lock_irqsave(&net->nsid_lock, flags); id = __peernet2id(net, peer); - spin_unlock_bh(&net->nsid_lock); + spin_unlock_irqrestore(&net->nsid_lock, flags); return id; } EXPORT_SYMBOL(peernet2id);