nfnetlink: avoid unbound loop on busy Netlink socket
I see a problem with how ctnetlink GET requests are being
processed in the kernel (2.6.32.24) under high load.
The sympton is Netlink looping around nfnetlink_rcv_msg(), which
is just because netlink_unicast() came back with EAGAIN when
trying to write the newly created Netlink skb to the SK receive
buffer in ctnetlink_get_conntrack(). In this case a (possibly)
infinit loop is entered. Mostly infinit I think in case the
userland party trying to receive those messages may be stuck in
the sendmsg() call, being unable to read anything if being single
threaded.
I tried to reproduce several times, a few times the loop
disappeared and the box proceeded normally after some minutes.
I have no explanation for this.
The attached patch tries to solve it by simple not trying again
to netlink_unicast() the reply skb and just fail with -ENOBUFS.
The reasoning is that at the point a Netlink overrun is detected
it seems counter intuitive to insist on sending one more Netlink
message.
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
===================================================================
@@ -138,7 +138,6 @@
return 0;
type = nlh->nlmsg_type;
-replay:
ss = nfnetlink_get_subsys(type);
if (!ss) {
#ifdef CONFIG_MODULES
@@ -169,7 +168,7 @@
err = nc->call(net->nfnl, skb, nlh, (const struct nlattr **)cda);
if (err == -EAGAIN)
- goto replay;
+ err = -ENOBUFS;
return err;
}
}