diff mbox

[RFC] netlink broadcast return value

Message ID 4985A4C5.4050908@netfilter.org
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Pablo Neira Ayuso Feb. 1, 2009, 1:33 p.m. UTC
Currently, and according to my interpretation of the source code,
netlink_broadcast() return-value reports errors to the caller if no
messages at all were delivered:

1) If, at least, one message has been delivered correctly, returns 0.
2) Otherwise, if no messages at all were delivered due to skb_clone()
failure, return -ENOBUFS.
3) Otherwise, if there are no listeners, return -ESRCH.

I would need to know if the caller has failed delivering any of the
messages to the listeners as follows:

1) If it fails to deliver any message (for whatever reason), return
-ENOBUFS.
2) If all messages were delivered OK, returns 0.
3) If no listeners, return -ESRCH.

In the current ctnetlink code and in Netfilter in general, we can add
reliable logging and connection tracking event delivery by dropping the
packets whose events were not successfully delivered over Netlink. Of
course, this option would be settable via /proc as this approach reduces
performance (in terms of filtered connections per seconds by a stateful
firewall) but providing reliable logging and event delivery (for
conntrackd) in return.

I have check the whole kernel code to look for current users of
netlink_broadcast() to see how they are handling errors reported and how
a change in the return value would affect them. Here it follows a short
summary:

= current list of clients of netlink_broadcast() =
== netlink_broadcast() ==

                                          Handling

drivers/scsi/scsi_transport_iscsi.c	: printk error
drivers/connector/connector.c		: cn_netlink_send() return value
include/net/netlink.h			: nlmsg_multicast() return value
lib/kobject_uevent.c			: ignores return value
net/core/rtnetlink.c			: ignores return value
net/ipv4/netfilter/ipt_ULOG.c		: ignores return value
net/bridge/netfilter/ebt_ulog.c		: ignores return value
net/decnet/netfilter/dn_rtmsg.c		: ignores return value
security/selinux/netlink.c		: ignores return value

== cn_netlink_send (uses netlink_broadcast return value) ==
drivers/w1/w1_netlink.c			: ignores return value
drivers/video/uvesafb.c			: printk error (if err != ESRCH)

== nlmsg_multicast (calls netlink_broadcast) ==
drivers/scsi/scsi_transport_fc.c	: printk error (if err != -ESRCH)
include/net/genetlink.h			: genlmsg_multicast() return value
net/xfrm/xfrm_user.c			: xfrm_send_migrate() return value
					  xfrm_exp_state_notify() return value
					  xfrm_aevent_state_notify() return value
					  xfrm_notify_sa_flush() return value
					  xfrm_notify_sa() return value
					  xfrm_send_acquire() return value
					  xfrm_exp_policy_notify() return value
					  xfrm_notify_policy() return value
					  xfrm_notify_policy_flush() return val
					  xfrm_send_report() return value
					  xfrm_send_mapping() return value
					  ...
					  later they all ignore the return value

== genlmsg_multicast (calls nlmsg_multicast) ==
net/netlink/genetlink.c			: ignores return value
drivers/acpi/event.c			: printk error
fs/dquot.c				: printk error (if err != -ESRCH)
net/wireless/nl80211.c			: ignores return value

In short, I think that the change that I'm proposing would also require
to fix some netlink_broadcast() clients to skip ENOBUFS errors: they are
not meaningful for them since they assume that Netlink is unreliable and
so the return value does not provide any useful information.

Please, let me know how crazy this idea is ;).

Comments

David Miller Feb. 2, 2009, 10:05 p.m. UTC | #1
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Sun, 01 Feb 2009 14:33:57 +0100

> In short, I think that the change that I'm proposing would also require
> to fix some netlink_broadcast() clients to skip ENOBUFS errors: they are
> not meaningful for them since they assume that Netlink is unreliable and
> so the return value does not provide any useful information.

I think this analysis is accurate.

Please proceed to do the netlink_broadcast() client changes
and then submit those with this patch and I'll queue it all
up for net-next-2.6

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Inaky Perez-Gonzalez Feb. 2, 2009, 10:35 p.m. UTC | #2
On Sunday 01 February 2009, Pablo Neira Ayuso wrote:

> == genlmsg_multicast (calls nlmsg_multicast) ==
> net/netlink/genetlink.c			: ignores return value
> drivers/acpi/event.c			: printk error
> fs/dquot.c				: printk error (if err != -ESRCH)
> net/wireless/nl80211.c			: ignores return value
>
> In short, I think that the change that I'm proposing would also require
> to fix some netlink_broadcast() clients to skip ENOBUFS errors: they are
> not meaningful for them since they assume that Netlink is unreliable and
> so the return value does not provide any useful information.

You are missing a few callers in net/wimax. Which kernel version did you
do the analysis on?
Pablo Neira Ayuso Feb. 3, 2009, 10:07 a.m. UTC | #3
Hi Iñaky,

Inaky Perez-Gonzalez wrote:
> On Sunday 01 February 2009, Pablo Neira Ayuso wrote:
> 
>> == genlmsg_multicast (calls nlmsg_multicast) ==
>> net/netlink/genetlink.c			: ignores return value
>> drivers/acpi/event.c			: printk error
>> fs/dquot.c				: printk error (if err != -ESRCH)
>> net/wireless/nl80211.c			: ignores return value
>>
>> In short, I think that the change that I'm proposing would also require
>> to fix some netlink_broadcast() clients to skip ENOBUFS errors: they are
>> not meaningful for them since they assume that Netlink is unreliable and
>> so the return value does not provide any useful information.
> 
> You are missing a few callers in net/wimax. Which kernel version did you
> do the analysis on?

I was using Patrick's tree (nf-2.6.git) which did not contain net/wimax
yet. I'm going to re-check against David's tree and let you know.
Patrick McHardy Feb. 9, 2009, 2:17 p.m. UTC | #4
David Miller wrote:
> From: Pablo Neira Ayuso <pablo@netfilter.org>
> Date: Sun, 01 Feb 2009 14:33:57 +0100
> 
>> In short, I think that the change that I'm proposing would also require
>> to fix some netlink_broadcast() clients to skip ENOBUFS errors: they are
>> not meaningful for them since they assume that Netlink is unreliable and
>> so the return value does not provide any useful information.
> 
> I think this analysis is accurate.

We have at least one case where the caller wants to know of
any successful delivery. Keymanager queries done by xfrm_state
want to know whether an acquire was delivered to any keymanager.
So we need to continue to indicate this, maybe using a different
errno code than -ENOBUFS. I don't have a suggestion which one to
use though.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso Feb. 9, 2009, 10:51 p.m. UTC | #5
Patrick McHardy wrote:
> David Miller wrote:
>> From: Pablo Neira Ayuso <pablo@netfilter.org>
>> Date: Sun, 01 Feb 2009 14:33:57 +0100
>>
>>> In short, I think that the change that I'm proposing would also require
>>> to fix some netlink_broadcast() clients to skip ENOBUFS errors: they are
>>> not meaningful for them since they assume that Netlink is unreliable and
>>> so the return value does not provide any useful information.
>>
>> I think this analysis is accurate.
> 
> We have at least one case where the caller wants to know of
> any successful delivery. Keymanager queries done by xfrm_state
> want to know whether an acquire was delivered to any keymanager.
> So we need to continue to indicate this, maybe using a different
> errno code than -ENOBUFS. I don't have a suggestion which one to
> use though.

Indeed, I have missed that spot. I'm not very familiar with that code,
however, I see that the creation of a state depends on the netlink
broadcast return value, but how useful is that? I think that the state
should be created even if the broadcast fails, the userspace daemon
should request a resync to the kernel as soon as it hits ENOBUFS, then
it would be in sync again with that state.
Patrick McHardy Feb. 9, 2009, 11:23 p.m. UTC | #6
Pablo Neira Ayuso wrote:
> Patrick McHardy wrote:
>> David Miller wrote:
>>> From: Pablo Neira Ayuso <pablo@netfilter.org>
>>> Date: Sun, 01 Feb 2009 14:33:57 +0100
>>>
>>>> In short, I think that the change that I'm proposing would also require
>>>> to fix some netlink_broadcast() clients to skip ENOBUFS errors: they are
>>>> not meaningful for them since they assume that Netlink is unreliable and
>>>> so the return value does not provide any useful information.
>>> I think this analysis is accurate.
>> We have at least one case where the caller wants to know of
>> any successful delivery. Keymanager queries done by xfrm_state
>> want to know whether an acquire was delivered to any keymanager.
>> So we need to continue to indicate this, maybe using a different
>> errno code than -ENOBUFS. I don't have a suggestion which one to
>> use though.
> 
> Indeed, I have missed that spot. I'm not very familiar with that code,
> however, I see that the creation of a state depends on the netlink
> broadcast return value, but how useful is that? I think that the state
> should be created even if the broadcast fails, the userspace daemon
> should request a resync to the kernel as soon as it hits ENOBUFS, then
> it would be in sync again with that state.

The idea is that the kernel is performing an active query. I agree
that there's nothing wrong with installing the SA and indicating the
error to userspace. Userspace could dump the SADB and look for new
larval states, however thats unlikely to be very useful since once
an overflow occurs, you probably have a lot of states.

But unless I'm missing something, there's nothing wrong with this
as long as the error is ignored. The fact that something was received
by some listener doesn't have any meaning anyways, it might have
been "ip monitor". Which somehow raises doubt about your proposed
interface change though, I think anything that wants a reliable
answer whether a packet was delivered to a process handling it
appropriately should use unicast.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 480184a..26e1a89 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -943,6 +943,7 @@  struct netlink_broadcast_data {
 	u32 pid;
 	u32 group;
 	int failure;
+	int delivery_failure;
 	int congested;
 	int delivered;
 	gfp_t allocation;
@@ -992,6 +993,7 @@  static inline int do_one_broadcast(struct sock *sk,
 		p->skb2 = NULL;
 	} else if ((val = netlink_broadcast_deliver(sk, p->skb2)) < 0) {
 		netlink_overrun(sk);
+		p->delivery_failure = 1;
 	} else {
 		p->congested |= val;
 		p->delivered = 1;
@@ -1018,6 +1020,7 @@  int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, u32 pid,
 	info.pid = pid;
 	info.group = group;
 	info.failure = 0;
+	info.delivery_failure = 0;
 	info.congested = 0;
 	info.delivered = 0;
 	info.allocation = allocation;
@@ -1038,13 +1041,14 @@  int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, u32 pid,
 	if (info.skb2)
 		kfree_skb(info.skb2);
 
+	if (info.delivery_failure || info.failure)
+		return -ENOBUFS;
+
 	if (info.delivered) {
 		if (info.congested && (allocation & __GFP_WAIT))
 			yield();
 		return 0;
 	}
-	if (info.failure)
-		return -ENOBUFS;
 	return -ESRCH;
 }
 EXPORT_SYMBOL(netlink_broadcast);