diff mbox

[RFC,0/3] Allow postponed netfilter handling for socket matches

Message ID 560035B4.9010504@zonque.org
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Daniel Mack Sept. 21, 2015, 4:52 p.m. UTC
Hi,

Thanks for your feedback, Florian!

On 09/17/2015 06:00 PM, Florian Westphal wrote:
> Daniel Mack <daniel@zonque.org> wrote:

>> That would be a new netfilter hook then, something that is called after
>> LOCAL_IN, for ingress only? In a sense, it would be called from the
>> protocol handlers, just as my patches do right now, but instead of
>> conditionally re-iterating the same rules again, we would walk a
>> different chain?
> 
> Yes, something like that.  Obviously, you'll need to dru^W brib^W
> convince a LOT of people before that could ever fly.
> 
> I think we should not do this and that this 'match on ingress sk
> properties' is just bad[tm].
> 
> f.e. you'd also have to move all of the stuff you want into
> sock_common ... 8-(

Hmm, I'm not sure whether I understand which problems you see, or which
corner cases I am missing in my assessment. I did a quick test with the
attached 4 patches that

1) Allow hook callbacks to look at the socket passed to nf_hook(), so
   skb->sk does not have to be set

2) Make nft_meta look at pkt->sk rather that skb->sk (only for cgroups
   as proof of concept)

3) Introduce a new POST_DEMUX netfilter chain (the name is not
   perfect, admittedly)

4) Iterate POST_DEMUX chains for v4 TCP and UDP unicast+multicast
   sockets.


With some really trivial modifications to libnftnl/nftables (which just
map strings to the new enum value), this works fine in my tests.
Multicast receivers that match a netclass ID in the ruleset won't see
any packets, while others do.

Some more considerations: if we cannot determine a socket for a packet
and hence don't deliver it, it's IMO perfectly fine not to run the
netfilter rules for them. All we need to achieve with this chain is that
for packets that _are_ delivered to a socket, all the necessary rules
have been processed, at a time when we know who the final receiver of
the skb is.

I'm happy to discuss the side effects of such an approach.


Thanks,
Daniel

Comments

Florian Westphal Sept. 21, 2015, 7:05 p.m. UTC | #1
Daniel Mack <daniel@zonque.org> wrote:
> >> LOCAL_IN, for ingress only? In a sense, it would be called from the
> >> protocol handlers, just as my patches do right now, but instead of
> >> conditionally re-iterating the same rules again, we would walk a
> >> different chain?
> > 
> > Yes, something like that.  Obviously, you'll need to dru^W brib^W
> > convince a LOT of people before that could ever fly.
> > 
> > I think we should not do this and that this 'match on ingress sk
> > properties' is just bad[tm].
> > 
> > f.e. you'd also have to move all of the stuff you want into
> > sock_common ... 8-(
> 
> Hmm, I'm not sure whether I understand which problems you see, or which
> corner cases I am missing in my assessment. I did a quick test with the
> attached 4 patches that
> 
> 1) Allow hook callbacks to look at the socket passed to nf_hook(), so
>    skb->sk does not have to be set
> 
> 2) Make nft_meta look at pkt->sk rather that skb->sk (only for cgroups
>    as proof of concept)
> 
> 3) Introduce a new POST_DEMUX netfilter chain (the name is not
>    perfect, admittedly)
> 
> 4) Iterate POST_DEMUX chains for v4 TCP and UDP unicast+multicast
>    sockets.
> 
> With some really trivial modifications to libnftnl/nftables (which just
> map strings to the new enum value), this works fine in my tests.
> Multicast receivers that match a netclass ID in the ruleset won't see
> any packets, while others do.
> 
> Some more considerations: if we cannot determine a socket for a packet
> and hence don't deliver it, it's IMO perfectly fine not to run the
> netfilter rules for them. All we need to achieve with this chain is that
> for packets that _are_ delivered to a socket, all the necessary rules
> have been processed, at a time when we know who the final receiver of
> the skb is.

Not sure if thats true.  What about Timewait sockets?

Its easy to imagine someone using this feature and then complaining
that it doesn't match some packets, at which point we'd have to grow
sock_common to accomondate all sk member we support matching for :-/

If we'd have kernel releases where we drop features this wouldn't be
much of an issue since we could back out in case it causes issues later.

But once we add your proposed feature we cannot go back...

I'm not sure; I dislike this feature proposal but I can't think of
any alternative [other than "don't do this"] :-(
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From 1898df7d6a35967972bae412994623a8d7c262cd Mon Sep 17 00:00:00 2001
From: Daniel Mack <daniel@zonque.org>
Date: Wed, 16 Sep 2015 14:58:08 +0200
Subject: [PATCH RFC 4/4] net: tcp_ipv4, udp_ipv4: hook up post demux netfilter
 chains

Run the POST_DEMUX netfilter chain rules after the destination socket
has been looked up.

Signed-off-by: Daniel Mack <daniel@zonque.org>
---
 net/ipv4/tcp_ipv4.c |  8 ++++++++
 net/ipv4/udp.c      | 15 +++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 93898e0..33f968e 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -78,6 +78,7 @@ 
 
 #include <linux/inet.h>
 #include <linux/ipv6.h>
+#include <linux/netfilter.h>
 #include <linux/stddef.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
@@ -1594,6 +1595,13 @@  int tcp_v4_rcv(struct sk_buff *skb)
 	if (!sk)
 		goto no_tcp_socket;
 
+	ret = nf_hook(NFPROTO_IPV4, NF_INET_POST_DEMUX, sk,
+		      skb, skb->dev, NULL, NULL);
+	if (ret != 1) {
+		sock_put(sk);
+		return 0;
+	}
+
 process:
 	if (sk->sk_state == TCP_TIME_WAIT)
 		goto do_time_wait;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c0a15e7..0056c20 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -97,6 +97,7 @@ 
 #include <linux/mm.h>
 #include <linux/inet.h>
 #include <linux/netdevice.h>
+#include <linux/netfilter.h>
 #include <linux/slab.h>
 #include <net/tcp_states.h>
 #include <linux/skbuff.h>
@@ -1632,7 +1633,14 @@  static void flush_stack(struct sock **stack, unsigned int count,
 	struct sock *sk;
 
 	for (i = 0; i < count; i++) {
+		int ret;
 		sk = stack[i];
+
+		ret = nf_hook(NFPROTO_IPV4, NF_INET_POST_DEMUX, sk,
+			      skb, skb->dev, NULL, NULL);
+		if (ret != 1)
+			continue;
+
 		if (likely(!skb1))
 			skb1 = (i == final) ? skb : skb_clone(skb, GFP_ATOMIC);
 
@@ -1819,6 +1827,13 @@  int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	if (sk) {
 		int ret;
 
+		ret = nf_hook(NFPROTO_IPV4, NF_INET_POST_DEMUX, sk,
+			      skb, skb->dev, NULL, NULL);
+		if (ret != 1) {
+			sock_put(sk);
+			return 0;
+		}
+
 		if (inet_get_convert_csum(sk) && uh->check && !IS_UDPLITE(sk))
 			skb_checksum_try_convert(skb, IPPROTO_UDP, uh->check,
 						 inet_compute_pseudo);
-- 
2.5.0