diff mbox

[5/7] net: add netfilter ingress hook

Message ID 20150410201719.GC5968@salvia
State RFC
Delegated to: Pablo Neira
Headers show

Commit Message

Pablo Neira Ayuso April 10, 2015, 8:17 p.m. UTC
On Fri, Apr 10, 2015 at 02:36:11PM +0100, Patrick McHardy wrote:
> On 10.04, Thomas Graf wrote:
> > On 04/10/15 at 02:15pm, Pablo Neira Ayuso wrote:
> > >  static int __netif_receive_skb_ingress(struct sk_buff *skb, bool pfmemalloc,
> > >  				       struct net_device *orig_dev)
> > >  {
> > > @@ -3772,6 +3800,8 @@ skip_taps:
> > >  	if (!skb)
> > >  		return NET_RX_DROP;
> > >  #endif
> > > +	if (nf_hook_ingress_active(skb))
> > > +		return nf_hook_ingress(skb, pt_prev, orig_dev, pfmemalloc);
> > >  
> > >  	return __netif_receive_skb_finish(skb, pfmemalloc, pt_prev, orig_dev);
> > >  }
> > 
> > I would favour if we avoid for every subsystem to manage its ingress
> > filter pointers in net_device. From a net_device perspective, all it
> > takes is a single pointer which points to a single linked list of
> > filters which need to be run through. These entries could represent
> > an ingress qdisc or a netfilter chain or something else (L2 ingress
> > qdisc?).
> 
> I'm wondering if the hook is the right abstraction at all. Netfilter hooks
> require async resumption (okfn) support, which is why all the refactoring is
> needed. Is that something that we need for NF_PROTO_NETDEV? For ingress
> userspace queueing *might* actually work if the missing pieces are added,
> but for offloaded rules it obviously can not work.

For userspace queueing from ingress we still have to call
skb_share_check() and hold a reference to orig_dev from the escape
path. But this support is still missing in nf_tables (actually, we
only support NFPROTO_IPV4 and NFPROTO_IPV6 at this moment, see patch
attached). Regarding offload, this path will not see any packet.

Comments

Patrick McHardy April 10, 2015, 9:33 p.m. UTC | #1
On 10.04, Pablo Neira Ayuso wrote:
> On Fri, Apr 10, 2015 at 02:36:11PM +0100, Patrick McHardy wrote:
> > 
> > I'm wondering if the hook is the right abstraction at all. Netfilter hooks
> > require async resumption (okfn) support, which is why all the refactoring is
> > needed. Is that something that we need for NF_PROTO_NETDEV? For ingress
> > userspace queueing *might* actually work if the missing pieces are added,
> > but for offloaded rules it obviously can not work.
> 
> For userspace queueing from ingress we still have to call
> skb_share_check() and hold a reference to orig_dev from the escape
> path. But this support is still missing in nf_tables (actually, we
> only support NFPROTO_IPV4 and NFPROTO_IPV6 at this moment, see patch
> attached). Regarding offload, this path will not see any packet.

We do support all families using the regular NF_QUEUE verdict of course.
But yes, nf_queue.c will simply drop packets that don't have a netfilter
AF registered.

But my question is whether queueing is something that is even worth
considering for the NFPROTO_NETDEV family. As I said, it will at best
work for ingress anyways and that will actually be more tricky than just
calling skb_share_check(), we need to take care of keeping valid
references to all the data you currently store in the CB, including the
packet_type, the device, things attached to the skb at this point to
the stack etc.

If we decide not to support queueing for this family we don't have to
use netfilter hooks for this and all the refactoring for async resume
becomes unnecessary. 

> >From db2fba74dea98b69ee7615fca86b9847bc42887f Mon Sep 17 00:00:00 2001
> From: Pablo Neira Ayuso <pablo@netfilter.org>
> Date: Fri, 10 Apr 2015 21:40:58 +0200
> Subject: [PATCH] netfilter: nf_tables: restrict nft_queue to AF_INET and
>  AF_INET6
> 
> Other families need the corresponding struct nf_afinfo in place to work.
> Restrict it to NFPROTO_IPV4 and NFPROTO_IPV6 until the necessary code is in
> place.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso April 11, 2015, 12:55 p.m. UTC | #2
On Fri, Apr 10, 2015 at 10:33:12PM +0100, Patrick McHardy wrote:
> On 10.04, Pablo Neira Ayuso wrote:
> > On Fri, Apr 10, 2015 at 02:36:11PM +0100, Patrick McHardy wrote:
> > > 
> > > I'm wondering if the hook is the right abstraction at all. Netfilter hooks
> > > require async resumption (okfn) support, which is why all the refactoring is
> > > needed. Is that something that we need for NF_PROTO_NETDEV? For ingress
> > > userspace queueing *might* actually work if the missing pieces are added,
> > > but for offloaded rules it obviously can not work.
> > 
> > For userspace queueing from ingress we still have to call
> > skb_share_check() and hold a reference to orig_dev from the escape
> > path. But this support is still missing in nf_tables (actually, we
> > only support NFPROTO_IPV4 and NFPROTO_IPV6 at this moment, see patch
> > attached). Regarding offload, this path will not see any packet.
> 
> We do support all families using the regular NF_QUEUE verdict of course.
> But yes, nf_queue.c will simply drop packets that don't have a netfilter
> AF registered.
> 
> But my question is whether queueing is something that is even worth
> considering for the NFPROTO_NETDEV family. As I said, it will at best
> work for ingress anyways and that will actually be more tricky than just
> calling skb_share_check(), we need to take care of keeping valid
> references to all the data you currently store in the CB, including the
> packet_type, the device, things attached to the skb at this point to
> the stack etc.

I think we only need to hold the reference on orig_dev. The pt_prev
pointer in skb CB can actually be removed. Other things attached to
the skb we already handle this from nf_queue to make sure they don't
vanish.

> If we decide not to support queueing for this family we don't have to
> use netfilter hooks for this and all the refactoring for async resume
> becomes unnecessary.

I think the refactoring is worth. Have a look at the current state of
this function. It has grown with features along time and it got many
gotos that force you travel back and forth when reading this code.

Regarding the nf_queue support at ingress, I don't see any major
technical obstacule at this moment to support this and I think that
existing programs that inspect traffic from userspace can benefit from
this feature (eg. IPS).
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy April 11, 2015, 1:06 p.m. UTC | #3
On 11.04, Pablo Neira Ayuso wrote:
> On Fri, Apr 10, 2015 at 10:33:12PM +0100, Patrick McHardy wrote:
> > On 10.04, Pablo Neira Ayuso wrote:
> > > On Fri, Apr 10, 2015 at 02:36:11PM +0100, Patrick McHardy wrote:
> > We do support all families using the regular NF_QUEUE verdict of course.
> > But yes, nf_queue.c will simply drop packets that don't have a netfilter
> > AF registered.
> > 
> > But my question is whether queueing is something that is even worth
> > considering for the NFPROTO_NETDEV family. As I said, it will at best
> > work for ingress anyways and that will actually be more tricky than just
> > calling skb_share_check(), we need to take care of keeping valid
> > references to all the data you currently store in the CB, including the
> > packet_type, the device, things attached to the skb at this point to
> > the stack etc.
> 
> I think we only need to hold the reference on orig_dev. The pt_prev
> pointer in skb CB can actually be removed. Other things attached to
> the skb we already handle this from nf_queue to make sure they don't
> vanish.

Are you sure? What about removable protocols or packet sockets?

> > If we decide not to support queueing for this family we don't have to
> > use netfilter hooks for this and all the refactoring for async resume
> > becomes unnecessary.
> 
> I think the refactoring is worth. Have a look at the current state of
> this function. It has grown with features along time and it got many
> gotos that force you travel back and forth when reading this code.
> 
> Regarding the nf_queue support at ingress, I don't see any major
> technical obstacule at this moment to support this and I think that
> existing programs that inspect traffic from userspace can benefit from
> this feature (eg. IPS).

Yeah, that might be useful, although they seem to be pretty fine with
getting only IPv4 and IPv6. I guess ARP might be interesting as well,
but we also have hooks for that already.

Regarding the refactoring, there seem to be concerns about performance
impact. My suggestions would be to use nf_hook(), make sure no queueing
can happen and therefore no okfn invocations and then you can simply
add this as a function call to the existing code without the need for
any refactoring or storing state.

You don't loose anything, it only massively simplifies the patches. If
queuing supported is added, you can still change it.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso April 11, 2015, 1:32 p.m. UTC | #4
On Sat, Apr 11, 2015 at 02:06:48PM +0100, Patrick McHardy wrote:
> On 11.04, Pablo Neira Ayuso wrote:
> > On Fri, Apr 10, 2015 at 10:33:12PM +0100, Patrick McHardy wrote:
> > > On 10.04, Pablo Neira Ayuso wrote:
> > > > On Fri, Apr 10, 2015 at 02:36:11PM +0100, Patrick McHardy wrote:
> > > We do support all families using the regular NF_QUEUE verdict of course.
> > > But yes, nf_queue.c will simply drop packets that don't have a netfilter
> > > AF registered.
> > > 
> > > But my question is whether queueing is something that is even worth
> > > considering for the NFPROTO_NETDEV family. As I said, it will at best
> > > work for ingress anyways and that will actually be more tricky than just
> > > calling skb_share_check(), we need to take care of keeping valid
> > > references to all the data you currently store in the CB, including the
> > > packet_type, the device, things attached to the skb at this point to
> > > the stack etc.
> > 
> > I think we only need to hold the reference on orig_dev. The pt_prev
> > pointer in skb CB can actually be removed. Other things attached to
> > the skb we already handle this from nf_queue to make sure they don't
> > vanish.
> 
> Are you sure? What about removable protocols or packet sockets?

pt_prev will be always NULL if we enter the netfilter ingress hook, so
no need to store it.

> > > If we decide not to support queueing for this family we don't have to
> > > use netfilter hooks for this and all the refactoring for async resume
> > > becomes unnecessary.
> > 
> > I think the refactoring is worth. Have a look at the current state of
> > this function. It has grown with features along time and it got many
> > gotos that force you travel back and forth when reading this code.
> > 
> > Regarding the nf_queue support at ingress, I don't see any major
> > technical obstacule at this moment to support this and I think that
> > existing programs that inspect traffic from userspace can benefit from
> > this feature (eg. IPS).
> 
> Yeah, that might be useful, although they seem to be pretty fine with
> getting only IPv4 and IPv6. I guess ARP might be interesting as well,
> but we also have hooks for that already.

For security applications, I guess they will be happy to get pretty
much everything that they can inspect.

> Regarding the refactoring, there seem to be concerns about performance
> impact. My suggestions would be to use nf_hook(), make sure no queueing
> can happen and therefore no okfn invocations and then you can simply
> add this as a function call to the existing code without the need for
> any refactoring or storing state.

I'll come back with numbers and more feedback anyway.

> You don't loose anything, it only massively simplifies the patches. If
> queuing supported is added, you can still change it.

I'll explore this, this seems like a good alternative if performance
becomes a real issue.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From db2fba74dea98b69ee7615fca86b9847bc42887f Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Fri, 10 Apr 2015 21:40:58 +0200
Subject: [PATCH] netfilter: nf_tables: restrict nft_queue to AF_INET and
 AF_INET6

Other families need the corresponding struct nf_afinfo in place to work.
Restrict it to NFPROTO_IPV4 and NFPROTO_IPV6 until the necessary code is in
place.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_queue.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nft_queue.c b/net/netfilter/nft_queue.c
index e8ae2f6..42ca976 100644
--- a/net/netfilter/nft_queue.c
+++ b/net/netfilter/nft_queue.c
@@ -129,4 +129,5 @@  module_exit(nft_queue_module_exit);
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Eric Leblond <eric@regit.org>");
-MODULE_ALIAS_NFT_EXPR("queue");
+MODULE_ALIAS_NFT_AF_EXPR(AF_INET, "queue");
+MODULE_ALIAS_NFT_AF_EXPR(AF_INET6, "queue");
-- 
1.7.10.4