diff mbox

PF_RING: Include in main line kernel?

Message ID 4AD64251.50903@candelatech.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Ben Greear Oct. 14, 2009, 9:27 p.m. UTC
On 10/14/2009 01:36 PM, Evgeniy Polyakov wrote:
> On Wed, Oct 14, 2009 at 10:17:30PM +0200, Luca Deri (deri@ntop.org) wrote:
>> The reason why I decided to patch dev.c is because I wanted PF_RING to
>> decide whether the packet journey shall continue or not. In other
>> words with my solution PF_RING applications can decide whether the
>> received packets will also be delivered to upper layers (and to other
>> kernel network components). This configurable 'early packet drop'
>> allows the overall performance to be significantly increased as
>> received packets are not supposed to be delivered to upper layers;
>> this is a typical situations for many monitoring devices.
>
> This is a feature many projects implement themself indeed. What about
> creating special return value from the packet handler which will
> indicate that packet was already consumed and no further work should be
> done on it?

Maybe something similar to the attached patch?

Thanks,
Ben

Comments

Luca Deri Oct. 14, 2009, 9:34 p.m. UTC | #1
Ben
the patch implements what I looked for.

Thanks Luca

On Oct 14, 2009, at 11:27 PM, Ben Greear wrote:

> On 10/14/2009 01:36 PM, Evgeniy Polyakov wrote:
>> On Wed, Oct 14, 2009 at 10:17:30PM +0200, Luca Deri (deri@ntop.org)  
>> wrote:
>>> The reason why I decided to patch dev.c is because I wanted  
>>> PF_RING to
>>> decide whether the packet journey shall continue or not. In other
>>> words with my solution PF_RING applications can decide whether the
>>> received packets will also be delivered to upper layers (and to  
>>> other
>>> kernel network components). This configurable 'early packet drop'
>>> allows the overall performance to be significantly increased as
>>> received packets are not supposed to be delivered to upper layers;
>>> this is a typical situations for many monitoring devices.
>>
>> This is a feature many projects implement themself indeed. What about
>> creating special return value from the packet handler which will
>> indicate that packet was already consumed and no further work  
>> should be
>> done on it?
>
> Maybe something similar to the attached patch?
>
> Thanks,
> Ben
>
> -- 
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
>
> <patch0.patch>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 14, 2009, 9:49 p.m. UTC | #2
From: Ben Greear <greearb@candelatech.com>
Date: Wed, 14 Oct 2009 14:27:45 -0700

> Maybe something similar to the attached patch?

This is not something I'm interested in applying.

It makes implementing proprietary complete networking stacks
for Linux way too easy.

Instead I'd rather have a GPL exported function that allows indication
of consumption somehow.  That's why we have a special hook for
bonding, so it cannot be abused in proprietary modules.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Greear Oct. 14, 2009, 11:29 p.m. UTC | #3
On 10/14/2009 02:49 PM, David Miller wrote:
> From: Ben Greear<greearb@candelatech.com>
> Date: Wed, 14 Oct 2009 14:27:45 -0700
>
>> Maybe something similar to the attached patch?
>
> This is not something I'm interested in applying.
>
> It makes implementing proprietary complete networking stacks
> for Linux way too easy.
 >
> Instead I'd rather have a GPL exported function that allows indication
> of consumption somehow.

This would mean one hard-coded hook for every application that wanted
this feature, or is there some way to have a gpl_ptype_all?

Thanks,
Ben
Evgeniy Polyakov Oct. 15, 2009, 7:02 a.m. UTC | #4
On Wed, Oct 14, 2009 at 02:49:23PM -0700, David Miller (davem@davemloft.net) wrote:
> > Maybe something similar to the attached patch?
> 
> This is not something I'm interested in applying.
> 
> It makes implementing proprietary complete networking stacks
> for Linux way too easy.

Such kernels will be tainted and thus its bugs and problems will be
clearly indicated by the proprietary module.

> Instead I'd rather have a GPL exported function that allows indication
> of consumption somehow.  That's why we have a special hook for
> bonding, so it cannot be abused in proprietary modules.

I believe bonding with its hook is a historical heritage and priority
absence in packet hooks which do not currently allow something to be
registered very first and steal packets from the stack.

Looks like bonding could be implemented as a packet handler with Ben's
patch applied, isn't it?
David Miller Oct. 15, 2009, 7:22 a.m. UTC | #5
From: Evgeniy Polyakov <zbr@ioremap.net>
Date: Thu, 15 Oct 2009 11:02:34 +0400

> On Wed, Oct 14, 2009 at 02:49:23PM -0700, David Miller (davem@davemloft.net) wrote:
>> Instead I'd rather have a GPL exported function that allows indication
>> of consumption somehow.  That's why we have a special hook for
>> bonding, so it cannot be abused in proprietary modules.
> 
> I believe bonding with its hook is a historical heritage and priority
> absence in packet hooks which do not currently allow something to be
> registered very first and steal packets from the stack.

It is in same category as ingress packet scheduler hook.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Greear Oct. 15, 2009, 4:33 p.m. UTC | #6
On 10/15/2009 12:02 AM, Evgeniy Polyakov wrote:
> On Wed, Oct 14, 2009 at 02:49:23PM -0700, David Miller (davem@davemloft.net) wrote:
>>> Maybe something similar to the attached patch?
>>
>> This is not something I'm interested in applying.
>>
>> It makes implementing proprietary complete networking stacks
>> for Linux way too easy.
>
> Such kernels will be tainted and thus its bugs and problems will be
> clearly indicated by the proprietary module.
>
>> Instead I'd rather have a GPL exported function that allows indication
>> of consumption somehow.  That's why we have a special hook for
>> bonding, so it cannot be abused in proprietary modules.
>
> I believe bonding with its hook is a historical heritage and priority
> absence in packet hooks which do not currently allow something to be
> registered very first and steal packets from the stack.
>
> Looks like bonding could be implemented as a packet handler with Ben's
> patch applied, isn't it?

Bonding, bridging, mac-vlans, pktgen-rx logic, sniffers, and others could.  The only trick is
ordering...it may be better to keep the hard-coded hooks in a definite
order than to allow users to switch them around.

Or, could have a precedence value when adding hooks (protocols) so that modules
can properly order themselves by default, but users *could* reorder
them if they really wanted to (might be useful to have mac-vlans before
bonding some of the time, and after it others, for instance...)

Thanks,
Ben
Harald Welte Oct. 18, 2009, 12:43 p.m. UTC | #7
Hi Dave,

On Wed, Oct 14, 2009 at 02:49:23PM -0700, David Miller wrote:

> > Maybe something similar to the attached patch?
> 
> This is not something I'm interested in applying.
> 
> It makes implementing proprietary complete networking stacks
> for Linux way too easy.

How does it make it any easier?  Even right now you can implement an entire
protocol family in your own module, either by registering as netpoll handler,
or even using the regular dev_add_pack().
Harald Welte Oct. 18, 2009, 12:45 p.m. UTC | #8
Hi Ben,

On Thu, Oct 15, 2009 at 09:33:53AM -0700, Ben Greear wrote:

> Bonding, bridging, mac-vlans, pktgen-rx logic, sniffers, and others could.
> The only trick is ordering...it may be better to keep the hard-coded hooks in
> a definite order than to allow users to switch them around.

why does this sound like the netfilter hooks with their priorities to me?  The
only difference is that netfilter hooks are inside the layer 3 protocols at the
moment, whereas this is one layer lower...
Evgeniy Polyakov Oct. 18, 2009, 2:18 p.m. UTC | #9
On Sun, Oct 18, 2009 at 02:43:37PM +0200, Harald Welte (laforge@gnumonks.org) wrote:
> How does it make it any easier?  Even right now you can implement an entire
> protocol family in your own module, either by registering as netpoll handler,
> or even using the regular dev_add_pack().

Well, it does, since packet will be processed by the main stack after
that, and module will work with the copy only. But I agree that this is
a weak argument.

If it is still a blocking one, what about implementing additional
gpl-only list of handlers which will have 'consumed' skb check? I
believe it would be enough to put it only in single place after the
bridge?
diff mbox

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 907d118..da78f0a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -78,6 +78,7 @@  struct wireless_dev;
 #define NET_RX_CN_MOD		3   /* Storm on its way! */
 #define NET_RX_CN_HIGH		4   /* The storm is here */
 #define NET_RX_BAD		5  /* packet dropped due to kernel error */
+#define NET_RX_CONSUMED       6 /* pkt is consumed, stop rx logic here. */
 
 /* NET_XMIT_CN is special. It does not guarantee that this packet is lost. It
  * indicates that the device will soon be dropping packets, or already drops
diff --git a/net/core/dev.c b/net/core/dev.c
index 0101178..d5024b9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2105,6 +2105,10 @@  static inline struct sk_buff *handle_bridge(struct sk_buff *skb,
 	if (*pt_prev) {
 		*ret = deliver_skb(skb, *pt_prev, orig_dev);
 		*pt_prev = NULL;
+		if (*ret == NET_RX_CONSUMED) {
+			kfree_skb(skb); /* we made a copy in deliver_skb */
+			return NULL;
+		}
 	}
 
 	return br_handle_frame_hook(port, skb);
@@ -2128,6 +2132,10 @@  static inline struct sk_buff *handle_macvlan(struct sk_buff *skb,
 	if (*pt_prev) {
 		*ret = deliver_skb(skb, *pt_prev, orig_dev);
 		*pt_prev = NULL;
+		if (*ret == NET_RX_CONSUMED) {
+			kfree_skb(skb); /* we made a copy in deliver_skb */
+			return NULL;
+		}
 	}
 	return macvlan_handle_frame_hook(skb);
 }
@@ -2185,6 +2193,10 @@  static inline struct sk_buff *handle_ing(struct sk_buff *skb,
 	if (*pt_prev) {
 		*ret = deliver_skb(skb, *pt_prev, orig_dev);
 		*pt_prev = NULL;
+		if (*ret == NET_RX_CONSUMED) {
+			kfree_skb(skb); /* we made a copy in deliver_skb */
+			return NULL;
+		}
 	} else {
 		/* Huh? Why does turning on AF_PACKET affect this? */
 		skb->tc_verd = SET_TC_OK2MUNGE(skb->tc_verd);
@@ -2300,8 +2312,13 @@  int netif_receive_skb(struct sk_buff *skb)
 	list_for_each_entry_rcu(ptype, &ptype_all, list) {
 		if (ptype->dev == null_or_orig || ptype->dev == skb->dev ||
 		    ptype->dev == orig_dev) {
-			if (pt_prev)
+			if (pt_prev) {
 				ret = deliver_skb(skb, pt_prev, orig_dev);
+				if (ret == NET_RX_CONSUMED) {
+					kfree_skb(skb); /* we made a copy in deliver_skb */
+					goto out;
+				}
+			}
 			pt_prev = ptype;
 		}
 	}
@@ -2336,8 +2353,13 @@  ncls:
 		if (ptype->type == type &&
 		    (ptype->dev == null_or_orig || ptype->dev == skb->dev ||
 		     ptype->dev == orig_dev)) {
-			if (pt_prev)
+			if (pt_prev) {
 				ret = deliver_skb(skb, pt_prev, orig_dev);
+				if (ret == NET_RX_CONSUMED) {
+					kfree_skb(skb); /* we made a copy in deliver_skb */
+					goto out;
+				}
+			}
 			pt_prev = ptype;
 		}
 	}