diff mbox

net/bridge: use the maximum hard_header_len of ports for bridging device

Message ID 20090323.152028.205335400.davem@davemloft.net
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

David Miller March 23, 2009, 10:20 p.m. UTC
From: Stephen Hemminger <shemminger@linux-foundation.org>
Date: Mon, 23 Mar 2009 08:51:22 -0700

> That ensures big enough header for locally generated packets, but
> any drivers that need bigger headroom still must handle bridged packets
> that come in with smaller space. When bridging packets, the skb comes
> from the allocation by the receiving driver. Almost all drivers will
> use dev_alloc_skb() which will allocate NET_SKB_PAD (16) bytes of
> additional headroom. This is used to hold copy of ethernet header for
> the bridge/netfilter code.
> 
> So your patch is fine as an optimization but a driver can not safely
> depend on any additional headroom. The driver must check if there
> is space, and if no space is available, reallocate and copy.

We had some plans to deal with this kind of issue for wireless
too.  Let me see if I can find the RFC patch from that discussion...

Here it is, similar code would be added to the ipv4/ipv6 forwarding
paths:

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Stephen Hemminger March 23, 2009, 10:45 p.m. UTC | #1
On Mon, 23 Mar 2009 15:20:28 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Stephen Hemminger <shemminger@linux-foundation.org>
> Date: Mon, 23 Mar 2009 08:51:22 -0700
> 
> > That ensures big enough header for locally generated packets, but
> > any drivers that need bigger headroom still must handle bridged packets
> > that come in with smaller space. When bridging packets, the skb comes
> > from the allocation by the receiving driver. Almost all drivers will
> > use dev_alloc_skb() which will allocate NET_SKB_PAD (16) bytes of
> > additional headroom. This is used to hold copy of ethernet header for
> > the bridge/netfilter code.
> > 
> > So your patch is fine as an optimization but a driver can not safely
> > depend on any additional headroom. The driver must check if there
> > is space, and if no space is available, reallocate and copy.
> 
> We had some plans to deal with this kind of issue for wireless
> too.  Let me see if I can find the RFC patch from that discussion...
> 
> Here it is, similar code would be added to the ipv4/ipv6 forwarding
> paths:
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 7c1d446..6c06fba 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -600,6 +600,7 @@ struct net_device
>   * Cache line mostly used on receive path (including eth_type_trans())
>   */
>  	unsigned long		last_rx;	/* Time of last Rx	*/
> +	unsigned int		rx_alloc_extra;
>  	/* Interface address info used in eth_type_trans() */
>  	unsigned char		dev_addr[MAX_ADDR_LEN];	/* hw address, (before bcast 
>  							because most packets are unicast) */
> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
> index bdd7c35..531e483 100644
> --- a/net/bridge/br_forward.c
> +++ b/net/bridge/br_forward.c
> @@ -42,6 +42,22 @@ int br_dev_queue_push_xmit(struct sk_buff *skb)
>  		if (nf_bridge_maybe_copy_header(skb))
>  			kfree_skb(skb);
>  		else {
> +			unsigned int headroom = skb_headroom(skb);
> +			unsigned int hh_len = LL_RESERVED_SPACE(skb->dev);
> +
> +			if (headroom < hh_len) {
> +				struct net_device *in_dev;
> +				unsigned int extra;
> +
> +				in_dev = __dev_get_by_index(dev_net(skb->dev),
> +							    skb->iif);
> +				BUG_ON(!in_dev);
> +
> +				extra = hh_len - headroom;
> +				if (extra >= in_dev->rx_alloc_extra)
> +					in_dev->rx_alloc_extra = extra;
> +			}

So you dynamically compute the additional space but if the space was
an awkward size, could it cause driver to breaks alignment assumptions?

And you didn't fixup the skb that is about to gag in the skb to make
more space, so transmitting device driver (gfar) is going to overwrite or die.

In summary, good idea, but may not solve the problem
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller March 23, 2009, 10:47 p.m. UTC | #2
From: Stephen Hemminger <shemminger@linux-foundation.org>
Date: Mon, 23 Mar 2009 15:45:17 -0700

> So you dynamically compute the additional space but if the space was
> an awkward size, could it cause driver to breaks alignment assumptions?

Yes, you'd need to 16-byte align or something like that.

> And you didn't fixup the skb that is about to gag in the skb to make
> more space, so transmitting device driver (gfar) is going to overwrite or die.

This particular instance will do the headroom reallocation,
that's unavoidable during the size transition event.

The headroom checks can't ever be removed, but we won't hit
them in the fast path after the adjustment is made by my
patch.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yang Li March 25, 2009, 8:43 a.m. UTC | #3
On Tue, Mar 24, 2009 at 6:20 AM, David Miller <davem@davemloft.net> wrote:
> From: Stephen Hemminger <shemminger@linux-foundation.org>
> Date: Mon, 23 Mar 2009 08:51:22 -0700
>
>> That ensures big enough header for locally generated packets, but
>> any drivers that need bigger headroom still must handle bridged packets
>> that come in with smaller space. When bridging packets, the skb comes
>> from the allocation by the receiving driver. Almost all drivers will
>> use dev_alloc_skb() which will allocate NET_SKB_PAD (16) bytes of
>> additional headroom. This is used to hold copy of ethernet header for
>> the bridge/netfilter code.
>>
>> So your patch is fine as an optimization but a driver can not safely
>> depend on any additional headroom. The driver must check if there
>> is space, and if no space is available, reallocate and copy.
>
> We had some plans to deal with this kind of issue for wireless
> too.  Let me see if I can find the RFC patch from that discussion...
>
> Here it is, similar code would be added to the ipv4/ipv6 forwarding
> paths:
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 7c1d446..6c06fba 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -600,6 +600,7 @@ struct net_device
>  * Cache line mostly used on receive path (including eth_type_trans())
>  */
>        unsigned long           last_rx;        /* Time of last Rx      */
> +       unsigned int            rx_alloc_extra;
>        /* Interface address info used in eth_type_trans() */
>        unsigned char           dev_addr[MAX_ADDR_LEN]; /* hw address, (before bcast
>                                                        because most packets are unicast) */
> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
> index bdd7c35..531e483 100644
> --- a/net/bridge/br_forward.c
> +++ b/net/bridge/br_forward.c
> @@ -42,6 +42,22 @@ int br_dev_queue_push_xmit(struct sk_buff *skb)
>                if (nf_bridge_maybe_copy_header(skb))
>                        kfree_skb(skb);
>                else {
> +                       unsigned int headroom = skb_headroom(skb);
> +                       unsigned int hh_len = LL_RESERVED_SPACE(skb->dev);
> +
> +                       if (headroom < hh_len) {
> +                               struct net_device *in_dev;
> +                               unsigned int extra;
> +
> +                               in_dev = __dev_get_by_index(dev_net(skb->dev),
> +                                                           skb->iif);
> +                               BUG_ON(!in_dev);
> +
> +                               extra = hh_len - headroom;
> +                               if (extra >= in_dev->rx_alloc_extra)
> +                                       in_dev->rx_alloc_extra = extra;
> +                       }
> +
>                        skb_push(skb, ETH_HLEN);
>
>                        dev_queue_xmit(skb);

Dynamically adjusting is a good idea, but the rx_alloc_extra can only
go up not the other way down in your code.  Another thought is that if
you re-allocate skb here the driver would be saved from checking the
headroom in the fastpath, am I right?

- Leo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller March 25, 2009, 9:35 p.m. UTC | #4
From: Li Yang <leoli@freescale.com>
Date: Wed, 25 Mar 2009 16:43:53 +0800

> Dynamically adjusting is a good idea, but the rx_alloc_extra can only
> go up not the other way down in your code.

That's not a problem.

> Another thought is that if you re-allocate skb here the driver would
> be saved from checking the headroom in the fastpath, am I right?

You're going to need it for other paths.  There are other kinds of
tunnels et al. that eat that headroom space on you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7c1d446..6c06fba 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -600,6 +600,7 @@  struct net_device
  * Cache line mostly used on receive path (including eth_type_trans())
  */
 	unsigned long		last_rx;	/* Time of last Rx	*/
+	unsigned int		rx_alloc_extra;
 	/* Interface address info used in eth_type_trans() */
 	unsigned char		dev_addr[MAX_ADDR_LEN];	/* hw address, (before bcast 
 							because most packets are unicast) */
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index bdd7c35..531e483 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -42,6 +42,22 @@  int br_dev_queue_push_xmit(struct sk_buff *skb)
 		if (nf_bridge_maybe_copy_header(skb))
 			kfree_skb(skb);
 		else {
+			unsigned int headroom = skb_headroom(skb);
+			unsigned int hh_len = LL_RESERVED_SPACE(skb->dev);
+
+			if (headroom < hh_len) {
+				struct net_device *in_dev;
+				unsigned int extra;
+
+				in_dev = __dev_get_by_index(dev_net(skb->dev),
+							    skb->iif);
+				BUG_ON(!in_dev);
+
+				extra = hh_len - headroom;
+				if (extra >= in_dev->rx_alloc_extra)
+					in_dev->rx_alloc_extra = extra;
+			}
+
 			skb_push(skb, ETH_HLEN);
 
 			dev_queue_xmit(skb);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5c459f2..74a2515 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -255,11 +255,12 @@  struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
 		unsigned int length, gfp_t gfp_mask)
 {
 	int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1;
+	unsigned int extra = dev->rx_alloc_extra + NET_SKB_PAD;
 	struct sk_buff *skb;
 
-	skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask, 0, node);
+	skb = __alloc_skb(length + extra, gfp_mask, 0, node);
 	if (likely(skb)) {
-		skb_reserve(skb, NET_SKB_PAD);
+		skb_reserve(skb, extra);
 		skb->dev = dev;
 	}
 	return skb;
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index f35eaea..86f0e36 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1562,13 +1562,13 @@  int ieee80211_subif_start_xmit(struct sk_buff *skb,
 	 * be cloned. This could happen, e.g., with Linux bridge code passing
 	 * us broadcast frames. */
 
-	if (head_need > 0 || skb_cloned(skb)) {
+	if (head_need > 0 || skb_header_cloned(skb)) {
 #if 0
 		printk(KERN_DEBUG "%s: need to reallocate buffer for %d bytes "
 		       "of headroom\n", dev->name, head_need);
 #endif
 
-		if (skb_cloned(skb))
+		if (skb_header_cloned(skb))
 			I802_DEBUG_INC(local->tx_expand_skb_head_cloned);
 		else
 			I802_DEBUG_INC(local->tx_expand_skb_head);