diff mbox

net: preserve IP control block during GSO segmentation

Message ID 145225444176.22215.2003378381077166898.stgit@zurg
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Konstantin Khlebnikov Jan. 8, 2016, noon UTC
Skb_gso_segment() uses skb control block during segmentation.
This patch adds 32-bytes room for previous control block which
will be copied into all resulting segments.

This patch fixes kernel crash during fragmenting forwarded packets.
Fragmentation requires valid IP CB in skb for clearing ip options.
Also patch removes custom save/restore in ovs code, now it's redundant.

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com
---
 include/linux/skbuff.h     |    3 ++-
 net/core/dev.c             |    5 +++++
 net/ipv4/ip_output.c       |    1 +
 net/openvswitch/datapath.c |    4 +---
 net/xfrm/xfrm_output.c     |    2 ++
 5 files changed, 11 insertions(+), 4 deletions(-)

Comments

Konstantin Khlebnikov Jan. 8, 2016, 12:10 p.m. UTC | #1
On Fri, Jan 8, 2016 at 3:00 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> Skb_gso_segment() uses skb control block during segmentation.
> This patch adds 32-bytes room for previous control block which
> will be copied into all resulting segments.
>
> This patch fixes kernel crash during fragmenting forwarded packets.
> Fragmentation requires valid IP CB in skb for clearing ip options.
> Also patch removes custom save/restore in ovs code, now it's redundant.
>
> Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
> Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com

This patch is alternative for [PATCH] net: prevent corruption of skb when using
skb_gso_segment by Thadeu Lima de Souza Cascardo. This version have
no stack allocations and no changes in code. It just shifts gso cb.
David Laight Jan. 8, 2016, 12:13 p.m. UTC | #2
From: Of Konstantin Khlebnikov

> Sent: 08 January 2016 12:01

> Skb_gso_segment() uses skb control block during segmentation.

> This patch adds 32-bytes room for previous control block which

> will be copied into all resulting segments.

> 

> This patch fixes kernel crash during fragmenting forwarded packets.

> Fragmentation requires valid IP CB in skb for clearing ip options.

> Also patch removes custom save/restore in ovs code, now it's redundant.

> 

...
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h

> index 4355129fff91..9147f9f34cbe 100644

> --- a/include/linux/skbuff.h

> +++ b/include/linux/skbuff.h

> @@ -3446,7 +3446,8 @@ struct skb_gso_cb {

>  	int	encap_level;

>  	__u16	csum_start;

>  };

> -#define SKB_GSO_CB(skb) ((struct skb_gso_cb *)(skb)->cb)

> +#define SKB_SGO_CB_OFFSET	32

> +#define SKB_GSO_CB(skb) ((struct skb_gso_cb *)((skb)->cb + SKB_SGO_CB_OFFSET))


You could set SKB_SGO_CB_OFFSET to sizeof ((skb)->cb) - sizeof (struct skb_gso_cb)
so that the end of 'cb' is always used.
(Assuming the former is a multiple of 4.)

It might be worth using an on-stack structure passed through as a separate
parameter - it doesn't look as though it has to be queued with the skb.
(Clearly a bigger change.)

	David
Thadeu Lima de Souza Cascardo Jan. 8, 2016, 12:20 p.m. UTC | #3
On Fri, Jan 08, 2016 at 12:13:49PM +0000, David Laight wrote:
> From: Of Konstantin Khlebnikov
> > Sent: 08 January 2016 12:01
> > Skb_gso_segment() uses skb control block during segmentation.
> > This patch adds 32-bytes room for previous control block which
> > will be copied into all resulting segments.
> > 
> > This patch fixes kernel crash during fragmenting forwarded packets.
> > Fragmentation requires valid IP CB in skb for clearing ip options.
> > Also patch removes custom save/restore in ovs code, now it's redundant.
> > 
> ...
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 4355129fff91..9147f9f34cbe 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -3446,7 +3446,8 @@ struct skb_gso_cb {
> >  	int	encap_level;
> >  	__u16	csum_start;
> >  };
> > -#define SKB_GSO_CB(skb) ((struct skb_gso_cb *)(skb)->cb)
> > +#define SKB_SGO_CB_OFFSET	32
> > +#define SKB_GSO_CB(skb) ((struct skb_gso_cb *)((skb)->cb + SKB_SGO_CB_OFFSET))
> 
> You could set SKB_SGO_CB_OFFSET to sizeof ((skb)->cb) - sizeof (struct skb_gso_cb)
> so that the end of 'cb' is always used.
> (Assuming the former is a multiple of 4.)
> 
> It might be worth using an on-stack structure passed through as a separate
> parameter - it doesn't look as though it has to be queued with the skb.
> (Clearly a bigger change.)
> 

I considered that as an option. But the bigger change and the use of the extra
stack for all users, plus the extra parameters indicated I should go the other
way.

In my opinion, at least in the IP fragmentation case, saving/restoring cb is not
such a big problem since we are in slow path already.

Cascardo.

> 	David
>
kernel test robot Jan. 8, 2016, 12:24 p.m. UTC | #4
Hi Konstantin,

[auto build test ERROR on net/master]
[also build test ERROR on v4.4-rc8 next-20160108]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url:    https://github.com/0day-ci/linux/commits/Konstantin-Khlebnikov/net-preserve-IP-control-block-during-GSO-segmentation/20160108-200335
config: xtensa-allyesconfig (attached as .config)
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=xtensa 

All errors (new ones prefixed by >>):

   net/openvswitch/datapath.c: In function 'queue_gso_packets':
>> net/openvswitch/datapath.c:360:18: error: 'ovs_cb' undeclared (first use in this function)
      *OVS_CB(skb) = ovs_cb;
                     ^
   net/openvswitch/datapath.c:360:18: note: each undeclared identifier is reported only once for each function it appears in

vim +/ovs_cb +360 net/openvswitch/datapath.c

e8eedb85 Pravin B Shelar 2014-11-06  354  		later_key.ip.frag = OVS_FRAG_TYPE_LATER;
e8eedb85 Pravin B Shelar 2014-11-06  355  	}
e8eedb85 Pravin B Shelar 2014-11-06  356  
ccb1352e Jesse Gross     2011-10-25  357  	/* Queue all of the segments. */
ccb1352e Jesse Gross     2011-10-25  358  	skb = segs;
ccb1352e Jesse Gross     2011-10-25  359  	do {
e8eedb85 Pravin B Shelar 2014-11-06 @360  		*OVS_CB(skb) = ovs_cb;
e8eedb85 Pravin B Shelar 2014-11-06  361  		if (gso_type & SKB_GSO_UDP && skb != segs)
e8eedb85 Pravin B Shelar 2014-11-06  362  			key = &later_key;
e8eedb85 Pravin B Shelar 2014-11-06  363  

:::::: The code at line 360 was first introduced by commit
:::::: e8eedb85bd238613332570ac6ae683fee94fbe36 openvswitch: Remove redundant key ref from upcall_info.

:::::: TO: Pravin B Shelar <pshelar@nicira.com>
:::::: CC: Pravin B Shelar <pshelar@nicira.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
Zang MingJie Jan. 11, 2016, 7:45 a.m. UTC | #5
On 01/08/2016 08:13 PM, David Laight wrote:
> You could set SKB_SGO_CB_OFFSET to sizeof ((skb)->cb) - sizeof (struct skb_gso_cb)
> so that the end of 'cb' is always used.
> (Assuming the former is a multiple of 4.)
>
> It might be worth using an on-stack structure passed through as a separate
> parameter - it doesn't look as though it has to be queued with the skb.
> (Clearly a bigger change.)

I would definitely prefer the stack structure.

As a kernel developer, sometime I can hardly figure out which struct 
current cb is without debug it, and the worst they are not documented 
anywhere. I can hardly know the life time of the cb types.

If using a stack, things can be much easier. only an extra var to store 
the stack top:

    int cb_top;

and several macro to manage the stack:

    SKB_CB_PUSH(skb, type)
    SKB_CB_POP(skb)
    SKB_CB_TOP(skb, type)

and maybe a debug variable to store current cb type.

All current cb macro can be replaced by SKB_CB_TOP.


Although it is a big change, I think it worths, for both performance and 
maintainability
Cong Wang Jan. 12, 2016, 12:58 a.m. UTC | #6
On Sun, Jan 10, 2016 at 11:45 PM, Zang MingJie <zealot0630@gmail.com> wrote:
> On 01/08/2016 08:13 PM, David Laight wrote:
>>
>> You could set SKB_SGO_CB_OFFSET to sizeof ((skb)->cb) - sizeof (struct
>> skb_gso_cb)
>> so that the end of 'cb' is always used.
>> (Assuming the former is a multiple of 4.)
>>
>> It might be worth using an on-stack structure passed through as a separate
>> parameter - it doesn't look as though it has to be queued with the skb.
>> (Clearly a bigger change.)
>
>
> I would definitely prefer the stack structure.
>
> As a kernel developer, sometime I can hardly figure out which struct current
> cb is without debug it, and the worst they are not documented anywhere. I
> can hardly know the life time of the cb types.

NAK.

Skb control block was not designed to be used like a stack but like a union
for different layers, just that people begin to mess it up.
diff mbox

Patch

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4355129fff91..9147f9f34cbe 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3446,7 +3446,8 @@  struct skb_gso_cb {
 	int	encap_level;
 	__u16	csum_start;
 };
-#define SKB_GSO_CB(skb) ((struct skb_gso_cb *)(skb)->cb)
+#define SKB_SGO_CB_OFFSET	32
+#define SKB_GSO_CB(skb) ((struct skb_gso_cb *)((skb)->cb + SKB_SGO_CB_OFFSET))
 
 static inline int skb_tnl_header_len(const struct sk_buff *inner_skb)
 {
diff --git a/net/core/dev.c b/net/core/dev.c
index ae00b894e675..7f00f2439770 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2542,6 +2542,8 @@  static inline bool skb_needs_check(struct sk_buff *skb, bool tx_path)
  *
  *	It may return NULL if the skb requires no segmentation.  This is
  *	only possible when GSO is used for verifying header integrity.
+ *
+ *	Segmentation preserves SKB_SGO_CB_OFFSET bytes of previous skb cb.
  */
 struct sk_buff *__skb_gso_segment(struct sk_buff *skb,
 				  netdev_features_t features, bool tx_path)
@@ -2556,6 +2558,9 @@  struct sk_buff *__skb_gso_segment(struct sk_buff *skb,
 			return ERR_PTR(err);
 	}
 
+	BUILD_BUG_ON(SKB_SGO_CB_OFFSET +
+		     sizeof(*SKB_GSO_CB(skb)) > sizeof(skb->cb));
+
 	SKB_GSO_CB(skb)->mac_offset = skb_headroom(skb);
 	SKB_GSO_CB(skb)->encap_level = 0;
 
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 4233cbe47052..59ed4b89b67a 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -240,6 +240,7 @@  static int ip_finish_output_gso(struct net *net, struct sock *sk,
 	 * from host network stack.
 	 */
 	features = netif_skb_features(skb);
+	BUILD_BUG_ON(sizeof(*IPCB(skb)) > SKB_SGO_CB_OFFSET);
 	segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
 	if (IS_ERR_OR_NULL(segs)) {
 		kfree_skb(skb);
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 91a8b004dc51..b1b380ee667d 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -336,12 +336,10 @@  static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb,
 	unsigned short gso_type = skb_shinfo(skb)->gso_type;
 	struct sw_flow_key later_key;
 	struct sk_buff *segs, *nskb;
-	struct ovs_skb_cb ovs_cb;
 	int err;
 
-	ovs_cb = *OVS_CB(skb);
+	BUILD_BUG_ON(sizeof(*OVS_CB(skb)) > SKB_SGO_CB_OFFSET);
 	segs = __skb_gso_segment(skb, NETIF_F_SG, false);
-	*OVS_CB(skb) = ovs_cb;
 	if (IS_ERR(segs))
 		return PTR_ERR(segs);
 	if (segs == NULL)
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index cc3676eb6239..ff4a91fcab9f 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -167,6 +167,8 @@  static int xfrm_output_gso(struct net *net, struct sock *sk, struct sk_buff *skb
 {
 	struct sk_buff *segs;
 
+	BUILD_BUG_ON(sizeof(*IPCB(skb)) > SKB_SGO_CB_OFFSET);
+	BUILD_BUG_ON(sizeof(*IP6CB(skb)) > SKB_SGO_CB_OFFSET);
 	segs = skb_gso_segment(skb, 0);
 	kfree_skb(skb);
 	if (IS_ERR(segs))