diff mbox

[net,v2] mld, igmp: Fix reserved tailroom calculation

Message ID 1456787013-3277-1-git-send-email-bpoirier@suse.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Benjamin Poirier Feb. 29, 2016, 11:03 p.m. UTC
The current reserved_tailroom calculation fails to take hlen and tlen into
account.

skb:
[__hlen__|__data____________|__tlen___|__extra__]
^                                               ^
head                                            skb_end_offset

In this representation, hlen + data + tlen is the size passed to alloc_skb.
"extra" is the extra space made available in __alloc_skb because of
rounding up by kmalloc. We can reorder the representation like so:

[__hlen__|__data____________|__extra__|__tlen___]
^                                               ^
head                                            skb_end_offset

The maximum space available for ip headers and payload without
fragmentation is min(mtu, data + extra). Therefore,
reserved_tailroom
= data + extra + tlen - min(mtu, data + extra)
= skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen)
= skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen)

Compare the second line to the current expression:
reserved_tailroom = skb_end_offset - min(mtu, skb_end_offset)
and we can see that hlen and tlen are not taken into account.

The min() in the third line can be expanded into:
if mtu < skb_tailroom - tlen:
	reserved_tailroom = skb_tailroom - mtu
else:
	reserved_tailroom = tlen

Depending on hlen, tlen, mtu and the number of multicast address records,
the current code may output skbs that have less tailroom than
dev->needed_tailroom or it may output more skbs than needed because not all
space available is used.

Fixes: 4c672e4b ("ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
---

Notes:
    Changes v1->v2
    As suggested by Hannes, move the code to an inline helper and express it
    using "if" rather than "min".

 include/linux/skbuff.h | 24 ++++++++++++++++++++++++
 net/ipv4/igmp.c        |  3 +--
 net/ipv6/mcast.c       |  3 +--
 3 files changed, 26 insertions(+), 4 deletions(-)

Comments

Hannes Frederic Sowa March 1, 2016, 10:09 a.m. UTC | #1
On 01.03.2016 00:03, Benjamin Poirier wrote:
> The current reserved_tailroom calculation fails to take hlen and tlen into
> account.
>
> skb:
> [__hlen__|__data____________|__tlen___|__extra__]
> ^                                               ^
> head                                            skb_end_offset
>
> In this representation, hlen + data + tlen is the size passed to alloc_skb.
> "extra" is the extra space made available in __alloc_skb because of
> rounding up by kmalloc. We can reorder the representation like so:
>
> [__hlen__|__data____________|__extra__|__tlen___]
> ^                                               ^
> head                                            skb_end_offset
>
> The maximum space available for ip headers and payload without
> fragmentation is min(mtu, data + extra). Therefore,
> reserved_tailroom
> = data + extra + tlen - min(mtu, data + extra)
> = skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen)
> = skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen)
>
> Compare the second line to the current expression:
> reserved_tailroom = skb_end_offset - min(mtu, skb_end_offset)
> and we can see that hlen and tlen are not taken into account.
>
> The min() in the third line can be expanded into:
> if mtu < skb_tailroom - tlen:
> 	reserved_tailroom = skb_tailroom - mtu
> else:
> 	reserved_tailroom = tlen
>
> Depending on hlen, tlen, mtu and the number of multicast address records,
> the current code may output skbs that have less tailroom than
> dev->needed_tailroom or it may output more skbs than needed because not all
> space available is used.
>
> Fixes: 4c672e4b ("ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs")
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>

I like it, thanks a lot!

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Daniel Borkmann March 1, 2016, 10:18 a.m. UTC | #2
On 03/01/2016 12:03 AM, Benjamin Poirier wrote:
[...]
> Notes:
>      Changes v1->v2
>      As suggested by Hannes, move the code to an inline helper and express it
>      using "if" rather than "min".

The code is correct, thanks!

Therefore:

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

However, I actually think that v1 was much better/easier as a fix though. :/

Meaning 1) it's likely easier to backport, and 2) that we now need a comment
above each skb->reserved_tailroom assignment probably says that min() might
perhaps have been more self-documenting ...

skb_tailroom_reserve() looks quite generic, but it only makes sense to use in
combination with skb_availroom(), which would have been good to put a note to
the header comment. Also "the required headroom should already have been
reserved before using this function", places one more requirement for usage.

If we really want to go that path, maybe rather a skb_setroom() that is coupled
with skb_availroom() like:

static inline int __skb_tailroom(const struct sk_buff *skb)
{
	return skb->end - skb->tail;
}

static inline void skb_setroom(struct sk_buff *skb,
                                unsigned int needed_headroom,
                                unsigned int size,
                                unsigned int needed_tailroom)
{
         SKB_LINEAR_ASSERT(skb);

         skb_reserve(skb, needed_headroom);
         skb->reserved_tailroom = needed_tailroom;

         if (size < __skb_tailroom(skb) - needed_tailroom)
                 skb->reserved_tailroom = __skb_tailroom(skb) - size;
}

Then, skb_tailroom() would internally use __skb_tailroom(), too. And we can also
spare us the two unneeded skb_is_nonlinear() checks in our helper which will
currently always evaluate to false anyway.

... just a thought.

Thanks again,
Daniel
Hannes Frederic Sowa March 1, 2016, 4 p.m. UTC | #3
On 01.03.2016 11:18, Daniel Borkmann wrote:
> On 03/01/2016 12:03 AM, Benjamin Poirier wrote:
> [...]
>> Notes:
>>      Changes v1->v2
>>      As suggested by Hannes, move the code to an inline helper and
>> express it
>>      using "if" rather than "min".
>
> The code is correct, thanks!
>
> Therefore:
>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
>
> However, I actually think that v1 was much better/easier as a fix
> though. :/
>
> Meaning 1) it's likely easier to backport, and 2) that we now need a
> comment
> above each skb->reserved_tailroom assignment probably says that min() might
> perhaps have been more self-documenting ...
>
> skb_tailroom_reserve() looks quite generic, but it only makes sense to
> use in
> combination with skb_availroom(), which would have been good to put a
> note to
> the header comment. Also "the required headroom should already have been
> reserved before using this function", places one more requirement for
> usage.
>
> If we really want to go that path, maybe rather a skb_setroom() that is
> coupled
> with skb_availroom() like:
>
> static inline int __skb_tailroom(const struct sk_buff *skb)
> {
>      return skb->end - skb->tail;
> }
>
> static inline void skb_setroom(struct sk_buff *skb,
>                                 unsigned int needed_headroom,
>                                 unsigned int size,
>                                 unsigned int needed_tailroom)
> {
>          SKB_LINEAR_ASSERT(skb);
>
>          skb_reserve(skb, needed_headroom);
>          skb->reserved_tailroom = needed_tailroom;
>
>          if (size < __skb_tailroom(skb) - needed_tailroom)
>                  skb->reserved_tailroom = __skb_tailroom(skb) - size;
> }
>
> Then, skb_tailroom() would internally use __skb_tailroom(), too. And we
> can also
> spare us the two unneeded skb_is_nonlinear() checks in our helper which
> will
> currently always evaluate to false anyway.
>
> ... just a thought.

I think it is fine. The code will be inlined anyway and probably skb 
linear assertions will be optimized away.

I like the current code in its form:

skb_reserve(skb, header);
skb_tailroom_reserve(skb, mtu, tailroom);

Combined form has too many arguments, so one has to look up the header 
file again.

Bye,
Hannes
David Miller March 3, 2016, 8:42 p.m. UTC | #4
From: Benjamin Poirier <bpoirier@suse.com>
Date: Mon, 29 Feb 2016 15:03:33 -0800

> The current reserved_tailroom calculation fails to take hlen and tlen into
> account.
 ...
> Fixes: 4c672e4b ("ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs")
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>

Applied and queued up for -stable, thanks.
diff mbox

Patch

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4ce9ff7..d3fcd45 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1985,6 +1985,30 @@  static inline void skb_reserve(struct sk_buff *skb, int len)
 	skb->tail += len;
 }
 
+/**
+ *	skb_tailroom_reserve - adjust reserved_tailroom
+ *	@skb: buffer to alter
+ *	@mtu: maximum amount of headlen permitted
+ *	@needed_tailroom: minimum amount of reserved_tailroom
+ *
+ *	Set reserved_tailroom so that headlen can be as large as possible but
+ *	not larger than mtu and tailroom cannot be smaller than
+ *	needed_tailroom.
+ *	The required headroom should already have been reserved before using
+ *	this function.
+ */
+static inline void skb_tailroom_reserve(struct sk_buff *skb, unsigned int mtu,
+					unsigned int needed_tailroom)
+{
+	SKB_LINEAR_ASSERT(skb);
+	if (mtu < skb_tailroom(skb) - needed_tailroom)
+		/* use at most mtu */
+		skb->reserved_tailroom = skb_tailroom(skb) - mtu;
+	else
+		/* use up to all available space */
+		skb->reserved_tailroom = needed_tailroom;
+}
+
 #define ENCAP_TYPE_ETHER	0
 #define ENCAP_TYPE_IPPROTO	1
 
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 05e4cba..b3086cf 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -356,9 +356,8 @@  static struct sk_buff *igmpv3_newpack(struct net_device *dev, unsigned int mtu)
 	skb_dst_set(skb, &rt->dst);
 	skb->dev = dev;
 
-	skb->reserved_tailroom = skb_end_offset(skb) -
-				 min(mtu, skb_end_offset(skb));
 	skb_reserve(skb, hlen);
+	skb_tailroom_reserve(skb, mtu, tlen);
 
 	skb_reset_network_header(skb);
 	pip = ip_hdr(skb);
diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index 5ee56d0..d64ee7e 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -1574,9 +1574,8 @@  static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu)
 		return NULL;
 
 	skb->priority = TC_PRIO_CONTROL;
-	skb->reserved_tailroom = skb_end_offset(skb) -
-				 min(mtu, skb_end_offset(skb));
 	skb_reserve(skb, hlen);
+	skb_tailroom_reserve(skb, mtu, tlen);
 
 	if (__ipv6_get_lladdr(idev, &addr_buf, IFA_F_TENTATIVE)) {
 		/* <draft-ietf-magma-mld-source-05.txt>: