diff mbox

mld, igmp: Fix reserved tailroom calculation

Message ID 1456603028-12589-2-git-send-email-bpoirier@suse.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Benjamin Poirier Feb. 27, 2016, 7:57 p.m. UTC
The current reserved_tailroom calculation fails to take hlen and tlen into
account.

skb:
[__hlen__|__data____________|__tlen___|__extra__]
^                                               ^
head                                            skb_end_offset

In this representation, hlen + data + tlen is the size passed to alloc_skb.
"extra" is the extra space made available in __alloc_skb because of
rounding up by kmalloc. We can reorder the representation like so:

[__hlen__|__data____________|__extra__|__tlen___]
^                                               ^
head                                            skb_end_offset

The maximum space available for ip headers and payload without
fragmentation is min(mtu, data + extra). Therefore,
reserved_tailroom
= data + extra + tlen - min(mtu, data + extra)
= skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen)
= skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen)

Compare the second line to the current expression:
reserved_tailroom = skb_end_offset - min(mtu, skb_end_offset)
and we can see that hlen and tlen are not taken into account.

Depending on hlen, tlen, mtu and the number of multicast address records,
the current code may output skbs that have less tailroom than
dev->needed_tailroom or it may output more skbs than needed because not all
space available is used.

Fixes: 4c672e4b ("ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
---
 net/ipv4/igmp.c  | 4 ++--
 net/ipv6/mcast.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

Comments

Daniel Borkmann Feb. 29, 2016, 2:57 p.m. UTC | #1
Hi Benjamin,

[ -Cc my old RH address ]

On 02/27/2016 08:57 PM, Benjamin Poirier wrote:
> The current reserved_tailroom calculation fails to take hlen and tlen into
> account.
>
> skb:
> [__hlen__|__data____________|__tlen___|__extra__]
> ^                                               ^
> head                                            skb_end_offset
>
> In this representation, hlen + data + tlen is the size passed to alloc_skb.
> "extra" is the extra space made available in __alloc_skb because of
> rounding up by kmalloc. We can reorder the representation like so:
>
> [__hlen__|__data____________|__extra__|__tlen___]
> ^                                               ^
> head                                            skb_end_offset
>
> The maximum space available for ip headers and payload without
> fragmentation is min(mtu, data + extra). Therefore,
> reserved_tailroom
> = data + extra + tlen - min(mtu, data + extra)
> = skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen)
> = skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen)
>
> Compare the second line to the current expression:
> reserved_tailroom = skb_end_offset - min(mtu, skb_end_offset)
> and we can see that hlen and tlen are not taken into account.
>
> Depending on hlen, tlen, mtu and the number of multicast address records,
> the current code may output skbs that have less tailroom than
> dev->needed_tailroom or it may output more skbs than needed because not all
> space available is used.
>
> Fixes: 4c672e4b ("ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs")
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> ---
>   net/ipv4/igmp.c  | 4 ++--
>   net/ipv6/mcast.c | 4 ++--
>   2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
> index 05e4cba..b5d28a4 100644
> --- a/net/ipv4/igmp.c
> +++ b/net/ipv4/igmp.c
[...]

[ cutting the IPv4 part off as diff is the same ]

> diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
> index 5ee56d0..c157edc 100644
> --- a/net/ipv6/mcast.c
> +++ b/net/ipv6/mcast.c
> @@ -1574,9 +1574,9 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu)
>   		return NULL;
>
>   	skb->priority = TC_PRIO_CONTROL;
> -	skb->reserved_tailroom = skb_end_offset(skb) -
> -				 min(mtu, skb_end_offset(skb));
>   	skb_reserve(skb, hlen);
> +	skb->reserved_tailroom = skb_tailroom(skb) -
> +		min_t(int, mtu, skb_tailroom(skb) - tlen);

Are you sure this is correct? Wouldn't that mean (assuming we allocated
enough space), that I could now fill a larger than MTU frame?

>   	if (__ipv6_get_lladdr(idev, &addr_buf, IFA_F_TENTATIVE)) {
>   		/* <draft-ietf-magma-mld-source-05.txt>:

Thanks,
Daniel
Benjamin Poirier Feb. 29, 2016, 3:19 p.m. UTC | #2
On 2016/02/29 15:57, Daniel Borkmann wrote:
[...]
> 
> [ cutting the IPv4 part off as diff is the same ]
> 
> >diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
> >index 5ee56d0..c157edc 100644
> >--- a/net/ipv6/mcast.c
> >+++ b/net/ipv6/mcast.c
> >@@ -1574,9 +1574,9 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu)
> >  		return NULL;
> >
> >  	skb->priority = TC_PRIO_CONTROL;
> >-	skb->reserved_tailroom = skb_end_offset(skb) -
> >-				 min(mtu, skb_end_offset(skb));
> >  	skb_reserve(skb, hlen);
> >+	skb->reserved_tailroom = skb_tailroom(skb) -
> >+		min_t(int, mtu, skb_tailroom(skb) - tlen);
> 
> Are you sure this is correct? Wouldn't that mean (assuming we allocated
> enough space), that I could now fill a larger than MTU frame?

Quoting back a part of the log:

> >The maximum space available for ip headers and payload without
> >fragmentation is min(mtu, data + extra). Therefore,
> >reserved_tailroom
> >= data + extra + tlen - min(mtu, data + extra)
> >= skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen)
> >= skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen)

The min() takes care of the situation you describe, ie. if the allocated
space is large, reserved_tailroom will be large enough that we do not
use more space than the mtu.

I tested the mld and igmp code with different driver parameters, mtu
values, number of multicast address records and even allocation
failures. If you think the formula is wrong, please provide a
counter-example with hlen, tlen, mtu and size values.

Regards,
-Benjamin
Daniel Borkmann Feb. 29, 2016, 3:38 p.m. UTC | #3
On 02/29/2016 04:19 PM, Benjamin Poirier wrote:
> On 2016/02/29 15:57, Daniel Borkmann wrote:
> [...]
>>
>> [ cutting the IPv4 part off as diff is the same ]
>>
>>> diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
>>> index 5ee56d0..c157edc 100644
>>> --- a/net/ipv6/mcast.c
>>> +++ b/net/ipv6/mcast.c
>>> @@ -1574,9 +1574,9 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu)
>>>   		return NULL;
>>>
>>>   	skb->priority = TC_PRIO_CONTROL;
>>> -	skb->reserved_tailroom = skb_end_offset(skb) -
>>> -				 min(mtu, skb_end_offset(skb));
>>>   	skb_reserve(skb, hlen);
>>> +	skb->reserved_tailroom = skb_tailroom(skb) -
>>> +		min_t(int, mtu, skb_tailroom(skb) - tlen);
>>
>> Are you sure this is correct? Wouldn't that mean (assuming we allocated
>> enough space), that I could now fill a larger than MTU frame?
>
> Quoting back a part of the log:
>
>>> The maximum space available for ip headers and payload without
>>> fragmentation is min(mtu, data + extra). Therefore,
>>> reserved_tailroom
>>> = data + extra + tlen - min(mtu, data + extra)
>>> = skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen)
>>> = skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen)
>
> The min() takes care of the situation you describe, ie. if the allocated
> space is large, reserved_tailroom will be large enough that we do not
> use more space than the mtu.

Hmm, sorry, you are right, I had a bug in my thought process wrt the
skb_reserve() that is now done first.

Code is fine, patch would be against -net tree:

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

Thanks, Benjamin!
Hannes Frederic Sowa Feb. 29, 2016, 3:43 p.m. UTC | #4
On 29.02.2016 16:19, Benjamin Poirier wrote:
> On 2016/02/29 15:57, Daniel Borkmann wrote:
> [...]
>>
>> [ cutting the IPv4 part off as diff is the same ]
>>
>>> diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
>>> index 5ee56d0..c157edc 100644
>>> --- a/net/ipv6/mcast.c
>>> +++ b/net/ipv6/mcast.c
>>> @@ -1574,9 +1574,9 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu)
>>>   		return NULL;
>>>
>>>   	skb->priority = TC_PRIO_CONTROL;
>>> -	skb->reserved_tailroom = skb_end_offset(skb) -
>>> -				 min(mtu, skb_end_offset(skb));
>>>   	skb_reserve(skb, hlen);
>>> +	skb->reserved_tailroom = skb_tailroom(skb) -
>>> +		min_t(int, mtu, skb_tailroom(skb) - tlen);
>>
>> Are you sure this is correct? Wouldn't that mean (assuming we allocated
>> enough space), that I could now fill a larger than MTU frame?
>
> Quoting back a part of the log:
>
>>> The maximum space available for ip headers and payload without
>>> fragmentation is min(mtu, data + extra). Therefore,
>>> reserved_tailroom
>>> = data + extra + tlen - min(mtu, data + extra)
>>> = skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen)
>>> = skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen)
>
> The min() takes care of the situation you describe, ie. if the allocated
> space is large, reserved_tailroom will be large enough that we do not
> use more space than the mtu.
>
> I tested the mld and igmp code with different driver parameters, mtu
> values, number of multicast address records and even allocation
> failures. If you think the formula is wrong, please provide a
> counter-example with hlen, tlen, mtu and size values.

I think the code is fine albeit I think we should remove the min macro 
and just do something:

if (skb_tailroom(skb) > mtu)
	skb->reserved_tailroom = skb_tailroom(skb) - mtu;

Does that make sense? I think it is much more readable.

Thanks,
Hannes
Benjamin Poirier Feb. 29, 2016, 6:08 p.m. UTC | #5
On 2016/02/29 16:43, Hannes Frederic Sowa wrote:
> On 29.02.2016 16:19, Benjamin Poirier wrote:
> >On 2016/02/29 15:57, Daniel Borkmann wrote:
> >[...]
> >>
> >>[ cutting the IPv4 part off as diff is the same ]
> >>
> >>>diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
> >>>index 5ee56d0..c157edc 100644
> >>>--- a/net/ipv6/mcast.c
> >>>+++ b/net/ipv6/mcast.c
> >>>@@ -1574,9 +1574,9 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu)
> >>>  		return NULL;
> >>>
> >>>  	skb->priority = TC_PRIO_CONTROL;
> >>>-	skb->reserved_tailroom = skb_end_offset(skb) -
> >>>-				 min(mtu, skb_end_offset(skb));
> >>>  	skb_reserve(skb, hlen);
> >>>+	skb->reserved_tailroom = skb_tailroom(skb) -
> >>>+		min_t(int, mtu, skb_tailroom(skb) - tlen);
> >>
> >>Are you sure this is correct? Wouldn't that mean (assuming we allocated
> >>enough space), that I could now fill a larger than MTU frame?
> >
> >Quoting back a part of the log:
> >
> >>>The maximum space available for ip headers and payload without
> >>>fragmentation is min(mtu, data + extra). Therefore,
> >>>reserved_tailroom
> >>>= data + extra + tlen - min(mtu, data + extra)
> >>>= skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen)
> >>>= skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen)
> >
> >The min() takes care of the situation you describe, ie. if the allocated
> >space is large, reserved_tailroom will be large enough that we do not
> >use more space than the mtu.
> >
> >I tested the mld and igmp code with different driver parameters, mtu
> >values, number of multicast address records and even allocation
> >failures. If you think the formula is wrong, please provide a
> >counter-example with hlen, tlen, mtu and size values.
> 
> I think the code is fine albeit I think we should remove the min macro and
> just do something:
> 
> if (skb_tailroom(skb) > mtu)
> 	skb->reserved_tailroom = skb_tailroom(skb) - mtu;
> 
> Does that make sense? I think it is much more readable.

That is not equivalent. It fails to take tlen into account.

For igmp, consider this case:
with hlen = 16, mtu = 9000, tlen = 8,
additionally, suppose that the first iteration of the allocation loop
(alloc_skb(9000 + 16 + 8, ...) which requires 4 pages) fails and the
second iteration (alloc_skb((9000 >> 1) + 16 + 8, ...) which requires 2
pages) succeeds:
	size = (9000 >> 1) + 16 + 8 = 4524
	skb_end_offset = 8192 - 320 = 7872
	tailroom = 7872 - 16 = 7856

	data = 9000 >> 1 = 4500
	extra = 7872 - 4524 = 3348

	reserved tailroom (patch version)
		= 4500 + 3348 + 8 - min(9000, 4500 + 3348)
		= 8
	reserved tailroom (your version)
		= 0

	Headers are ipv4 + igmpv3 = 24 + 8 = 32, records are 8 bytes
	With 978 igmpv3 records, with your version, we would output an
	skb that has less tailroom (0) than dev->needed_tailroom (8).

For mld, consider this case:
with hlen = 16, mtu = 9000, tlen = 8:
	size = 3776 (SKB_MAX_ORDER case)
	skb_end_offset = 3776
	tailroom = 3776 - 16 = 3760

	data = 3776 - 16 - 8 = 3752
	extra = 0

	reserved tailroom (patch version)
		= 3752 + 0 + 8 - min(9000, 3752 + 0)
		= 8
	reserved tailroom (your version)
		= 0

	Headers are ipv6 + icmpv6 = 48 + 8 = 56, records are 20 bytes
	With 185 mld records, with your formula, we would output an skb that
	has less tailroom (4) than dev->needed_tailroom (8).

If you think we should write the expression with "if" instead of "min",
instead of the current

+	skb->reserved_tailroom = skb_tailroom(skb) -
+		min_t(int, mtu, skb_tailroom(skb) - tlen);

it should be:

+	if (mtu < skb_tailroom(skb) - tlen)
+		skb->reserved_tailroom = skb_tailroom(skb) - mtu;
+	else
+		skb->reserved_tailroom = tlen;

The second alternative does not look more readable to me but I have been
looking at that expression for a while. If you think that it is more
readable, I will resend the patch expressed that way. Please let me
know.
Hannes Frederic Sowa Feb. 29, 2016, 6:28 p.m. UTC | #6
On 29.02.2016 19:08, Benjamin Poirier wrote:
> If you think we should write the expression with "if" instead of "min",
> instead of the current
>
> +	skb->reserved_tailroom = skb_tailroom(skb) -
> +		min_t(int, mtu, skb_tailroom(skb) - tlen);
>
> it should be:
>
> +	if (mtu < skb_tailroom(skb) - tlen)
> +		skb->reserved_tailroom = skb_tailroom(skb) - mtu;
> +	else
> +		skb->reserved_tailroom = tlen;
>
> The second alternative does not look more readable to me but I have been
> looking at that expression for a while. If you think that it is more
> readable, I will resend the patch expressed that way. Please let me
> know.

I would still find it more readable actually, but no strong opinion, I 
would leave it up to you.

Could it make sense to put this code into a static inline helper and 
reuse it for both, igmp and mld?

Thanks,
Hannes
diff mbox

Patch

diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 05e4cba..b5d28a4 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -356,9 +356,9 @@  static struct sk_buff *igmpv3_newpack(struct net_device *dev, unsigned int mtu)
 	skb_dst_set(skb, &rt->dst);
 	skb->dev = dev;
 
-	skb->reserved_tailroom = skb_end_offset(skb) -
-				 min(mtu, skb_end_offset(skb));
 	skb_reserve(skb, hlen);
+	skb->reserved_tailroom = skb_tailroom(skb) -
+		min_t(int, mtu, skb_tailroom(skb) - tlen);
 
 	skb_reset_network_header(skb);
 	pip = ip_hdr(skb);
diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index 5ee56d0..c157edc 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -1574,9 +1574,9 @@  static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu)
 		return NULL;
 
 	skb->priority = TC_PRIO_CONTROL;
-	skb->reserved_tailroom = skb_end_offset(skb) -
-				 min(mtu, skb_end_offset(skb));
 	skb_reserve(skb, hlen);
+	skb->reserved_tailroom = skb_tailroom(skb) -
+		min_t(int, mtu, skb_tailroom(skb) - tlen);
 
 	if (__ipv6_get_lladdr(idev, &addr_buf, IFA_F_TENTATIVE)) {
 		/* <draft-ietf-magma-mld-source-05.txt>: