diff mbox

[RFC] fix IP_ECN_set_ce

Message ID 1355898095-7444-1-git-send-email-roy.qing.li@gmail.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Li RongQing Dec. 19, 2012, 6:21 a.m. UTC
From: Li RongQing <roy.qing.li@gmail.com>

1. ECN uses the two least significant (right-most) bits of the DiffServ
field in the IPv4, so it should be in iph->tos, not in (iph->tos+1)

2. When setting CE, we should check if ECN Capable Transport supports,
both 10 and 01 mean ECN Capable Transport, so only check 10 is not enough
    00: Non ECN-Capable Transport — Non-ECT
    10: ECN Capable Transport — ECT(0)
    01: ECN Capable Transport — ECT(1)
    11: Congestion Encountered — CE

3. Remove the misunderstand comment

4. fix the checksum computation

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
---
 include/net/inet_ecn.h |   22 ++++------------------
 1 file changed, 4 insertions(+), 18 deletions(-)

Comments

Julian Anastasov Dec. 19, 2012, 8:11 a.m. UTC | #1
Hello,

On Wed, 19 Dec 2012, roy.qing.li@gmail.com wrote:

> From: Li RongQing <roy.qing.li@gmail.com>
> 
> 1. ECN uses the two least significant (right-most) bits of the DiffServ
> field in the IPv4, so it should be in iph->tos, not in (iph->tos+1)
> 
> 2. When setting CE, we should check if ECN Capable Transport supports,
> both 10 and 01 mean ECN Capable Transport, so only check 10 is not enough
>     00: Non ECN-Capable Transport — Non-ECT
>     10: ECN Capable Transport — ECT(0)
>     01: ECN Capable Transport — ECT(1)
>     11: Congestion Encountered — CE
> 
> 3. Remove the misunderstand comment
> 
> 4. fix the checksum computation
> 
> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
> ---
>  include/net/inet_ecn.h |   22 ++++------------------
>  1 file changed, 4 insertions(+), 18 deletions(-)
> 
> diff --git a/include/net/inet_ecn.h b/include/net/inet_ecn.h
> index aab7375..545a683 100644
> --- a/include/net/inet_ecn.h
> +++ b/include/net/inet_ecn.h
> @@ -73,27 +73,13 @@ static inline void INET_ECN_dontxmit(struct sock *sk)
>  
>  static inline int IP_ECN_set_ce(struct iphdr *iph)
>  {
> -	u32 check = (__force u32)iph->check;
> -	u32 ecn = (iph->tos + 1) & INET_ECN_MASK;
> -
> -	/*
> -	 * After the last operation we have (in binary):
> -	 * INET_ECN_NOT_ECT => 01
> -	 * INET_ECN_ECT_1   => 10
> -	 * INET_ECN_ECT_0   => 11
> -	 * INET_ECN_CE      => 00
> -	 */

	I think, the above comment explains how an
increment (iph->tos + 1) serves the purpose to check
for ECT_1 and ECT_0, there is no such thing as
addressing the next byte from header. It is just an
optimized logic that avoids complex INET_ECN_is_XXX
checks.

> -	if (!(ecn & 2))
> +	u32 ecn = iph->tos & INET_ECN_MASK;
> +
> +	if (INET_ECN_is_ce(ecn) || INET_ECN_is_not_ect(ecn))
>  		return !ecn;

	May be return INET_ECN_is_ce(ecn) ?

>  
> -	/*
> -	 * The following gives us:
> -	 * INET_ECN_ECT_1 => check += htons(0xFFFD)
> -	 * INET_ECN_ECT_0 => check += htons(0xFFFE)
> -	 */
> -	check += (__force u16)htons(0xFFFB) + (__force u16)htons(ecn);
> +	csum_replace2(&iph->check, iph->tos, iph->tos|INET_ECN_CE);
>  
> -	iph->check = (__force __sum16)(check + (check>=0xFFFF));
>  	iph->tos |= INET_ECN_CE;
>  	return 1;
>  }
> -- 
> 1.7.10.4

Regards

--
Julian Anastasov <ja@ssi.bg>
Li RongQing Dec. 19, 2012, 8:41 a.m. UTC | #2
2012/12/19 Julian Anastasov <ja@ssi.bg>:
>
>         Hello,
>
> On Wed, 19 Dec 2012, roy.qing.li@gmail.com wrote:
>
>> From: Li RongQing <roy.qing.li@gmail.com>
>>
>> 1. ECN uses the two least significant (right-most) bits of the DiffServ
>> field in the IPv4, so it should be in iph->tos, not in (iph->tos+1)
>>
>> 2. When setting CE, we should check if ECN Capable Transport supports,
>> both 10 and 01 mean ECN Capable Transport, so only check 10 is not enough
>>     00: Non ECN-Capable Transport — Non-ECT
>>     10: ECN Capable Transport — ECT(0)
>>     01: ECN Capable Transport — ECT(1)
>>     11: Congestion Encountered — CE
>>
>> 3. Remove the misunderstand comment
>>
>> 4. fix the checksum computation
>>
>> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
>> ---
>>  include/net/inet_ecn.h |   22 ++++------------------
>>  1 file changed, 4 insertions(+), 18 deletions(-)
>>
>> diff --git a/include/net/inet_ecn.h b/include/net/inet_ecn.h
>> index aab7375..545a683 100644
>> --- a/include/net/inet_ecn.h
>> +++ b/include/net/inet_ecn.h
>> @@ -73,27 +73,13 @@ static inline void INET_ECN_dontxmit(struct sock *sk)
>>
>>  static inline int IP_ECN_set_ce(struct iphdr *iph)
>>  {
>> -     u32 check = (__force u32)iph->check;
>> -     u32 ecn = (iph->tos + 1) & INET_ECN_MASK;
>> -
>> -     /*
>> -      * After the last operation we have (in binary):
>> -      * INET_ECN_NOT_ECT => 01
>> -      * INET_ECN_ECT_1   => 10
>> -      * INET_ECN_ECT_0   => 11
>> -      * INET_ECN_CE      => 00
>> -      */
>
>         I think, the above comment explains how an
> increment (iph->tos + 1) serves the purpose to check
> for ECT_1 and ECT_0, there is no such thing as
> addressing the next byte from header. It is just an
> optimized logic that avoids complex INET_ECN_is_XXX
> checks.
Thanks for your reply.
Do you mean this comment are valuable?


>
>> -     if (!(ecn & 2))
>> +     u32 ecn = iph->tos & INET_ECN_MASK;
>> +
>> +     if (INET_ECN_is_ce(ecn) || INET_ECN_is_not_ect(ecn))
>>               return !ecn;
>
>         May be return INET_ECN_is_ce(ecn) ?
>

I like to set the return value to void, since noone cares about the
return value.

-Roy

>>
>> -     /*
>> -      * The following gives us:
>> -      * INET_ECN_ECT_1 => check += htons(0xFFFD)
>> -      * INET_ECN_ECT_0 => check += htons(0xFFFE)
>> -      */
>> -     check += (__force u16)htons(0xFFFB) + (__force u16)htons(ecn);
>> +     csum_replace2(&iph->check, iph->tos, iph->tos|INET_ECN_CE);
>>
>> -     iph->check = (__force __sum16)(check + (check>=0xFFFF));
>>       iph->tos |= INET_ECN_CE;
>>       return 1;
>>  }
>> --
>> 1.7.10.4
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julian Anastasov Dec. 19, 2012, 8:58 a.m. UTC | #3
Hello,

On Wed, 19 Dec 2012, RongQing Li wrote:

> >>  static inline int IP_ECN_set_ce(struct iphdr *iph)
> >>  {
> >> -     u32 check = (__force u32)iph->check;
> >> -     u32 ecn = (iph->tos + 1) & INET_ECN_MASK;
> >> -
> >> -     /*
> >> -      * After the last operation we have (in binary):
> >> -      * INET_ECN_NOT_ECT => 01
> >> -      * INET_ECN_ECT_1   => 10
> >> -      * INET_ECN_ECT_0   => 11
> >> -      * INET_ECN_CE      => 00
> >> -      */
> >
> >         I think, the above comment explains how an
> > increment (iph->tos + 1) serves the purpose to check
> > for ECT_1 and ECT_0, there is no such thing as
> > addressing the next byte from header. It is just an
> > optimized logic that avoids complex INET_ECN_is_XXX
> > checks.
> Thanks for your reply.
> Do you mean this comment are valuable?

	It looks better to me with the comment and the
original checks. But I can't comment the correctness of
the other changes in your patch.

> >> -     if (!(ecn & 2))
> >> +     u32 ecn = iph->tos & INET_ECN_MASK;
> >> +
> >> +     if (INET_ECN_is_ce(ecn) || INET_ECN_is_not_ect(ecn))
> >>               return !ecn;
> >
> >         May be return INET_ECN_is_ce(ecn) ?
> >
> 
> I like to set the return value to void, since noone cares about the
> return value.

	It is used by INET_ECN_set_ce and its users in
net/sched/

> -Roy

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Li RongQing Dec. 19, 2012, 9:11 a.m. UTC | #4
2012/12/19 Julian Anastasov <ja@ssi.bg>:
>
>         Hello,
>
> On Wed, 19 Dec 2012, RongQing Li wrote:
>
>> >>  static inline int IP_ECN_set_ce(struct iphdr *iph)
>> >>  {
>> >> -     u32 check = (__force u32)iph->check;
>> >> -     u32 ecn = (iph->tos + 1) & INET_ECN_MASK;
>> >> -
>> >> -     /*
>> >> -      * After the last operation we have (in binary):
>> >> -      * INET_ECN_NOT_ECT => 01
>> >> -      * INET_ECN_ECT_1   => 10
>> >> -      * INET_ECN_ECT_0   => 11
>> >> -      * INET_ECN_CE      => 00
>> >> -      */
>> >
>> >         I think, the above comment explains how an
>> > increment (iph->tos + 1) serves the purpose to check
>> > for ECT_1 and ECT_0, there is no such thing as
>> > addressing the next byte from header. It is just an
>> > optimized logic that avoids complex INET_ECN_is_XXX
>> > checks.
>> Thanks for your reply.
>> Do you mean this comment are valuable?
>
>         It looks better to me with the comment and the
> original checks. But I can't comment the correctness of
> the other changes in your patch.

I do not know how they are useful, and how the original check
works, since the value in comments are wrong, the correct is:

enum {
        INET_ECN_NOT_ECT = 0,
        INET_ECN_ECT_1 = 1,
        INET_ECN_ECT_0 = 2,
        INET_ECN_CE = 3,
        INET_ECN_MASK = 3,
};


   00: Non ECN-Capable Transport — Non-ECT
    10: ECN Capable Transport — ECT(0)
    01: ECN Capable Transport — ECT(1)
    11: Congestion Encountered — CE

-Roy


>
>> >> -     if (!(ecn & 2))
>> >> +     u32 ecn = iph->tos & INET_ECN_MASK;
>> >> +
>> >> +     if (INET_ECN_is_ce(ecn) || INET_ECN_is_not_ect(ecn))
>> >>               return !ecn;
>> >
>> >         May be return INET_ECN_is_ce(ecn) ?
>> >
>>
>> I like to set the return value to void, since noone cares about the
>> return value.
>
>         It is used by INET_ECN_set_ce and its users in
> net/sched/
>
>> -Roy
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 19, 2012, 9:31 a.m. UTC | #5
From: RongQing Li <roy.qing.li@gmail.com>
Date: Wed, 19 Dec 2012 17:11:59 +0800

> 2012/12/19 Julian Anastasov <ja@ssi.bg>:
>>
>>         Hello,
>>
>> On Wed, 19 Dec 2012, RongQing Li wrote:
>>
>>> >>  static inline int IP_ECN_set_ce(struct iphdr *iph)
>>> >>  {
>>> >> -     u32 check = (__force u32)iph->check;
>>> >> -     u32 ecn = (iph->tos + 1) & INET_ECN_MASK;
>>> >> -
>>> >> -     /*
>>> >> -      * After the last operation we have (in binary):
>>> >> -      * INET_ECN_NOT_ECT => 01
>>> >> -      * INET_ECN_ECT_1   => 10
>>> >> -      * INET_ECN_ECT_0   => 11
>>> >> -      * INET_ECN_CE      => 00
>>> >> -      */
>>> >
>>> >         I think, the above comment explains how an
>>> > increment (iph->tos + 1) serves the purpose to check
>>> > for ECT_1 and ECT_0, there is no such thing as
>>> > addressing the next byte from header. It is just an
>>> > optimized logic that avoids complex INET_ECN_is_XXX
>>> > checks.
>>> Thanks for your reply.
>>> Do you mean this comment are valuable?
>>
>>         It looks better to me with the comment and the
>> original checks. But I can't comment the correctness of
>> the other changes in your patch.
> 
> I do not know how they are useful, and how the original check
> works, since the value in comments are wrong, the correct is:
> 
> enum {
>         INET_ECN_NOT_ECT = 0,
>         INET_ECN_ECT_1 = 1,
>         INET_ECN_ECT_0 = 2,
>         INET_ECN_CE = 3,
>         INET_ECN_MASK = 3,
> };
> 
> 
>    00: Non ECN-Capable Transport ― Non-ECT
>     10: ECN Capable Transport ― ECT(0)
>     01: ECN Capable Transport ― ECT(1)
>     11: Congestion Encountered ― CE

You really don't understand the comment, it is saying what
the incremented value corresponds to, ECN wise.

If iph->tos + 1 is 01, we had INET_ECN_NOT_ECT in iph->tos to
begine with, and so on an so forth.

Because you are having so much trouble with this most fundamental
aspect of this code, I have zero confidence in your being able to
make reasonable changes here.

I am not applying this patch.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Dec. 19, 2012, 4:14 p.m. UTC | #6
On Wed, 2012-12-19 at 14:21 +0800, roy.qing.li@gmail.com wrote:
> From: Li RongQing <roy.qing.li@gmail.com>
> 
> 1. ECN uses the two least significant (right-most) bits of the DiffServ
> field in the IPv4, so it should be in iph->tos, not in (iph->tos+1)
> 
> 2. When setting CE, we should check if ECN Capable Transport supports,
> both 10 and 01 mean ECN Capable Transport, so only check 10 is not enough
>     00: Non ECN-Capable Transport — Non-ECT
>     10: ECN Capable Transport — ECT(0)
>     01: ECN Capable Transport — ECT(1)
>     11: Congestion Encountered — CE
> 
> 3. Remove the misunderstand comment
> 
> 4. fix the checksum computation
> 
> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>

This is total crap.

Its perfectly clear to me and compiler generates fast code.

If you don't understand this code, please don't touch it.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/inet_ecn.h b/include/net/inet_ecn.h
index aab7375..545a683 100644
--- a/include/net/inet_ecn.h
+++ b/include/net/inet_ecn.h
@@ -73,27 +73,13 @@  static inline void INET_ECN_dontxmit(struct sock *sk)
 
 static inline int IP_ECN_set_ce(struct iphdr *iph)
 {
-	u32 check = (__force u32)iph->check;
-	u32 ecn = (iph->tos + 1) & INET_ECN_MASK;
-
-	/*
-	 * After the last operation we have (in binary):
-	 * INET_ECN_NOT_ECT => 01
-	 * INET_ECN_ECT_1   => 10
-	 * INET_ECN_ECT_0   => 11
-	 * INET_ECN_CE      => 00
-	 */
-	if (!(ecn & 2))
+	u32 ecn = iph->tos & INET_ECN_MASK;
+
+	if (INET_ECN_is_ce(ecn) || INET_ECN_is_not_ect(ecn))
 		return !ecn;
 
-	/*
-	 * The following gives us:
-	 * INET_ECN_ECT_1 => check += htons(0xFFFD)
-	 * INET_ECN_ECT_0 => check += htons(0xFFFE)
-	 */
-	check += (__force u16)htons(0xFFFB) + (__force u16)htons(ecn);
+	csum_replace2(&iph->check, iph->tos, iph->tos|INET_ECN_CE);
 
-	iph->check = (__force __sum16)(check + (check>=0xFFFF));
 	iph->tos |= INET_ECN_CE;
 	return 1;
 }