diff mbox

[PATCH] MIPS checksum fix

Message ID 20080919120752.GA19877@linux-mips.org
State Not Applicable, archived
Headers show

Commit Message

Ralf Baechle Sept. 19, 2008, 12:07 p.m. UTC
On Fri, Sep 19, 2008 at 01:47:43PM +0200, Ralf Baechle wrote:

> I'm interested in test reports of this on all sorts of configurations -
> 32-bit, 64-bit, big / little endian, R2 processors and pre-R2.  In
> particular Cavium being the only MIPS64 R2 implementation would be
> interesting.  This definately is stuff which should go upstream for 2.6.27.

There was a trivial bug in the R2 code.

From 97ad23f4696a322cb3bc379a25a8c0f6526751d6 Mon Sep 17 00:00:00 2001
From: Ralf Baechle <ralf@linux-mips.org>
Date: Fri, 19 Sep 2008 14:05:53 +0200
Subject: [PATCH] [MIPS] Fix 64-bit csum_partial, __csum_partial_copy_user and csum_partial_copy

On 64-bit machines it wouldn't handle a possible carry when adding the
32-bit folded checksum and checksum argument.

While at it, add a few trivial optimizations, also for R2 processors.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

Comments

Maciej W. Rozycki Sept. 19, 2008, 12:15 p.m. UTC | #1
On Fri, 19 Sep 2008, Ralf Baechle wrote:

> +	beqz	t7, 1f			/* odd buffer alignment? */
> +	 lui	v1, 0x00ff

 Well, .set reorder to move something from before the branch would have 
been a little bit better for the common aligned case. ;)  There is nothing 
special about branch delay slots in the whole epilogue, so one from just 
before the return might simply be relocated here.

  Maciej
Atsushi Nemoto Sept. 19, 2008, 2:09 p.m. UTC | #2
On Fri, 19 Sep 2008 14:07:52 +0200, Ralf Baechle <ralf@linux-mips.org> wrote:
> From 97ad23f4696a322cb3bc379a25a8c0f6526751d6 Mon Sep 17 00:00:00 2001
> From: Ralf Baechle <ralf@linux-mips.org>
> Date: Fri, 19 Sep 2008 14:05:53 +0200
> Subject: [PATCH] [MIPS] Fix 64-bit csum_partial, __csum_partial_copy_user and csum_partial_copy

... and __csum_partial_copy_nocheck, you mean? ;)

> On 64-bit machines it wouldn't handle a possible carry when adding the
> 32-bit folded checksum and checksum argument.
> 
> While at it, add a few trivial optimizations, also for R2 processors.

I think it would be better splitting bugfix and optimization.  This
code is too complex to do many things at a time, isn't it?

> @@ -53,12 +53,14 @@
>  #define UNIT(unit)  ((unit)*NBYTES)
>  
>  #define ADDC(sum,reg)						\
> -	.set	push;						\
> -	.set	noat;						\
>  	ADD	sum, reg;					\
>  	sltu	v1, sum, reg;					\
>  	ADD	sum, v1;					\
> -	.set	pop

Is this required?  Just a cleanup?

> @@ -254,8 +256,6 @@ LEAF(csum_partial)
>  1:	ADDC(sum, t1)
>  
>  	/* fold checksum */
> -	.set	push
> -	.set	noat
>  #ifdef USE_DOUBLE
>  	dsll32	v1, sum, 0
>  	daddu	sum, v1
> @@ -263,24 +263,25 @@ LEAF(csum_partial)
>  	dsra32	sum, sum, 0
>  	addu	sum, v1
>  #endif
> -	sll	v1, sum, 16
> -	addu	sum, v1
> -	sltu	v1, sum, v1
> -	srl	sum, sum, 16
> -	addu	sum, v1
>  
>  	/* odd buffer alignment? */
> -	beqz	t7, 1f
> -	 nop
> -	sll	v1, sum, 8
> +#ifdef CPU_MIPSR2
> +	wsbh	v1, sum	
> +	movn	sum, v1, t7
> +#else
> +	beqz	t7, 1f			/* odd buffer alignment? */
> +	 lui	v1, 0x00ff
> +	addu	v1, 0x00ff
> +	and	t0, sum, v1
> +	sll	t0, t0, 8
>  	srl	sum, sum, 8
> -	or	sum, v1
> -	andi	sum, 0xffff
> -	.set	pop
> +	and	sum, sum, v1
> +	or	sum, sum, t0
>  1:
> +#endif

Is this just an optimization?  or contain any fixes?
Maciej W. Rozycki Sept. 19, 2008, 3:02 p.m. UTC | #3
On Fri, 19 Sep 2008, Atsushi Nemoto wrote:

> I think it would be better splitting bugfix and optimization.  This
> code is too complex to do many things at a time, isn't it?

 It's probably easier for people to test a single patch now and it can
then be split at the commitment time.  Of course if something actually
breaks, then keeping changes combined won't help tracking down the cause.  
;)

  Maciej
Ralf Baechle Sept. 20, 2008, 3:09 p.m. UTC | #4
On Fri, Sep 19, 2008 at 11:09:52PM +0900, Atsushi Nemoto wrote:

> I think it would be better splitting bugfix and optimization.  This
> code is too complex to do many things at a time, isn't it?
> 
> > @@ -53,12 +53,14 @@
> >  #define UNIT(unit)  ((unit)*NBYTES)
> >  
> >  #define ADDC(sum,reg)						\
> > -	.set	push;						\
> > -	.set	noat;						\
> >  	ADD	sum, reg;					\
> >  	sltu	v1, sum, reg;					\
> >  	ADD	sum, v1;					\
> > -	.set	pop
> 
> Is this required?  Just a cleanup?

It papers over potencially important warnings so had to go.  I argue the
caller of ADDC should set noat mode, if at all.

  Ralf
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bryan Phillippe Sept. 23, 2008, 9:52 p.m. UTC | #5
On Sep 19, 2008, at 5:07 AM, Ralf Baechle wrote:

> On Fri, Sep 19, 2008 at 01:47:43PM +0200, Ralf Baechle wrote:
>
>> I'm interested in test reports of this on all sorts of  
>> configurations -
>> 32-bit, 64-bit, big / little endian, R2 processors and pre-R2.  In
>> particular Cavium being the only MIPS64 R2 implementation would be
>> interesting.  This definately is stuff which should go upstream for  
>> 2.6.27.
>
> There was a trivial bug in the R2 code.
>
>> From 97ad23f4696a322cb3bc379a25a8c0f6526751d6 Mon Sep 17 00:00:00  
>> 2001
> From: Ralf Baechle <ralf@linux-mips.org>
> Date: Fri, 19 Sep 2008 14:05:53 +0200
> Subject: [PATCH] [MIPS] Fix 64-bit csum_partial,  
> __csum_partial_copy_user and csum_partial_copy
>
> On 64-bit machines it wouldn't handle a possible carry when adding the
> 32-bit folded checksum and checksum argument.
>
> While at it, add a few trivial optimizations, also for R2 processors.

I tried this patch (with and without Atsushi's addition, shown below)  
on a MIPS64 today and the checksums were all bad (i.e. worse than the  
original problem).

Note that I had to manually create the diff, because of "malformed  
patch" errors at line 21 (second hunk).

If anyone would like to send me an updated unified diff for this  
issue, I can re-test today within the next day.

--
-bp

> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
>
> diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/ 
> csum_partial.S
> index 8d77841..c77a7a0 100644
> --- a/arch/mips/lib/csum_partial.S
> +++ b/arch/mips/lib/csum_partial.S
> @@ -53,12 +53,14 @@
> #define UNIT(unit)  ((unit)*NBYTES)
>
> #define ADDC(sum,reg)						\
> -	.set	push;						\
> -	.set	noat;						\
> 	ADD	sum, reg;					\
> 	sltu	v1, sum, reg;					\
> 	ADD	sum, v1;					\
> -	.set	pop
> +
> +#define ADDC32(sum,reg)						\
> +	addu	sum, reg;					\
> +	sltu	v1, sum, reg;					\
> +	addu	sum, v1;					\
>
> #define CSUM_BIGCHUNK1(src, offset, sum, _t0, _t1, _t2, _t3)	\
> 	LOAD	_t0, (offset + UNIT(0))(src);			\
> @@ -254,8 +256,6 @@ LEAF(csum_partial)
> 1:	ADDC(sum, t1)
>
> 	/* fold checksum */
> -	.set	push
> -	.set	noat
> #ifdef USE_DOUBLE
> 	dsll32	v1, sum, 0
> 	daddu	sum, v1
> @@ -263,24 +263,25 @@ LEAF(csum_partial)
> 	dsra32	sum, sum, 0
> 	addu	sum, v1
> #endif
> -	sll	v1, sum, 16
> -	addu	sum, v1
> -	sltu	v1, sum, v1
> -	srl	sum, sum, 16
> -	addu	sum, v1
>
> 	/* odd buffer alignment? */
> -	beqz	t7, 1f
> -	 nop
> -	sll	v1, sum, 8
> +#ifdef CPU_MIPSR2
> +	wsbh	v1, sum	
> +	movn	sum, v1, t7
> +#else
> +	beqz	t7, 1f			/* odd buffer alignment? */
> +	 lui	v1, 0x00ff
> +	addu	v1, 0x00ff
> +	and	t0, sum, v1
> +	sll	t0, t0, 8
> 	srl	sum, sum, 8
> -	or	sum, v1
> -	andi	sum, 0xffff
> -	.set	pop
> +	and	sum, sum, v1
> +	or	sum, sum, t0
> 1:
> +#endif
> 	.set	reorder
> 	/* Add the passed partial csum.  */
> -	ADDC(sum, a2)
> +	ADDC32(sum, a2)
> 	jr	ra
> 	.set	noreorder
> 	END(csum_partial)
> @@ -656,8 +657,6 @@ EXC(	sb	t0, NBYTES-2(dst), .Ls_exc)
> 	ADDC(sum, t2)
> .Ldone:
> 	/* fold checksum */
> -	.set	push
> -	.set	noat
> #ifdef USE_DOUBLE
> 	dsll32	v1, sum, 0
> 	daddu	sum, v1
> @@ -665,23 +664,23 @@ EXC(	sb	t0, NBYTES-2(dst), .Ls_exc)
> 	dsra32	sum, sum, 0
> 	addu	sum, v1
> #endif
> -	sll	v1, sum, 16
> -	addu	sum, v1
> -	sltu	v1, sum, v1
> -	srl	sum, sum, 16
> -	addu	sum, v1
>
> -	/* odd buffer alignment? */
> -	beqz	odd, 1f
> -	 nop
> -	sll	v1, sum, 8
> +#ifdef CPU_MIPSR2
> +	wsbh	v1, sum
> +	movn	sum, v1, odd
> +#else
> +	beqz	odd, 1f			/* odd buffer alignment? */
> +	 lui	v1, 0x00ff
> +	addu	v1, 0x00ff
> +	and	t0, sum, v1
> +	sll	t0, t0, 8
> 	srl	sum, sum, 8
> -	or	sum, v1
> -	andi	sum, 0xffff
> -	.set	pop
> +	and	sum, sum, v1
> +	or	sum, sum, t0
> 1:
> +#endif
> 	.set reorder
> -	ADDC(sum, psum)
> +	ADDC32(sum, psum)
> 	jr	ra
> 	.set noreorder
>
>



Begin forwarded message:

> From: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
> Date: September 19, 2008 8:43:19 AM PDT
> To: u1@terran.org
> Cc: macro@linux-mips.org, linux-mips@linux-mips.org
> Subject: Re: MIPS checksum bug
>
> On Fri, 19 Sep 2008 01:17:04 +0900 (JST), Atsushi Nemoto <anemo@mba.ocn.ne.jp 
> > wrote:
>> Thank you for testing.  Though this patch did not fixed your problem,
>> I still have a doubt on 64-bit optimization.
>>
>> If your hardware could run 32-bit kernel, could you confirm the
>> problem can happens in 32-bit too or not?
>
> I think I found possible breakage on 64-bit path.
>
> There are some "lw" (and "ulw") used in 64-bit path and they should be
> "lwu" (and "ulwu" ... but there is no such pseudo insn) to avoid
> sign-extention.
>
> Here is a completely untested patch.
>
> diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/ 
> csum_partial.S
> index 8d77841..40f9174 100644
> --- a/arch/mips/lib/csum_partial.S
> +++ b/arch/mips/lib/csum_partial.S
> @@ -39,12 +39,14 @@
> #ifdef USE_DOUBLE
>
> #define LOAD   ld
> +#define LOAD32 lwu
> #define ADD    daddu
> #define NBYTES 8
>
> #else
>
> #define LOAD   lw
> +#define LOAD32 lw
> #define ADD    addu
> #define NBYTES 4
>
> @@ -60,6 +62,14 @@
> 	ADD	sum, v1;					\
> 	.set	pop
>
> +#define ADDC32(sum,reg)						\
> +	.set	push;						\
> +	.set	noat;						\
> +	addu	sum, reg;					\
> +	sltu	v1, sum, reg;					\
> +	addu	sum, v1;					\
> +	.set	pop
> +
> #define CSUM_BIGCHUNK1(src, offset, sum, _t0, _t1, _t2, _t3)	\
> 	LOAD	_t0, (offset + UNIT(0))(src);			\
> 	LOAD	_t1, (offset + UNIT(1))(src);			\
> @@ -132,7 +142,7 @@ LEAF(csum_partial)
> 	beqz	t8, .Lqword_align
> 	 andi	t8, src, 0x8
>
> -	lw	t0, 0x00(src)
> +	LOAD32	t0, 0x00(src)
> 	LONG_SUBU	a1, a1, 0x4
> 	ADDC(sum, t0)
> 	PTR_ADDU	src, src, 0x4
> @@ -211,7 +221,7 @@ LEAF(csum_partial)
> 	LONG_SRL	t8, t8, 0x2
>
> .Lend_words:
> -	lw	t0, (src)
> +	LOAD32	t0, (src)
> 	LONG_SUBU	t8, t8, 0x1
> 	ADDC(sum, t0)
> 	.set	reorder				/* DADDI_WAR */
> @@ -229,6 +239,9 @@ LEAF(csum_partial)
>
> 	/* Still a full word to go  */
> 	ulw	t1, (src)
> +#ifdef USE_DOUBLE
> +	add	t1, zero	/* clear upper 32bit */
> +#endif
> 	PTR_ADDIU	src, 4
> 	ADDC(sum, t1)
>
> @@ -280,7 +293,7 @@ LEAF(csum_partial)
> 1:
> 	.set	reorder
> 	/* Add the passed partial csum.  */
> -	ADDC(sum, a2)
> +	ADDC32(sum, a2)
> 	jr	ra
> 	.set	noreorder
> 	END(csum_partial)
> @@ -681,7 +694,7 @@ EXC(	sb	t0, NBYTES-2(dst), .Ls_exc)
> 	.set	pop
> 1:
> 	.set reorder
> -	ADDC(sum, psum)
> +	ADDC32(sum, psum)
> 	jr	ra
> 	.set noreorder
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ralf Baechle Sept. 23, 2008, 10:06 p.m. UTC | #6
On Tue, Sep 23, 2008 at 02:52:24PM -0700, Bryan Phillippe wrote:

> If anyone would like to send me an updated unified diff for this issue, I 
> can re-test today within the next day.

An updated fix is already in the lmo and kernel.org git repositories.

  Ralf
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/csum_partial.S
index 8d77841..c77a7a0 100644
--- a/arch/mips/lib/csum_partial.S
+++ b/arch/mips/lib/csum_partial.S
@@ -53,12 +53,14 @@ 
 #define UNIT(unit)  ((unit)*NBYTES)
 
 #define ADDC(sum,reg)						\
-	.set	push;						\
-	.set	noat;						\
 	ADD	sum, reg;					\
 	sltu	v1, sum, reg;					\
 	ADD	sum, v1;					\
-	.set	pop
+
+#define ADDC32(sum,reg)						\
+	addu	sum, reg;					\
+	sltu	v1, sum, reg;					\
+	addu	sum, v1;					\
 
 #define CSUM_BIGCHUNK1(src, offset, sum, _t0, _t1, _t2, _t3)	\
 	LOAD	_t0, (offset + UNIT(0))(src);			\
@@ -254,8 +256,6 @@  LEAF(csum_partial)
 1:	ADDC(sum, t1)
 
 	/* fold checksum */
-	.set	push
-	.set	noat
 #ifdef USE_DOUBLE
 	dsll32	v1, sum, 0
 	daddu	sum, v1
@@ -263,24 +263,25 @@  LEAF(csum_partial)
 	dsra32	sum, sum, 0
 	addu	sum, v1
 #endif
-	sll	v1, sum, 16
-	addu	sum, v1
-	sltu	v1, sum, v1
-	srl	sum, sum, 16
-	addu	sum, v1
 
 	/* odd buffer alignment? */
-	beqz	t7, 1f
-	 nop
-	sll	v1, sum, 8
+#ifdef CPU_MIPSR2
+	wsbh	v1, sum	
+	movn	sum, v1, t7
+#else
+	beqz	t7, 1f			/* odd buffer alignment? */
+	 lui	v1, 0x00ff
+	addu	v1, 0x00ff
+	and	t0, sum, v1
+	sll	t0, t0, 8
 	srl	sum, sum, 8
-	or	sum, v1
-	andi	sum, 0xffff
-	.set	pop
+	and	sum, sum, v1
+	or	sum, sum, t0
 1:
+#endif
 	.set	reorder
 	/* Add the passed partial csum.  */
-	ADDC(sum, a2)
+	ADDC32(sum, a2)
 	jr	ra
 	.set	noreorder
 	END(csum_partial)
@@ -656,8 +657,6 @@  EXC(	sb	t0, NBYTES-2(dst), .Ls_exc)
 	ADDC(sum, t2)
 .Ldone:
 	/* fold checksum */
-	.set	push
-	.set	noat
 #ifdef USE_DOUBLE
 	dsll32	v1, sum, 0
 	daddu	sum, v1
@@ -665,23 +664,23 @@  EXC(	sb	t0, NBYTES-2(dst), .Ls_exc)
 	dsra32	sum, sum, 0
 	addu	sum, v1
 #endif
-	sll	v1, sum, 16
-	addu	sum, v1
-	sltu	v1, sum, v1
-	srl	sum, sum, 16
-	addu	sum, v1
 
-	/* odd buffer alignment? */
-	beqz	odd, 1f
-	 nop
-	sll	v1, sum, 8
+#ifdef CPU_MIPSR2
+	wsbh	v1, sum
+	movn	sum, v1, odd
+#else
+	beqz	odd, 1f			/* odd buffer alignment? */
+	 lui	v1, 0x00ff
+	addu	v1, 0x00ff
+	and	t0, sum, v1
+	sll	t0, t0, 8
 	srl	sum, sum, 8
-	or	sum, v1
-	andi	sum, 0xffff
-	.set	pop
+	and	sum, sum, v1
+	or	sum, sum, t0
 1:
+#endif
 	.set reorder
-	ADDC(sum, psum)
+	ADDC32(sum, psum)
 	jr	ra
 	.set noreorder