diff mbox

[v3,2/2] powerpc: add support for csum_add()

Message ID 1d1362c8aa696e316d3ba97dce2342df6f6ee6cf.1432047904.git.christophe.leroy@c-s.fr (mailing list archive)
State Accepted
Delegated to: Scott Wood
Headers show

Commit Message

Christophe Leroy May 19, 2015, 3:18 p.m. UTC
The C version of csum_add() as defined in include/net/checksum.h gives the
following assembly in ppc32:
       0:       7c 04 1a 14     add     r0,r4,r3
       4:       7c 64 00 10     subfc   r3,r4,r0
       8:       7c 63 19 10     subfe   r3,r3,r3
       c:       7c 63 00 50     subf    r3,r3,r0
and the following in ppc64:
   0xc000000000001af8 <+0>:	add     r3,r3,r4
   0xc000000000001afc <+4>:	cmplw   cr7,r3,r4
   0xc000000000001b00 <+8>:	mfcr    r4
   0xc000000000001b04 <+12>:	rlwinm  r4,r4,29,31,31
   0xc000000000001b08 <+16>:	add     r3,r4,r3
   0xc000000000001b0c <+20>:	clrldi  r3,r3,32
   0xc000000000001b10 <+24>:	blr

include/net/checksum.h also offers the possibility to define an arch specific
function.
This patch provides a specific csum_add() inline function.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/checksum.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Comments

David Laight May 22, 2015, 3:57 p.m. UTC | #1
From: Linuxppc-dev Christophe Leroy

> Sent: 19 May 2015 16:19

...
> diff --git a/arch/powerpc/include/asm/checksum.h b/arch/powerpc/include/asm/checksum.h

> index 5e43d2d..e8d9ef4 100644

> --- a/arch/powerpc/include/asm/checksum.h

> +++ b/arch/powerpc/include/asm/checksum.h

> @@ -130,6 +130,22 @@ static inline __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr,

>  	return csum_fold(csum_tcpudp_nofold(saddr, daddr, len, proto, sum));

>  }

> 

> +#define HAVE_ARCH_CSUM_ADD

> +static inline __wsum csum_add(__wsum csum, __wsum addend)

> +{

> +#ifdef __powerpc64__

> +	u64 res = (__force u64)csum;

> +

> +	res += (__force u64)addend;

> +	return (__force __wsum)((u32)res + (res >> 32));

> +#else

> +	asm("addc %0,%0,%1;"

> +	    "addze %0,%0;"

> +	    : "+r" (csum) : "r" (addend));

> +	return csum;

> +#endif


I'd have thought it better to test for the cpu type where you want the
'asm' variant, and then fall back on the C version for all others.
I know (well suspect) there are only two cases here.

I'd also have thought that the 64bit C version above would be generally 'good'.

	David
Scott Wood May 22, 2015, 7:32 p.m. UTC | #2
On Fri, 2015-05-22 at 15:57 +0000, David Laight wrote:
> From: Linuxppc-dev Christophe Leroy
> > Sent: 19 May 2015 16:19
> ...
> > diff --git a/arch/powerpc/include/asm/checksum.h b/arch/powerpc/include/asm/checksum.h
> > index 5e43d2d..e8d9ef4 100644
> > --- a/arch/powerpc/include/asm/checksum.h
> > +++ b/arch/powerpc/include/asm/checksum.h
> > @@ -130,6 +130,22 @@ static inline __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr,
> >  	return csum_fold(csum_tcpudp_nofold(saddr, daddr, len, proto, sum));
> >  }
> > 
> > +#define HAVE_ARCH_CSUM_ADD
> > +static inline __wsum csum_add(__wsum csum, __wsum addend)
> > +{
> > +#ifdef __powerpc64__
> > +	u64 res = (__force u64)csum;
> > +
> > +	res += (__force u64)addend;
> > +	return (__force __wsum)((u32)res + (res >> 32));
> > +#else
> > +	asm("addc %0,%0,%1;"
> > +	    "addze %0,%0;"
> > +	    : "+r" (csum) : "r" (addend));
> > +	return csum;
> > +#endif
> 
> I'd have thought it better to test for the cpu type where you want the
> 'asm' variant, and then fall back on the C version for all others.
> I know (well suspect) there are only two cases here.

Usually it's more readable to see "if (x) ... else ..." than "if (!
x) ... else ..." and 64-bit is what has a symbol defined.

> I'd also have thought that the 64bit C version above would be generally 'good'.

It doesn't generate the addc/addze sequence.  At least with GCC 4.8.2,
it does something like:

	mr	tmp0, csum
	li	tmp1, 0
	li	tmp2, 0
	addc	tmp3, addend, tmp0
	adde	csum, tmp2, tmp1
	add	csum, csum, tmp3

-Scott
Segher Boessenkool May 22, 2015, 9:39 p.m. UTC | #3
On Fri, May 22, 2015 at 02:32:42PM -0500, Scott Wood wrote:
> > I'd also have thought that the 64bit C version above would be generally 'good'.
> 
> It doesn't generate the addc/addze sequence.  At least with GCC 4.8.2,
> it does something like:
> 
> 	mr	tmp0, csum
> 	li	tmp1, 0
> 	li	tmp2, 0
> 	addc	tmp3, addend, tmp0
> 	adde	csum, tmp2, tmp1
> 	add	csum, csum, tmp3

Right.  Don't expect older compilers to do sane things here.

All this begs a question...  If it is worth spending so much time
micro-optimising this, why not pick the low-hanging fruit first?
Having a 32-bit accumulator for ones' complement sums, on a 64-bit
system, is not such a great idea.


Segher
Scott Wood May 22, 2015, 9:54 p.m. UTC | #4
On Fri, 2015-05-22 at 16:39 -0500, Segher Boessenkool wrote:
> On Fri, May 22, 2015 at 02:32:42PM -0500, Scott Wood wrote:
> > > I'd also have thought that the 64bit C version above would be generally 'good'.
> > 
> > It doesn't generate the addc/addze sequence.  At least with GCC 4.8.2,
> > it does something like:
> > 
> > 	mr	tmp0, csum
> > 	li	tmp1, 0
> > 	li	tmp2, 0
> > 	addc	tmp3, addend, tmp0
> > 	adde	csum, tmp2, tmp1
> > 	add	csum, csum, tmp3
> 
> Right.  Don't expect older compilers to do sane things here.
> 
> All this begs a question...  If it is worth spending so much time
> micro-optimising this, why not pick the low-hanging fruit first?
> Having a 32-bit accumulator for ones' complement sums, on a 64-bit
> system, is not such a great idea.

That would be a more intrusive change -- not (comparatively) low-hanging
fruit.  Plus, the person submitting these patches is focused on 32-bit.

-Scott
David Laight May 26, 2015, 1:57 p.m. UTC | #5
From: Scott Wood ...

> > I'd also have thought that the 64bit C version above would be generally 'good'.

> 

> It doesn't generate the addc/addze sequence.  At least with GCC 4.8.2,

> it does something like:

> 

> 	mr	tmp0, csum

> 	li	tmp1, 0

> 	li	tmp2, 0

> 	addc	tmp3, addend, tmp0

> 	adde	csum, tmp2, tmp1

> 	add	csum, csum, tmp3


I was thinking of all 64bit targets, not 32bit ones.

	David
Scott Wood May 26, 2015, 7:42 p.m. UTC | #6
On Tue, 2015-05-26 at 13:57 +0000, David Laight wrote:
> From: Scott Wood ...
> > > I'd also have thought that the 64bit C version above would be 
> > > generally 'good'.
> > 
> > It doesn't generate the addc/addze sequence.  At least with GCC 
> > 4.8.2,
> > it does something like:
> > 
> >     mr      tmp0, csum
> >     li      tmp1, 0
> >     li      tmp2, 0
> >     addc    tmp3, addend, tmp0
> >     adde    csum, tmp2, tmp1
> >     add     csum, csum, tmp3
> 
> I was thinking of all 64bit targets, not 32bit ones.

Oh, you mean move it out of arch/powerpc?  Sounds reasonable, but 
someone should probably check what the resulting code looks like on 
other common arches.  OTOH, if we're going to modify non-arch code, 
that might be a good opportunity to implement Segher's suggestion and 
move to a 64-bit accumulator.

-Scott
David Laight May 27, 2015, 8:41 a.m. UTC | #7
From: Scott Wood

> Sent: 26 May 2015 20:43

...
> > I was thinking of all 64bit targets, not 32bit ones.

> 

> Oh, you mean move it out of arch/powerpc?  Sounds reasonable, but

> someone should probably check what the resulting code looks like on

> other common arches.  OTOH, if we're going to modify non-arch code,

> that might be a good opportunity to implement Segher's suggestion and

> move to a 64-bit accumulator.


Or more likely: adding alternate 32bit words to different 64-bit
registers in order to reduce the dependency chains further.
I'm sure I've seen a version that does that somewhere.

	David
diff mbox

Patch

diff --git a/arch/powerpc/include/asm/checksum.h b/arch/powerpc/include/asm/checksum.h
index 5e43d2d..e8d9ef4 100644
--- a/arch/powerpc/include/asm/checksum.h
+++ b/arch/powerpc/include/asm/checksum.h
@@ -130,6 +130,22 @@  static inline __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr,
 	return csum_fold(csum_tcpudp_nofold(saddr, daddr, len, proto, sum));
 }
 
+#define HAVE_ARCH_CSUM_ADD
+static inline __wsum csum_add(__wsum csum, __wsum addend)
+{
+#ifdef __powerpc64__
+	u64 res = (__force u64)csum;
+
+	res += (__force u64)addend;
+	return (__force __wsum)((u32)res + (res >> 32));
+#else
+	asm("addc %0,%0,%1;"
+	    "addze %0,%0;"
+	    : "+r" (csum) : "r" (addend));
+	return csum;
+#endif
+}
+
 #endif
 #endif /* __KERNEL__ */
 #endif