[RFC] new byteorder primitives - ..._{replace,get}_bits()

Message ID	20171211155422.GA12326@ZenIV.linux.org.uk
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> Date: Mon, 11 Dec 2017 15:54:22 +0000 From: Al Viro <viro@ZenIV.linux.org.uk> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC][PATCH] new byteorder primitives - ..._{replace,get}_bits() Message-ID: <20171211155422.GA12326@ZenIV.linux.org.uk> References: <20171210045326.GO21978@ZenIV.linux.org.uk> <420a198d-61f8-81cf-646d-10446cb41def@synopsys.com> <20171211050520.GV21978@ZenIV.linux.org.uk> <20171211053803.GW21978@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171211053803.GW21978@ZenIV.linux.org.uk> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: netdev-owner@vger.kernel.org Precedence: bulk
Series	[RFC] new byteorder primitives - ..._{replace,get}_bits() \| expand [RFC] new byteorder primitives - ..._{replace,get}_bits()

Al Viro Dec. 11, 2017, 3:54 p.m. UTC

A lot of drivers are open-coding the "replace these bits in __be32 with
the following value" kind of primitives.  Let's add them to byteorder.h.

Primitives:
	{be,le}{16,32,64}_replace_bits(old, v, bit, nbits)
	{be,le}{16,32,64}_get_bits(val, bit, nbits)

Essentially, it gives helpers for work with bitfields in fixed-endian.
Suppose we have e.g. a little-endian 32bit value with fixed layout;
expressing that as a bitfield would go like
	struct foo {
		unsigned foo:4;		/* bits 0..3 */
		unsigned :2;
		unsigned bar:12;	/* bits 6..17 */
		unsigned baz:14;	/* bits 18..31 */
	}
Even for host-endian it doesn't work all that well - you end up with
ifdefs in structure definition and generated code stinks.  For fixed-endian
it gets really painful, and people tend to use explicit shift-and-mask
kind of macros for accessing the fields (and often enough get the
endianness conversions wrong, at that).  With these primitives

struct foo v		<=>	__le32 v
v.foo = i ? 1 : 2	<=>	v = le32_replace_bits(v, i ? 1 : 2, 0, 4)
f(4 + v.baz)		<=>	f(4 + le32_get_bits(v, 18, 14))

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---

Jakub Kicinski Dec. 12, 2017, 4:02 a.m. UTC | #1

On Mon, 11 Dec 2017 15:54:22 +0000, Al Viro wrote:
> Essentially, it gives helpers for work with bitfields in fixed-endian.
> Suppose we have e.g. a little-endian 32bit value with fixed layout;
> expressing that as a bitfield would go like
> 	struct foo {
> 		unsigned foo:4;		/* bits 0..3 */
> 		unsigned :2;
> 		unsigned bar:12;	/* bits 6..17 */
> 		unsigned baz:14;	/* bits 18..31 */
> 	}
> Even for host-endian it doesn't work all that well - you end up with
> ifdefs in structure definition and generated code stinks.  For fixed-endian
> it gets really painful, and people tend to use explicit shift-and-mask
> kind of macros for accessing the fields (and often enough get the
> endianness conversions wrong, at that).  With these primitives
> 
> struct foo v		<=>	__le32 v
> v.foo = i ? 1 : 2	<=>	v = le32_replace_bits(v, i ? 1 : 2, 0, 4)
> f(4 + v.baz)		<=>	f(4 + le32_get_bits(v, 18, 14))

Looks very useful.  The [start bit, size] pair may not land itself
too nicely to creating defines, though.  Which is why in
include/linux/bitfield.h we tried to use a shifted mask and work
backwards from that single value what the start and size are.  commit
3e9b3112ec74 ("add basic register-field manipulation macros") has the
description.  Could a similar trick perhaps be applicable here?

Al Viro Dec. 12, 2017, 6:20 a.m. UTC | #2

On Mon, Dec 11, 2017 at 08:02:24PM -0800, Jakub Kicinski wrote:
> On Mon, 11 Dec 2017 15:54:22 +0000, Al Viro wrote:
> > Essentially, it gives helpers for work with bitfields in fixed-endian.
> > Suppose we have e.g. a little-endian 32bit value with fixed layout;
> > expressing that as a bitfield would go like
> > 	struct foo {
> > 		unsigned foo:4;		/* bits 0..3 */
> > 		unsigned :2;
> > 		unsigned bar:12;	/* bits 6..17 */
> > 		unsigned baz:14;	/* bits 18..31 */
> > 	}
> > Even for host-endian it doesn't work all that well - you end up with
> > ifdefs in structure definition and generated code stinks.  For fixed-endian
> > it gets really painful, and people tend to use explicit shift-and-mask
> > kind of macros for accessing the fields (and often enough get the
> > endianness conversions wrong, at that).  With these primitives
> > 
> > struct foo v		<=>	__le32 v
> > v.foo = i ? 1 : 2	<=>	v = le32_replace_bits(v, i ? 1 : 2, 0, 4)
> > f(4 + v.baz)		<=>	f(4 + le32_get_bits(v, 18, 14))
> 
> Looks very useful.  The [start bit, size] pair may not land itself
> too nicely to creating defines, though.  Which is why in
> include/linux/bitfield.h we tried to use a shifted mask and work
> backwards from that single value what the start and size are.  commit
> 3e9b3112ec74 ("add basic register-field manipulation macros") has the
> description.  Could a similar trick perhaps be applicable here?

Umm...  What's wrong with

#define FIELD_FOO 0,4
#define FIELD_BAR 6,12
#define FIELD_BAZ 18,14

A macro can bloody well expand to any sequence of tokens - le32_get_bits(v, FIELD_BAZ)
will become le32_get_bits(v, 18, 14) just fine.  What's the problem with that?

Al Viro Dec. 12, 2017, 7:45 p.m. UTC | #3

On Tue, Dec 12, 2017 at 06:20:02AM +0000, Al Viro wrote:

> Umm...  What's wrong with
> 
> #define FIELD_FOO 0,4
> #define FIELD_BAR 6,12
> #define FIELD_BAZ 18,14
> 
> A macro can bloody well expand to any sequence of tokens - le32_get_bits(v, FIELD_BAZ)
> will become le32_get_bits(v, 18, 14) just fine.  What's the problem with that?

FWIW, if you want to use the mask, __builtin_ffsll() is not the only way to do
it - you don't need the shift.  Multiplier would do just as well, and that can
be had easier.  If mask = (2*a + 1)<<n = ((2*a)<<n) ^ (1<<n), then
	mask - 1 = ((2*a) << n) + ((1<<n) - 1) = ((2*n) << n) ^ ((1<<n) - 1)
	mask ^ (mask - 1) = (1<<n) + ((1<<n) - 1)
and
	mask & (mask ^ (mask - 1)) = 1<<n.

IOW, with

static __always_inline u64 mask_to_multiplier(u64 mask)
{
	return mask & (mask ^ (mask - 1));
}

we could do

static __always_inline __le64 le64_replace_bits(__le64 old, u64 v, u64 mask)
{
	__le64 m = cpu_to_le64(mask);
	return (old & ~m) | (cpu_to_le64(v * mask_to_multiplier(mask)) & m);
}

static __always_inline u64 le64_get_bits(__le64 v, u64 mask)
{
	return (le64_to_cpu(v) & mask) / mask_to_multiplier(mask);
}

etc.  Compiler will turn those into shifts...  I can live with either calling
conventions.

Comments?

Jakub Kicinski Dec. 12, 2017, 8:04 p.m. UTC | #4

On Tue, 12 Dec 2017 19:45:32 +0000, Al Viro wrote:
> On Tue, Dec 12, 2017 at 06:20:02AM +0000, Al Viro wrote:
> 
> > Umm...  What's wrong with
> > 
> > #define FIELD_FOO 0,4
> > #define FIELD_BAR 6,12
> > #define FIELD_BAZ 18,14
> > 
> > A macro can bloody well expand to any sequence of tokens - le32_get_bits(v, FIELD_BAZ)
> > will become le32_get_bits(v, 18, 14) just fine.  What's the problem with that?  
> 
> FWIW, if you want to use the mask, __builtin_ffsll() is not the only way to do
> it - you don't need the shift.  Multiplier would do just as well, and that can
> be had easier.  If mask = (2*a + 1)<<n = ((2*a)<<n) ^ (1<<n), then
> 	mask - 1 = ((2*a) << n) + ((1<<n) - 1) = ((2*n) << n) ^ ((1<<n) - 1)
> 	mask ^ (mask - 1) = (1<<n) + ((1<<n) - 1)
> and
> 	mask & (mask ^ (mask - 1)) = 1<<n.
> 
> IOW, with
> 
> static __always_inline u64 mask_to_multiplier(u64 mask)
> {
> 	return mask & (mask ^ (mask - 1));
> }
> 
> we could do
> 
> static __always_inline __le64 le64_replace_bits(__le64 old, u64 v, u64 mask)
> {
> 	__le64 m = cpu_to_le64(mask);
> 	return (old & ~m) | (cpu_to_le64(v * mask_to_multiplier(mask)) & m);
> }
> 
> static __always_inline u64 le64_get_bits(__le64 v, u64 mask)
> {
> 	return (le64_to_cpu(v) & mask) / mask_to_multiplier(mask);
> }
> 
> etc.  Compiler will turn those into shifts...  I can live with either calling
> conventions.
> 
> Comments?

Very nice!  The compilation-time check if the value can fit in a field
covered by the mask (if they're both known) did help me catch bugs
early a few times over the years, so if it could be preserved we can
maybe even drop the FIELD_* macros and just use this approach?

Al Viro Dec. 12, 2017, 11:48 p.m. UTC | #5

On Tue, Dec 12, 2017 at 12:04:09PM -0800, Jakub Kicinski wrote:

> > static __always_inline u64 mask_to_multiplier(u64 mask)
> > {
> > 	return mask & (mask ^ (mask - 1));
> > }

D'oh.  Even simpler than that, of course -

static __always_inline u64 mask_to_multiplier(u64 mask)
{
 	return mask & -mask;
}

> Very nice!  The compilation-time check if the value can fit in a field
> covered by the mask (if they're both known) did help me catch bugs
> early a few times over the years, so if it could be preserved we can
> maybe even drop the FIELD_* macros and just use this approach?

Umm...  Something like this, perhaps?  Same bunch, plus u{16,32,64}_...
variants for host-endian.  Adding sanity check on mask is also not
hard, but I don't know how useful it actually is...

diff --git a/include/linux/byteorder/generic.h b/include/linux/byteorder/generic.h
index 451aaa0786ae..a032de9aa03d 100644
--- a/include/linux/byteorder/generic.h
+++ b/include/linux/byteorder/generic.h
@@ -187,4 +187,36 @@ static inline void be32_to_cpu_array(u32 *dst, const __be32 *src, size_t len)
 		dst[i] = be32_to_cpu(src[i]);
 }
 
+extern void __compiletime_error("value doesn't fit into mask")
+__field_overflow(void);
+static __always_inline u64 mask_to_multiplier(u64 mask)
+{
+	return mask & -mask;
+}
+
+#define ____MAKE_OP(type,base,to,from)					\
+static __always_inline __##type type##_replace_bits(__##type old,	\
+					base val, base mask)		\
+{									\
+	__##type m = to(mask);						\
+        if (__builtin_constant_p(val) &&				\
+		    (val & ~(mask/mask_to_multiplier(mask))))		\
+				    __field_overflow();			\
+	return (old & ~m) |						\
+		(to(val * mask_to_multiplier(mask)) & m);		\
+}									\
+static __always_inline base type##_get_bits(__##type v, base mask)	\
+{									\
+	return (from(v) & mask)/mask_to_multiplier(mask);		\
+}
+#define __MAKE_OP(size)							\
+	____MAKE_OP(le##size,u##size,cpu_to_le##size,le##size##_to_cpu)	\
+	____MAKE_OP(be##size,u##size,cpu_to_be##size,be##size##_to_cpu)	\
+	____MAKE_OP(u##size,u##size,,)
+__MAKE_OP(16)
+__MAKE_OP(32)
+__MAKE_OP(64)
+#undef __MAKE_OP
+#undef ____MAKE_OP
+
 #endif /* _LINUX_BYTEORDER_GENERIC_H */

Jakub Kicinski Dec. 12, 2017, 11:59 p.m. UTC | #6

On Tue, 12 Dec 2017 23:48:56 +0000, Al Viro wrote:
> On Tue, Dec 12, 2017 at 12:04:09PM -0800, Jakub Kicinski wrote:
> 
> > > static __always_inline u64 mask_to_multiplier(u64 mask)
> > > {
> > > 	return mask & (mask ^ (mask - 1));
> > > }  
> 
> D'oh.  Even simpler than that, of course -
> 
> static __always_inline u64 mask_to_multiplier(u64 mask)
> {
>  	return mask & -mask;
> }
> 
> > Very nice!  The compilation-time check if the value can fit in a field
> > covered by the mask (if they're both known) did help me catch bugs
> > early a few times over the years, so if it could be preserved we can
> > maybe even drop the FIELD_* macros and just use this approach?  
> 
> Umm...  Something like this, perhaps?  Same bunch, plus u{16,32,64}_...
> variants for host-endian.  Adding sanity check on mask is also not
> hard, but I don't know how useful it actually is...

Sanity checking the mask is not that useful in my experience.

> diff --git a/include/linux/byteorder/generic.h b/include/linux/byteorder/generic.h
> index 451aaa0786ae..a032de9aa03d 100644
> --- a/include/linux/byteorder/generic.h
> +++ b/include/linux/byteorder/generic.h
> @@ -187,4 +187,36 @@ static inline void be32_to_cpu_array(u32 *dst, const __be32 *src, size_t len)
>  		dst[i] = be32_to_cpu(src[i]);
>  }
>  
> +extern void __compiletime_error("value doesn't fit into mask")
> +__field_overflow(void);
> +static __always_inline u64 mask_to_multiplier(u64 mask)
> +{
> +	return mask & -mask;
> +}
> +
> +#define ____MAKE_OP(type,base,to,from)					\
> +static __always_inline __##type type##_replace_bits(__##type old,	\
> +					base val, base mask)		\
> +{									\
> +	__##type m = to(mask);						\
> +        if (__builtin_constant_p(val) &&				\

Is the lack of a __builtin_constant_p(mask) test intentional?  Sometimes
the bitfield is a packed array and people may have a helper to which
only the mask is passed as non-constant and the value is implied by the
helper, thus constant.

> +		    (val & ~(mask/mask_to_multiplier(mask))))		\
> +				    __field_overflow();			\
> +	return (old & ~m) |						\
> +		(to(val * mask_to_multiplier(mask)) & m);		\
> +}									\
> +static __always_inline base type##_get_bits(__##type v, base mask)	\
> +{									\
> +	return (from(v) & mask)/mask_to_multiplier(mask);		\
> +}
> +#define __MAKE_OP(size)							\
> +	____MAKE_OP(le##size,u##size,cpu_to_le##size,le##size##_to_cpu)	\
> +	____MAKE_OP(be##size,u##size,cpu_to_be##size,be##size##_to_cpu)	\
> +	____MAKE_OP(u##size,u##size,,)
> +__MAKE_OP(16)
> +__MAKE_OP(32)
> +__MAKE_OP(64)
> +#undef __MAKE_OP
> +#undef ____MAKE_OP
> +
>  #endif /* _LINUX_BYTEORDER_GENERIC_H */

Al Viro Dec. 13, 2017, 12:36 a.m. UTC | #7

On Tue, Dec 12, 2017 at 03:59:33PM -0800, Jakub Kicinski wrote:
> > +static __always_inline __##type type##_replace_bits(__##type old,	\
> > +					base val, base mask)		\
> > +{									\
> > +	__##type m = to(mask);						\
> > +        if (__builtin_constant_p(val) &&				\
> 
> Is the lack of a __builtin_constant_p(mask) test intentional?  Sometimes
> the bitfield is a packed array and people may have a helper to which
> only the mask is passed as non-constant and the value is implied by the
> helper, thus constant.

If the mask in non-constant, we probably shouldn't be using that at all;
could you show a real-world example where that would be the case?

Jakub Kicinski Dec. 13, 2017, 1:04 a.m. UTC | #8

On Wed, 13 Dec 2017 00:36:59 +0000, Al Viro wrote:
> On Tue, Dec 12, 2017 at 03:59:33PM -0800, Jakub Kicinski wrote:
> > > +static __always_inline __##type type##_replace_bits(__##type old,	\
> > > +					base val, base mask)		\
> > > +{									\
> > > +	__##type m = to(mask);						\
> > > +        if (__builtin_constant_p(val) &&				\  
> > 
> > Is the lack of a __builtin_constant_p(mask) test intentional?  Sometimes
> > the bitfield is a packed array and people may have a helper to which
> > only the mask is passed as non-constant and the value is implied by the
> > helper, thus constant.  
> 
> If the mask in non-constant, we probably shouldn't be using that at all;
> could you show a real-world example where that would be the case?

FIELD_* macros explicitly forbid this, since the code would be...
suboptimal with the runtime ffsl.  Real life examples are the hackish
macro NFP_ETH_SET_BIT_CONFIG() in

drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c

I remember there was also some Renesas code..  maybe this:

https://patchwork.kernel.org/patch/9881279/

it looks like cpg_z_clk_recalc_rate() and cpg_z2_clk_recalc_rate() only
differ in mask.

Al Viro Dec. 13, 2017, 1:30 a.m. UTC | #9

On Tue, Dec 12, 2017 at 05:04:37PM -0800, Jakub Kicinski wrote:
> On Wed, 13 Dec 2017 00:36:59 +0000, Al Viro wrote:
> > On Tue, Dec 12, 2017 at 03:59:33PM -0800, Jakub Kicinski wrote:
> > > > +static __always_inline __##type type##_replace_bits(__##type old,	\
> > > > +					base val, base mask)		\
> > > > +{									\
> > > > +	__##type m = to(mask);						\
> > > > +        if (__builtin_constant_p(val) &&				\  
> > > 
> > > Is the lack of a __builtin_constant_p(mask) test intentional?  Sometimes
> > > the bitfield is a packed array and people may have a helper to which
> > > only the mask is passed as non-constant and the value is implied by the
> > > helper, thus constant.  
> > 
> > If the mask in non-constant, we probably shouldn't be using that at all;
> > could you show a real-world example where that would be the case?
> 
> FIELD_* macros explicitly forbid this, since the code would be...
> suboptimal with the runtime ffsl.  Real life examples are the hackish
> macro NFP_ETH_SET_BIT_CONFIG() in
> 
> drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c

Why not simply make nfp_eth_set_bit_config() static inline and be
done with that?  It's not that large and there are only few callers,
so...

> I remember there was also some Renesas code..  maybe this:
> 
> https://patchwork.kernel.org/patch/9881279/
> 
> it looks like cpg_z_clk_recalc_rate() and cpg_z2_clk_recalc_rate() only
> differ in mask.

*shrug*

That thing would expand into "reg &= 15" in one case and "reg >>= 8; reg &= 15"
in another.  Either of which is cheaper than a function call, and definitely
cheaper than any kind of dynamic calculation of shift, no matter how implemented.

Jakub Kicinski Dec. 13, 2017, 1:35 a.m. UTC | #10

On Wed, 13 Dec 2017 01:30:56 +0000, Al Viro wrote:
> On Tue, Dec 12, 2017 at 05:04:37PM -0800, Jakub Kicinski wrote:
> > On Wed, 13 Dec 2017 00:36:59 +0000, Al Viro wrote:  
> > > On Tue, Dec 12, 2017 at 03:59:33PM -0800, Jakub Kicinski wrote:  
> > > > > +static __always_inline __##type type##_replace_bits(__##type old,	\
> > > > > +					base val, base mask)		\
> > > > > +{									\
> > > > > +	__##type m = to(mask);						\
> > > > > +        if (__builtin_constant_p(val) &&				\    
> > > > 
> > > > Is the lack of a __builtin_constant_p(mask) test intentional?  Sometimes
> > > > the bitfield is a packed array and people may have a helper to which
> > > > only the mask is passed as non-constant and the value is implied by the
> > > > helper, thus constant.    
> > > 
> > > If the mask in non-constant, we probably shouldn't be using that at all;
> > > could you show a real-world example where that would be the case?  
> > 
> > FIELD_* macros explicitly forbid this, since the code would be...
> > suboptimal with the runtime ffsl.  Real life examples are the hackish
> > macro NFP_ETH_SET_BIT_CONFIG() in
> > 
> > drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c  
> 
> Why not simply make nfp_eth_set_bit_config() static inline and be
> done with that?  It's not that large and there are only few callers,
> so...

It used to be __always_inline, but apparently LLVM/clang doesn't
propagate constants :(  

4e59532541c8 ("nfp: don't depend on compiler constant propagation")

> > I remember there was also some Renesas code..  maybe this:
> > 
> > https://patchwork.kernel.org/patch/9881279/
> > 
> > it looks like cpg_z_clk_recalc_rate() and cpg_z2_clk_recalc_rate() only
> > differ in mask.  
> 
> *shrug*
> 
> That thing would expand into "reg &= 15" in one case and "reg >>= 8; reg &= 15"
> in another.  Either of which is cheaper than a function call, and definitely
> cheaper than any kind of dynamic calculation of shift, no matter how implemented.

Al Viro Dec. 13, 2017, 1:51 a.m. UTC | #11

On Tue, Dec 12, 2017 at 05:35:28PM -0800, Jakub Kicinski wrote:

> It used to be __always_inline, but apparently LLVM/clang doesn't
> propagate constants :(  
> 
> 4e59532541c8 ("nfp: don't depend on compiler constant propagation")

Doesn't propagate constants or doesn't have exact same set of
rules for __builtin_constant_p()?  IOW, if you dropped that
BUILD_BUG_ON(), what would be left after optimizations?

Jakub Kicinski Dec. 13, 2017, 2:44 a.m. UTC | #12

On Wed, 13 Dec 2017 01:51:25 +0000, Al Viro wrote:
> On Tue, Dec 12, 2017 at 05:35:28PM -0800, Jakub Kicinski wrote:
> 
> > It used to be __always_inline, but apparently LLVM/clang doesn't
> > propagate constants :(  
> > 
> > 4e59532541c8 ("nfp: don't depend on compiler constant propagation")  
> 
> Doesn't propagate constants or doesn't have exact same set of
> rules for __builtin_constant_p()?  IOW, if you dropped that
> BUILD_BUG_ON(), what would be left after optimizations?

Hm.  You're right.  It just doesn't recognize the parameter as constant
in __builtin_constant_p().  I haven't compiled the actual code, but
here is a trivial test:

18:36 ~$ clang --version
clang version 4.0.1 (tags/RELEASE_401/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

18:36 ~$ cat /tmp/test.c
#include <stdio.h>

static inline int test_thing(unsigned int a)
{
	printf("const: %d\n", __builtin_constant_p(a));
	if (a > 10)
		return a - 1;
	return a;
}

int main()
{
	printf("const: %d\n", __builtin_constant_p(0));
	return test_thing(0);
}
18:36 ~$ clang -g /tmp/test.c -O2 -Wall -W -o /tmp/test
18:36 ~$ /tmp/test 
const: 1
const: 0
18:36 ~$ objdump -S /tmp/test
...
00000000004004e0 <main>:
	return a;
}

int main()
{
	printf("const: %d\n", __builtin_constant_p(0));
  4004e0:	50                   	push   %rax
  4004e1:	bf a0 05 40 00       	mov    $0x4005a0,%edi
  4004e6:	be 01 00 00 00       	mov    $0x1,%esi
  4004eb:	31 c0                	xor    %eax,%eax
  4004ed:	e8 fe fe ff ff       	callq  4003f0 <printf@plt>
	printf("const: %d\n", __builtin_constant_p(a));
  4004f2:	bf a0 05 40 00       	mov    $0x4005a0,%edi
  4004f7:	31 f6                	xor    %esi,%esi
  4004f9:	31 c0                	xor    %eax,%eax
  4004fb:	e8 f0 fe ff ff       	callq  4003f0 <printf@plt>
	return test_thing(0);
  400500:	31 c0                	xor    %eax,%eax
  400502:	59                   	pop    %rcx
  400503:	c3                   	retq   
  400504:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  40050b:	00 00 00 
  40050e:	66 90                	xchg   %ax,%ax

...

Al Viro Dec. 13, 2017, 2:22 p.m. UTC | #13

On Tue, Dec 12, 2017 at 06:44:00PM -0800, Jakub Kicinski wrote:
> On Wed, 13 Dec 2017 01:51:25 +0000, Al Viro wrote:
> > On Tue, Dec 12, 2017 at 05:35:28PM -0800, Jakub Kicinski wrote:
> > 
> > > It used to be __always_inline, but apparently LLVM/clang doesn't
> > > propagate constants :(  
> > > 
> > > 4e59532541c8 ("nfp: don't depend on compiler constant propagation")  
> > 
> > Doesn't propagate constants or doesn't have exact same set of
> > rules for __builtin_constant_p()?  IOW, if you dropped that
> > BUILD_BUG_ON(), what would be left after optimizations?
> 
> Hm.  You're right.  It just doesn't recognize the parameter as constant
> in __builtin_constant_p().

FWIW, clang does propagate them well enough to detect and optimize
multiplication by constant power of two.  With the variant I've posted.

The check on field overflow (which works on gcc builds) does nothing on
clang - __builtin_constant_p() gives a flat-out false for arguments of
inline function there.  I can understand their reasoning, even though
it's inconvenient in cases like this one - semantics of __builtin_constant_p()
is a fucking mess and the less you rely upon its details, the safer you
are...

Hell knows - a part of that can be recovered by taking the check into a
wrapper; as in

static __always_inline __le32_replace_bits(__le32 old, u32 v, u32 mask)
{
	__le32 m = cpu_to_le32(mask);
	return (old & ~m) | cpu_to_le32(v * mask/mask_to_multiplier(mask)));
}

#define le32_replace_bits(old, v, mask)	({			\
	typeof(v) ____v = (v);					\
	typeof(mask) ____m = (mask);				\
	if (__builtin_constant_p(____v))			\
		if (v & ~(____m / mask_to_multiplier(____m)))	\
			__field_overflow();			\
	__l32_replace_bits(old, ____v, ____m);			\
})

That would give that sanity check a better coverage on clang builds, but...
does it really buy us enough to bother, especially since all those macros
would have to be spelled out; you can have a macro expanding to definition
of static inline, but you can't have a macro expanding to anything that
would contain a preprocessor directive, including #define.  And with that
kind of sanity checks, the first build with gcc will catch everything
missed by clang builds anyway.

IMO it's not worth the trouble; let's go with the check inside of
static inline and accept that on clang builds it'll do nothing.

Next question: where do we put that bunch?  I've put it into
linux/byteorder/generic.h, so that anything picking fixed-endian primitives
would pick those as well; I hadn't thought of linux/bitfield.h at the time.
We certainly could put it there instead - it's never pulled by other headers,
so adding #include <asm/byteorder.h> into linux/bitfield.h is not going to
cause header order problems.  Not sure...

Linus, do you have any preferences in that area?

Al Viro Dec. 13, 2017, 5:45 p.m. UTC | #14

On Wed, Dec 13, 2017 at 02:22:12PM +0000, Al Viro wrote:

> Next question: where do we put that bunch?  I've put it into
> linux/byteorder/generic.h, so that anything picking fixed-endian primitives
> would pick those as well; I hadn't thought of linux/bitfield.h at the time.
> We certainly could put it there instead - it's never pulled by other headers,
> so adding #include <asm/byteorder.h> into linux/bitfield.h is not going to
> cause header order problems.  Not sure...
> 
> Linus, do you have any preferences in that area?

After looking at some of the callers of bitfield.h stuff: it might be useful
to add

static inline void le64p_replace_bits(__le64 *p, u64 v, u64 mask)
{
	__le64 m = cpu_to_le64(mask);
	*p = (*p & ~m) | (cpu_to_le64(v * mask_to_multiplier(mask)) & m);
}

and similar for other types.  Not sure what would be a good name for
host-endian variants - u64p_replace_bits() sounds a bit clumsy.  Suggestions?

Jakub Kicinski Dec. 13, 2017, 7:04 p.m. UTC | #15

On Wed, 13 Dec 2017 14:22:12 +0000, Al Viro wrote:
> IMO it's not worth the trouble; let's go with the check inside of
> static inline and accept that on clang builds it'll do nothing.

Ack, thanks for investigating!

[RFC] new byteorder primitives - ..._{replace,get}_bits()

Commit Message

Comments

Patch