diff mbox series

[RESEND,v4,1/8] bitops: Introduce the for_each_set_clump macro

Message ID f5250b9c78377e476667be1b616c86cf573b57dc.1538441919.git.vilhelm.gray@gmail.com
State New
Headers show
Series Introduce the for_each_set_clump macro | expand

Commit Message

William Breathitt Gray Oct. 2, 2018, 1:13 a.m. UTC
This macro iterates for each group of bits (clump) with set bits, within
a bitmap memory region. For each iteration, "clump" is set to the found
clump index, "index" is set to the word index of the bitmap containing
the found clump, and "offset" is set to the bit offset of the found
clump within the respective bitmap word.

Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com>
---
 include/asm-generic/bitops/find.h |  9 +++++++
 include/linux/bitops.h            |  7 ++++++
 lib/find_bit.c                    | 40 +++++++++++++++++++++++++++++++
 3 files changed, 56 insertions(+)

Comments

Rasmus Villemoes Oct. 2, 2018, 7:42 a.m. UTC | #1
On 2018-10-02 03:13, William Breathitt Gray wrote:
> This macro iterates for each group of bits (clump) with set bits, within
> a bitmap memory region. For each iteration, "clump" is set to the found
> clump index, "index" is set to the word index of the bitmap containing
> the found clump, and "offset" is set to the bit offset of the found
> clump within the respective bitmap word.

I can't say I'm a fan. It seems rather clumsy and ad-hoc - though I do
see how it matches the code you replace in drivers/gpio/. When I
initially read the cover letter, I assumed that one would get a sequence
of 4-bit values, but one has to dig the actual value out of the bitmap
afterwards using all of index, offset and a mask computed from clump_size.

> +
> +/**
> + * find_next_clump - find next clump with set bits in a memory region
> + * @index: location to store bitmap word index of found clump
> + * @offset: bits offset of the found clump within the respective bitmap word
> + * @bits: address to base the search on
> + * @size: bitmap size in number of clumps

That's a rather inconvenient unit, no? And rather easy to get wrong, I
can easily see people passing nbits instead.

I think you could reduce the number of arguments to this helper and the
macro, while getting rid of some confusion: Drop index and offset, let
clump_index be start_index and measured in bit positions (like
find_next_bit et al), and let the return value also be a bit position.
And instead of index and offset, have another unsigned long* output
parameter that gives the actual value at [return value:return
value+clump_size]. IOW, I think the prototype should be close to
find_next_bit, except that in case of "clumps", there's an extra piece
of information to return.

> + * @clump_index: clump index at which to start searching
> + * @clump_size: clump size in bits
> + *
> + * Returns the clump index for the next clump with set bits; the respective
> + * bitmap word index is stored at the location pointed by @index, and the bits
> + * offset of the found clump within the respective bitmap word is stored at the
> + * location pointed by @offset. If no bits are set, returns @size.
> + */
> +size_t find_next_clump(size_t *const index, unsigned int *const offset,
> +		       const unsigned long *const bits, const size_t size,
> +		       const size_t clump_index, const unsigned int clump_size)
> +{
> +	size_t i;
> +	unsigned int bits_offset;
> +	unsigned long word_mask;
> +	const unsigned long clump_mask = GENMASK(clump_size - 1, 0);
> +
> +	for (i = clump_index; i < size; i++) {
> +		bits_offset = i * clump_size;
> +
> +		*index = BIT_WORD(bits_offset);
> +		*offset = bits_offset % BITS_PER_LONG;
> +
> +		word_mask = bits[*index] & (clump_mask << *offset);
> +		if (!word_mask)
> +			continue;

The cover letter says

  The clump_size argument can be an arbitrary number of bits and is not
  required to be a multiple of 2.

by which I assume you mean "power of 2", but either way, the above code
does not seem to take into account the case where bits_offset +
clump_size straddles a word boundary, so it wouldn't work for a
clump_size that does not divide BITS_PER_LONG.

May I suggest another approach:

unsigned long bitmap_get_value(const unsigned long *bitmap, unsigned
start, unsigned width): Get the value of bitmap[start:start+width] for
1<=width<=BITS_PER_LONG (it's up to the caller to ensure this is within
the defined region). That can almost be an inline

bitmap_get_value(const unsigned long *bitmap, unsigned start, unsigned
width)
{
  unsigned idx = BIT_WORD(start);
  unsigned offset = start % BITS_PER_LONG;
  unsigned long lower = bitmap[idx] >> offset;
  unsigned long upper = offset <= BITS_PER_LONG - width ? 0 :
bitmap[idx+1] << (BITS_PER_LONG - offset);
  return (lower | upper) & GENMASK(width-1, 0)
}

Then you can implement the for_each_set_clump by a (IMO) more readable

  for (i = 0, start = 0; i < num_ports; i++, start += gpio_reg_size) {
    word_mask = bitmap_get_value(mask, start, gpio_reg_size);
    if (!word_mask)
      continue;
    ...
  }

Or, if you do want find_next_clump/for_each_set_clump, that can be
implemented in terms of this.

Rasmus
Andy Shevchenko Oct. 2, 2018, 8:21 a.m. UTC | #2
On Tue, Oct 02, 2018 at 09:42:48AM +0200, Rasmus Villemoes wrote:
> On 2018-10-02 03:13, William Breathitt Gray wrote:

> The cover letter says
> 
>   The clump_size argument can be an arbitrary number of bits and is not
>   required to be a multiple of 2.
> 
> by which I assume you mean "power of 2", but either way, the above code
> does not seem to take into account the case where bits_offset +
> clump_size straddles a word boundary, so it wouldn't work for a
> clump_size that does not divide BITS_PER_LONG.

E.g. 3 bits in a clump? Hmm...

Why would we need that? I mean some real use case?

> May I suggest another approach:

You may, of course, but see above and my comments below.

> unsigned long bitmap_get_value(const unsigned long *bitmap, unsigned
> start, unsigned width): Get the value of bitmap[start:start+width] for
> 1<=width<=BITS_PER_LONG (it's up to the caller to ensure this is within
> the defined region). That can almost be an inline
> 
> bitmap_get_value(const unsigned long *bitmap, unsigned start, unsigned
> width)
> {
>   unsigned idx = BIT_WORD(start);
>   unsigned offset = start % BITS_PER_LONG;
>   unsigned long lower = bitmap[idx] >> offset;
>   unsigned long upper = offset <= BITS_PER_LONG - width ? 0 :
> bitmap[idx+1] << (BITS_PER_LONG - offset);
>   return (lower | upper) & GENMASK(width-1, 0)
> }
> 
> Then you can implement the for_each_set_clump by a (IMO) more readable
> 
>   for (i = 0, start = 0; i < num_ports; i++, start += gpio_reg_size) {
>     word_mask = bitmap_get_value(mask, start, gpio_reg_size);
>     if (!word_mask)
>       continue;
>     ...
>   }

I would rather go with two prototypes to get()/set() a clump in the bitmap
in a way when it's aligned and BITS_PER_LONG % clump_size == 0.

unsigned long bitmap_get_clump(unsigned long *src, unsigned int start, unsigned int clump_size)
{
	unsigned int index = BIT_WORD(start);
	unsigned int offset = start % BITS_PER_LONG;


	/* These just for spelling the restrictions */
	WARN_ON(BITS_PER_LONG % clump_size);
	WARN_ON(offset % clump_size);

	/* TODO: take care of clump_size == 64 */
	return (bitmap[index] >> offset) & GENMASK(clump_size - 1, 0);
}

Something similar with set with additional parameter unsigned long value
which has MSB cleared till we reach [clump_size - 1 : 0].
Andy Shevchenko Oct. 3, 2018, 11:48 a.m. UTC | #3
On Tue, Oct 02, 2018 at 11:21:42AM +0300, Andy Shevchenko wrote:

> I would rather go with two prototypes to get()/set() a clump in the bitmap
> in a way when it's aligned and BITS_PER_LONG % clump_size == 0.

To make things much easier, restrict clump_size to the one
from the following set:

1, 2, 4, 8, 16, 32 even on 64-bit platforms.

If it would be simpler solution to add 64 here (implying 32-bit platform),
I would vote for that.

For the generic case we might need something like:

unsigned long bitmap_get_bits(unsigned long *src, unsigned int start, unsigned int nbits)
{
	assert(nbits > BITS_PER_LONG);

	/* Something like Rasmus proposed earlier */
}

And similar to setter.
William Breathitt Gray Oct. 4, 2018, 10:03 a.m. UTC | #4
On Tue, Oct 02, 2018 at 09:42:48AM +0200, Rasmus Villemoes wrote:
> On 2018-10-02 03:13, William Breathitt Gray wrote:
> > This macro iterates for each group of bits (clump) with set bits, within
> > a bitmap memory region. For each iteration, "clump" is set to the found
> > clump index, "index" is set to the word index of the bitmap containing
> > the found clump, and "offset" is set to the bit offset of the found
> > clump within the respective bitmap word.
> 
> I can't say I'm a fan. It seems rather clumsy and ad-hoc - though I do
> see how it matches the code you replace in drivers/gpio/. When I
> initially read the cover letter, I assumed that one would get a sequence
> of 4-bit values, but one has to dig the actual value out of the bitmap
> afterwards using all of index, offset and a mask computed from clump_size.

Yes, that is because this macro is as you noted primarily a replacement
for the repetitive code used in the GPIO drivers; the GPIO drivers
require the index and offset in order to modify and store the requested
bit values and perform port I/O.

I put this macro up in the bitops code, but perhaps I should have left
it local to the GPIO subsystem since its so specific. This is one aspect
I want to determine: whether to keep this macro here or move it.

> > +
> > +/**
> > + * find_next_clump - find next clump with set bits in a memory region
> > + * @index: location to store bitmap word index of found clump
> > + * @offset: bits offset of the found clump within the respective bitmap word
> > + * @bits: address to base the search on
> > + * @size: bitmap size in number of clumps
> 
> That's a rather inconvenient unit, no? And rather easy to get wrong, I
> can easily see people passing nbits instead.
> 
> I think you could reduce the number of arguments to this helper and the
> macro, while getting rid of some confusion: Drop index and offset, let
> clump_index be start_index and measured in bit positions (like
> find_next_bit et al), and let the return value also be a bit position.
> And instead of index and offset, have another unsigned long* output
> parameter that gives the actual value at [return value:return
> value+clump_size]. IOW, I think the prototype should be close to
> find_next_bit, except that in case of "clumps", there's an extra piece
> of information to return.

There may be benefit to develop a different macro more aligned with the
rest of the bitops code -- one where we do in fact return the direct
4-bit value for example. Essentially all the GPIO drivers need are the
index for the hardware I/O port and the index for the bitmap to store
the bits.

So we may be able to reimplement the for_each_set_clump to utilize a
simplier macro that returns the clump value, and then determine index
and offset up in the for_each_set_clump macro; that way we can keep the
generic clump value return code isolated from the code needed by the
GPIO drivers.
 
> > + * @clump_index: clump index at which to start searching
> > + * @clump_size: clump size in bits
> > + *
> > + * Returns the clump index for the next clump with set bits; the respective
> > + * bitmap word index is stored at the location pointed by @index, and the bits
> > + * offset of the found clump within the respective bitmap word is stored at the
> > + * location pointed by @offset. If no bits are set, returns @size.
> > + */
> > +size_t find_next_clump(size_t *const index, unsigned int *const offset,
> > +		       const unsigned long *const bits, const size_t size,
> > +		       const size_t clump_index, const unsigned int clump_size)
> > +{
> > +	size_t i;
> > +	unsigned int bits_offset;
> > +	unsigned long word_mask;
> > +	const unsigned long clump_mask = GENMASK(clump_size - 1, 0);
> > +
> > +	for (i = clump_index; i < size; i++) {
> > +		bits_offset = i * clump_size;
> > +
> > +		*index = BIT_WORD(bits_offset);
> > +		*offset = bits_offset % BITS_PER_LONG;
> > +
> > +		word_mask = bits[*index] & (clump_mask << *offset);
> > +		if (!word_mask)
> > +			continue;
> 
> The cover letter says
> 
>   The clump_size argument can be an arbitrary number of bits and is not
>   required to be a multiple of 2.
> 
> by which I assume you mean "power of 2", but either way, the above code
> does not seem to take into account the case where bits_offset +
> clump_size straddles a word boundary, so it wouldn't work for a
> clump_size that does not divide BITS_PER_LONG.

Ah, you are correct, if clump_size does not divide evenly into
BITS_PER_LONG then the macro skips the portion of bits that reside
across the boundary. This is an unintentional behavior that will need to
be fixed. I didn't notice this since the GPIO drivers utilizing the
macro so far have all used a clump_size that divides cleanly.

> 
> May I suggest another approach:
> 
> unsigned long bitmap_get_value(const unsigned long *bitmap, unsigned
> start, unsigned width): Get the value of bitmap[start:start+width] for
> 1<=width<=BITS_PER_LONG (it's up to the caller to ensure this is within
> the defined region). That can almost be an inline
> 
> bitmap_get_value(const unsigned long *bitmap, unsigned start, unsigned
> width)
> {
>   unsigned idx = BIT_WORD(start);
>   unsigned offset = start % BITS_PER_LONG;
>   unsigned long lower = bitmap[idx] >> offset;
>   unsigned long upper = offset <= BITS_PER_LONG - width ? 0 :
> bitmap[idx+1] << (BITS_PER_LONG - offset);
>   return (lower | upper) & GENMASK(width-1, 0)
> }
> 
> Then you can implement the for_each_set_clump by a (IMO) more readable
> 
>   for (i = 0, start = 0; i < num_ports; i++, start += gpio_reg_size) {
>     word_mask = bitmap_get_value(mask, start, gpio_reg_size);
>     if (!word_mask)
>       continue;
>     ...
>   }
> 
> Or, if you do want find_next_clump/for_each_set_clump, that can be
> implemented in terms of this.
> 
> Rasmus

This might work. All that would need to change to support the GPIO
drivers is to return BIT_WORD(start) as index and offset as (start %
BITS_PER_LONG). These sets can be performed outside of bitmap_get_value,
thus allowing it to have a simplier interface for code that does not
require index/offset.

William Breathitt Gray
William Breathitt Gray Oct. 4, 2018, 10:30 a.m. UTC | #5
On Tue, Oct 02, 2018 at 11:21:42AM +0300, Andy Shevchenko wrote:
> On Tue, Oct 02, 2018 at 09:42:48AM +0200, Rasmus Villemoes wrote:
> > On 2018-10-02 03:13, William Breathitt Gray wrote:
> 
> > The cover letter says
> > 
> >   The clump_size argument can be an arbitrary number of bits and is not
> >   required to be a multiple of 2.
> > 
> > by which I assume you mean "power of 2", but either way, the above code
> > does not seem to take into account the case where bits_offset +
> > clump_size straddles a word boundary, so it wouldn't work for a
> > clump_size that does not divide BITS_PER_LONG.
> 
> E.g. 3 bits in a clump? Hmm...
> 
> Why would we need that? I mean some real use case?

GPIOs in hardware may be routed to devices logically in groups of I/O
lines, yet must still be accessed via the word-sized registers on the
operating machine.

For example, suppose a GPIO card is used to control a set of shower
devices. The card supports 4 shower devices, each device controlled by 3
lines of I/O: enable, hot-cold selection, high-low pressure selection.
In this case, a operating machine would still have to access the GPIO
lines via the I/O registers (e.g. 8-bit port I/O); but with a macro
handling a clump size of 3-bits, we can loop logically by each shower
device which is much simpler from a driver perspective.

William Breathitt Gray
William Breathitt Gray Oct. 4, 2018, 10:36 a.m. UTC | #6
On Wed, Oct 03, 2018 at 02:48:04PM +0300, Andy Shevchenko wrote:
> On Tue, Oct 02, 2018 at 11:21:42AM +0300, Andy Shevchenko wrote:
> 
> > I would rather go with two prototypes to get()/set() a clump in the bitmap
> > in a way when it's aligned and BITS_PER_LONG % clump_size == 0.
> 
> To make things much easier, restrict clump_size to the one
> from the following set:
> 
> 1, 2, 4, 8, 16, 32 even on 64-bit platforms.
> 
> If it would be simpler solution to add 64 here (implying 32-bit platform),
> I would vote for that.
> 
> For the generic case we might need something like:
> 
> unsigned long bitmap_get_bits(unsigned long *src, unsigned int start, unsigned int nbits)
> {
> 	assert(nbits > BITS_PER_LONG);
> 
> 	/* Something like Rasmus proposed earlier */
> }
> 
> And similar to setter.
> 
> 
> -- 
> With Best Regards,
> Andy Shevchenko

I have no objections to have a simplier macro for these common clump
sizes -- afterall, I suspect most drivers will likely use clump sizes
that are powers of 2 anyway. It would be nice to have a more versatile
macro though for those drivers that would benefit from odd clump sizes,
but we can perhaps postpone that until the need arises (the GPIO drivers
in this patchset all use a power of 2).

William Breathitt Gray
Andy Shevchenko Oct. 4, 2018, 12:10 p.m. UTC | #7
On Thu, Oct 04, 2018 at 07:36:20PM +0900, William Breathitt Gray wrote:
> On Wed, Oct 03, 2018 at 02:48:04PM +0300, Andy Shevchenko wrote:
> > On Tue, Oct 02, 2018 at 11:21:42AM +0300, Andy Shevchenko wrote:
> > 
> > > I would rather go with two prototypes to get()/set() a clump in the bitmap
> > > in a way when it's aligned and BITS_PER_LONG % clump_size == 0.
> > 
> > To make things much easier, restrict clump_size to the one
> > from the following set:
> > 
> > 1, 2, 4, 8, 16, 32 even on 64-bit platforms.
> > 
> > If it would be simpler solution to add 64 here (implying 32-bit platform),
> > I would vote for that.
> > 
> > For the generic case we might need something like:
> > 
> > unsigned long bitmap_get_bits(unsigned long *src, unsigned int start, unsigned int nbits)
> > {
> > 	assert(nbits > BITS_PER_LONG);
> > 
> > 	/* Something like Rasmus proposed earlier */
> > }
> > 
> > And similar to setter.
> > 
> > 
> > -- 
> > With Best Regards,
> > Andy Shevchenko
> 
> I have no objections to have a simplier macro for these common clump
> sizes -- afterall, I suspect most drivers will likely use clump sizes
> that are powers of 2 anyway. It would be nice to have a more versatile
> macro though for those drivers that would benefit from odd clump sizes,
> but we can perhaps postpone that until the need arises (the GPIO drivers
> in this patchset all use a power of 2).

Yes, this is my point of view: don't produce additional complexity to some
which has no users (yet).

When we would really have groups out of an odd bit number, we may reconsider.
diff mbox series

Patch

diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h
index 8a1ee10014de..3d3b2fc34908 100644
--- a/include/asm-generic/bitops/find.h
+++ b/include/asm-generic/bitops/find.h
@@ -2,6 +2,8 @@ 
 #ifndef _ASM_GENERIC_BITOPS_FIND_H_
 #define _ASM_GENERIC_BITOPS_FIND_H_
 
+#include <linux/types.h>
+
 #ifndef find_next_bit
 /**
  * find_next_bit - find the next set bit in a memory region
@@ -80,4 +82,11 @@  extern unsigned long find_first_zero_bit(const unsigned long *addr,
 
 #endif /* CONFIG_GENERIC_FIND_FIRST_BIT */
 
+size_t find_next_clump(size_t *const index, unsigned int *const offset,
+		       const unsigned long *const bits, const size_t size,
+		       const size_t clump_index, const unsigned int clump_size);
+
+#define find_first_clump(index, offset, bits, size, clump_size) \
+	find_next_clump((index), (offset), (bits), (size), 0, (clump_size))
+
 #endif /*_ASM_GENERIC_BITOPS_FIND_H_ */
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 7ddb1349394d..089381017f74 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -40,6 +40,13 @@  extern unsigned long __sw_hweight64(__u64 w);
 	     (bit) < (size);					\
 	     (bit) = find_next_zero_bit((addr), (size), (bit) + 1))
 
+#define for_each_set_clump(clump, index, offset, bits, size, clump_size) \
+	for ((clump) = find_first_clump(&(index), &(offset), (bits), (size), \
+					(clump_size)); \
+	     (clump) < (size); \
+	     (clump) = find_next_clump(&(index), &(offset), (bits), (size), \
+				       (clump) + 1, (clump_size)))
+
 static inline int get_bitmask_order(unsigned int count)
 {
 	int order;
diff --git a/lib/find_bit.c b/lib/find_bit.c
index ee3df93ba69a..1d547fe9304f 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -218,3 +218,43 @@  EXPORT_SYMBOL(find_next_bit_le);
 #endif
 
 #endif /* __BIG_ENDIAN */
+
+/**
+ * find_next_clump - find next clump with set bits in a memory region
+ * @index: location to store bitmap word index of found clump
+ * @offset: bits offset of the found clump within the respective bitmap word
+ * @bits: address to base the search on
+ * @size: bitmap size in number of clumps
+ * @clump_index: clump index at which to start searching
+ * @clump_size: clump size in bits
+ *
+ * Returns the clump index for the next clump with set bits; the respective
+ * bitmap word index is stored at the location pointed by @index, and the bits
+ * offset of the found clump within the respective bitmap word is stored at the
+ * location pointed by @offset. If no bits are set, returns @size.
+ */
+size_t find_next_clump(size_t *const index, unsigned int *const offset,
+		       const unsigned long *const bits, const size_t size,
+		       const size_t clump_index, const unsigned int clump_size)
+{
+	size_t i;
+	unsigned int bits_offset;
+	unsigned long word_mask;
+	const unsigned long clump_mask = GENMASK(clump_size - 1, 0);
+
+	for (i = clump_index; i < size; i++) {
+		bits_offset = i * clump_size;
+
+		*index = BIT_WORD(bits_offset);
+		*offset = bits_offset % BITS_PER_LONG;
+
+		word_mask = bits[*index] & (clump_mask << *offset);
+		if (!word_mask)
+			continue;
+
+		return i;
+	}
+
+	return size;
+}
+EXPORT_SYMBOL(find_next_clump);