diff mbox

[3/3] perf: Use 64-bit value when comparing sample_regs

Message ID 1394080919-17957-4-git-send-email-sukadev@linux.vnet.ibm.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Sukadev Bhattiprolu March 6, 2014, 4:41 a.m. UTC
When checking whether a bit representing a register is set in
sample_regs, a 64-bit mask, use 64-bit value (1LL).

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 tools/perf/util/unwind.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

David Laight March 6, 2014, 9:44 a.m. UTC | #1
From: Sukadev Bhattiprolu

> When checking whether a bit representing a register is set in

> sample_regs, a 64-bit mask, use 64-bit value (1LL).

> 

> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>

> ---

>  tools/perf/util/unwind.c |    4 ++--

>  1 file changed, 2 insertions(+), 2 deletions(-)

> 

> diff --git a/tools/perf/util/unwind.c b/tools/perf/util/unwind.c

> index 742f23b..2b888c6 100644

> --- a/tools/perf/util/unwind.c

> +++ b/tools/perf/util/unwind.c

> @@ -396,11 +396,11 @@ static int reg_value(unw_word_t *valp, struct regs_dump *regs, int id,

>  {

>  	int i, idx = 0;

> 

> -	if (!(sample_regs & (1 << id)))

> +	if (!(sample_regs & (1LL << id)))

>  		return -EINVAL;

> 

>  	for (i = 0; i < id; i++) {

> -		if (sample_regs & (1 << i))

> +		if (sample_regs & (1LL << i))

>  			idx++;

>  	}


There are much faster ways to count the number of set bits, especially
if you might need to check a significant number of bits.
There might even be a function defined somewhere to do it.
Basically you just add up the bits, for 16 bit it would be:
	val = (val & 0x5555) + (val >> 1) & 0x5555;
	val = (val & 0x3333) + (val >> 2) & 0x3333;
	val = (val & 0x0f0f) + (val >> 4) & 0x0f0f;
	val = (val & 0x00ff) + (val >> 8) & 0x00ff;
As the size of the work increases the improvement is more significant.
(Some of the later masking can probably be proven unnecessary.)

	David
Gabriel Paubert March 6, 2014, 11:33 a.m. UTC | #2
On Thu, Mar 06, 2014 at 09:44:47AM +0000, David Laight wrote:
> From: Sukadev Bhattiprolu
> > When checking whether a bit representing a register is set in
> > sample_regs, a 64-bit mask, use 64-bit value (1LL).
> > 
> > Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> > ---
> >  tools/perf/util/unwind.c |    4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/perf/util/unwind.c b/tools/perf/util/unwind.c
> > index 742f23b..2b888c6 100644
> > --- a/tools/perf/util/unwind.c
> > +++ b/tools/perf/util/unwind.c
> > @@ -396,11 +396,11 @@ static int reg_value(unw_word_t *valp, struct regs_dump *regs, int id,
> >  {
> >  	int i, idx = 0;
> > 
> > -	if (!(sample_regs & (1 << id)))
> > +	if (!(sample_regs & (1LL << id)))
> >  		return -EINVAL;
> > 
> >  	for (i = 0; i < id; i++) {
> > -		if (sample_regs & (1 << i))
> > +		if (sample_regs & (1LL << i))
> >  			idx++;
> >  	}
> 
> There are much faster ways to count the number of set bits, especially
> if you might need to check a significant number of bits.
> There might even be a function defined somewhere to do it.

Indeed, look for Hamming weight (hweight family of functions)
in asm/hweight.h and what is included from there.

Besides that, many modern processors also have a machine instruction
to perform this task. In the processor manuals the instruction is 
described as population count and the mnemonic starts with "popcnt"
on x86 and ppc.

	Gabriel

> Basically you just add up the bits, for 16 bit it would be:
> 	val = (val & 0x5555) + (val >> 1) & 0x5555;
> 	val = (val & 0x3333) + (val >> 2) & 0x3333;
> 	val = (val & 0x0f0f) + (val >> 4) & 0x0f0f;
> 	val = (val & 0x00ff) + (val >> 8) & 0x00ff;
> As the size of the work increases the improvement is more significant.
> (Some of the later masking can probably be proven unnecessary.)
> 
> 	David
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
Jiri Olsa March 6, 2014, 5:06 p.m. UTC | #3
On Thu, Mar 06, 2014 at 12:33:32PM +0100, Gabriel Paubert wrote:
> On Thu, Mar 06, 2014 at 09:44:47AM +0000, David Laight wrote:
> > From: Sukadev Bhattiprolu
> > > When checking whether a bit representing a register is set in
> > > sample_regs, a 64-bit mask, use 64-bit value (1LL).
> > > 
> > > Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> > > ---
> > >  tools/perf/util/unwind.c |    4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/tools/perf/util/unwind.c b/tools/perf/util/unwind.c
> > > index 742f23b..2b888c6 100644
> > > --- a/tools/perf/util/unwind.c
> > > +++ b/tools/perf/util/unwind.c
> > > @@ -396,11 +396,11 @@ static int reg_value(unw_word_t *valp, struct regs_dump *regs, int id,
> > >  {
> > >  	int i, idx = 0;
> > > 
> > > -	if (!(sample_regs & (1 << id)))
> > > +	if (!(sample_regs & (1LL << id)))
> > >  		return -EINVAL;
> > > 
> > >  	for (i = 0; i < id; i++) {
> > > -		if (sample_regs & (1 << i))
> > > +		if (sample_regs & (1LL << i))
> > >  			idx++;
> > >  	}
> > 
> > There are much faster ways to count the number of set bits, especially
> > if you might need to check a significant number of bits.
> > There might even be a function defined somewhere to do it.
> 
> Indeed, look for Hamming weight (hweight family of functions)
> in asm/hweight.h and what is included from there.
> 
> Besides that, many modern processors also have a machine instruction
> to perform this task. In the processor manuals the instruction is 
> described as population count and the mnemonic starts with "popcnt"
> on x86 and ppc.
> 
> 	Gabriel
> 
> > Basically you just add up the bits, for 16 bit it would be:
> > 	val = (val & 0x5555) + (val >> 1) & 0x5555;
> > 	val = (val & 0x3333) + (val >> 2) & 0x3333;
> > 	val = (val & 0x0f0f) + (val >> 4) & 0x0f0f;
> > 	val = (val & 0x00ff) + (val >> 8) & 0x00ff;
> > As the size of the work increases the improvement is more significant.
> > (Some of the later masking can probably be proven unnecessary.)

right I think the loop could be replaced by:

  idx = hweight(mask & ((1 << id) - 1))

Sukadev,
please also rebase against latest Arnaldo's perf/core,
this code has changed just recently, it's now in:
  util/perf_regs.c:perf_reg_value

thanks,
jirka
diff mbox

Patch

diff --git a/tools/perf/util/unwind.c b/tools/perf/util/unwind.c
index 742f23b..2b888c6 100644
--- a/tools/perf/util/unwind.c
+++ b/tools/perf/util/unwind.c
@@ -396,11 +396,11 @@  static int reg_value(unw_word_t *valp, struct regs_dump *regs, int id,
 {
 	int i, idx = 0;
 
-	if (!(sample_regs & (1 << id)))
+	if (!(sample_regs & (1LL << id)))
 		return -EINVAL;
 
 	for (i = 0; i < id; i++) {
-		if (sample_regs & (1 << i))
+		if (sample_regs & (1LL << i))
 			idx++;
 	}