diff mbox series

KVM: PPC: BOOK3S: book3s_hv_nested.c: improve branch prediction for k.alloc

Message ID 20230407093147.3646597-1-kconsul@linux.vnet.ibm.com (mailing list archive)
State Rejected
Headers show
Series KVM: PPC: BOOK3S: book3s_hv_nested.c: improve branch prediction for k.alloc | expand

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_ppctests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_selftests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_sparse success Successfully ran 4 jobs.
snowpatch_ozlabs/github-powerpc_kernel_qemu success Successfully ran 24 jobs.
snowpatch_ozlabs/github-powerpc_clang success Successfully ran 6 jobs.

Commit Message

Kautuk Consul April 7, 2023, 9:31 a.m. UTC
I used the unlikely() macro on the return values of the k.alloc
calls and found that it changes the code generation a bit.
Optimize all return paths of k.alloc calls by improving
branch prediction on return value of k.alloc.

Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_hv_nested.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Bagas Sanjaya April 7, 2023, 1:46 p.m. UTC | #1
On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote:
> I used the unlikely() macro on the return values of the k.alloc
> calls and found that it changes the code generation a bit.
> Optimize all return paths of k.alloc calls by improving
> branch prediction on return value of k.alloc.

What about below?

"Improve branch prediction on kmalloc() and kzalloc() call by using
unlikely() macro to optimize their return paths."

That is, try to avoid first-person construct (I).

Thanks.
Sean Christopherson April 7, 2023, 4:01 p.m. UTC | #2
On Fri, Apr 07, 2023, Bagas Sanjaya wrote:
> On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote:
> > I used the unlikely() macro on the return values of the k.alloc
> > calls and found that it changes the code generation a bit.
> > Optimize all return paths of k.alloc calls by improving
> > branch prediction on return value of k.alloc.

Nit, this is improving code generation, not branch prediction.

> What about below?
> 
> "Improve branch prediction on kmalloc() and kzalloc() call by using
> unlikely() macro to optimize their return paths."

Another nit, using unlikely() doesn't necessarily provide a measurable optimization.
As above, it does often improve code generation for the happy path, but that doesn't
always equate to improved performance, e.g. if the CPU can easily predict the branch
and/or there is no impact on the cache footprint.
Kautuk Consul April 11, 2023, 4:59 a.m. UTC | #3
On 2023-04-07 09:01:29, Sean Christopherson wrote:
> On Fri, Apr 07, 2023, Bagas Sanjaya wrote:
> > On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote:
> > > I used the unlikely() macro on the return values of the k.alloc
> > > calls and found that it changes the code generation a bit.
> > > Optimize all return paths of k.alloc calls by improving
> > > branch prediction on return value of k.alloc.
> 
> Nit, this is improving code generation, not branch prediction.
Sorry my mistake.
> 
> > What about below?
> > 
> > "Improve branch prediction on kmalloc() and kzalloc() call by using
> > unlikely() macro to optimize their return paths."
> 
> Another nit, using unlikely() doesn't necessarily provide a measurable optimization.
> As above, it does often improve code generation for the happy path, but that doesn't
> always equate to improved performance, e.g. if the CPU can easily predict the branch
> and/or there is no impact on the cache footprint.
I see. I will submit a v2 of the patch with a better and more accurate
description. Does anyone else have any comments before I do so ?
Michael Ellerman April 11, 2023, 6:35 a.m. UTC | #4
Kautuk Consul <kconsul@linux.vnet.ibm.com> writes:
> On 2023-04-07 09:01:29, Sean Christopherson wrote:
>> On Fri, Apr 07, 2023, Bagas Sanjaya wrote:
>> > On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote:
>> > > I used the unlikely() macro on the return values of the k.alloc
>> > > calls and found that it changes the code generation a bit.
>> > > Optimize all return paths of k.alloc calls by improving
>> > > branch prediction on return value of k.alloc.
>> 
>> Nit, this is improving code generation, not branch prediction.
> Sorry my mistake.
>> 
>> > What about below?
>> > 
>> > "Improve branch prediction on kmalloc() and kzalloc() call by using
>> > unlikely() macro to optimize their return paths."
>> 
>> Another nit, using unlikely() doesn't necessarily provide a measurable optimization.
>> As above, it does often improve code generation for the happy path, but that doesn't
>> always equate to improved performance, e.g. if the CPU can easily predict the branch
>> and/or there is no impact on the cache footprint.

> I see. I will submit a v2 of the patch with a better and more accurate
> description. Does anyone else have any comments before I do so ?
 
In general I think unlikely should be saved for cases where either the
compiler is generating terrible code, or the likelyness of the condition
might be surprising to a human reader.

eg. if you had some code that does a NULL check and it's *expected* that
the value is NULL, then wrapping that check in likely() actually adds
information for a human reader.
    
Also please don't use unlikely in init paths or other cold paths, it
clutters the code (only slightly but a little) and that's not worth the
possible tiny benefit for code that only runs once or infrequently.

I would expect the compilers to do the right thing in all
these cases without the unlikely. But if you can demonstrate that they
meaningfully improve the code generation with a before/after
dissassembly then I'd be interested.

cheers
Kautuk Consul April 11, 2023, 9:04 a.m. UTC | #5
Sorry, last email rejected by the mailing lists.
Can you please look at the diff file attach ?

On Tue, Apr 11, 2023 at 2:14 PM Kautuk Consul
<kautuk.consul.80@gmail.com> wrote:
>
> Hi,
>
> Sorry Im replying back using my private gmail ID as I can't figure out
> how to attach multiple files using mutt.
>
> On Tue, Apr 11, 2023 at 12:05 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
> >
> > Kautuk Consul <kconsul@linux.vnet.ibm.com> writes:
> > > On 2023-04-07 09:01:29, Sean Christopherson wrote:
> > >> On Fri, Apr 07, 2023, Bagas Sanjaya wrote:
> > >> > On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote:
> > >> > > I used the unlikely() macro on the return values of the k.alloc
> > >> > > calls and found that it changes the code generation a bit.
> > >> > > Optimize all return paths of k.alloc calls by improving
> > >> > > branch prediction on return value of k.alloc.
> > >>
> > >> Nit, this is improving code generation, not branch prediction.
> > > Sorry my mistake.
> > >>
> > >> > What about below?
> > >> >
> > >> > "Improve branch prediction on kmalloc() and kzalloc() call by using
> > >> > unlikely() macro to optimize their return paths."
> > >>
> > >> Another nit, using unlikely() doesn't necessarily provide a measurable optimization.
> > >> As above, it does often improve code generation for the happy path, but that doesn't
> > >> always equate to improved performance, e.g. if the CPU can easily predict the branch
> > >> and/or there is no impact on the cache footprint.
> >
> > > I see. I will submit a v2 of the patch with a better and more accurate
> > > description. Does anyone else have any comments before I do so ?
> >
> > In general I think unlikely should be saved for cases where either the
> > compiler is generating terrible code, or the likelyness of the condition
> > might be surprising to a human reader.
> >
> > eg. if you had some code that does a NULL check and it's *expected* that
> > the value is NULL, then wrapping that check in likely() actually adds
> > information for a human reader.
> >
> > Also please don't use unlikely in init paths or other cold paths, it
> > clutters the code (only slightly but a little) and that's not worth the
> > possible tiny benefit for code that only runs once or infrequently.
> >
> > I would expect the compilers to do the right thing in all
> > these cases without the unlikely. But if you can demonstrate that they
> > meaningfully improve the code generation with a before/after
> > dissassembly then I'd be interested.
> >
> There are surprisingly many changes to code generation before and
> after using these
> instances of the unlikely macro. I couldn't really analyze all of them
> to be able to state
> that they are indeed improving performance in some way. I assumed the compiler
> would generate optimal code for these unlikely paths.
> Please find the before and after file attached to this email.
>
> > cheers
Kautuk Consul April 12, 2023, 7:04 a.m. UTC | #6
Hi,

On 2023-04-11 16:35:10, Michael Ellerman wrote:
> Kautuk Consul <kconsul@linux.vnet.ibm.com> writes:
> > On 2023-04-07 09:01:29, Sean Christopherson wrote:
> >> On Fri, Apr 07, 2023, Bagas Sanjaya wrote:
> >> > On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote:
> >> > > I used the unlikely() macro on the return values of the k.alloc
> >> > > calls and found that it changes the code generation a bit.
> >> > > Optimize all return paths of k.alloc calls by improving
> >> > > branch prediction on return value of k.alloc.
> >> 
> >> Nit, this is improving code generation, not branch prediction.
> > Sorry my mistake.
> >> 
> >> > What about below?
> >> > 
> >> > "Improve branch prediction on kmalloc() and kzalloc() call by using
> >> > unlikely() macro to optimize their return paths."
> >> 
> >> Another nit, using unlikely() doesn't necessarily provide a measurable optimization.
> >> As above, it does often improve code generation for the happy path, but that doesn't
> >> always equate to improved performance, e.g. if the CPU can easily predict the branch
> >> and/or there is no impact on the cache footprint.
> 
> > I see. I will submit a v2 of the patch with a better and more accurate
> > description. Does anyone else have any comments before I do so ?
>  
> In general I think unlikely should be saved for cases where either the
> compiler is generating terrible code, or the likelyness of the condition
> might be surprising to a human reader.
> 
> eg. if you had some code that does a NULL check and it's *expected* that
> the value is NULL, then wrapping that check in likely() actually adds
> information for a human reader.
>     
> Also please don't use unlikely in init paths or other cold paths, it
> clutters the code (only slightly but a little) and that's not worth the
> possible tiny benefit for code that only runs once or infrequently.
> 
> I would expect the compilers to do the right thing in all
> these cases without the unlikely. But if you can demonstrate that they
> meaningfully improve the code generation with a before/after
> dissassembly then I'd be interested.
Just FYI, the last email by kautuk.consul.80@gmail.com was by me.
That last email contains a diff file attachment which compares 2 files:
before my changes and after my changes.
This diff file shows a lot of changes in code generation. Im assuming
all those changes are made by the compiler towards optimizing all return
paths to k.alloc calls.
Kindly review and comment.
> cheers
Kautuk Consul April 19, 2023, 10:08 a.m. UTC | #7
On 2023-04-12 12:34:13, Kautuk Consul wrote:
> Hi,
> 
> On 2023-04-11 16:35:10, Michael Ellerman wrote:
> > Kautuk Consul <kconsul@linux.vnet.ibm.com> writes:
> > > On 2023-04-07 09:01:29, Sean Christopherson wrote:
> > >> On Fri, Apr 07, 2023, Bagas Sanjaya wrote:
> > >> > On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote:
> > >> > > I used the unlikely() macro on the return values of the k.alloc
> > >> > > calls and found that it changes the code generation a bit.
> > >> > > Optimize all return paths of k.alloc calls by improving
> > >> > > branch prediction on return value of k.alloc.
> > >> 
> > >> Nit, this is improving code generation, not branch prediction.
> > > Sorry my mistake.
> > >> 
> > >> > What about below?
> > >> > 
> > >> > "Improve branch prediction on kmalloc() and kzalloc() call by using
> > >> > unlikely() macro to optimize their return paths."
> > >> 
> > >> Another nit, using unlikely() doesn't necessarily provide a measurable optimization.
> > >> As above, it does often improve code generation for the happy path, but that doesn't
> > >> always equate to improved performance, e.g. if the CPU can easily predict the branch
> > >> and/or there is no impact on the cache footprint.
> > 
> > > I see. I will submit a v2 of the patch with a better and more accurate
> > > description. Does anyone else have any comments before I do so ?
> >  
> > In general I think unlikely should be saved for cases where either the
> > compiler is generating terrible code, or the likelyness of the condition
> > might be surprising to a human reader.
> > 
> > eg. if you had some code that does a NULL check and it's *expected* that
> > the value is NULL, then wrapping that check in likely() actually adds
> > information for a human reader.
> >     
> > Also please don't use unlikely in init paths or other cold paths, it
> > clutters the code (only slightly but a little) and that's not worth the
> > possible tiny benefit for code that only runs once or infrequently.
> > 
> > I would expect the compilers to do the right thing in all
> > these cases without the unlikely. But if you can demonstrate that they
> > meaningfully improve the code generation with a before/after
> > dissassembly then I'd be interested.
> Just FYI, the last email by kautuk.consul.80@gmail.com was by me.
> That last email contains a diff file attachment which compares 2 files:
> before my changes and after my changes.
> This diff file shows a lot of changes in code generation. Im assuming
> all those changes are made by the compiler towards optimizing all return
> paths to k.alloc calls.
> Kindly review and comment.
Any comments on the numerous code generation changes as shown by the
files I attached to this mail chain ? Sorry I don't have concrete
figures of any type to prove that this leads to any measurable performance
improvements. I am just assuming that the compiler's modified code
generation (due to the use of the unlikely macro) would be optimal.

Thanks.
> > cheers
diff mbox series

Patch

diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 5a64a1341e6f..dbf2dd073e1f 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -446,7 +446,7 @@  long kvmhv_nested_init(void)
 		ptb_order = 12;
 	pseries_partition_tb = kmalloc(sizeof(struct patb_entry) << ptb_order,
 				       GFP_KERNEL);
-	if (!pseries_partition_tb) {
+	if (unlikely(!pseries_partition_tb)) {
 		pr_err("kvm-hv: failed to allocated nested partition table\n");
 		return -ENOMEM;
 	}
@@ -575,7 +575,7 @@  long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu)
 		return H_PARAMETER;
 
 	buf = kzalloc(n, GFP_KERNEL | __GFP_NOWARN);
-	if (!buf)
+	if (unlikely(!buf))
 		return H_NO_MEM;
 
 	gp = kvmhv_get_nested(vcpu->kvm, l1_lpid, false);
@@ -689,7 +689,7 @@  static struct kvm_nested_guest *kvmhv_alloc_nested(struct kvm *kvm, unsigned int
 	long shadow_lpid;
 
 	gp = kzalloc(sizeof(*gp), GFP_KERNEL);
-	if (!gp)
+	if (unlikely(!gp))
 		return NULL;
 	gp->l1_host = kvm;
 	gp->l1_lpid = lpid;
@@ -1633,7 +1633,7 @@  static long int __kvmhv_nested_page_fault(struct kvm_vcpu *vcpu,
 	/* 4. Insert the pte into our shadow_pgtable */
 
 	n_rmap = kzalloc(sizeof(*n_rmap), GFP_KERNEL);
-	if (!n_rmap)
+	if (unlikely(!n_rmap))
 		return RESUME_GUEST; /* Let the guest try again */
 	n_rmap->rmap = (n_gpa & RMAP_NESTED_GPA_MASK) |
 		(((unsigned long) gp->l1_lpid) << RMAP_NESTED_LPID_SHIFT);