diff mbox

[4/4] powerpc/32: remove a NOP from memset()

Message ID ae2205dedfe437e4df2dd0cc7605d62189833221.1503277387.git.christophe.leroy@c-s.fr (mailing list archive)
State Accepted
Commit ad1b0122bdc96cbcfcfbc9ef59f9bf3658802a72
Headers show

Commit Message

Christophe Leroy Aug. 23, 2017, 2:54 p.m. UTC
memset() is patched after initialisation to activate the
optimised part which uses cache instructions.

Today we have a 'b 2f' to skip the optimised patch, which then gets
replaced by a NOP, implying a useless cycle consumption.
As we have a 'bne 2f' just before, we could use that instruction
for the live patching, hence removing the need to have a
dedicated 'b 2f' to be replaced by a NOP.

This patch changes the 'bne 2f' by a 'b 2f'. During init, that
'b 2f' is then replaced by 'bne 2f'

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/kernel/setup_32.c | 7 ++++++-
 arch/powerpc/lib/copy_32.S     | 7 +++++--
 2 files changed, 11 insertions(+), 3 deletions(-)

Comments

Michael Ellerman Aug. 24, 2017, 10:51 a.m. UTC | #1
Christophe Leroy <christophe.leroy@c-s.fr> writes:

> memset() is patched after initialisation to activate the
> optimised part which uses cache instructions.
>
> Today we have a 'b 2f' to skip the optimised patch, which then gets
> replaced by a NOP, implying a useless cycle consumption.
> As we have a 'bne 2f' just before, we could use that instruction
> for the live patching, hence removing the need to have a
> dedicated 'b 2f' to be replaced by a NOP.
>
> This patch changes the 'bne 2f' by a 'b 2f'. During init, that
> 'b 2f' is then replaced by 'bne 2f'

I'm not sure what the sequence is during boot for the 32-bit code, but
can you use an ALT_FTR section for this? Possibly that doesn't get done
at the right time though.

cheers
Christophe Leroy Aug. 24, 2017, 1:58 p.m. UTC | #2
Le 24/08/2017 à 12:51, Michael Ellerman a écrit :
> Christophe Leroy <christophe.leroy@c-s.fr> writes:
> 
>> memset() is patched after initialisation to activate the
>> optimised part which uses cache instructions.
>>
>> Today we have a 'b 2f' to skip the optimised patch, which then gets
>> replaced by a NOP, implying a useless cycle consumption.
>> As we have a 'bne 2f' just before, we could use that instruction
>> for the live patching, hence removing the need to have a
>> dedicated 'b 2f' to be replaced by a NOP.
>>
>> This patch changes the 'bne 2f' by a 'b 2f'. During init, that
>> 'b 2f' is then replaced by 'bne 2f'
> 
> I'm not sure what the sequence is during boot for the 32-bit code, but
> can you use an ALT_FTR section for this? Possibly that doesn't get done
> at the right time though.

Unfortunately, as we discussed in 2015 
(https://lkml.org/lkml/2015/9/10/608), the ALT_FTR does things too 
early, while the cache is not enabled yet.

Christophe
Michael Ellerman Aug. 25, 2017, 12:15 a.m. UTC | #3
Christophe LEROY <christophe.leroy@c-s.fr> writes:

> Le 24/08/2017 à 12:51, Michael Ellerman a écrit :
>> Christophe Leroy <christophe.leroy@c-s.fr> writes:
>> 
>>> memset() is patched after initialisation to activate the
>>> optimised part which uses cache instructions.
>>>
>>> Today we have a 'b 2f' to skip the optimised patch, which then gets
>>> replaced by a NOP, implying a useless cycle consumption.
>>> As we have a 'bne 2f' just before, we could use that instruction
>>> for the live patching, hence removing the need to have a
>>> dedicated 'b 2f' to be replaced by a NOP.
>>>
>>> This patch changes the 'bne 2f' by a 'b 2f'. During init, that
>>> 'b 2f' is then replaced by 'bne 2f'
>> 
>> I'm not sure what the sequence is during boot for the 32-bit code, but
>> can you use an ALT_FTR section for this? Possibly that doesn't get done
>> at the right time though.
>
> Unfortunately, as we discussed in 2015 
> (https://lkml.org/lkml/2015/9/10/608),

Haha, you expect me to remember things I said then! ;)

> the ALT_FTR does things too early, while the cache is not enabled yet.

OK. Ben did do some reworks to the early init since then, but I don't
think he changed that.

I notice we do setup_feature_keys() in machine_init(), which is the jump
label equivalent of apply_feature_fixups(). So I wonder if we could
actually move apply_feature_fixups() to there. But it would need some
serious review.

cheers
diff mbox

Patch

diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index 2f88f6cf1a42..51ebc01fff52 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -98,6 +98,9 @@  extern unsigned int memset_nocache_branch; /* Insn to be replaced by NOP */
 
 notrace void __init machine_init(u64 dt_ptr)
 {
+	unsigned int *addr = &memset_nocache_branch;
+	unsigned long insn;
+
 	/* Configure static keys first, now that we're relocated. */
 	setup_feature_keys();
 
@@ -105,7 +108,9 @@  notrace void __init machine_init(u64 dt_ptr)
 	udbg_early_init();
 
 	patch_instruction((unsigned int *)&memcpy, PPC_INST_NOP);
-	patch_instruction(&memset_nocache_branch, PPC_INST_NOP);
+
+	insn = create_cond_branch(addr, branch_target(addr), 0x820000);
+	patch_instruction(addr, insn);	/* replace b by bne cr0 */
 
 	/* Do some early initialization based on the flat device tree */
 	early_init_devtree(__va(dt_ptr));
diff --git a/arch/powerpc/lib/copy_32.S b/arch/powerpc/lib/copy_32.S
index 05aaee20590f..da425bb6b369 100644
--- a/arch/powerpc/lib/copy_32.S
+++ b/arch/powerpc/lib/copy_32.S
@@ -103,9 +103,12 @@  _GLOBAL(memset)
 	add	r5,r0,r5
 	subf	r6,r0,r3
 	cmplwi	0,r4,0
-	bne	2f	/* Use normal procedure if r4 is not zero */
+	/*
+	 * Skip optimised bloc until cache is enabled. Will be replaced
+	 * by 'bne' during boot to use normal procedure if r4 is not zero
+	 */
 _GLOBAL(memset_nocache_branch)
-	b	2f	/* Skip optimised bloc until cache is enabled */
+	b	2f
 
 	clrlwi	r7,r6,32-LG_CACHELINE_BYTES
 	add	r8,r7,r5