diff mbox series

[RFC] powerpc/lib: Fixing use a temporary mm for code patching

Message ID c88b13ede49744d81fdab32e037a7ae10f0b241f.1585233657.git.christophe.leroy@c-s.fr (mailing list archive)
State RFC
Headers show
Series [RFC] powerpc/lib: Fixing use a temporary mm for code patching | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch warning Failed to apply on branch powerpc/merge (c6624071c338732402e8c726df6a4074473eaa0e)
snowpatch_ozlabs/apply_patch warning Failed to apply on branch powerpc/next (7074695ac6fb965d478f373b95bc5c636e9f21b0)
snowpatch_ozlabs/apply_patch warning Failed to apply on branch linus/master (1b649e0bcae71c118c1333e02249a7510ba7f70a)
snowpatch_ozlabs/apply_patch warning Failed to apply on branch powerpc/fixes (1d0c32ec3b860a32df593a22bad0d1dbc5546a59)
snowpatch_ozlabs/apply_patch warning Failed to apply on branch linux-next (89295c59c1f063b533d071ca49d0fa0c0783ca6f)
snowpatch_ozlabs/apply_patch fail Failed to apply to any branch

Commit Message

Christophe Leroy March 26, 2020, 2:42 p.m. UTC
This patch fixes the RFC series identified below.
It fixes three points:
- Failure with CONFIG_PPC_KUAP
- Failure to write do to lack of DIRTY bit set on the 8xx
- Inadequaly complex WARN post verification

However, it has an impact on the CPU load. Here is the time
needed on an 8xx to run the ftrace selftests without and
with this series:
- Without CONFIG_STRICT_KERNEL_RWX		==> 38 seconds
- With CONFIG_STRICT_KERNEL_RWX			==> 40 seconds
- With CONFIG_STRICT_KERNEL_RWX + this series	==> 43 seconds

Link: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166003
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/lib/code-patching.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Christopher M. Riedl April 15, 2020, 5:16 a.m. UTC | #1
> On March 26, 2020 9:42 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> 
>  
> This patch fixes the RFC series identified below.
> It fixes three points:
> - Failure with CONFIG_PPC_KUAP
> - Failure to write do to lack of DIRTY bit set on the 8xx
> - Inadequaly complex WARN post verification
> 
> However, it has an impact on the CPU load. Here is the time
> needed on an 8xx to run the ftrace selftests without and
> with this series:
> - Without CONFIG_STRICT_KERNEL_RWX		==> 38 seconds
> - With CONFIG_STRICT_KERNEL_RWX			==> 40 seconds
> - With CONFIG_STRICT_KERNEL_RWX + this series	==> 43 seconds
> 
> Link: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166003
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
>  arch/powerpc/lib/code-patching.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> index f156132e8975..4ccff427592e 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -97,6 +97,7 @@ static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
>  	}
>  
>  	pte = mk_pte(page, pgprot);
> +	pte = pte_mkdirty(pte);
>  	set_pte_at(patching_mm, patching_addr, ptep, pte);
>  
>  	init_temp_mm(&patch_mapping->temp_mm, patching_mm);
> @@ -168,7 +169,9 @@ static int do_patch_instruction(unsigned int *addr, unsigned int instr)
>  			(offset_in_page((unsigned long)addr) /
>  				sizeof(unsigned int));
>  
> +	allow_write_to_user(patch_addr, sizeof(instr));
>  	__patch_instruction(addr, instr, patch_addr);
> +	prevent_write_to_user(patch_addr, sizeof(instr));
> 

On radix we can map the page with PAGE_KERNEL protection which ends up
setting EAA[0] in the radix PTE. This means the KUAP (AMR) protection is
ignored (ISA v3.0b Fig. 35) since we are accessing the page from MSR[PR]=0.

Can we employ a similar approach on the 8xx? I would prefer *not* to wrap
the __patch_instruction() with the allow_/prevent_write_to_user() KUAP things
because this is a temporary kernel mapping which really isn't userspace in
the usual sense.
 
>  	err = unmap_patch(&patch_mapping);
>  	if (err)
> @@ -179,7 +182,7 @@ static int do_patch_instruction(unsigned int *addr, unsigned int instr)
>  	 * think we just wrote.
>  	 * XXX: BUG_ON() instead?
>  	 */
> -	WARN_ON(memcmp(addr, &instr, sizeof(instr)));
> +	WARN_ON(*addr != instr);
>  
>  out:
>  	local_irq_restore(flags);
> -- 
> 2.25.0
Christophe Leroy April 15, 2020, 9:12 a.m. UTC | #2
Le 15/04/2020 à 07:16, Christopher M Riedl a écrit :
>> On March 26, 2020 9:42 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>
>>   
>> This patch fixes the RFC series identified below.
>> It fixes three points:
>> - Failure with CONFIG_PPC_KUAP
>> - Failure to write do to lack of DIRTY bit set on the 8xx
>> - Inadequaly complex WARN post verification
>>
>> However, it has an impact on the CPU load. Here is the time
>> needed on an 8xx to run the ftrace selftests without and
>> with this series:
>> - Without CONFIG_STRICT_KERNEL_RWX		==> 38 seconds
>> - With CONFIG_STRICT_KERNEL_RWX			==> 40 seconds
>> - With CONFIG_STRICT_KERNEL_RWX + this series	==> 43 seconds
>>
>> Link: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166003
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> ---
>>   arch/powerpc/lib/code-patching.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
>> index f156132e8975..4ccff427592e 100644
>> --- a/arch/powerpc/lib/code-patching.c
>> +++ b/arch/powerpc/lib/code-patching.c
>> @@ -97,6 +97,7 @@ static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
>>   	}
>>   
>>   	pte = mk_pte(page, pgprot);
>> +	pte = pte_mkdirty(pte);
>>   	set_pte_at(patching_mm, patching_addr, ptep, pte);
>>   
>>   	init_temp_mm(&patch_mapping->temp_mm, patching_mm);
>> @@ -168,7 +169,9 @@ static int do_patch_instruction(unsigned int *addr, unsigned int instr)
>>   			(offset_in_page((unsigned long)addr) /
>>   				sizeof(unsigned int));
>>   
>> +	allow_write_to_user(patch_addr, sizeof(instr));
>>   	__patch_instruction(addr, instr, patch_addr);
>> +	prevent_write_to_user(patch_addr, sizeof(instr));
>>
> 
> On radix we can map the page with PAGE_KERNEL protection which ends up
> setting EAA[0] in the radix PTE. This means the KUAP (AMR) protection is
> ignored (ISA v3.0b Fig. 35) since we are accessing the page from MSR[PR]=0.
> 
> Can we employ a similar approach on the 8xx? I would prefer *not* to wrap
> the __patch_instruction() with the allow_/prevent_write_to_user() KUAP things
> because this is a temporary kernel mapping which really isn't userspace in
> the usual sense.

On the 8xx, that's pretty different.

The PTE doesn't control whether a page is user page or a kernel page. 
The only thing that is set in the PTE is whether a page is linked to a 
given PID or not.
PAGE_KERNEL tells that the page can be addressed with any PID.

The user access right is given by a kind of zone, which is in the PGD 
entry. Every pages above PAGE_OFFSET are defined as belonging to zone 0. 
Every pages below PAGE_OFFSET are defined as belonging to zone 1.

By default, zone 0 can only be accessed by kernel, and zone 1 can only 
be accessed by user. When kernel wants to access zone 1, it temporarily 
changes properties of zone 1 to allow both kernel and user accesses.

So, if your mapping is below PAGE_OFFSET, it is in zone 1 and kernel 
must unlock it to access it.


And this is more or less the same on hash/32. This is managed by segment 
registers. One segment register corresponds to a 256Mbytes area. Every 
pages below PAGE_OFFSET can only be read by default by kernel. Only user 
can write if the PTE allows it. When the kernel needs to write at an 
address below PAGE_OFFSET, it must change the segment properties in the 
corresponding segment register.

So, for both cases, if we want to have it local to a task while still 
allowing kernel access, it means we have to define a new special area 
between TASK_SIZE and PAGE_OFFSET which belongs to kernel zone.

That looks complex to me for a small benefit, especially as 8xx is not 
SMP and neither are most of the hash/32 targets.

Christophe
Christopher M. Riedl April 15, 2020, 4:22 p.m. UTC | #3
> On April 15, 2020 4:12 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> 
>  
> Le 15/04/2020 à 07:16, Christopher M Riedl a écrit :
> >> On March 26, 2020 9:42 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >>
> >>   
> >> This patch fixes the RFC series identified below.
> >> It fixes three points:
> >> - Failure with CONFIG_PPC_KUAP
> >> - Failure to write do to lack of DIRTY bit set on the 8xx
> >> - Inadequaly complex WARN post verification
> >>
> >> However, it has an impact on the CPU load. Here is the time
> >> needed on an 8xx to run the ftrace selftests without and
> >> with this series:
> >> - Without CONFIG_STRICT_KERNEL_RWX		==> 38 seconds
> >> - With CONFIG_STRICT_KERNEL_RWX			==> 40 seconds
> >> - With CONFIG_STRICT_KERNEL_RWX + this series	==> 43 seconds
> >>
> >> Link: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166003
> >> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> >> ---
> >>   arch/powerpc/lib/code-patching.c | 5 ++++-
> >>   1 file changed, 4 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> >> index f156132e8975..4ccff427592e 100644
> >> --- a/arch/powerpc/lib/code-patching.c
> >> +++ b/arch/powerpc/lib/code-patching.c
> >> @@ -97,6 +97,7 @@ static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
> >>   	}
> >>   
> >>   	pte = mk_pte(page, pgprot);
> >> +	pte = pte_mkdirty(pte);
> >>   	set_pte_at(patching_mm, patching_addr, ptep, pte);
> >>   
> >>   	init_temp_mm(&patch_mapping->temp_mm, patching_mm);
> >> @@ -168,7 +169,9 @@ static int do_patch_instruction(unsigned int *addr, unsigned int instr)
> >>   			(offset_in_page((unsigned long)addr) /
> >>   				sizeof(unsigned int));
> >>   
> >> +	allow_write_to_user(patch_addr, sizeof(instr));
> >>   	__patch_instruction(addr, instr, patch_addr);
> >> +	prevent_write_to_user(patch_addr, sizeof(instr));
> >>
> > 
> > On radix we can map the page with PAGE_KERNEL protection which ends up
> > setting EAA[0] in the radix PTE. This means the KUAP (AMR) protection is
> > ignored (ISA v3.0b Fig. 35) since we are accessing the page from MSR[PR]=0.
> > 
> > Can we employ a similar approach on the 8xx? I would prefer *not* to wrap
> > the __patch_instruction() with the allow_/prevent_write_to_user() KUAP things
> > because this is a temporary kernel mapping which really isn't userspace in
> > the usual sense.
> 
> On the 8xx, that's pretty different.
> 
> The PTE doesn't control whether a page is user page or a kernel page. 
> The only thing that is set in the PTE is whether a page is linked to a 
> given PID or not.
> PAGE_KERNEL tells that the page can be addressed with any PID.
> 
> The user access right is given by a kind of zone, which is in the PGD 
> entry. Every pages above PAGE_OFFSET are defined as belonging to zone 0. 
> Every pages below PAGE_OFFSET are defined as belonging to zone 1.
> 
> By default, zone 0 can only be accessed by kernel, and zone 1 can only 
> be accessed by user. When kernel wants to access zone 1, it temporarily 
> changes properties of zone 1 to allow both kernel and user accesses.
> 
> So, if your mapping is below PAGE_OFFSET, it is in zone 1 and kernel 
> must unlock it to access it.
> 
> 
> And this is more or less the same on hash/32. This is managed by segment 
> registers. One segment register corresponds to a 256Mbytes area. Every 
> pages below PAGE_OFFSET can only be read by default by kernel. Only user 
> can write if the PTE allows it. When the kernel needs to write at an 
> address below PAGE_OFFSET, it must change the segment properties in the 
> corresponding segment register.
> 
> So, for both cases, if we want to have it local to a task while still 
> allowing kernel access, it means we have to define a new special area 
> between TASK_SIZE and PAGE_OFFSET which belongs to kernel zone.
> 
> That looks complex to me for a small benefit, especially as 8xx is not 
> SMP and neither are most of the hash/32 targets.
> 

Agreed. So I guess the solution is to differentiate between radix/non-radix
and use PAGE_SHARED for non-radix along with the KUAP functions when KUAP
is enabled. Hmm, I need to think about this some more, especially if it's
acceptable to temporarily map kernel text as PAGE_SHARED for patching. Do
you see any obvious problems on 8xx and hash/32 w/ using PAGE_SHARED?

I don't necessarily want to drop the local mm patching idea for non-radix
platforms since that means we would have to maintain two implementations.

> Christophe
Christophe Leroy April 18, 2020, 10:27 a.m. UTC | #4
Le 15/04/2020 à 18:22, Christopher M Riedl a écrit :
>> On April 15, 2020 4:12 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>
>>   
>> Le 15/04/2020 à 07:16, Christopher M Riedl a écrit :
>>>> On March 26, 2020 9:42 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>>>
>>>>    
>>>> This patch fixes the RFC series identified below.
>>>> It fixes three points:
>>>> - Failure with CONFIG_PPC_KUAP
>>>> - Failure to write do to lack of DIRTY bit set on the 8xx
>>>> - Inadequaly complex WARN post verification
>>>>
>>>> However, it has an impact on the CPU load. Here is the time
>>>> needed on an 8xx to run the ftrace selftests without and
>>>> with this series:
>>>> - Without CONFIG_STRICT_KERNEL_RWX		==> 38 seconds
>>>> - With CONFIG_STRICT_KERNEL_RWX			==> 40 seconds
>>>> - With CONFIG_STRICT_KERNEL_RWX + this series	==> 43 seconds
>>>>
>>>> Link: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166003
>>>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>>>> ---
>>>>    arch/powerpc/lib/code-patching.c | 5 ++++-
>>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
>>>> index f156132e8975..4ccff427592e 100644
>>>> --- a/arch/powerpc/lib/code-patching.c
>>>> +++ b/arch/powerpc/lib/code-patching.c
>>>> @@ -97,6 +97,7 @@ static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
>>>>    	}
>>>>    
>>>>    	pte = mk_pte(page, pgprot);
>>>> +	pte = pte_mkdirty(pte);
>>>>    	set_pte_at(patching_mm, patching_addr, ptep, pte);
>>>>    
>>>>    	init_temp_mm(&patch_mapping->temp_mm, patching_mm);
>>>> @@ -168,7 +169,9 @@ static int do_patch_instruction(unsigned int *addr, unsigned int instr)
>>>>    			(offset_in_page((unsigned long)addr) /
>>>>    				sizeof(unsigned int));
>>>>    
>>>> +	allow_write_to_user(patch_addr, sizeof(instr));
>>>>    	__patch_instruction(addr, instr, patch_addr);
>>>> +	prevent_write_to_user(patch_addr, sizeof(instr));
>>>>
>>>
>>> On radix we can map the page with PAGE_KERNEL protection which ends up
>>> setting EAA[0] in the radix PTE. This means the KUAP (AMR) protection is
>>> ignored (ISA v3.0b Fig. 35) since we are accessing the page from MSR[PR]=0.
>>>
>>> Can we employ a similar approach on the 8xx? I would prefer *not* to wrap
>>> the __patch_instruction() with the allow_/prevent_write_to_user() KUAP things
>>> because this is a temporary kernel mapping which really isn't userspace in
>>> the usual sense.
>>
>> On the 8xx, that's pretty different.
>>
>> The PTE doesn't control whether a page is user page or a kernel page.
>> The only thing that is set in the PTE is whether a page is linked to a
>> given PID or not.
>> PAGE_KERNEL tells that the page can be addressed with any PID.
>>
>> The user access right is given by a kind of zone, which is in the PGD
>> entry. Every pages above PAGE_OFFSET are defined as belonging to zone 0.
>> Every pages below PAGE_OFFSET are defined as belonging to zone 1.
>>
>> By default, zone 0 can only be accessed by kernel, and zone 1 can only
>> be accessed by user. When kernel wants to access zone 1, it temporarily
>> changes properties of zone 1 to allow both kernel and user accesses.
>>
>> So, if your mapping is below PAGE_OFFSET, it is in zone 1 and kernel
>> must unlock it to access it.
>>
>>
>> And this is more or less the same on hash/32. This is managed by segment
>> registers. One segment register corresponds to a 256Mbytes area. Every
>> pages below PAGE_OFFSET can only be read by default by kernel. Only user
>> can write if the PTE allows it. When the kernel needs to write at an
>> address below PAGE_OFFSET, it must change the segment properties in the
>> corresponding segment register.
>>
>> So, for both cases, if we want to have it local to a task while still
>> allowing kernel access, it means we have to define a new special area
>> between TASK_SIZE and PAGE_OFFSET which belongs to kernel zone.
>>
>> That looks complex to me for a small benefit, especially as 8xx is not
>> SMP and neither are most of the hash/32 targets.
>>
> 
> Agreed. So I guess the solution is to differentiate between radix/non-radix
> and use PAGE_SHARED for non-radix along with the KUAP functions when KUAP
> is enabled. Hmm, I need to think about this some more, especially if it's
> acceptable to temporarily map kernel text as PAGE_SHARED for patching. Do
> you see any obvious problems on 8xx and hash/32 w/ using PAGE_SHARED?

No it shouldn't be a problem AFAICS, except maybe the CPU overhead it 
brings as I mentioned previously (ftrace selftests going from 40 to 43 
seconds ie 8% overhead.

> 
> I don't necessarily want to drop the local mm patching idea for non-radix
> platforms since that means we would have to maintain two implementations.
> 

What's the problem with RADIX, why can't PAGE_SHARED be used on radix ?

Christophe
Christopher M. Riedl April 21, 2020, 4:22 a.m. UTC | #5
On Sat Apr 18, 2020 at 12:27 PM, Christophe Leroy wrote:
>
> 
>
> 
> Le 15/04/2020 à 18:22, Christopher M Riedl a écrit :
> >> On April 15, 2020 4:12 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >>
> >>   
> >> Le 15/04/2020 à 07:16, Christopher M Riedl a écrit :
> >>>> On March 26, 2020 9:42 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >>>>
> >>>>    
> >>>> This patch fixes the RFC series identified below.
> >>>> It fixes three points:
> >>>> - Failure with CONFIG_PPC_KUAP
> >>>> - Failure to write do to lack of DIRTY bit set on the 8xx
> >>>> - Inadequaly complex WARN post verification
> >>>>
> >>>> However, it has an impact on the CPU load. Here is the time
> >>>> needed on an 8xx to run the ftrace selftests without and
> >>>> with this series:
> >>>> - Without CONFIG_STRICT_KERNEL_RWX		==> 38 seconds
> >>>> - With CONFIG_STRICT_KERNEL_RWX			==> 40 seconds
> >>>> - With CONFIG_STRICT_KERNEL_RWX + this series	==> 43 seconds
> >>>>
> >>>> Link: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166003
> >>>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> >>>> ---
> >>>>    arch/powerpc/lib/code-patching.c | 5 ++++-
> >>>>    1 file changed, 4 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> >>>> index f156132e8975..4ccff427592e 100644
> >>>> --- a/arch/powerpc/lib/code-patching.c
> >>>> +++ b/arch/powerpc/lib/code-patching.c
> >>>> @@ -97,6 +97,7 @@ static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
> >>>>    	}
> >>>>    
> >>>>    	pte = mk_pte(page, pgprot);
> >>>> +	pte = pte_mkdirty(pte);
> >>>>    	set_pte_at(patching_mm, patching_addr, ptep, pte);
> >>>>    
> >>>>    	init_temp_mm(&patch_mapping->temp_mm, patching_mm);
> >>>> @@ -168,7 +169,9 @@ static int do_patch_instruction(unsigned int *addr, unsigned int instr)
> >>>>    			(offset_in_page((unsigned long)addr) /
> >>>>    				sizeof(unsigned int));
> >>>>    
> >>>> +	allow_write_to_user(patch_addr, sizeof(instr));
> >>>>    	__patch_instruction(addr, instr, patch_addr);
> >>>> +	prevent_write_to_user(patch_addr, sizeof(instr));
> >>>>
> >>>
> >>> On radix we can map the page with PAGE_KERNEL protection which ends up
> >>> setting EAA[0] in the radix PTE. This means the KUAP (AMR) protection is
> >>> ignored (ISA v3.0b Fig. 35) since we are accessing the page from MSR[PR]=0.
> >>>
> >>> Can we employ a similar approach on the 8xx? I would prefer *not* to wrap
> >>> the __patch_instruction() with the allow_/prevent_write_to_user() KUAP things
> >>> because this is a temporary kernel mapping which really isn't userspace in
> >>> the usual sense.
> >>
> >> On the 8xx, that's pretty different.
> >>
> >> The PTE doesn't control whether a page is user page or a kernel page.
> >> The only thing that is set in the PTE is whether a page is linked to a
> >> given PID or not.
> >> PAGE_KERNEL tells that the page can be addressed with any PID.
> >>
> >> The user access right is given by a kind of zone, which is in the PGD
> >> entry. Every pages above PAGE_OFFSET are defined as belonging to zone 0.
> >> Every pages below PAGE_OFFSET are defined as belonging to zone 1.
> >>
> >> By default, zone 0 can only be accessed by kernel, and zone 1 can only
> >> be accessed by user. When kernel wants to access zone 1, it temporarily
> >> changes properties of zone 1 to allow both kernel and user accesses.
> >>
> >> So, if your mapping is below PAGE_OFFSET, it is in zone 1 and kernel
> >> must unlock it to access it.
> >>
> >>
> >> And this is more or less the same on hash/32. This is managed by segment
> >> registers. One segment register corresponds to a 256Mbytes area. Every
> >> pages below PAGE_OFFSET can only be read by default by kernel. Only user
> >> can write if the PTE allows it. When the kernel needs to write at an
> >> address below PAGE_OFFSET, it must change the segment properties in the
> >> corresponding segment register.
> >>
> >> So, for both cases, if we want to have it local to a task while still
> >> allowing kernel access, it means we have to define a new special area
> >> between TASK_SIZE and PAGE_OFFSET which belongs to kernel zone.
> >>
> >> That looks complex to me for a small benefit, especially as 8xx is not
> >> SMP and neither are most of the hash/32 targets.
> >>
> > 
> > Agreed. So I guess the solution is to differentiate between radix/non-radix
> > and use PAGE_SHARED for non-radix along with the KUAP functions when KUAP
> > is enabled. Hmm, I need to think about this some more, especially if it's
> > acceptable to temporarily map kernel text as PAGE_SHARED for patching. Do
> > you see any obvious problems on 8xx and hash/32 w/ using PAGE_SHARED?
>
> 
> No it shouldn't be a problem AFAICS, except maybe the CPU overhead it
> brings as I mentioned previously (ftrace selftests going from 40 to 43
> seconds ie 8% overhead.
>
Ok great. I will have some performance numbers for POWER8 and POWER9 with
the next spin of the RFC
> 
> > 
> > I don't necessarily want to drop the local mm patching idea for non-radix
> > platforms since that means we would have to maintain two implementations.
> > 
>
> 
> What's the problem with RADIX, why can't PAGE_SHARED be used on radix ?
>
It's not a problem. I would actually prefer to use PAGE_KERNEL since the
mapping is really for a kernel page. On radix using PAGE_KERNEL allows us
to avoid the KUAP functions due to the HW implementation (AMR and EAA).
> 
> Christophe
>
> 
>
>
diff mbox series

Patch

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index f156132e8975..4ccff427592e 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -97,6 +97,7 @@  static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
 	}
 
 	pte = mk_pte(page, pgprot);
+	pte = pte_mkdirty(pte);
 	set_pte_at(patching_mm, patching_addr, ptep, pte);
 
 	init_temp_mm(&patch_mapping->temp_mm, patching_mm);
@@ -168,7 +169,9 @@  static int do_patch_instruction(unsigned int *addr, unsigned int instr)
 			(offset_in_page((unsigned long)addr) /
 				sizeof(unsigned int));
 
+	allow_write_to_user(patch_addr, sizeof(instr));
 	__patch_instruction(addr, instr, patch_addr);
+	prevent_write_to_user(patch_addr, sizeof(instr));
 
 	err = unmap_patch(&patch_mapping);
 	if (err)
@@ -179,7 +182,7 @@  static int do_patch_instruction(unsigned int *addr, unsigned int instr)
 	 * think we just wrote.
 	 * XXX: BUG_ON() instead?
 	 */
-	WARN_ON(memcmp(addr, &instr, sizeof(instr)));
+	WARN_ON(*addr != instr);
 
 out:
 	local_irq_restore(flags);