[v3,1/3] powerpc/mm: prepare kernel for KAsan on PPC32
diff mbox series

Message ID 0c854dd6b110ac2b81ef1681f6e097f59f84af8b.1547289808.git.christophe.leroy@c-s.fr
State Superseded
Headers show
Series
  • KASAN for powerpc/32
Related show

Checks

Context Check Description
snowpatch_ozlabs/checkpatch success total: 0 errors, 0 warnings, 0 checks, 28 lines checked
snowpatch_ozlabs/apply_patch success next/apply_patch Successfully applied

Commit Message

Christophe Leroy Jan. 12, 2019, 11:16 a.m. UTC
In kernel/cputable.c, explicitly use memcpy() in order
to allow GCC to replace it with __memcpy() when KASAN is
selected.

Since commit 400c47d81ca38 ("powerpc32: memset: only use dcbz once cache is
enabled"), memset() can be used before activation of the cache,
so no need to use memset_io() for zeroing the BSS.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/kernel/cputable.c | 4 ++--
 arch/powerpc/kernel/setup_32.c | 6 ++----
 2 files changed, 4 insertions(+), 6 deletions(-)

Comments

Dmitry Vyukov Jan. 14, 2019, 9:34 a.m. UTC | #1
On Sat, Jan 12, 2019 at 12:16 PM Christophe Leroy
<christophe.leroy@c-s.fr> wrote:
&gt;
&gt; In kernel/cputable.c, explicitly use memcpy() in order
&gt; to allow GCC to replace it with __memcpy() when KASAN is
&gt; selected.
&gt;
&gt; Since commit 400c47d81ca38 ("powerpc32: memset: only use dcbz once cache is
&gt; enabled"), memset() can be used before activation of the cache,
&gt; so no need to use memset_io() for zeroing the BSS.
&gt;
&gt; Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
&gt; ---
&gt;  arch/powerpc/kernel/cputable.c | 4 ++--
&gt;  arch/powerpc/kernel/setup_32.c | 6 ++----
&gt;  2 files changed, 4 insertions(+), 6 deletions(-)
&gt;
&gt; diff --git a/arch/powerpc/kernel/cputable.c
b/arch/powerpc/kernel/cputable.c
&gt; index 1eab54bc6ee9..84814c8d1bcb 100644
&gt; --- a/arch/powerpc/kernel/cputable.c
&gt; +++ b/arch/powerpc/kernel/cputable.c
&gt; @@ -2147,7 +2147,7 @@ void __init set_cur_cpu_spec(struct cpu_spec *s)
&gt;         struct cpu_spec *t = &amp;the_cpu_spec;
&gt;
&gt;         t = PTRRELOC(t);
&gt; -       *t = *s;
&gt; +       memcpy(t, s, sizeof(*t));

Hi Christophe,

I understand why you are doing this, but this looks a bit fragile and
non-scalable. This may not work with the next version of compiler,
just different than yours version of compiler, clang, etc.

Does using -ffreestanding and/or -fno-builtin-memcpy (-memset) help?
If it helps, perhaps it makes sense to add these flags to
KASAN_SANITIZE := n files.


>         *PTRRELOC(&cur_cpu_spec) = &the_cpu_spec;
>  }
> @@ -2162,7 +2162,7 @@ static struct cpu_spec * __init setup_cpu_spec(unsigned long offset,
>         old = *t;
>
>         /* Copy everything, then do fixups */
> -       *t = *s;
> +       memcpy(t, s, sizeof(*t));
>
>         /*
>          * If we are overriding a previous value derived from the real
> diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
> index 947f904688b0..5e761eb16a6d 100644
> --- a/arch/powerpc/kernel/setup_32.c
> +++ b/arch/powerpc/kernel/setup_32.c
> @@ -73,10 +73,8 @@ notrace unsigned long __init early_init(unsigned long dt_ptr)
>  {
>         unsigned long offset = reloc_offset();
>
> -       /* First zero the BSS -- use memset_io, some platforms don't have
> -        * caches on yet */
> -       memset_io((void __iomem *)PTRRELOC(&__bss_start), 0,
> -                       __bss_stop - __bss_start);
> +       /* First zero the BSS */
> +       memset(PTRRELOC(&__bss_start), 0, __bss_stop - __bss_start);
>
>         /*
>          * Identify the CPU type and fix up code sections
> --
> 2.13.3
>
Christophe Leroy Jan. 15, 2019, 7:27 a.m. UTC | #2
On 01/14/2019 09:34 AM, Dmitry Vyukov wrote:
> On Sat, Jan 12, 2019 at 12:16 PM Christophe Leroy
> <christophe.leroy@c-s.fr> wrote:
> &gt;
> &gt; In kernel/cputable.c, explicitly use memcpy() in order
> &gt; to allow GCC to replace it with __memcpy() when KASAN is
> &gt; selected.
> &gt;
> &gt; Since commit 400c47d81ca38 ("powerpc32: memset: only use dcbz once cache is
> &gt; enabled"), memset() can be used before activation of the cache,
> &gt; so no need to use memset_io() for zeroing the BSS.
> &gt;
> &gt; Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> &gt; ---
> &gt;  arch/powerpc/kernel/cputable.c | 4 ++--
> &gt;  arch/powerpc/kernel/setup_32.c | 6 ++----
> &gt;  2 files changed, 4 insertions(+), 6 deletions(-)
> &gt;
> &gt; diff --git a/arch/powerpc/kernel/cputable.c
> b/arch/powerpc/kernel/cputable.c
> &gt; index 1eab54bc6ee9..84814c8d1bcb 100644
> &gt; --- a/arch/powerpc/kernel/cputable.c
> &gt; +++ b/arch/powerpc/kernel/cputable.c
> &gt; @@ -2147,7 +2147,7 @@ void __init set_cur_cpu_spec(struct cpu_spec *s)
> &gt;         struct cpu_spec *t = &amp;the_cpu_spec;
> &gt;
> &gt;         t = PTRRELOC(t);
> &gt; -       *t = *s;
> &gt; +       memcpy(t, s, sizeof(*t));
> 
> Hi Christophe,
> 
> I understand why you are doing this, but this looks a bit fragile and
> non-scalable. This may not work with the next version of compiler,
> just different than yours version of compiler, clang, etc.

My felling would be that this change makes it more solid.

My understanding is that when you do *t = *s, the compiler can use 
whatever way it wants to do the copy.
When you do memcpy(), you ensure it will do it that way and not another 
way, don't you ?

My problem is that when using *t = *s, the function set_cur_cpu_spec() 
always calls memcpy(), not taking into account the following define 
which is in arch/powerpc/include/asm/string.h (other arches do the same):

#if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
/*
  * For files that are not instrumented (e.g. mm/slub.c) we
  * should use not instrumented version of mem* functions.
  */
#define memcpy(dst, src, len) __memcpy(dst, src, len)
#define memmove(dst, src, len) __memmove(dst, src, len)
#define memset(s, c, n) __memset(s, c, n)
#endif

void __init set_cur_cpu_spec(struct cpu_spec *s)
{
	struct cpu_spec *t = &the_cpu_spec;

	t = PTRRELOC(t);
	*t = *s;

	*PTRRELOC(&cur_cpu_spec) = &the_cpu_spec;
}

00000000 <set_cur_cpu_spec>:
    0:   94 21 ff f0     stwu    r1,-16(r1)
    4:   7c 08 02 a6     mflr    r0
    8:   bf c1 00 08     stmw    r30,8(r1)
    c:   3f e0 00 00     lis     r31,0
                         e: R_PPC_ADDR16_HA      .data..read_mostly
   10:   3b ff 00 00     addi    r31,r31,0
                         12: R_PPC_ADDR16_LO     .data..read_mostly
   14:   7c 7e 1b 78     mr      r30,r3
   18:   7f e3 fb 78     mr      r3,r31
   1c:   90 01 00 14     stw     r0,20(r1)
   20:   48 00 00 01     bl      20 <set_cur_cpu_spec+0x20>
                         20: R_PPC_REL24 add_reloc_offset
   24:   7f c4 f3 78     mr      r4,r30
   28:   38 a0 00 58     li      r5,88
   2c:   48 00 00 01     bl      2c <set_cur_cpu_spec+0x2c>
                         2c: R_PPC_REL24 memcpy
   30:   38 7f 00 58     addi    r3,r31,88
   34:   48 00 00 01     bl      34 <set_cur_cpu_spec+0x34>
                         34: R_PPC_REL24 add_reloc_offset
   38:   93 e3 00 00     stw     r31,0(r3)
   3c:   80 01 00 14     lwz     r0,20(r1)
   40:   bb c1 00 08     lmw     r30,8(r1)
   44:   7c 08 03 a6     mtlr    r0
   48:   38 21 00 10     addi    r1,r1,16
   4c:   4e 80 00 20     blr


When replacing *t = *s by memcpy(t, s, sizeof(*t)), GCC replace it by 
__memcpy() as expected.

> 
> Does using -ffreestanding and/or -fno-builtin-memcpy (-memset) help?

No it doesn't and to be honest I can't see how it would. My 
understanding is that it could be even worse because it would mean 
adding calls to memcpy() also in all trivial places where GCC does the 
copy itself by default.

Do you see any alternative ?

Christophe

> If it helps, perhaps it makes sense to add these flags to
> KASAN_SANITIZE := n files.
> 
> 
>>          *PTRRELOC(&cur_cpu_spec) = &the_cpu_spec;
>>   }
>> @@ -2162,7 +2162,7 @@ static struct cpu_spec * __init setup_cpu_spec(unsigned long offset,
>>          old = *t;
>>
>>          /* Copy everything, then do fixups */
>> -       *t = *s;
>> +       memcpy(t, s, sizeof(*t));
>>
>>          /*
>>           * If we are overriding a previous value derived from the real
>> diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
>> index 947f904688b0..5e761eb16a6d 100644
>> --- a/arch/powerpc/kernel/setup_32.c
>> +++ b/arch/powerpc/kernel/setup_32.c
>> @@ -73,10 +73,8 @@ notrace unsigned long __init early_init(unsigned long dt_ptr)
>>   {
>>          unsigned long offset = reloc_offset();
>>
>> -       /* First zero the BSS -- use memset_io, some platforms don't have
>> -        * caches on yet */
>> -       memset_io((void __iomem *)PTRRELOC(&__bss_start), 0,
>> -                       __bss_stop - __bss_start);
>> +       /* First zero the BSS */
>> +       memset(PTRRELOC(&__bss_start), 0, __bss_stop - __bss_start);
>>
>>          /*
>>           * Identify the CPU type and fix up code sections
>> --
>> 2.13.3
>>
Dmitry Vyukov Jan. 15, 2019, 11:14 a.m. UTC | #3
On Tue, Jan 15, 2019 at 8:27 AM Christophe Leroy
<christophe.leroy@c-s.fr> wrote:
>
>
>
> On 01/14/2019 09:34 AM, Dmitry Vyukov wrote:
> > On Sat, Jan 12, 2019 at 12:16 PM Christophe Leroy
> > <christophe.leroy@c-s.fr> wrote:
> > &gt;
> > &gt; In kernel/cputable.c, explicitly use memcpy() in order
> > &gt; to allow GCC to replace it with __memcpy() when KASAN is
> > &gt; selected.
> > &gt;
> > &gt; Since commit 400c47d81ca38 ("powerpc32: memset: only use dcbz once cache is
> > &gt; enabled"), memset() can be used before activation of the cache,
> > &gt; so no need to use memset_io() for zeroing the BSS.
> > &gt;
> > &gt; Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > &gt; ---
> > &gt;  arch/powerpc/kernel/cputable.c | 4 ++--
> > &gt;  arch/powerpc/kernel/setup_32.c | 6 ++----
> > &gt;  2 files changed, 4 insertions(+), 6 deletions(-)
> > &gt;
> > &gt; diff --git a/arch/powerpc/kernel/cputable.c
> > b/arch/powerpc/kernel/cputable.c
> > &gt; index 1eab54bc6ee9..84814c8d1bcb 100644
> > &gt; --- a/arch/powerpc/kernel/cputable.c
> > &gt; +++ b/arch/powerpc/kernel/cputable.c
> > &gt; @@ -2147,7 +2147,7 @@ void __init set_cur_cpu_spec(struct cpu_spec *s)
> > &gt;         struct cpu_spec *t = &amp;the_cpu_spec;
> > &gt;
> > &gt;         t = PTRRELOC(t);
> > &gt; -       *t = *s;
> > &gt; +       memcpy(t, s, sizeof(*t));
> >
> > Hi Christophe,
> >
> > I understand why you are doing this, but this looks a bit fragile and
> > non-scalable. This may not work with the next version of compiler,
> > just different than yours version of compiler, clang, etc.
>
> My felling would be that this change makes it more solid.
>
> My understanding is that when you do *t = *s, the compiler can use
> whatever way it wants to do the copy.
> When you do memcpy(), you ensure it will do it that way and not another
> way, don't you ?

It makes this single line more deterministic wrt code-gen (though,
strictly saying compiler can turn memcpy back into inlines
instructions, it knows memcpy semantics anyway).
But the problem I meant is that the set of places that are subject to
this problem is not deterministic. So if we go with this solution,
after this change it's in the status "works on your machine" and we
either need to commit to not using struct copies and zeroing
throughout kernel code or potentially have a long tail of other
similar cases, and since they can be triggered by another compiler
version, we may need to backport these changes to previous releases
too. Whereas if we would go with compiler flags, it would prevent the
problem in all current and future places and with other past/future
versions of compilers.


> My problem is that when using *t = *s, the function set_cur_cpu_spec()
> always calls memcpy(), not taking into account the following define
> which is in arch/powerpc/include/asm/string.h (other arches do the same):
>
> #if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
> /*
>   * For files that are not instrumented (e.g. mm/slub.c) we
>   * should use not instrumented version of mem* functions.
>   */
> #define memcpy(dst, src, len) __memcpy(dst, src, len)
> #define memmove(dst, src, len) __memmove(dst, src, len)
> #define memset(s, c, n) __memset(s, c, n)
> #endif
>
> void __init set_cur_cpu_spec(struct cpu_spec *s)
> {
>         struct cpu_spec *t = &the_cpu_spec;
>
>         t = PTRRELOC(t);
>         *t = *s;
>
>         *PTRRELOC(&cur_cpu_spec) = &the_cpu_spec;
> }
>
> 00000000 <set_cur_cpu_spec>:
>     0:   94 21 ff f0     stwu    r1,-16(r1)
>     4:   7c 08 02 a6     mflr    r0
>     8:   bf c1 00 08     stmw    r30,8(r1)
>     c:   3f e0 00 00     lis     r31,0
>                          e: R_PPC_ADDR16_HA      .data..read_mostly
>    10:   3b ff 00 00     addi    r31,r31,0
>                          12: R_PPC_ADDR16_LO     .data..read_mostly
>    14:   7c 7e 1b 78     mr      r30,r3
>    18:   7f e3 fb 78     mr      r3,r31
>    1c:   90 01 00 14     stw     r0,20(r1)
>    20:   48 00 00 01     bl      20 <set_cur_cpu_spec+0x20>
>                          20: R_PPC_REL24 add_reloc_offset
>    24:   7f c4 f3 78     mr      r4,r30
>    28:   38 a0 00 58     li      r5,88
>    2c:   48 00 00 01     bl      2c <set_cur_cpu_spec+0x2c>
>                          2c: R_PPC_REL24 memcpy
>    30:   38 7f 00 58     addi    r3,r31,88
>    34:   48 00 00 01     bl      34 <set_cur_cpu_spec+0x34>
>                          34: R_PPC_REL24 add_reloc_offset
>    38:   93 e3 00 00     stw     r31,0(r3)
>    3c:   80 01 00 14     lwz     r0,20(r1)
>    40:   bb c1 00 08     lmw     r30,8(r1)
>    44:   7c 08 03 a6     mtlr    r0
>    48:   38 21 00 10     addi    r1,r1,16
>    4c:   4e 80 00 20     blr
>
>
> When replacing *t = *s by memcpy(t, s, sizeof(*t)), GCC replace it by
> __memcpy() as expected.
>
> >
> > Does using -ffreestanding and/or -fno-builtin-memcpy (-memset) help?
>
> No it doesn't and to be honest I can't see how it would. My
> understanding is that it could be even worse because it would mean
> adding calls to memcpy() also in all trivial places where GCC does the
> copy itself by default.

The idea was that with -ffreestanding compiler must not assume
presence of any runtime support library, so it must not emit any calls
that are not explicitly present in the source code. However, after
reading more docs, it seems that even with -ffreestanding gcc and
clang still assume presence of a runtime library that provides at
least memcpy,  memmove, memset and memcmp. There does not seem to be a
way to prevent clang and gcc from doing it. So I guess this approach
is our only option:

Acked-by: Dmitry Vyukov <dvyukov@google.com>

Though, a comment may be useful so that a next person does not try to
revert it back.


> Do you see any alternative ?
>
> Christophe
>
> > If it helps, perhaps it makes sense to add these flags to
> > KASAN_SANITIZE := n files.
> >
> >
> >>          *PTRRELOC(&cur_cpu_spec) = &the_cpu_spec;
> >>   }
> >> @@ -2162,7 +2162,7 @@ static struct cpu_spec * __init setup_cpu_spec(unsigned long offset,
> >>          old = *t;
> >>
> >>          /* Copy everything, then do fixups */
> >> -       *t = *s;
> >> +       memcpy(t, s, sizeof(*t));
> >>
> >>          /*
> >>           * If we are overriding a previous value derived from the real
> >> diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
> >> index 947f904688b0..5e761eb16a6d 100644
> >> --- a/arch/powerpc/kernel/setup_32.c
> >> +++ b/arch/powerpc/kernel/setup_32.c
> >> @@ -73,10 +73,8 @@ notrace unsigned long __init early_init(unsigned long dt_ptr)
> >>   {
> >>          unsigned long offset = reloc_offset();
> >>
> >> -       /* First zero the BSS -- use memset_io, some platforms don't have
> >> -        * caches on yet */
> >> -       memset_io((void __iomem *)PTRRELOC(&__bss_start), 0,
> >> -                       __bss_stop - __bss_start);
> >> +       /* First zero the BSS */
> >> +       memset(PTRRELOC(&__bss_start), 0, __bss_stop - __bss_start);
> >>
> >>          /*
> >>           * Identify the CPU type and fix up code sections
> >> --
> >> 2.13.3
> >>
Andrey Ryabinin Jan. 15, 2019, 5:07 p.m. UTC | #4
On 1/15/19 2:14 PM, Dmitry Vyukov wrote:
> On Tue, Jan 15, 2019 at 8:27 AM Christophe Leroy
> <christophe.leroy@c-s.fr> wrote:
>> On 01/14/2019 09:34 AM, Dmitry Vyukov wrote:
>>> On Sat, Jan 12, 2019 at 12:16 PM Christophe Leroy
>>> <christophe.leroy@c-s.fr> wrote:
>>> &gt;
>>> &gt; In kernel/cputable.c, explicitly use memcpy() in order
>>> &gt; to allow GCC to replace it with __memcpy() when KASAN is
>>> &gt; selected.
>>> &gt;
>>> &gt; Since commit 400c47d81ca38 ("powerpc32: memset: only use dcbz once cache is
>>> &gt; enabled"), memset() can be used before activation of the cache,
>>> &gt; so no need to use memset_io() for zeroing the BSS.
>>> &gt;
>>> &gt; Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>>> &gt; ---
>>> &gt;  arch/powerpc/kernel/cputable.c | 4 ++--
>>> &gt;  arch/powerpc/kernel/setup_32.c | 6 ++----
>>> &gt;  2 files changed, 4 insertions(+), 6 deletions(-)
>>> &gt;
>>> &gt; diff --git a/arch/powerpc/kernel/cputable.c
>>> b/arch/powerpc/kernel/cputable.c
>>> &gt; index 1eab54bc6ee9..84814c8d1bcb 100644
>>> &gt; --- a/arch/powerpc/kernel/cputable.c
>>> &gt; +++ b/arch/powerpc/kernel/cputable.c
>>> &gt; @@ -2147,7 +2147,7 @@ void __init set_cur_cpu_spec(struct cpu_spec *s)
>>> &gt;         struct cpu_spec *t = &amp;the_cpu_spec;
>>> &gt;
>>> &gt;         t = PTRRELOC(t);
>>> &gt; -       *t = *s;
>>> &gt; +       memcpy(t, s, sizeof(*t));
>>>
>>> Hi Christophe,
>>>
>>> I understand why you are doing this, but this looks a bit fragile and
>>> non-scalable. This may not work with the next version of compiler,
>>> just different than yours version of compiler, clang, etc.
>>
>> My felling would be that this change makes it more solid.
>>
>> My understanding is that when you do *t = *s, the compiler can use
>> whatever way it wants to do the copy.
>> When you do memcpy(), you ensure it will do it that way and not another
>> way, don't you ?
> 
> It makes this single line more deterministic wrt code-gen (though,
> strictly saying compiler can turn memcpy back into inlines
> instructions, it knows memcpy semantics anyway).
> But the problem I meant is that the set of places that are subject to
> this problem is not deterministic. So if we go with this solution,
> after this change it's in the status "works on your machine" and we
> either need to commit to not using struct copies and zeroing
> throughout kernel code or potentially have a long tail of other
> similar cases, and since they can be triggered by another compiler
> version, we may need to backport these changes to previous releases
> too. Whereas if we would go with compiler flags, it would prevent the
> problem in all current and future places and with other past/future
> versions of compilers.
> 

The patch will work for any compiler. The point of this patch is to make
memcpy() visible to the preprocessor which will replace it with __memcpy().

After preprocessor's work, compiler will see just __memcpy() call here.
Dmitry Vyukov Jan. 15, 2019, 5:10 p.m. UTC | #5
On Tue, Jan 15, 2019 at 6:06 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
>
>
>
> On 1/15/19 2:14 PM, Dmitry Vyukov wrote:
> > On Tue, Jan 15, 2019 at 8:27 AM Christophe Leroy
> > <christophe.leroy@c-s.fr> wrote:
> >> On 01/14/2019 09:34 AM, Dmitry Vyukov wrote:
> >>> On Sat, Jan 12, 2019 at 12:16 PM Christophe Leroy
> >>> <christophe.leroy@c-s.fr> wrote:
> >>> &gt;
> >>> &gt; In kernel/cputable.c, explicitly use memcpy() in order
> >>> &gt; to allow GCC to replace it with __memcpy() when KASAN is
> >>> &gt; selected.
> >>> &gt;
> >>> &gt; Since commit 400c47d81ca38 ("powerpc32: memset: only use dcbz once cache is
> >>> &gt; enabled"), memset() can be used before activation of the cache,
> >>> &gt; so no need to use memset_io() for zeroing the BSS.
> >>> &gt;
> >>> &gt; Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> >>> &gt; ---
> >>> &gt;  arch/powerpc/kernel/cputable.c | 4 ++--
> >>> &gt;  arch/powerpc/kernel/setup_32.c | 6 ++----
> >>> &gt;  2 files changed, 4 insertions(+), 6 deletions(-)
> >>> &gt;
> >>> &gt; diff --git a/arch/powerpc/kernel/cputable.c
> >>> b/arch/powerpc/kernel/cputable.c
> >>> &gt; index 1eab54bc6ee9..84814c8d1bcb 100644
> >>> &gt; --- a/arch/powerpc/kernel/cputable.c
> >>> &gt; +++ b/arch/powerpc/kernel/cputable.c
> >>> &gt; @@ -2147,7 +2147,7 @@ void __init set_cur_cpu_spec(struct cpu_spec *s)
> >>> &gt;         struct cpu_spec *t = &amp;the_cpu_spec;
> >>> &gt;
> >>> &gt;         t = PTRRELOC(t);
> >>> &gt; -       *t = *s;
> >>> &gt; +       memcpy(t, s, sizeof(*t));
> >>>
> >>> Hi Christophe,
> >>>
> >>> I understand why you are doing this, but this looks a bit fragile and
> >>> non-scalable. This may not work with the next version of compiler,
> >>> just different than yours version of compiler, clang, etc.
> >>
> >> My felling would be that this change makes it more solid.
> >>
> >> My understanding is that when you do *t = *s, the compiler can use
> >> whatever way it wants to do the copy.
> >> When you do memcpy(), you ensure it will do it that way and not another
> >> way, don't you ?
> >
> > It makes this single line more deterministic wrt code-gen (though,
> > strictly saying compiler can turn memcpy back into inlines
> > instructions, it knows memcpy semantics anyway).
> > But the problem I meant is that the set of places that are subject to
> > this problem is not deterministic. So if we go with this solution,
> > after this change it's in the status "works on your machine" and we
> > either need to commit to not using struct copies and zeroing
> > throughout kernel code or potentially have a long tail of other
> > similar cases, and since they can be triggered by another compiler
> > version, we may need to backport these changes to previous releases
> > too. Whereas if we would go with compiler flags, it would prevent the
> > problem in all current and future places and with other past/future
> > versions of compilers.
> >
>
> The patch will work for any compiler. The point of this patch is to make
> memcpy() visible to the preprocessor which will replace it with __memcpy().

For this single line, yes. But it does not mean that KASAN will work.

> After preprocessor's work, compiler will see just __memcpy() call here.
Christophe Leroy Jan. 15, 2019, 5:25 p.m. UTC | #6
Le 15/01/2019 à 18:10, Dmitry Vyukov a écrit :
> On Tue, Jan 15, 2019 at 6:06 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
>>
>>
>>
>> On 1/15/19 2:14 PM, Dmitry Vyukov wrote:
>>> On Tue, Jan 15, 2019 at 8:27 AM Christophe Leroy
>>> <christophe.leroy@c-s.fr> wrote:
>>>> On 01/14/2019 09:34 AM, Dmitry Vyukov wrote:
>>>>> On Sat, Jan 12, 2019 at 12:16 PM Christophe Leroy
>>>>> <christophe.leroy@c-s.fr> wrote:
>>>>> &gt;
>>>>> &gt; In kernel/cputable.c, explicitly use memcpy() in order
>>>>> &gt; to allow GCC to replace it with __memcpy() when KASAN is
>>>>> &gt; selected.
>>>>> &gt;
>>>>> &gt; Since commit 400c47d81ca38 ("powerpc32: memset: only use dcbz once cache is
>>>>> &gt; enabled"), memset() can be used before activation of the cache,
>>>>> &gt; so no need to use memset_io() for zeroing the BSS.
>>>>> &gt;
>>>>> &gt; Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>>>>> &gt; ---
>>>>> &gt;  arch/powerpc/kernel/cputable.c | 4 ++--
>>>>> &gt;  arch/powerpc/kernel/setup_32.c | 6 ++----
>>>>> &gt;  2 files changed, 4 insertions(+), 6 deletions(-)
>>>>> &gt;
>>>>> &gt; diff --git a/arch/powerpc/kernel/cputable.c
>>>>> b/arch/powerpc/kernel/cputable.c
>>>>> &gt; index 1eab54bc6ee9..84814c8d1bcb 100644
>>>>> &gt; --- a/arch/powerpc/kernel/cputable.c
>>>>> &gt; +++ b/arch/powerpc/kernel/cputable.c
>>>>> &gt; @@ -2147,7 +2147,7 @@ void __init set_cur_cpu_spec(struct cpu_spec *s)
>>>>> &gt;         struct cpu_spec *t = &amp;the_cpu_spec;
>>>>> &gt;
>>>>> &gt;         t = PTRRELOC(t);
>>>>> &gt; -       *t = *s;
>>>>> &gt; +       memcpy(t, s, sizeof(*t));
>>>>>
>>>>> Hi Christophe,
>>>>>
>>>>> I understand why you are doing this, but this looks a bit fragile and
>>>>> non-scalable. This may not work with the next version of compiler,
>>>>> just different than yours version of compiler, clang, etc.
>>>>
>>>> My felling would be that this change makes it more solid.
>>>>
>>>> My understanding is that when you do *t = *s, the compiler can use
>>>> whatever way it wants to do the copy.
>>>> When you do memcpy(), you ensure it will do it that way and not another
>>>> way, don't you ?
>>>
>>> It makes this single line more deterministic wrt code-gen (though,
>>> strictly saying compiler can turn memcpy back into inlines
>>> instructions, it knows memcpy semantics anyway).
>>> But the problem I meant is that the set of places that are subject to
>>> this problem is not deterministic. So if we go with this solution,
>>> after this change it's in the status "works on your machine" and we
>>> either need to commit to not using struct copies and zeroing
>>> throughout kernel code or potentially have a long tail of other
>>> similar cases, and since they can be triggered by another compiler
>>> version, we may need to backport these changes to previous releases
>>> too. Whereas if we would go with compiler flags, it would prevent the
>>> problem in all current and future places and with other past/future
>>> versions of compilers.
>>>
>>
>> The patch will work for any compiler. The point of this patch is to make
>> memcpy() visible to the preprocessor which will replace it with __memcpy().
> 
> For this single line, yes. But it does not mean that KASAN will work.
> 
>> After preprocessor's work, compiler will see just __memcpy() call here.

This problem can affect any arch I believe. Maybe the 'solution' would 
be to run a generic script similar to 
arch/powerpc/kernel/prom_init_check.sh on all objects compiled with 
KASAN_SANITIZE_object.o := n don't include any reference to memcpy() 
memset() or memmove() ?

Christophe
Dmitry Vyukov Jan. 16, 2019, 10:03 a.m. UTC | #7
On Tue, Jan 15, 2019 at 6:25 PM Christophe Leroy
<christophe.leroy@c-s.fr> wrote:
>
> Le 15/01/2019 à 18:10, Dmitry Vyukov a écrit :
> > On Tue, Jan 15, 2019 at 6:06 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
> >>
> >> On 1/15/19 2:14 PM, Dmitry Vyukov wrote:
> >>> On Tue, Jan 15, 2019 at 8:27 AM Christophe Leroy
> >>> <christophe.leroy@c-s.fr> wrote:
> >>>> On 01/14/2019 09:34 AM, Dmitry Vyukov wrote:
> >>>>> On Sat, Jan 12, 2019 at 12:16 PM Christophe Leroy
> >>>>> <christophe.leroy@c-s.fr> wrote:
> >>>>> &gt;
> >>>>> &gt; In kernel/cputable.c, explicitly use memcpy() in order
> >>>>> &gt; to allow GCC to replace it with __memcpy() when KASAN is
> >>>>> &gt; selected.
> >>>>> &gt;
> >>>>> &gt; Since commit 400c47d81ca38 ("powerpc32: memset: only use dcbz once cache is
> >>>>> &gt; enabled"), memset() can be used before activation of the cache,
> >>>>> &gt; so no need to use memset_io() for zeroing the BSS.
> >>>>> &gt;
> >>>>> &gt; Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> >>>>> &gt; ---
> >>>>> &gt;  arch/powerpc/kernel/cputable.c | 4 ++--
> >>>>> &gt;  arch/powerpc/kernel/setup_32.c | 6 ++----
> >>>>> &gt;  2 files changed, 4 insertions(+), 6 deletions(-)
> >>>>> &gt;
> >>>>> &gt; diff --git a/arch/powerpc/kernel/cputable.c
> >>>>> b/arch/powerpc/kernel/cputable.c
> >>>>> &gt; index 1eab54bc6ee9..84814c8d1bcb 100644
> >>>>> &gt; --- a/arch/powerpc/kernel/cputable.c
> >>>>> &gt; +++ b/arch/powerpc/kernel/cputable.c
> >>>>> &gt; @@ -2147,7 +2147,7 @@ void __init set_cur_cpu_spec(struct cpu_spec *s)
> >>>>> &gt;         struct cpu_spec *t = &amp;the_cpu_spec;
> >>>>> &gt;
> >>>>> &gt;         t = PTRRELOC(t);
> >>>>> &gt; -       *t = *s;
> >>>>> &gt; +       memcpy(t, s, sizeof(*t));
> >>>>>
> >>>>> Hi Christophe,
> >>>>>
> >>>>> I understand why you are doing this, but this looks a bit fragile and
> >>>>> non-scalable. This may not work with the next version of compiler,
> >>>>> just different than yours version of compiler, clang, etc.
> >>>>
> >>>> My felling would be that this change makes it more solid.
> >>>>
> >>>> My understanding is that when you do *t = *s, the compiler can use
> >>>> whatever way it wants to do the copy.
> >>>> When you do memcpy(), you ensure it will do it that way and not another
> >>>> way, don't you ?
> >>>
> >>> It makes this single line more deterministic wrt code-gen (though,
> >>> strictly saying compiler can turn memcpy back into inlines
> >>> instructions, it knows memcpy semantics anyway).
> >>> But the problem I meant is that the set of places that are subject to
> >>> this problem is not deterministic. So if we go with this solution,
> >>> after this change it's in the status "works on your machine" and we
> >>> either need to commit to not using struct copies and zeroing
> >>> throughout kernel code or potentially have a long tail of other
> >>> similar cases, and since they can be triggered by another compiler
> >>> version, we may need to backport these changes to previous releases
> >>> too. Whereas if we would go with compiler flags, it would prevent the
> >>> problem in all current and future places and with other past/future
> >>> versions of compilers.
> >>>
> >>
> >> The patch will work for any compiler. The point of this patch is to make
> >> memcpy() visible to the preprocessor which will replace it with __memcpy().
> >
> > For this single line, yes. But it does not mean that KASAN will work.
> >
> >> After preprocessor's work, compiler will see just __memcpy() call here.
>
> This problem can affect any arch I believe. Maybe the 'solution' would
> be to run a generic script similar to
> arch/powerpc/kernel/prom_init_check.sh on all objects compiled with
> KASAN_SANITIZE_object.o := n don't include any reference to memcpy()
> memset() or memmove() ?


We do this when building user-space sanitizers runtime. There all code
always runs with sanitizer enabled, but at the same time must not be
instrumented. So we committed to changing all possible memcpy/memset
injection points and have a script that checks that we indeed have no
such calls at any paths. There problem is a bit simpler as we don't
have gazillion combinations of configs and the runtime is usually
self-hosted (as it is bundled with compiler), so we know what compiler
is used to build it. And that all is checked on CI.
I don't know how much work it is to do the same for kernel, though.
Adding -ffreestanding, if worked, looked like a cheap option to
achieve the same.

Another option is to insert checks into KASAN's memcpy/memset that at
least some early init has completed. If early init hasn't finished
yet, then they could skip all additional work besides just doing
memcpy/memset. We can't afford this for memory access instrumentation
for performance reasons, but it should be bearable for memcpy/memset.

Patch
diff mbox series

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 1eab54bc6ee9..84814c8d1bcb 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -2147,7 +2147,7 @@  void __init set_cur_cpu_spec(struct cpu_spec *s)
 	struct cpu_spec *t = &the_cpu_spec;
 
 	t = PTRRELOC(t);
-	*t = *s;
+	memcpy(t, s, sizeof(*t));
 
 	*PTRRELOC(&cur_cpu_spec) = &the_cpu_spec;
 }
@@ -2162,7 +2162,7 @@  static struct cpu_spec * __init setup_cpu_spec(unsigned long offset,
 	old = *t;
 
 	/* Copy everything, then do fixups */
-	*t = *s;
+	memcpy(t, s, sizeof(*t));
 
 	/*
 	 * If we are overriding a previous value derived from the real
diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index 947f904688b0..5e761eb16a6d 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -73,10 +73,8 @@  notrace unsigned long __init early_init(unsigned long dt_ptr)
 {
 	unsigned long offset = reloc_offset();
 
-	/* First zero the BSS -- use memset_io, some platforms don't have
-	 * caches on yet */
-	memset_io((void __iomem *)PTRRELOC(&__bss_start), 0,
-			__bss_stop - __bss_start);
+	/* First zero the BSS */
+	memset(PTRRELOC(&__bss_start), 0, __bss_stop - __bss_start);
 
 	/*
 	 * Identify the CPU type and fix up code sections