Message ID | 87r3mb4603.fsf@steelpick.2x.cz (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Hi Michal, Thanks for finding the problem. On Sun, 2015-09-06 at 23:01 +0200, Michal Sojka wrote: > > I found the problem. The compiler replaces an assignment with a call to > memcpy. The following patch fixes the problem for me. However, I'm not > sure whether this is the real solution. I guess the compiler is free to > generate a call to memcpy wherever it wants so other compilers or other > optimization levels may need fixes at other places. What do others > think? I think you're right that it's not a good solution, the compiler could generate other calls to memcpy depending on various factors, and people will add new code that causes memcpy to get called and it will break your platform. Christophe, am I right that the problem here is that your new memcpy() doesn't work until later in boot when caches are enabled? cheers
Hi Michael Le 07/09/2015 03:14, Michael Ellerman a écrit : > Hi Michal, > > Thanks for finding the problem. > > On Sun, 2015-09-06 at 23:01 +0200, Michal Sojka wrote: >> I found the problem. The compiler replaces an assignment with a call to >> memcpy. The following patch fixes the problem for me. However, I'm not >> sure whether this is the real solution. I guess the compiler is free to >> generate a call to memcpy wherever it wants so other compilers or other >> optimization levels may need fixes at other places. What do others >> think? > I think you're right that it's not a good solution, the compiler could generate > other calls to memcpy depending on various factors, and people will add new > code that causes memcpy to get called and it will break your platform. > > Christophe, am I right that the problem here is that your new memcpy() doesn't > work until later in boot when caches are enabled? > > That's right, memset() and memcpy() are for setting/copying data into cacheable RAM. They are using dczb instruction in order to avoid wasting time loading the cacheline with data that will be overwritten. memset_io() and memcpy_toio() are the functions to use when using not cacheable memory. The issue identified by Michal is in function setup_cpu_spec() which is called by identify_cpu(). identify_cpu() is called from early_init(). In the begining of early_init(), there is (code from Paul in 2005) /* First zero the BSS -- use memset_io, some platforms don't have * caches on yet */ memset_io((void __iomem *)PTRRELOC(&__bss_start), 0, __bss_stop - __bss_start); It shows that it is already expected that the cache is not active yet and standard memset() shall not be used yet. That's the same with memcpy(). I think GCC uses memcpy() in well known situations like initialising structures or copying structures. Shouldn't we just avoid this kind of actions in the very few early init functions ? Christophe
On Mon, 2015-09-07 at 09:08 +0200, Christophe LEROY wrote: > Hi Michael > > Le 07/09/2015 03:14, Michael Ellerman a écrit : > > On Sun, 2015-09-06 at 23:01 +0200, Michal Sojka wrote: > >> I found the problem. The compiler replaces an assignment with a call to > >> memcpy. The following patch fixes the problem for me. However, I'm not > >> sure whether this is the real solution. I guess the compiler is free to > >> generate a call to memcpy wherever it wants so other compilers or other > >> optimization levels may need fixes at other places. What do others > >> think? > > I think you're right that it's not a good solution, the compiler could generate > > other calls to memcpy depending on various factors, and people will add new > > code that causes memcpy to get called and it will break your platform. > > > > Christophe, am I right that the problem here is that your new memcpy() doesn't > > work until later in boot when caches are enabled? > > That's right, memset() and memcpy() are for setting/copying data into > cacheable RAM. > They are using dczb instruction in order to avoid wasting time loading > the cacheline with data that will be overwritten. > > memset_io() and memcpy_toio() are the functions to use when using not > cacheable memory. > > The issue identified by Michal is in function setup_cpu_spec() which is > called by identify_cpu(). identify_cpu() is called from early_init(). > In the begining of early_init(), there is (code from Paul in 2005) > > /* First zero the BSS -- use memset_io, some platforms don't have > * caches on yet */ > memset_io((void __iomem *)PTRRELOC(&__bss_start), 0, > __bss_stop - __bss_start); > > It shows that it is already expected that the cache is not active yet > and standard memset() shall not be used yet. That's the same with memcpy(). Thanks for the explanation. > I think GCC uses memcpy() in well known situations like initialising > structures or copying structures. > Shouldn't we just avoid this kind of actions in the very few early init > functions ? Which are the "very few" early init functions? Can you make a list, for 32-bit and 64-bit? And can we keep it updated over time and not introduce regressions? cheers
On 7.9.2015 10:40, Michael Ellerman wrote: > On Mon, 2015-09-07 at 09:08 +0200, Christophe LEROY wrote: >> Hi Michael >> >> Le 07/09/2015 03:14, Michael Ellerman a écrit : >>> On Sun, 2015-09-06 at 23:01 +0200, Michal Sojka wrote: >>>> I found the problem. The compiler replaces an assignment with a call to >>>> memcpy. The following patch fixes the problem for me. However, I'm not >>>> sure whether this is the real solution. I guess the compiler is free to >>>> generate a call to memcpy wherever it wants so other compilers or other >>>> optimization levels may need fixes at other places. What do others >>>> think? >>> I think you're right that it's not a good solution, the compiler could generate >>> other calls to memcpy depending on various factors, and people will add new >>> code that causes memcpy to get called and it will break your platform. >>> >>> Christophe, am I right that the problem here is that your new memcpy() doesn't >>> work until later in boot when caches are enabled? >> That's right, memset() and memcpy() are for setting/copying data into >> cacheable RAM. >> They are using dczb instruction in order to avoid wasting time loading >> the cacheline with data that will be overwritten. >> >> memset_io() and memcpy_toio() are the functions to use when using not >> cacheable memory. >> >> The issue identified by Michal is in function setup_cpu_spec() which is >> called by identify_cpu(). identify_cpu() is called from early_init(). >> In the begining of early_init(), there is (code from Paul in 2005) >> >> /* First zero the BSS -- use memset_io, some platforms don't have >> * caches on yet */ >> memset_io((void __iomem *)PTRRELOC(&__bss_start), 0, >> __bss_stop - __bss_start); >> >> It shows that it is already expected that the cache is not active yet >> and standard memset() shall not be used yet. That's the same with memcpy(). > Thanks for the explanation. > >> I think GCC uses memcpy() in well known situations like initialising >> structures or copying structures. >> Shouldn't we just avoid this kind of actions in the very few early init >> functions ? > Which are the "very few" early init functions? Can you make a list, for 32-bit > and 64-bit? And can we keep it updated over time and not introduce regressions? > If the code that runs without caches is concentrated in few files, we may either modify the buildsystem to check whether there is a call to memcpy from these files (e.g. by using nm) or these files can be "prelinked" with special version of memcpy that doesn't require caches. Would any of these be acceptable? -Michal
From: Michal Sojka > >> I think GCC uses memcpy() in well known situations like initialising > >> structures or copying structures. > >> Shouldn't we just avoid this kind of actions in the very few early init > >> functions ? > > Which are the "very few" early init functions? Can you make a list, for 32-bit > > and 64-bit? And can we keep it updated over time and not introduce regressions? > > > If the code that runs without caches is concentrated in few files, we > may either modify the buildsystem to check whether there is a call to > memcpy from these files (e.g. by using nm) or these files can be > "prelinked" with special version of memcpy that doesn't require caches. > Would any of these be acceptable? What about run-time patching memcpy() after the caches are initialised? David
On Mon, 2015-09-07 at 10:59 +0000, David Laight wrote: > From: Michal Sojka > > >> I think GCC uses memcpy() in well known situations like initialising > > >> structures or copying structures. > > >> Shouldn't we just avoid this kind of actions in the very few early init > > >> functions ? > > > Which are the "very few" early init functions? Can you make a list, for 32-bit > > > and 64-bit? And can we keep it updated over time and not introduce regressions? > > > > > If the code that runs without caches is concentrated in few files, we > > may either modify the buildsystem to check whether there is a call to > > memcpy from these files (e.g. by using nm) or these files can be > > "prelinked" with special version of memcpy that doesn't require caches. > > Would any of these be acceptable? > > What about run-time patching memcpy() after the caches are initialised? Yeah, that's the solution we use on 64-bit. It also means you can have cpu specific optimisations, which can be patched in or out using the cpu feature patching. cheers
> > What about run-time patching memcpy() after the caches are initialised? > > Yeah, that's the solution we use on 64-bit. > > It also means you can have cpu specific optimisations, which can be patched in > or out using the cpu feature patching. I've noticed x86 doing that. For newer Intel parts it patches in 'rep movsb' but unfortunately memcpy_io is always #defined to memcpy. For uncached targets the hardware can't optimise rep movsb - so you end up with byte accesses. These work can be rather slower than expected. This also affects userspace copies to mmap()ed PCIe space. David
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c index 7d80bfd..c2f1fba 100644 --- a/arch/powerpc/kernel/cputable.c +++ b/arch/powerpc/kernel/cputable.c @@ -2121,7 +2121,7 @@ static struct cpu_spec * __init setup_cpu_spec(unsigned long offset, old = *t; /* Copy everything, then do fixups */ - *t = *s; + memcpy_toio(t, s, sizeof(struct cpu_spec)); /* * If we are overriding a previous value derived from the real