diff mbox

memcpy regression

Message ID 87r3mb4603.fsf@steelpick.2x.cz (mailing list archive)
State Not Applicable
Headers show

Commit Message

Michal Sojka Sept. 6, 2015, 9:01 p.m. UTC
On Sun, Sep 06 2015, Michal Sojka wrote:
> On Sun, Sep 06 2015, christophe leroy wrote:
>> Le 05/09/2015 02:08, Michal Sojka a écrit :
>>> On 4.9.2015 21:49, Michal Sojka wrote:
>>>> On 4.9.2015 20:10, christophe leroy wrote:
>>>>>
>>>>>
>>>>> Le 04/09/2015 16:35, Michal Sojka a écrit :
>>>>>> On Fri, Sep 04 2015, Christophe LEROY wrote:
>>>>>>> Le 04/09/2015 15:33, Michal Sojka a écrit :
>>>>>>>> Dear Christophe,
>>>>>>>>
>>>>>>>> my MPC5200-based system stopped booting recently. I bisected the 
>>>>>>>> problem
>>>>>>>> to your commit below. If I revert that commit (on top of
>>>>>>>> 807249d3ada1ff28a47c4054ca4edd479421b671 = v4.2-6663-g807249d), my
>>>>>>>> system boots again.
>>>>>>>>
>>>>>>>>
>>>>>>> Do you use mainline code only, or do you have home-made code ?
>>>>>> I use mainline only sources with non-mainline device-tree.
>>>>>>
>>>>>>> memcpy() is not supposed to be used on non-cacheable memory.
>>>>>>> memcpy_toio() is the function to use when copying to non-cacheble 
>>>>>>> area.
>>>>>>>
>>>>>>> When I submitted the patch, I looked for erroneous use of memcpy() 
>>>>>>> and
>>>>>>> memset().
>>>>>>> I found one wrong use of memset() that I changed to memset_io() but I
>>>>>>> didn't find any misuse of memcpy().
>>>>>>> But I may have missed one.
>>>>>> I attach my .config, if it helps. I have there
>>>>>>
>>>>>> CONFIG_PPC_MPC52xx=y
>>>>>> CONFIG_PPC_MPC5200_SIMPLE=y
>>>>>>
>>>>>> so arch/powerpc/platforms/52xx is probably the directory to look. 
>>>>>> Do you
>>>>>> see any mempcy misuse there?
>>>>> I only found one suspect use of memcpy() in 
>>>>> arch/powerpc/platforms/52xx/
>>>>> It is in mpc52xx_pm.c but it's linked to CONFIG_PM which is not 
>>>>> selected by your .config
>>>>> I'll check in the drivers selected by your .config
>>>>>
>>>>> In parallele, are you able to try with CONFIG_PPC_EARLY_DEBUG in 
>>>>> order to try and locate the blocking point ?
>>>> I don't get any output from the system even with CONFIG_PPC_EARLY_DEBUG.
>>>
>>> Hmm, there is no udbg console for MPC5200. I hacked something up and 
>>> the earliest place I was able to initialize it is after 
>>> early_init_devtree() in setup_32.c. Even with this console, I got no 
>>> output when the problematic patch was applied. So the problem is 
>>> somewhere earlier.
>>>
>>
>> In early_init() in setup_32.c, there is the following comment:
>> /* First zero the BSS -- use memset_io, some platforms don't have caches 
>> on yet */
>>
>> In that case, when does cache get activated ?
>>
>> In move_device_tree(), called from early_init_devtree(), there is a call 
>> to memcpy().
>> Can you try replacing it by memcpy_io() ?
>
> I tried replacing it by memcpy_toio(), memcpy_fromio() and by
> generic_memcpy(). Nothing helped :(

I found the problem. The compiler replaces an assignment with a call to
memcpy. The following patch fixes the problem for me. However, I'm not
sure whether this is the real solution. I guess the compiler is free to
generate a call to memcpy wherever it wants so other compilers or other
optimization levels may need fixes at other places. What do others
think?

-Michal

---------------8<----------------------
From: Michal Sojka <sojkam1@fel.cvut.cz>
Date: Sun, 6 Sep 2015 22:44:55 +0200
Subject: [PATCH] powerpc: Fix unbootable system after memcpy optimization

This fixes the problem caused by commit
0b05e2d671c40cfb57e66e4e402320d6e056b2f8. On MPC5200, when using the
following version of GCC:

    powerpc-603e-linux-gnu-gcc (OSELAS.Toolchain-2012.12.1) 4.7.2

the system was ubootable.

Signed-off-by: Michal Sojka <sojkam1@fel.cvut.cz>
---
 arch/powerpc/kernel/cputable.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Michael Ellerman Sept. 7, 2015, 1:14 a.m. UTC | #1
Hi Michal,

Thanks for finding the problem.

On Sun, 2015-09-06 at 23:01 +0200, Michal Sojka wrote:
> 
> I found the problem. The compiler replaces an assignment with a call to
> memcpy. The following patch fixes the problem for me. However, I'm not
> sure whether this is the real solution. I guess the compiler is free to
> generate a call to memcpy wherever it wants so other compilers or other
> optimization levels may need fixes at other places. What do others
> think?

I think you're right that it's not a good solution, the compiler could generate
other calls to memcpy depending on various factors, and people will add new
code that causes memcpy to get called and it will break your platform.

Christophe, am I right that the problem here is that your new memcpy() doesn't
work until later in boot when caches are enabled?

cheers
Christophe Leroy Sept. 7, 2015, 7:08 a.m. UTC | #2
Hi Michael

Le 07/09/2015 03:14, Michael Ellerman a écrit :
> Hi Michal,
>
> Thanks for finding the problem.
>
> On Sun, 2015-09-06 at 23:01 +0200, Michal Sojka wrote:
>> I found the problem. The compiler replaces an assignment with a call to
>> memcpy. The following patch fixes the problem for me. However, I'm not
>> sure whether this is the real solution. I guess the compiler is free to
>> generate a call to memcpy wherever it wants so other compilers or other
>> optimization levels may need fixes at other places. What do others
>> think?
> I think you're right that it's not a good solution, the compiler could generate
> other calls to memcpy depending on various factors, and people will add new
> code that causes memcpy to get called and it will break your platform.
>
> Christophe, am I right that the problem here is that your new memcpy() doesn't
> work until later in boot when caches are enabled?
>
>

That's right, memset() and memcpy() are for setting/copying data into 
cacheable RAM.
They are using dczb instruction in order to avoid wasting time loading 
the cacheline with data that will be overwritten.

memset_io() and memcpy_toio() are the functions to use when using not 
cacheable memory.

The issue identified by Michal is in function setup_cpu_spec() which is 
called by identify_cpu(). identify_cpu() is called from early_init().
In the begining of early_init(), there is (code from Paul in 2005)

	/* First zero the BSS -- use memset_io, some platforms don't have
	 * caches on yet */
	memset_io((void __iomem *)PTRRELOC(&__bss_start), 0,
			__bss_stop - __bss_start);

It shows that it is already expected that the cache is not active yet 
and standard memset() shall not be used yet. That's the same with memcpy().

I think GCC uses memcpy() in well known situations like initialising 
structures or copying structures.
Shouldn't we just avoid this kind of actions in the very few early init 
functions ?

Christophe
Michael Ellerman Sept. 7, 2015, 8:40 a.m. UTC | #3
On Mon, 2015-09-07 at 09:08 +0200, Christophe LEROY wrote:
> Hi Michael
> 
> Le 07/09/2015 03:14, Michael Ellerman a écrit :
> > On Sun, 2015-09-06 at 23:01 +0200, Michal Sojka wrote:
> >> I found the problem. The compiler replaces an assignment with a call to
> >> memcpy. The following patch fixes the problem for me. However, I'm not
> >> sure whether this is the real solution. I guess the compiler is free to
> >> generate a call to memcpy wherever it wants so other compilers or other
> >> optimization levels may need fixes at other places. What do others
> >> think?
> > I think you're right that it's not a good solution, the compiler could generate
> > other calls to memcpy depending on various factors, and people will add new
> > code that causes memcpy to get called and it will break your platform.
> >
> > Christophe, am I right that the problem here is that your new memcpy() doesn't
> > work until later in boot when caches are enabled?
> 
> That's right, memset() and memcpy() are for setting/copying data into 
> cacheable RAM.
> They are using dczb instruction in order to avoid wasting time loading 
> the cacheline with data that will be overwritten.
> 
> memset_io() and memcpy_toio() are the functions to use when using not 
> cacheable memory.
> 
> The issue identified by Michal is in function setup_cpu_spec() which is 
> called by identify_cpu(). identify_cpu() is called from early_init().
> In the begining of early_init(), there is (code from Paul in 2005)
> 
> 	/* First zero the BSS -- use memset_io, some platforms don't have
> 	 * caches on yet */
> 	memset_io((void __iomem *)PTRRELOC(&__bss_start), 0,
> 			__bss_stop - __bss_start);
> 
> It shows that it is already expected that the cache is not active yet 
> and standard memset() shall not be used yet. That's the same with memcpy().

Thanks for the explanation.

> I think GCC uses memcpy() in well known situations like initialising 
> structures or copying structures.
> Shouldn't we just avoid this kind of actions in the very few early init 
> functions ?

Which are the "very few" early init functions? Can you make a list, for 32-bit
and 64-bit? And can we keep it updated over time and not introduce regressions?

cheers
Michal Sojka Sept. 7, 2015, 9:45 a.m. UTC | #4
On 7.9.2015 10:40, Michael Ellerman wrote:
> On Mon, 2015-09-07 at 09:08 +0200, Christophe LEROY wrote:
>> Hi Michael
>>
>> Le 07/09/2015 03:14, Michael Ellerman a écrit :
>>> On Sun, 2015-09-06 at 23:01 +0200, Michal Sojka wrote:
>>>> I found the problem. The compiler replaces an assignment with a call to
>>>> memcpy. The following patch fixes the problem for me. However, I'm not
>>>> sure whether this is the real solution. I guess the compiler is free to
>>>> generate a call to memcpy wherever it wants so other compilers or other
>>>> optimization levels may need fixes at other places. What do others
>>>> think?
>>> I think you're right that it's not a good solution, the compiler could generate
>>> other calls to memcpy depending on various factors, and people will add new
>>> code that causes memcpy to get called and it will break your platform.
>>>
>>> Christophe, am I right that the problem here is that your new memcpy() doesn't
>>> work until later in boot when caches are enabled?
>> That's right, memset() and memcpy() are for setting/copying data into
>> cacheable RAM.
>> They are using dczb instruction in order to avoid wasting time loading
>> the cacheline with data that will be overwritten.
>>
>> memset_io() and memcpy_toio() are the functions to use when using not
>> cacheable memory.
>>
>> The issue identified by Michal is in function setup_cpu_spec() which is
>> called by identify_cpu(). identify_cpu() is called from early_init().
>> In the begining of early_init(), there is (code from Paul in 2005)
>>
>> 	/* First zero the BSS -- use memset_io, some platforms don't have
>> 	 * caches on yet */
>> 	memset_io((void __iomem *)PTRRELOC(&__bss_start), 0,
>> 			__bss_stop - __bss_start);
>>
>> It shows that it is already expected that the cache is not active yet
>> and standard memset() shall not be used yet. That's the same with memcpy().
> Thanks for the explanation.
>
>> I think GCC uses memcpy() in well known situations like initialising
>> structures or copying structures.
>> Shouldn't we just avoid this kind of actions in the very few early init
>> functions ?
> Which are the "very few" early init functions? Can you make a list, for 32-bit
> and 64-bit? And can we keep it updated over time and not introduce regressions?
>
If the code that runs without caches is concentrated in few files, we 
may either modify the buildsystem to check whether there is a call to 
memcpy from these files (e.g. by using nm) or these files can be 
"prelinked" with special version of memcpy that doesn't require caches. 
Would any of these be acceptable?

-Michal
David Laight Sept. 7, 2015, 10:59 a.m. UTC | #5
From: Michal Sojka

> >> I think GCC uses memcpy() in well known situations like initialising

> >> structures or copying structures.

> >> Shouldn't we just avoid this kind of actions in the very few early init

> >> functions ?

> > Which are the "very few" early init functions? Can you make a list, for 32-bit

> > and 64-bit? And can we keep it updated over time and not introduce regressions?

> >

> If the code that runs without caches is concentrated in few files, we

> may either modify the buildsystem to check whether there is a call to

> memcpy from these files (e.g. by using nm) or these files can be

> "prelinked" with special version of memcpy that doesn't require caches.

> Would any of these be acceptable?


What about run-time patching memcpy() after the caches are initialised?

	David
Michael Ellerman Sept. 8, 2015, 3:54 a.m. UTC | #6
On Mon, 2015-09-07 at 10:59 +0000, David Laight wrote:
> From: Michal Sojka
> > >> I think GCC uses memcpy() in well known situations like initialising
> > >> structures or copying structures.
> > >> Shouldn't we just avoid this kind of actions in the very few early init
> > >> functions ?
> > > Which are the "very few" early init functions? Can you make a list, for 32-bit
> > > and 64-bit? And can we keep it updated over time and not introduce regressions?
> > >
> > If the code that runs without caches is concentrated in few files, we
> > may either modify the buildsystem to check whether there is a call to
> > memcpy from these files (e.g. by using nm) or these files can be
> > "prelinked" with special version of memcpy that doesn't require caches.
> > Would any of these be acceptable?
> 
> What about run-time patching memcpy() after the caches are initialised?

Yeah, that's the solution we use on 64-bit.

It also means you can have cpu specific optimisations, which can be patched in
or out using the cpu feature patching.

cheers
David Laight Sept. 8, 2015, 8:59 a.m. UTC | #7
> > What about run-time patching memcpy() after the caches are initialised?

> 

> Yeah, that's the solution we use on 64-bit.

> 

> It also means you can have cpu specific optimisations, which can be patched in

> or out using the cpu feature patching.


I've noticed x86 doing that.
For newer Intel parts it patches in 'rep movsb' but unfortunately
memcpy_io is always #defined to memcpy.

For uncached targets the hardware can't optimise rep movsb - so you
end up with byte accesses.
These work can be rather slower than expected.

This also affects userspace copies to mmap()ed PCIe space.

	David
diff mbox

Patch

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 7d80bfd..c2f1fba 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -2121,7 +2121,7 @@  static struct cpu_spec * __init setup_cpu_spec(unsigned long offset,
 	old = *t;
 
 	/* Copy everything, then do fixups */
-	*t = *s;
+	memcpy_toio(t, s, sizeof(struct cpu_spec));
 
 	/*
 	 * If we are overriding a previous value derived from the real