Patchwork Query about: ARM11 MPCore: preemption/task migration cache coherency

login
register
mail settings
Submitter bill4carson@gmail.com
Date May 29, 2012, 5:28 a.m.
Message ID <4FC45E6B.2070202@gmail.com>
Download mbox | patch
Permalink /patch/161658/
State New
Headers show

Comments

bill4carson@gmail.com - May 29, 2012, 5:28 a.m.
On 2012年05月11日 16:51, Will Deacon wrote:
> Bill,
>
> On Wed, May 09, 2012 at 10:11:05AM +0100, bill4carson wrote:
>> I'm using ARM11 MPCore on linux-2.6.34, unfortunately I have random
>> panic/segment fault with task migration. I noticed a patch set
>> [ARM11 MPCore: preemption/task migration cache coherency fixups] to fix
>> such issues in here:
>>
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2011-October/069851.html
>>
>> It seems there is no follow ups, is there official patch to fix such
>> issues?
>
> Let's be honest: you haven't given us a lot to go on here. Perhaps you could
> answer the following?
>
> (1) Do you experience the same issues with a more recent kernel?
> (2) If you apply the patches linked to above, does it fix your problem?
> (3) If you can reproduce on current mainline, do you have a testcase?
> (4) Does disabling CONFIG_PREEMPT make the problem disappear?
>
> That should provide us with some information about the problem.
>



Based on the limitation of CP15 of ARM11 MPCore:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0228a/index.html

The ARM11 MPCore SCU does not handle coherency consequences of CP15 
cache operations like clean and invalidate. If these operations are 
performed on one CPU, they do not affect the state of a cache line on a 
different CPU.  This can result in unexpected behavior if, say, a line 
is cleaned/invalidated but a subsequent access hits a stale copy in 
another CPU’s L1 through snooping the ‘coherency domain’.

Kernel option CONFIG_DMA_CACHE_RWFO was introduced to fix it. details 
see arch/arm/mm/cache-v6.S:
...
#ifdef CONFIG_DMA_CACHE_RWFO
         ldr     r2, [r0]                        @ read for ownership
         str     r2, [r0]                        @ write for ownership
#endif
...

I think:
1) The similar protection was not added on data cache handlers like 
v6_coherent_kern_range and v6_flush_kern_cache_all.


Here I modified v6_coherent_kern_range as:

But I have no idea on how to accomplish the v6_flush_kern_cache_all, 
maybe IPI is needed?

Any opinions?

And, with this little patch above, a strange issue on my side did 
*disappeared*. The issue is when enable task migration, some strange 
error may occurred like Segmentation fault and:

*** glibc detected *** cat: munmap_chunk(): invalid pointer: 0xbeaa0f13 ***

[10870.314465] Unhandled fault: alignment exception (0x821) at 0xebfffee6
[10870.315347] Unhandled fault: alignment exception (0x821) at 0xebfffee6
[10870.354917] Unhandled fault: alignment exception (0x821) at 0xebfffee6

cat: invalid number 'cat'
cat: invalid number 'cat'
cat: invalid number 'cat'
cat: invalid number 'cat'

cat: ����: No such file or directory
cat: �˾�˾: No such file or directory

cat: can't open ' *D_fo|�����': No such file or directory
cat: applet not found

cat: (null): Invalid argument

cat: (null): Bad address
cat: (null): Bad address

cat: o��o��: No such file or directory
cat: applet not found
cat: ������: No such file or directory

cat: unknown user /busybox

cat: ?��?��: No such file or directory
cat: applet not found

cat: ���$���: Bad address
cat: ������: Bad address
cat: ����: Bad address

[13758.255564] Alignment trap: not handling instruction e1a00001 at
[<00033e38>]
[13758.256459] Alignment trap: not handling instruction e1a00001 at
[<00033e38>]
[13758.256522] Unhandled fault: alignment exception (0x811) at 0x0019f1aa
[13758.319272] Alignment trap: not handling instruction e1a00001 at
[<00033e38>]
[13758.340860] Unhandled fault: alignment exception (0x811) at 0x0019f1aa
[13758.361612] Unhandled fault: alignment exception (0x811) at 0x0019f1aa
Will Deacon - May 30, 2012, 6:38 a.m.
On Tue, May 29, 2012 at 06:28:11AM +0100, bill4carson wrote:
> --- a/arch/arm/mm/cache-v6.S
> +++ b/arch/arm/mm/cache-v6.S
> @@ -170,6 +170,10 @@ ENDPROC(v6_coherent_kern_range)
>   ENTRY(v6_flush_kern_dcache_area)
>          add     r1, r0, r1
>   1:
> +#ifdef CONFIG_SMP
> +       ldr     r2, [r0]                        @ read for ownership
> +       str     r2, [r0]                        @ write for ownership
> +#endif /* CONFIG_SMP */
>   #ifdef HARVARD_CACHE
>          mcr     p15, 0, r0, c7, c14, 1          @ clean & invalidate D line
>   #else

I don't think the invalidation is needed here, so you probably don't need to
hack this function at all.

> But I have no idea on how to accomplish the v6_flush_kern_cache_all, 
> maybe IPI is needed?

We could add an IPI to invalidate the I-caches on the other cores, however
I haven't checked to see if we could instead do something on the CPU
migration path which avoid the need for the broadcasting.

Will
bill4carson@gmail.com - May 30, 2012, 10:01 a.m.
Hi, Will

First of all, Thanks your attentions for this issue.


On 2012年05月30日 14:38, Will Deacon wrote:
> On Tue, May 29, 2012 at 06:28:11AM +0100, bill4carson wrote:
>> --- a/arch/arm/mm/cache-v6.S
>> +++ b/arch/arm/mm/cache-v6.S
>> @@ -170,6 +170,10 @@ ENDPROC(v6_coherent_kern_range)
>>    ENTRY(v6_flush_kern_dcache_area)
>>           add     r1, r0, r1
>>    1:
>> +#ifdef CONFIG_SMP
>> +       ldr     r2, [r0]                        @ read for ownership
>> +       str     r2, [r0]                        @ write for ownership
>> +#endif /* CONFIG_SMP */
>>    #ifdef HARVARD_CACHE
>>           mcr     p15, 0, r0, c7, c14, 1          @ clean&  invalidate D line
>>    #else
>
> I don't think the invalidation is needed here, so you probably don't need to
> hack this function at all.


     CPU0                        CPU1
+--------------------+        +--------------------+   +----------+
| L1 Dcache   part_a |        | L1 Dcache   part_b |   |  Icache  |
+--------------------+        +--------------------+   +----------+
             ^                                            |
             |                   +------------------------+
             |                   |
             |         SCU       V
     +-------|-------------------|---------+
     |       +------- ??? -------+         |
     |                 |                   |
     +-----------------|-------------------+
                       |
                       V
              Main   Memory
     +-------------------------------------+
     |       stale part_a       part_b     |
     +-------------------------------------+


1. task t runs on CPU 0, it executes one program in nand flash,
    so first task t read *part* of this program into its local Dcache,
    let's call it part_a;

2. task t migrates from CPU0 onto CPU1, in there reads the rest of
    program into its local Dcache, label it part_b;

3. on CPU1, task t flush the whole address range of this program,
    As with ARM11 MPCore, this flush is locally effective, after flush
    part_a still resides in CPU0 L1 Dcache.

4. When this program gets executed, CPU 1 fetch its instructions from
    Icache, SCU will notice this action,

    Question:
    At this point, is stale part_a in main memory *or* L1 Dcache part_a
    in CPU0 routed into CPU1 as instruction?



>
>> But I have no idea on how to accomplish the v6_flush_kern_cache_all,
>> maybe IPI is needed?
>
> We could add an IPI to invalidate the I-caches on the other cores, however
> I haven't checked to see if we could instead do something on the CPU
> migration path which avoid the need for the broadcasting.
>


Another workaround is mark the task migrations in function "pull_task",
while in function "context_switch", check it to see if any tasks
migrated into current CPU, if there do be such tasks, flush entire data
cache on remote core(the source core of task migration) and wait the
operation accomplished. This is also verified. But from my point of
view, this is just a temporary workaround instead of resolution.
The little patch above should be the right one that fix this bug:
Make the flush_kern_dcache_area global affective.


It's very pleased if you could give your insights in this.


> Will
>
Catalin Marinas - May 31, 2012, 3 a.m.
Hi,

On Wed, May 30, 2012 at 11:01:59AM +0100, bill4carson wrote:
> On 2012年05月30日 14:38, Will Deacon wrote:
> > On Tue, May 29, 2012 at 06:28:11AM +0100, bill4carson wrote:
> >> --- a/arch/arm/mm/cache-v6.S
> >> +++ b/arch/arm/mm/cache-v6.S
> >> @@ -170,6 +170,10 @@ ENDPROC(v6_coherent_kern_range)
> >>    ENTRY(v6_flush_kern_dcache_area)
> >>           add     r1, r0, r1
> >>    1:
> >> +#ifdef CONFIG_SMP
> >> +       ldr     r2, [r0]                        @ read for ownership
> >> +       str     r2, [r0]                        @ write for ownership
> >> +#endif /* CONFIG_SMP */
> >>    #ifdef HARVARD_CACHE
> >>           mcr     p15, 0, r0, c7, c14, 1          @ clean&  invalidate D line
> >>    #else
> >
> > I don't think the invalidation is needed here, so you probably don't need to
> > hack this function at all.
...
> 1. task t runs on CPU 0, it executes one program in nand flash,
>     so first task t read *part* of this program into its local Dcache,
>     let's call it part_a;
> 
> 2. task t migrates from CPU0 onto CPU1, in there reads the rest of
>     program into its local Dcache, label it part_b;
> 
> 3. on CPU1, task t flush the whole address range of this program,
>     As with ARM11 MPCore, this flush is locally effective, after flush
>     part_a still resides in CPU0 L1 Dcache.
> 
> 4. When this program gets executed, CPU 1 fetch its instructions from
>     Icache, SCU will notice this action,
> 
>     Question:
>     At this point, is stale part_a in main memory *or* L1 Dcache part_a
>     in CPU0 routed into CPU1 as instruction?

This has been discussed in the past. The first thing is to disable
CONFIG_PREEMPT to make sure you are not preempted between page copying
and task execution. The code doing the page copy needs to call
flush_dcache_page() on the CPU where the copy occurred and Linux does a
non-lazy D-cache flush. But there are many situations where Linux
doesn't do this (you can google for flush_dcache_page and my name to
find a few places where I tried to fix this).

AFAIK, The SCU only snoops the D-cache, not the I-cache. We have a full
I-cache invalidation during task migration.
bill4carson@gmail.com - May 31, 2012, 3:11 a.m.
On 2012年05月31日 11:00, Catalin Marinas wrote:
> Hi,
>
> On Wed, May 30, 2012 at 11:01:59AM +0100, bill4carson wrote:
>> On 2012年05月30日 14:38, Will Deacon wrote:
>>> On Tue, May 29, 2012 at 06:28:11AM +0100, bill4carson wrote:
>>>> --- a/arch/arm/mm/cache-v6.S
>>>> +++ b/arch/arm/mm/cache-v6.S
>>>> @@ -170,6 +170,10 @@ ENDPROC(v6_coherent_kern_range)
>>>>     ENTRY(v6_flush_kern_dcache_area)
>>>>            add     r1, r0, r1
>>>>     1:
>>>> +#ifdef CONFIG_SMP
>>>> +       ldr     r2, [r0]                        @ read for ownership
>>>> +       str     r2, [r0]                        @ write for ownership
>>>> +#endif /* CONFIG_SMP */
>>>>     #ifdef HARVARD_CACHE
>>>>            mcr     p15, 0, r0, c7, c14, 1          @ clean&   invalidate D line
>>>>     #else
>>>
>>> I don't think the invalidation is needed here, so you probably don't need to
>>> hack this function at all.
> ...
>> 1. task t runs on CPU 0, it executes one program in nand flash,
>>      so first task t read *part* of this program into its local Dcache,
>>      let's call it part_a;
>>
>> 2. task t migrates from CPU0 onto CPU1, in there reads the rest of
>>      program into its local Dcache, label it part_b;
>>
>> 3. on CPU1, task t flush the whole address range of this program,
>>      As with ARM11 MPCore, this flush is locally effective, after flush
>>      part_a still resides in CPU0 L1 Dcache.
>>
>> 4. When this program gets executed, CPU 1 fetch its instructions from
>>      Icache, SCU will notice this action,
>>
>>      Question:
>>      At this point, is stale part_a in main memory *or* L1 Dcache part_a
>>      in CPU0 routed into CPU1 as instruction?
>
> This has been discussed in the past. The first thing is to disable
> CONFIG_PREEMPT to make sure you are not preempted between page copying
> and task execution. The code doing the page copy needs to call
> flush_dcache_page() on the CPU where the copy occurred and Linux does a
> non-lazy D-cache flush. But there are many situations where Linux
> doesn't do this (you can google for flush_dcache_page and my name to
> find a few places where I tried to fix this).
>
Thanks:)


> AFAIK, The SCU only snoops the D-cache, not the I-cache. We have a full
> I-cache invalidation during task migration.

                             ^^^^^^^^^^^^
Could you please point it to me?
Catalin Marinas - May 31, 2012, 3:12 a.m.
On Thu, May 31, 2012 at 04:11:46AM +0100, bill4carson wrote:
> On 2012年05月31日 11:00, Catalin Marinas wrote:
> > AFAIK, The SCU only snoops the D-cache, not the I-cache. We have a full
> > I-cache invalidation during task migration.
> 
>                              ^^^^^^^^^^^^
> Could you please point it to me?

There is a check in switch_mm() in the asm/mmu_context.h file.
Catalin Marinas - May 31, 2012, 3:19 a.m.
On Thu, May 31, 2012 at 04:11:46AM +0100, bill4carson wrote:
> On 2012年05月31日 11:00, Catalin Marinas wrote:
> > On Wed, May 30, 2012 at 11:01:59AM +0100, bill4carson wrote:
> >> On 2012年05月30日 14:38, Will Deacon wrote:
> >>> On Tue, May 29, 2012 at 06:28:11AM +0100, bill4carson wrote:
> >>>> --- a/arch/arm/mm/cache-v6.S
> >>>> +++ b/arch/arm/mm/cache-v6.S
> >>>> @@ -170,6 +170,10 @@ ENDPROC(v6_coherent_kern_range)
> >>>>     ENTRY(v6_flush_kern_dcache_area)
> >>>>            add     r1, r0, r1
> >>>>     1:
> >>>> +#ifdef CONFIG_SMP
> >>>> +       ldr     r2, [r0]                        @ read for ownership
> >>>> +       str     r2, [r0]                        @ write for ownership
> >>>> +#endif /* CONFIG_SMP */
> >>>>     #ifdef HARVARD_CACHE
> >>>>            mcr     p15, 0, r0, c7, c14, 1          @ clean&   invalidate D line
> >>>>     #else
> >>>
> >>> I don't think the invalidation is needed here, so you probably don't need to
> >>> hack this function at all.
> > ...
> >> 1. task t runs on CPU 0, it executes one program in nand flash,
> >>      so first task t read *part* of this program into its local Dcache,
> >>      let's call it part_a;
> >>
> >> 2. task t migrates from CPU0 onto CPU1, in there reads the rest of
> >>      program into its local Dcache, label it part_b;
> >>
> >> 3. on CPU1, task t flush the whole address range of this program,
> >>      As with ARM11 MPCore, this flush is locally effective, after flush
> >>      part_a still resides in CPU0 L1 Dcache.
> >>
> >> 4. When this program gets executed, CPU 1 fetch its instructions from
> >>      Icache, SCU will notice this action,
> >>
> >>      Question:
> >>      At this point, is stale part_a in main memory *or* L1 Dcache part_a
> >>      in CPU0 routed into CPU1 as instruction?
> >
> > This has been discussed in the past. The first thing is to disable
> > CONFIG_PREEMPT to make sure you are not preempted between page copying
> > and task execution. The code doing the page copy needs to call
> > flush_dcache_page() on the CPU where the copy occurred and Linux does a
> > non-lazy D-cache flush. But there are many situations where Linux
> > doesn't do this (you can google for flush_dcache_page and my name to
> > find a few places where I tried to fix this).
>
> Thanks:)

BTW, see this as a starting point (and a hack):

http://article.gmane.org/gmane.linux.ports.arm.kernel/51556
bill4carson@gmail.com - May 31, 2012, 3:38 a.m.
On 2012年05月31日 11:19, Catalin Marinas wrote:
> On Thu, May 31, 2012 at 04:11:46AM +0100, bill4carson wrote:
>> On 2012年05月31日 11:00, Catalin Marinas wrote:
>>> On Wed, May 30, 2012 at 11:01:59AM +0100, bill4carson wrote:
>>>> On 2012年05月30日 14:38, Will Deacon wrote:
>>>>> On Tue, May 29, 2012 at 06:28:11AM +0100, bill4carson wrote:
>>>>>> --- a/arch/arm/mm/cache-v6.S
>>>>>> +++ b/arch/arm/mm/cache-v6.S
>>>>>> @@ -170,6 +170,10 @@ ENDPROC(v6_coherent_kern_range)
>>>>>>      ENTRY(v6_flush_kern_dcache_area)
>>>>>>             add     r1, r0, r1
>>>>>>      1:
>>>>>> +#ifdef CONFIG_SMP
>>>>>> +       ldr     r2, [r0]                        @ read for ownership
>>>>>> +       str     r2, [r0]                        @ write for ownership
>>>>>> +#endif /* CONFIG_SMP */
>>>>>>      #ifdef HARVARD_CACHE
>>>>>>             mcr     p15, 0, r0, c7, c14, 1          @ clean&    invalidate D line
>>>>>>      #else
>>>>>
>>>>> I don't think the invalidation is needed here, so you probably don't need to
>>>>> hack this function at all.
>>> ...
>>>> 1. task t runs on CPU 0, it executes one program in nand flash,
>>>>       so first task t read *part* of this program into its local Dcache,
>>>>       let's call it part_a;
>>>>
>>>> 2. task t migrates from CPU0 onto CPU1, in there reads the rest of
>>>>       program into its local Dcache, label it part_b;
>>>>
>>>> 3. on CPU1, task t flush the whole address range of this program,
>>>>       As with ARM11 MPCore, this flush is locally effective, after flush
>>>>       part_a still resides in CPU0 L1 Dcache.
>>>>
>>>> 4. When this program gets executed, CPU 1 fetch its instructions from
>>>>       Icache, SCU will notice this action,
>>>>
>>>>       Question:
>>>>       At this point, is stale part_a in main memory *or* L1 Dcache part_a
>>>>       in CPU0 routed into CPU1 as instruction?
>>>
>>> This has been discussed in the past. The first thing is to disable
>>> CONFIG_PREEMPT to make sure you are not preempted between page copying
>>> and task execution. The code doing the page copy needs to call
>>> flush_dcache_page() on the CPU where the copy occurred and Linux does a
>>> non-lazy D-cache flush. But there are many situations where Linux
>>> doesn't do this (you can google for flush_dcache_page and my name to
>>> find a few places where I tried to fix this).
>>
>> Thanks:)
>
> BTW, see this as a starting point (and a hack):
>
> http://article.gmane.org/gmane.linux.ports.arm.kernel/51556


I think we has some mis-understanding here :(

As for:v6_flush_kern_dcache_area/v6_flush_kern_dcache_all
these two hooks is supposed to be globally effective, *not*
locally!

Hence, there should below fix to make it as globally effective.

+#ifdef CONFIG_SMP
+       ldr     r2, [r0]                        @ read for ownership
+       str     r2, [r0]                        @ write for ownership
+#endif /* CONFIG_SMP */

Or there maybe some other better solution for this issue.


thanks
Catalin Marinas - May 31, 2012, 3:58 a.m.
On 31 May 2012 11:38, bill4carson <bill4carson@gmail.com> wrote:
> On 2012年05月31日 11:19, Catalin Marinas wrote:
>> BTW, see this as a starting point (and a hack):
>>
>> http://article.gmane.org/gmane.linux.ports.arm.kernel/51556
>
> I think we has some mis-understanding here :(
>
> As for:v6_flush_kern_dcache_area/v6_flush_kern_dcache_all
> these two hooks is supposed to be globally effective, *not*
> locally!
>
> Hence, there should below fix to make it as globally effective.
>
>
> +#ifdef CONFIG_SMP
> +       ldr     r2, [r0]                        @ read for ownership
> +       str     r2, [r0]                        @ write for ownership
> +#endif /* CONFIG_SMP */
>
> Or there maybe some other better solution for this issue.

I still didn't fully understand what the problem is. So, to make sure,
if you run some applications from flash using a yaffs filesystem, you
get random crashes. Is this correct? If yes, a solution is to actually
call flush_dcache_page() on the CPU that does the page copying from
flash into RAM, which could be the yaffs filesystem.
bill4carson@gmail.com - May 31, 2012, 5:06 a.m.
On 2012年05月31日 11:58, Catalin Marinas wrote:
> On 31 May 2012 11:38, bill4carson<bill4carson@gmail.com>  wrote:
>> On 2012年05月31日 11:19, Catalin Marinas wrote:
>>> BTW, see this as a starting point (and a hack):
>>>
>>> http://article.gmane.org/gmane.linux.ports.arm.kernel/51556
>>
>> I think we has some mis-understanding here :(
>>
>> As for:v6_flush_kern_dcache_area/v6_flush_kern_dcache_all
>> these two hooks is supposed to be globally effective, *not*
>> locally!
>>
>> Hence, there should below fix to make it as globally effective.
>>
>>
>> +#ifdef CONFIG_SMP
>> +       ldr     r2, [r0]                        @ read for ownership
>> +       str     r2, [r0]                        @ write for ownership
>> +#endif /* CONFIG_SMP */
>>
>> Or there maybe some other better solution for this issue.
>
> I still didn't fully understand what the problem is. So, to make sure,
> if you run some applications from flash using a yaffs filesystem, you
> get random crashes. Is this correct? If yes, a solution is to actually
> call flush_dcache_page() on the CPU that does the page copying from
> flash into RAM, which could be the yaffs filesystem.
>

The story goes like this:
function "flush_dcache_page" should be global effective
but in ARMv6 MPCore, it was not, it was just local effective due
to hardware design. This may cause error in some cases for example:

1) Task running on Core-0 loading text section into memory.
    It was preempted and then migrate into Core-1;
2) On Core-1, this task continue loading it and then
    "flush_dcache_page" to make sure the loaded text section write
    into main memory.
3) Task tend to the loaded text section and running it.

If the "flush_dcache_page" was not global effective,
there maybe data still in Core-0's data cache, not write
into main memory. Thus in step 3, error instruction maybe
fetched thus cause strange error.

If I'm missing something here, please let me know.

thanks
Catalin Marinas - May 31, 2012, 5:19 a.m.
On 31 May 2012 13:06, bill4carson <bill4carson@gmail.com> wrote:
> On 2012年05月31日 11:58, Catalin Marinas wrote:
>> I still didn't fully understand what the problem is. So, to make sure,
>> if you run some applications from flash using a yaffs filesystem, you
>> get random crashes. Is this correct? If yes, a solution is to actually
>> call flush_dcache_page() on the CPU that does the page copying from
>> flash into RAM, which could be the yaffs filesystem.
>
> The story goes like this:
> function "flush_dcache_page" should be global effective
> but in ARMv6 MPCore, it was not, it was just local effective due
> to hardware design.

Yes, I know this.

> This may cause error in some cases for example:
>
> 1) Task running on Core-0 loading text section into memory.
>   It was preempted and then migrate into Core-1;

BTW, do you have CONFIG_PREEMPT enabled?

To be clear - is your application reading some data from flash and
trying to execute or it's the kernel doing the load via the
page/prefetch abort mechanism?

If the latter, task running on core 0 gets a prefetch abort when
trying to execute some code. The kernel reads the page from flash (via
mtd, block layer, VFS) and copies it into RAM. It can be on any CPU as
long as it calls flush_dcache_page on the same CPU that copied the
data.

No matter where the task was running or migrated to, if the code doing
the copy also called flush_dcache_page() on the same core, there is no
data left in the D-cache for that page.

> 2) On Core-1, this task continue loading it and then
>   "flush_dcache_page" to make sure the loaded text section write
>   into main memory.

The flush_dcache_page() must be called by the code doing the copy. If
that copy happened on core 0, the call is done there and not where the
task migrated. We don't do lazy flushing on ARM11MPCore.

> 3) Task tend to the loaded text section and running it.
>
> If the "flush_dcache_page" was not global effective,
> there maybe data still in Core-0's data cache, not write
> into main memory. Thus in step 3, error instruction maybe
> fetched thus cause strange error.

This can only happen if you have either preempt enabled (so that the
kernel code doing the copy is migrated) or the mtd driver or fs do not
call flush_dcache_page().
bill4carson@gmail.com - May 31, 2012, 5:51 a.m.
On 2012年05月31日 13:19, Catalin Marinas wrote:
> On 31 May 2012 13:06, bill4carson<bill4carson@gmail.com>  wrote:
>> On 2012年05月31日 11:58, Catalin Marinas wrote:
>>> I still didn't fully understand what the problem is. So, to make sure,
>>> if you run some applications from flash using a yaffs filesystem, you
>>> get random crashes. Is this correct? If yes, a solution is to actually
>>> call flush_dcache_page() on the CPU that does the page copying from
>>> flash into RAM, which could be the yaffs filesystem.
>>
>> The story goes like this:
>> function "flush_dcache_page" should be global effective
>> but in ARMv6 MPCore, it was not, it was just local effective due
>> to hardware design.
>
> Yes, I know this.
>
Then, why not fix "flush_dcache_page" to make it globally effective?


>> This may cause error in some cases for example:
>>
>> 1) Task running on Core-0 loading text section into memory.
>>    It was preempted and then migrate into Core-1;
>
> BTW, do you have CONFIG_PREEMPT enabled?
>
Yes, CONFIG_PREEMPT enabled. Thus cause the task was preempted. :-)


> To be clear - is your application reading some data from flash and
> trying to execute or it's the kernel doing the load via the
> page/prefetch abort mechanism?
>

In my current case, it is yaffs root file system.

> If the latter, task running on core 0 gets a prefetch abort when
> trying to execute some code. The kernel reads the page from flash (via
> mtd, block layer, VFS) and copies it into RAM. It can be on any CPU as
> long as it calls flush_dcache_page on the same CPU that copied the
> data.
>
> No matter where the task was running or migrated to, if the code doing
> the copy also called flush_dcache_page() on the same core, there is no
> data left in the D-cache for that page.

Yes, I agree with it.
But how to flush the data cache on the same core with PREEMPT enabled?
And, I think according to the design, there is no such operation that
guarantee it.


>
>> 2) On Core-1, this task continue loading it and then
>>    "flush_dcache_page" to make sure the loaded text section write
>>    into main memory.
>
> The flush_dcache_page() must be called by the code doing the copy. If
> that copy happened on core 0, the call is done there and not where the
> task migrated. We don't do lazy flushing on ARM11MPCore.
>
WHY?

>> 3) Task tend to the loaded text section and running it.
>>
>> If the "flush_dcache_page" was not global effective,
>> there maybe data still in Core-0's data cache, not write
>> into main memory. Thus in step 3, error instruction maybe
>> fetched thus cause strange error.
>
> This can only happen if you have either preempt enabled (so that the
> kernel code doing the copy is migrated) or the mtd driver or fs do not
> call flush_dcache_page().
>


PREEMPT + MTD root file system. :-)

But any way, I think flush_dcache_page should be global effective.
If ARMv6 MPCore didn't make it, we should try to accomplish it.
Catalin Marinas - May 31, 2012, 6:56 a.m.
On Thu, May 31, 2012 at 06:51:22AM +0100, bill4carson wrote:
> On 2012年05月31日 13:19, Catalin Marinas wrote:
> > On 31 May 2012 13:06, bill4carson<bill4carson@gmail.com>  wrote:
> >> On 2012年05月31日 11:58, Catalin Marinas wrote:
> >>> I still didn't fully understand what the problem is. So, to make sure,
> >>> if you run some applications from flash using a yaffs filesystem, you
> >>> get random crashes. Is this correct? If yes, a solution is to actually
> >>> call flush_dcache_page() on the CPU that does the page copying from
> >>> flash into RAM, which could be the yaffs filesystem.
> >>
> >> The story goes like this:
> >> function "flush_dcache_page" should be global effective
> >> but in ARMv6 MPCore, it was not, it was just local effective due
> >> to hardware design.
> >
> > Yes, I know this.
>
> Then, why not fix "flush_dcache_page" to make it globally effective?

Performance?

And it's also just ARM11MPCore microarchitecture specific.

> >> This may cause error in some cases for example:
> >>
> >> 1) Task running on Core-0 loading text section into memory.
> >>    It was preempted and then migrate into Core-1;
> >
> > BTW, do you have CONFIG_PREEMPT enabled?
> >
> Yes, CONFIG_PREEMPT enabled. Thus cause the task was preempted. :-)

I told you that CONFIG_PREEMPT is not supported on ARM11MPCore :).

> > To be clear - is your application reading some data from flash and
> > trying to execute or it's the kernel doing the load via the
> > page/prefetch abort mechanism?
> 
> In my current case, it is yaffs root file system.
> 
> > If the latter, task running on core 0 gets a prefetch abort when
> > trying to execute some code. The kernel reads the page from flash (via
> > mtd, block layer, VFS) and copies it into RAM. It can be on any CPU as
> > long as it calls flush_dcache_page on the same CPU that copied the
> > data.
> >
> > No matter where the task was running or migrated to, if the code doing
> > the copy also called flush_dcache_page() on the same core, there is no
> > data left in the D-cache for that page.
> 
> Yes, I agree with it.
> But how to flush the data cache on the same core with PREEMPT enabled?

That's not easily possible. But you may get better results with
VOLUNTARY_PREEMPT.

> And, I think according to the design, there is no such operation that
> guarantee it.

RFO/WFO tricks only work on ARM11MPCore.

> But any way, I think flush_dcache_page should be global effective.
> If ARMv6 MPCore didn't make it, we should try to accomplish it.

Do some performance tests first.
bill4carson@gmail.com - May 31, 2012, 7:21 a.m.
On 2012年05月31日 14:56, Catalin Marinas wrote:
> On Thu, May 31, 2012 at 06:51:22AM +0100, bill4carson wrote:
>> On 2012年05月31日 13:19, Catalin Marinas wrote:
>>> On 31 May 2012 13:06, bill4carson<bill4carson@gmail.com>   wrote:
>>>> On 2012年05月31日 11:58, Catalin Marinas wrote:
>>>>> I still didn't fully understand what the problem is. So, to make sure,
>>>>> if you run some applications from flash using a yaffs filesystem, you
>>>>> get random crashes. Is this correct? If yes, a solution is to actually
>>>>> call flush_dcache_page() on the CPU that does the page copying from
>>>>> flash into RAM, which could be the yaffs filesystem.
>>>>
>>>> The story goes like this:
>>>> function "flush_dcache_page" should be global effective
>>>> but in ARMv6 MPCore, it was not, it was just local effective due
>>>> to hardware design.
>>>
>>> Yes, I know this.
>>
>> Then, why not fix "flush_dcache_page" to make it globally effective?
>
> Performance?
>
> And it's also just ARM11MPCore microarchitecture specific.
>
>>>> This may cause error in some cases for example:
>>>>
>>>> 1) Task running on Core-0 loading text section into memory.
>>>>     It was preempted and then migrate into Core-1;
>>>
>>> BTW, do you have CONFIG_PREEMPT enabled?
>>>
>> Yes, CONFIG_PREEMPT enabled. Thus cause the task was preempted. :-)
>
> I told you that CONFIG_PREEMPT is not supported on ARM11MPCore :).

Point!

Is it better to add comment, such as "PREEMPT is not supported for
ARM11MPCore" in somewhere(for now, I don't find such place)?
then custom will be alerted with such notice when they trying change
to PREEMPT.

And again, thanks for your patience with me :)


>
>>> To be clear - is your application reading some data from flash and
>>> trying to execute or it's the kernel doing the load via the
>>> page/prefetch abort mechanism?
>>
>> In my current case, it is yaffs root file system.
>>
>>> If the latter, task running on core 0 gets a prefetch abort when
>>> trying to execute some code. The kernel reads the page from flash (via
>>> mtd, block layer, VFS) and copies it into RAM. It can be on any CPU as
>>> long as it calls flush_dcache_page on the same CPU that copied the
>>> data.
>>>
>>> No matter where the task was running or migrated to, if the code doing
>>> the copy also called flush_dcache_page() on the same core, there is no
>>> data left in the D-cache for that page.
>>
>> Yes, I agree with it.
>> But how to flush the data cache on the same core with PREEMPT enabled?
>
> That's not easily possible. But you may get better results with
> VOLUNTARY_PREEMPT.
>
>> And, I think according to the design, there is no such operation that
>> guarantee it.
>
> RFO/WFO tricks only work on ARM11MPCore.
>
>> But any way, I think flush_dcache_page should be global effective.
>> If ARMv6 MPCore didn't make it, we should try to accomplish it.
>
> Do some performance tests first.
>
snakky.zhang@gmail.com - May 31, 2012, 7:46 a.m.
On 2012年05月31日 15:21, bill4carson wrote:
>
>
> On 2012年05月31日 14:56, Catalin Marinas wrote:
>> On Thu, May 31, 2012 at 06:51:22AM +0100, bill4carson wrote:
>>> On 2012年05月31日 13:19, Catalin Marinas wrote:
>>>> On 31 May 2012 13:06, bill4carson<bill4carson@gmail.com>   wrote:
>>>>> On 2012年05月31日 11:58, Catalin Marinas wrote:
>>>>>> I still didn't fully understand what the problem is. So, to make 
>>>>>> sure,
>>>>>> if you run some applications from flash using a yaffs filesystem, 
>>>>>> you
>>>>>> get random crashes. Is this correct? If yes, a solution is to 
>>>>>> actually
>>>>>> call flush_dcache_page() on the CPU that does the page copying from
>>>>>> flash into RAM, which could be the yaffs filesystem.
>>>>>
>>>>> The story goes like this:
>>>>> function "flush_dcache_page" should be global effective
>>>>> but in ARMv6 MPCore, it was not, it was just local effective due
>>>>> to hardware design.
>>>>
>>>> Yes, I know this.
>>>
>>> Then, why not fix "flush_dcache_page" to make it globally effective?
>>
>> Performance?
>>
>> And it's also just ARM11MPCore microarchitecture specific.
>>
Yes, seems newer CPUs has no such limitation thus this function is 
global effective naturally. :-)

And , I find Mips's c-r4k also has this issue but it use IPI to make it. 
Details in arch/mips/mm/c-r4k.c.

 From my point of view, PREEMPT should not related to CPU type. So if 
this type of CPU does not support PREEMPT for performance reason, can we 
add something in Documentation and related Kconfig to make/mark it?

Or maybe disable task migration is also a choice. :-)

Thanks
Xiao
>>>>> This may cause error in some cases for example:
>>>>>
>>>>> 1) Task running on Core-0 loading text section into memory.
>>>>>     It was preempted and then migrate into Core-1;
>>>>
>>>> BTW, do you have CONFIG_PREEMPT enabled?
>>>>
>>> Yes, CONFIG_PREEMPT enabled. Thus cause the task was preempted. :-)
>>
>> I told you that CONFIG_PREEMPT is not supported on ARM11MPCore :).
>
> Point!
>
> Is it better to add comment, such as "PREEMPT is not supported for
> ARM11MPCore" in somewhere(for now, I don't find such place)?
> then custom will be alerted with such notice when they trying change
> to PREEMPT.
>
> And again, thanks for your patience with me :)
>
>
>>
>>>> To be clear - is your application reading some data from flash and
>>>> trying to execute or it's the kernel doing the load via the
>>>> page/prefetch abort mechanism?
>>>
>>> In my current case, it is yaffs root file system.
>>>
>>>> If the latter, task running on core 0 gets a prefetch abort when
>>>> trying to execute some code. The kernel reads the page from flash (via
>>>> mtd, block layer, VFS) and copies it into RAM. It can be on any CPU as
>>>> long as it calls flush_dcache_page on the same CPU that copied the
>>>> data.
>>>>
>>>> No matter where the task was running or migrated to, if the code doing
>>>> the copy also called flush_dcache_page() on the same core, there is no
>>>> data left in the D-cache for that page.
>>>
>>> Yes, I agree with it.
>>> But how to flush the data cache on the same core with PREEMPT enabled?
>>
>> That's not easily possible. But you may get better results with
>> VOLUNTARY_PREEMPT.
>>
>>> And, I think according to the design, there is no such operation that
>>> guarantee it.
>>
>> RFO/WFO tricks only work on ARM11MPCore.
>>
>>> But any way, I think flush_dcache_page should be global effective.
>>> If ARMv6 MPCore didn't make it, we should try to accomplish it.
>>
>> Do some performance tests first.
>>
>
Catalin Marinas - May 31, 2012, 4:04 p.m.
On 31 May 2012 15:46,  <snakky.zhang@gmail.com> wrote:
> On 2012年05月31日 15:21, bill4carson wrote:
>> On 2012年05月31日 14:56, Catalin Marinas wrote:
>>> On Thu, May 31, 2012 at 06:51:22AM +0100, bill4carson wrote:
>>>> On 2012年05月31日 13:19, Catalin Marinas wrote:
>>>>> On 31 May 2012 13:06, bill4carson<bill4carson@gmail.com>   wrote:
>>>>>> On 2012年05月31日 11:58, Catalin Marinas wrote:
>>>>>>> I still didn't fully understand what the problem is. So, to make
>>>>>>> sure,
>>>>>>> if you run some applications from flash using a yaffs filesystem, you
>>>>>>> get random crashes. Is this correct? If yes, a solution is to
>>>>>>> actually
>>>>>>> call flush_dcache_page() on the CPU that does the page copying from
>>>>>>> flash into RAM, which could be the yaffs filesystem.
>>>>>>
>>>>>>
>>>>>> The story goes like this:
>>>>>> function "flush_dcache_page" should be global effective
>>>>>> but in ARMv6 MPCore, it was not, it was just local effective due
>>>>>> to hardware design.
>>>>>
>>>>>
>>>>> Yes, I know this.
>>>>
>>>>
>>>> Then, why not fix "flush_dcache_page" to make it globally effective?
>>>
>>>
>>> Performance?
>>>
>>> And it's also just ARM11MPCore microarchitecture specific.
>>>
> Yes, seems newer CPUs has no such limitation thus this function is global
> effective naturally. :-)
>
> And , I find Mips's c-r4k also has this issue but it use IPI to make it.
> Details in arch/mips/mm/c-r4k.c.

Rather than IPI we would better use the read-for-ownership trick like
in this patch to make flush_dcache_page global (no need for
write-for-ownership):

http://dchs.spinics.net/lists/arm-kernel/msg125075.html

(it may no longer apply, I haven't checked it for some time).

That's the first thing. Secondly you still need preemption disable so
that it is not preempted between RFO and the actual cache cleaning.
snakky.zhang@gmail.com - June 1, 2012, 1:11 a.m.
On 2012年06月01日 00:04, Catalin Marinas wrote:
> On 31 May 2012 15:46,<snakky.zhang@gmail.com>  wrote:
>> On 2012年05月31日 15:21, bill4carson wrote:
>>> On 2012年05月31日 14:56, Catalin Marinas wrote:
>>>> On Thu, May 31, 2012 at 06:51:22AM +0100, bill4carson wrote:
>>>>> On 2012年05月31日 13:19, Catalin Marinas wrote:
>>>>>> On 31 May 2012 13:06, bill4carson<bill4carson@gmail.com>     wrote:
>>>>>>> On 2012年05月31日 11:58, Catalin Marinas wrote:
>>>>>>>> I still didn't fully understand what the problem is. So, to make
>>>>>>>> sure,
>>>>>>>> if you run some applications from flash using a yaffs filesystem, you
>>>>>>>> get random crashes. Is this correct? If yes, a solution is to
>>>>>>>> actually
>>>>>>>> call flush_dcache_page() on the CPU that does the page copying from
>>>>>>>> flash into RAM, which could be the yaffs filesystem.
>>>>>>>
>>>>>>> The story goes like this:
>>>>>>> function "flush_dcache_page" should be global effective
>>>>>>> but in ARMv6 MPCore, it was not, it was just local effective due
>>>>>>> to hardware design.
>>>>>>
>>>>>> Yes, I know this.
>>>>>
>>>>> Then, why not fix "flush_dcache_page" to make it globally effective?
>>>>
>>>> Performance?
>>>>
>>>> And it's also just ARM11MPCore microarchitecture specific.
>>>>
>> Yes, seems newer CPUs has no such limitation thus this function is global
>> effective naturally. :-)
>>
>> And , I find Mips's c-r4k also has this issue but it use IPI to make it.
>> Details in arch/mips/mm/c-r4k.c.
> Rather than IPI we would better use the read-for-ownership trick like
> in this patch to make flush_dcache_page global (no need for
> write-for-ownership):

I think write for ownership is necessary for flush_dcache_xxx: Read 
guarantee local data cache get
newest data, at the same time write guarantee the data can be flushed 
into memory.

See Section 7.1 of ARM11 MPCore. Processor Technical Reference 
Manual(Revision: r2p0) P146/728:
======
Clean Applies to write-back data caches. If the cache line targeted by 
the Clean
operation contains stored data that has not yet been written out to main
memory, it is written to main memory, and the line is marked as clean.
======

So I am afraid without the write action, the "clean & invalidate" action 
later will not write data back to main
memory.

Another question here:  Why the flush_kern_dcache_xxx in 
arch/arm/mm/cache-v6 use "clean & invalidate"
progress instead of "clean"? Seems clean is enough here.

Please fix me if I mis-understand something.

>
> http://dchs.spinics.net/lists/arm-kernel/msg125075.html
>
> (it may no longer apply, I haven't checked it for some time).
>
> That's the first thing. Secondly you still need preemption disable so
> that it is not preempted between RFO and the actual cache cleaning.
>
PREEMPT. :-)

Get it. But currently, I can't find anything related to ARMv6 MPCore 
conflict with PREEMPT. So if it is also
necessary to add something in Documentation and related Kconfig to 
describe it and make sure PREEMPT
can't been enabled on such CPUs?

Thanks
Xiao
snakky.zhang@gmail.com - June 1, 2012, 1:34 a.m.
>> Yes, seems newer CPUs has no such limitation thus this function is global
>> effective naturally. :-)
>>
>> And , I find Mips's c-r4k also has this issue but it use IPI to make it.
>> Details in arch/mips/mm/c-r4k.c.
> Rather than IPI we would better use the read-for-ownership trick like
> in this patch to make flush_dcache_page global (no need for
> write-for-ownership):
>
> http://dchs.spinics.net/lists/arm-kernel/msg125075.html
>
> (it may no longer apply, I haven't checked it for some time).
>
> That's the first thing. Secondly you still need preemption disable so
> that it is not preempted between RFO and the actual cache cleaning.
>
And, another confusion for PREEMPT: Even if we disable preempt, with locally
effective flush_dcache_xxx, there is still possibility to reproduce such
issue(Similar with the previous case):

1) Task running on Core-0 loading text section into memory.
     It was schedule out and then migrate into Core-1;
2) On Core-1, this task continue loading it and then
     "flush_dcache_page" to make sure the loaded text section write
     into main memory.
3) Task tend to the loaded text section and running it.

Similar as the previous case, the difference lies in step1 that the task was
interrupted by timer interrupt. Thus it still can be switch out and then 
been
migrate to another core. Thus in step2 and 3, this issue may still been 
reproduced.
So, disable preempt can only lower the possibility of this issue but 
can't avoid it.

If I mis-understand something, please correct me.

Thanks
Xiao
Catalin Marinas - June 1, 2012, 3:25 a.m.
On Fri, Jun 01, 2012 at 02:11:59AM +0100, snakky.zhang@gmail.com wrote:
> On 2012年06月01日 00:04, Catalin Marinas wrote:
> > Rather than IPI we would better use the read-for-ownership trick like
> > in this patch to make flush_dcache_page global (no need for
> > write-for-ownership):
> 
> I think write for ownership is necessary for flush_dcache_xxx: Read
> guarantee local data cache get newest data, at the same time write
> guarantee the data can be flushed into memory.
> 
> See Section 7.1 of ARM11 MPCore. Processor Technical Reference 
> Manual(Revision: r2p0) P146/728:
> ======
> Clean Applies to write-back data caches. If the cache line targeted by
> the Clean operation contains stored data that has not yet been written
> out to main memory, it is written to main memory, and the line is
> marked as clean.
> ======
> 
> So I am afraid without the write action, the "clean & invalidate"
> action later will not write data back to main memory.

If there is a dirty cache line, it will be written to memory by the
clean&invalidate operation. If the data in the cache line is in a clean
state, it means that it is identical to the main memory (or L2 if
present).

With just a read, clean&invalidate would not invalidate (remove) the
cache lines from the other CPUs. Doing a write forces the cache line to
only be present on the current CPU (though automatically invalidating it
on the other CPUs).

> Another question here:  Why the flush_kern_dcache_xxx in
> arch/arm/mm/cache-v6 use "clean & invalidate" progress instead of
> "clean"? Seems clean is enough here.

I think in the context of VIPT caches clean would be enough.

> > http://dchs.spinics.net/lists/arm-kernel/msg125075.html
> >
> > (it may no longer apply, I haven't checked it for some time).
> >
> > That's the first thing. Secondly you still need preemption disable so
> > that it is not preempted between RFO and the actual cache cleaning.
>
> PREEMPT. :-)
> 
> Get it. But currently, I can't find anything related to ARMv6 MPCore
> conflict with PREEMPT. So if it is also necessary to add something in
> Documentation and related Kconfig to describe it and make sure PREEMPT
> can't been enabled on such CPUs?

Well, we either get it to work or, if not possible, we add a comment.
Let's try the former option first :)
Catalin Marinas - June 1, 2012, 3:29 a.m.
On Fri, Jun 01, 2012 at 02:34:12AM +0100, snakky.zhang@gmail.com wrote:
> >> Yes, seems newer CPUs has no such limitation thus this function is global
> >> effective naturally. :-)
> >>
> >> And , I find Mips's c-r4k also has this issue but it use IPI to make it.
> >> Details in arch/mips/mm/c-r4k.c.
> > Rather than IPI we would better use the read-for-ownership trick like
> > in this patch to make flush_dcache_page global (no need for
> > write-for-ownership):
> >
> > http://dchs.spinics.net/lists/arm-kernel/msg125075.html
> >
> > (it may no longer apply, I haven't checked it for some time).
> >
> > That's the first thing. Secondly you still need preemption disable so
> > that it is not preempted between RFO and the actual cache cleaning.
> >
> And, another confusion for PREEMPT: Even if we disable preempt, with locally
> effective flush_dcache_xxx, there is still possibility to reproduce such
> issue(Similar with the previous case):
> 
> 1) Task running on Core-0 loading text section into memory.
>      It was schedule out and then migrate into Core-1;
> 2) On Core-1, this task continue loading it and then
>      "flush_dcache_page" to make sure the loaded text section write
>      into main memory.
> 3) Task tend to the loaded text section and running it.
> 
> Similar as the previous case, the difference lies in step1 that the
> task was interrupted by timer interrupt. Thus it still can be switch
> out and then been migrate to another core. Thus in step2 and 3, this
> issue may still been reproduced.  So, disable preempt can only lower
> the possibility of this issue but can't avoid it.

It would work as long as the data copying into the text area (done by
the driver and VFS layer) and the flush_dcache_page() sequence are not
preemptible. A timer interrupt between data copying and
flush_dcache_page() would interrupt a kernel routine which is not
preemptible.
snakky.zhang@gmail.com - June 1, 2012, 5:21 a.m.
On 2012年06月01日 11:25, Catalin Marinas wrote:
> On Fri, Jun 01, 2012 at 02:11:59AM +0100, snakky.zhang@gmail.com wrote:
>> On 2012年06月01日 00:04, Catalin Marinas wrote:
>>> Rather than IPI we would better use the read-for-ownership trick like
>>> in this patch to make flush_dcache_page global (no need for
>>> write-for-ownership):
>> I think write for ownership is necessary for flush_dcache_xxx: Read
>> guarantee local data cache get newest data, at the same time write
>> guarantee the data can be flushed into memory.
>>
>> See Section 7.1 of ARM11 MPCore. Processor Technical Reference
>> Manual(Revision: r2p0) P146/728:
>> ======
>> Clean Applies to write-back data caches. If the cache line targeted by
>> the Clean operation contains stored data that has not yet been written
>> out to main memory, it is written to main memory, and the line is
>> marked as clean.
>> ======
>>
>> So I am afraid without the write action, the "clean&  invalidate"
>> action later will not write data back to main memory.
> If there is a dirty cache line, it will be written to memory by the
> clean&invalidate operation. If the data in the cache line is in a clean
> state, it means that it is identical to the main memory (or L2 if
> present).
>
> With just a read, clean&invalidate would not invalidate (remove) the
> cache lines from the other CPUs. Doing a write forces the cache line to
> only be present on the current CPU (though automatically invalidating it
> on the other CPUs).
>
>> Another question here:  Why the flush_kern_dcache_xxx in
>> arch/arm/mm/cache-v6 use "clean&  invalidate" progress instead of
>> "clean"? Seems clean is enough here.
> I think in the context of VIPT caches clean would be enough.
>
>>> http://dchs.spinics.net/lists/arm-kernel/msg125075.html
>>>
>>> (it may no longer apply, I haven't checked it for some time).
>>>
>>> That's the first thing. Secondly you still need preemption disable so
>>> that it is not preempted between RFO and the actual cache cleaning.
>> PREEMPT. :-)
>>
>> Get it. But currently, I can't find anything related to ARMv6 MPCore
>> conflict with PREEMPT. So if it is also necessary to add something in
>> Documentation and related Kconfig to describe it and make sure PREEMPT
>> can't been enabled on such CPUs?
> Well, we either get it to work or, if not possible, we add a comment.
> Let's try the former option first :)
>
Thanks for your patient explanation. :-)

Thanks a lot!
Xiao
Russell King - ARM Linux - June 3, 2012, 11:34 a.m.
On Fri, Jun 01, 2012 at 11:29:13AM +0800, Catalin Marinas wrote:
> On Fri, Jun 01, 2012 at 02:34:12AM +0100, snakky.zhang@gmail.com wrote:
> > >> Yes, seems newer CPUs has no such limitation thus this function is global
> > >> effective naturally. :-)
> > >>
> > >> And , I find Mips's c-r4k also has this issue but it use IPI to make it.
> > >> Details in arch/mips/mm/c-r4k.c.
> > > Rather than IPI we would better use the read-for-ownership trick like
> > > in this patch to make flush_dcache_page global (no need for
> > > write-for-ownership):
> > >
> > > http://dchs.spinics.net/lists/arm-kernel/msg125075.html
> > >
> > > (it may no longer apply, I haven't checked it for some time).
> > >
> > > That's the first thing. Secondly you still need preemption disable so
> > > that it is not preempted between RFO and the actual cache cleaning.
> > >
> > And, another confusion for PREEMPT: Even if we disable preempt, with locally
> > effective flush_dcache_xxx, there is still possibility to reproduce such
> > issue(Similar with the previous case):
> > 
> > 1) Task running on Core-0 loading text section into memory.
> >      It was schedule out and then migrate into Core-1;
> > 2) On Core-1, this task continue loading it and then
> >      "flush_dcache_page" to make sure the loaded text section write
> >      into main memory.
> > 3) Task tend to the loaded text section and running it.
> > 
> > Similar as the previous case, the difference lies in step1 that the
> > task was interrupted by timer interrupt. Thus it still can be switch
> > out and then been migrate to another core. Thus in step2 and 3, this
> > issue may still been reproduced.  So, disable preempt can only lower
> > the possibility of this issue but can't avoid it.
> 
> It would work as long as the data copying into the text area (done by
> the driver and VFS layer) and the flush_dcache_page() sequence are not
> preemptible. A timer interrupt between data copying and
> flush_dcache_page() would interrupt a kernel routine which is not
> preemptible.

And that doesn't matter because on a non-preemptible kernel, timer ticks
do _not_ cause the threads to be rescheduled while in kernel mode.  If
they did, it would be a _preemptible_ kernel.

The only things on a non-preemptible kernel which cause a schedule point
are functions which may sleep, so semaphores, waiting for events, etc.
snakky.zhang@gmail.com - June 4, 2012, 9:20 a.m.
On 2012年06月03日 19:34, Russell King - ARM Linux wrote:
> On Fri, Jun 01, 2012 at 11:29:13AM +0800, Catalin Marinas wrote:
>> On Fri, Jun 01, 2012 at 02:34:12AM +0100, snakky.zhang@gmail.com wrote:
>>>>> Yes, seems newer CPUs has no such limitation thus this function is global
>>>>> effective naturally. :-)
>>>>>
>>>>> And , I find Mips's c-r4k also has this issue but it use IPI to make it.
>>>>> Details in arch/mips/mm/c-r4k.c.
>>>> Rather than IPI we would better use the read-for-ownership trick like
>>>> in this patch to make flush_dcache_page global (no need for
>>>> write-for-ownership):
>>>>
>>>> http://dchs.spinics.net/lists/arm-kernel/msg125075.html
>>>>
>>>> (it may no longer apply, I haven't checked it for some time).
>>>>
>>>> That's the first thing. Secondly you still need preemption disable so
>>>> that it is not preempted between RFO and the actual cache cleaning.
>>>>
>>> And, another confusion for PREEMPT: Even if we disable preempt, with locally
>>> effective flush_dcache_xxx, there is still possibility to reproduce such
>>> issue(Similar with the previous case):
>>>
>>> 1) Task running on Core-0 loading text section into memory.
>>>       It was schedule out and then migrate into Core-1;
>>> 2) On Core-1, this task continue loading it and then
>>>       "flush_dcache_page" to make sure the loaded text section write
>>>       into main memory.
>>> 3) Task tend to the loaded text section and running it.
>>>
>>> Similar as the previous case, the difference lies in step1 that the
>>> task was interrupted by timer interrupt. Thus it still can be switch
>>> out and then been migrate to another core. Thus in step2 and 3, this
>>> issue may still been reproduced.  So, disable preempt can only lower
>>> the possibility of this issue but can't avoid it.
>> It would work as long as the data copying into the text area (done by
>> the driver and VFS layer) and the flush_dcache_page() sequence are not
>> preemptible. A timer interrupt between data copying and
>> flush_dcache_page() would interrupt a kernel routine which is not
>> preemptible.
> And that doesn't matter because on a non-preemptible kernel, timer ticks
> do _not_ cause the threads to be rescheduled while in kernel mode.  If
> they did, it would be a _preemptible_ kernel.
>
> The only things on a non-preemptible kernel which cause a schedule point
> are functions which may sleep, so semaphores, waiting for events, etc.
>
>
Thanks for your description. :-)

Thanks
Xiao

Patch

--- a/arch/arm/mm/cache-v6.S
+++ b/arch/arm/mm/cache-v6.S
@@ -170,6 +170,10 @@  ENDPROC(v6_coherent_kern_range)
  ENTRY(v6_flush_kern_dcache_area)
         add     r1, r0, r1
  1:
+#ifdef CONFIG_SMP
+       ldr     r2, [r0]                        @ read for ownership
+       str     r2, [r0]                        @ write for ownership
+#endif /* CONFIG_SMP */
  #ifdef HARVARD_CACHE
         mcr     p15, 0, r0, c7, c14, 1          @ clean & invalidate D line
  #else