Message ID | 4FC45E6B.2070202@gmail.com |
---|---|
State | New |
Headers | show |
On Tue, May 29, 2012 at 06:28:11AM +0100, bill4carson wrote: > --- a/arch/arm/mm/cache-v6.S > +++ b/arch/arm/mm/cache-v6.S > @@ -170,6 +170,10 @@ ENDPROC(v6_coherent_kern_range) > ENTRY(v6_flush_kern_dcache_area) > add r1, r0, r1 > 1: > +#ifdef CONFIG_SMP > + ldr r2, [r0] @ read for ownership > + str r2, [r0] @ write for ownership > +#endif /* CONFIG_SMP */ > #ifdef HARVARD_CACHE > mcr p15, 0, r0, c7, c14, 1 @ clean & invalidate D line > #else I don't think the invalidation is needed here, so you probably don't need to hack this function at all. > But I have no idea on how to accomplish the v6_flush_kern_cache_all, > maybe IPI is needed? We could add an IPI to invalidate the I-caches on the other cores, however I haven't checked to see if we could instead do something on the CPU migration path which avoid the need for the broadcasting. Will
Hi, Will First of all, Thanks your attentions for this issue. On 2012年05月30日 14:38, Will Deacon wrote: > On Tue, May 29, 2012 at 06:28:11AM +0100, bill4carson wrote: >> --- a/arch/arm/mm/cache-v6.S >> +++ b/arch/arm/mm/cache-v6.S >> @@ -170,6 +170,10 @@ ENDPROC(v6_coherent_kern_range) >> ENTRY(v6_flush_kern_dcache_area) >> add r1, r0, r1 >> 1: >> +#ifdef CONFIG_SMP >> + ldr r2, [r0] @ read for ownership >> + str r2, [r0] @ write for ownership >> +#endif /* CONFIG_SMP */ >> #ifdef HARVARD_CACHE >> mcr p15, 0, r0, c7, c14, 1 @ clean& invalidate D line >> #else > > I don't think the invalidation is needed here, so you probably don't need to > hack this function at all. CPU0 CPU1 +--------------------+ +--------------------+ +----------+ | L1 Dcache part_a | | L1 Dcache part_b | | Icache | +--------------------+ +--------------------+ +----------+ ^ | | +------------------------+ | | | SCU V +-------|-------------------|---------+ | +------- ??? -------+ | | | | +-----------------|-------------------+ | V Main Memory +-------------------------------------+ | stale part_a part_b | +-------------------------------------+ 1. task t runs on CPU 0, it executes one program in nand flash, so first task t read *part* of this program into its local Dcache, let's call it part_a; 2. task t migrates from CPU0 onto CPU1, in there reads the rest of program into its local Dcache, label it part_b; 3. on CPU1, task t flush the whole address range of this program, As with ARM11 MPCore, this flush is locally effective, after flush part_a still resides in CPU0 L1 Dcache. 4. When this program gets executed, CPU 1 fetch its instructions from Icache, SCU will notice this action, Question: At this point, is stale part_a in main memory *or* L1 Dcache part_a in CPU0 routed into CPU1 as instruction? > >> But I have no idea on how to accomplish the v6_flush_kern_cache_all, >> maybe IPI is needed? > > We could add an IPI to invalidate the I-caches on the other cores, however > I haven't checked to see if we could instead do something on the CPU > migration path which avoid the need for the broadcasting. > Another workaround is mark the task migrations in function "pull_task", while in function "context_switch", check it to see if any tasks migrated into current CPU, if there do be such tasks, flush entire data cache on remote core(the source core of task migration) and wait the operation accomplished. This is also verified. But from my point of view, this is just a temporary workaround instead of resolution. The little patch above should be the right one that fix this bug: Make the flush_kern_dcache_area global affective. It's very pleased if you could give your insights in this. > Will >
Hi, On Wed, May 30, 2012 at 11:01:59AM +0100, bill4carson wrote: > On 2012年05月30日 14:38, Will Deacon wrote: > > On Tue, May 29, 2012 at 06:28:11AM +0100, bill4carson wrote: > >> --- a/arch/arm/mm/cache-v6.S > >> +++ b/arch/arm/mm/cache-v6.S > >> @@ -170,6 +170,10 @@ ENDPROC(v6_coherent_kern_range) > >> ENTRY(v6_flush_kern_dcache_area) > >> add r1, r0, r1 > >> 1: > >> +#ifdef CONFIG_SMP > >> + ldr r2, [r0] @ read for ownership > >> + str r2, [r0] @ write for ownership > >> +#endif /* CONFIG_SMP */ > >> #ifdef HARVARD_CACHE > >> mcr p15, 0, r0, c7, c14, 1 @ clean& invalidate D line > >> #else > > > > I don't think the invalidation is needed here, so you probably don't need to > > hack this function at all. ... > 1. task t runs on CPU 0, it executes one program in nand flash, > so first task t read *part* of this program into its local Dcache, > let's call it part_a; > > 2. task t migrates from CPU0 onto CPU1, in there reads the rest of > program into its local Dcache, label it part_b; > > 3. on CPU1, task t flush the whole address range of this program, > As with ARM11 MPCore, this flush is locally effective, after flush > part_a still resides in CPU0 L1 Dcache. > > 4. When this program gets executed, CPU 1 fetch its instructions from > Icache, SCU will notice this action, > > Question: > At this point, is stale part_a in main memory *or* L1 Dcache part_a > in CPU0 routed into CPU1 as instruction? This has been discussed in the past. The first thing is to disable CONFIG_PREEMPT to make sure you are not preempted between page copying and task execution. The code doing the page copy needs to call flush_dcache_page() on the CPU where the copy occurred and Linux does a non-lazy D-cache flush. But there are many situations where Linux doesn't do this (you can google for flush_dcache_page and my name to find a few places where I tried to fix this). AFAIK, The SCU only snoops the D-cache, not the I-cache. We have a full I-cache invalidation during task migration.
On 2012年05月31日 11:00, Catalin Marinas wrote: > Hi, > > On Wed, May 30, 2012 at 11:01:59AM +0100, bill4carson wrote: >> On 2012年05月30日 14:38, Will Deacon wrote: >>> On Tue, May 29, 2012 at 06:28:11AM +0100, bill4carson wrote: >>>> --- a/arch/arm/mm/cache-v6.S >>>> +++ b/arch/arm/mm/cache-v6.S >>>> @@ -170,6 +170,10 @@ ENDPROC(v6_coherent_kern_range) >>>> ENTRY(v6_flush_kern_dcache_area) >>>> add r1, r0, r1 >>>> 1: >>>> +#ifdef CONFIG_SMP >>>> + ldr r2, [r0] @ read for ownership >>>> + str r2, [r0] @ write for ownership >>>> +#endif /* CONFIG_SMP */ >>>> #ifdef HARVARD_CACHE >>>> mcr p15, 0, r0, c7, c14, 1 @ clean& invalidate D line >>>> #else >>> >>> I don't think the invalidation is needed here, so you probably don't need to >>> hack this function at all. > ... >> 1. task t runs on CPU 0, it executes one program in nand flash, >> so first task t read *part* of this program into its local Dcache, >> let's call it part_a; >> >> 2. task t migrates from CPU0 onto CPU1, in there reads the rest of >> program into its local Dcache, label it part_b; >> >> 3. on CPU1, task t flush the whole address range of this program, >> As with ARM11 MPCore, this flush is locally effective, after flush >> part_a still resides in CPU0 L1 Dcache. >> >> 4. When this program gets executed, CPU 1 fetch its instructions from >> Icache, SCU will notice this action, >> >> Question: >> At this point, is stale part_a in main memory *or* L1 Dcache part_a >> in CPU0 routed into CPU1 as instruction? > > This has been discussed in the past. The first thing is to disable > CONFIG_PREEMPT to make sure you are not preempted between page copying > and task execution. The code doing the page copy needs to call > flush_dcache_page() on the CPU where the copy occurred and Linux does a > non-lazy D-cache flush. But there are many situations where Linux > doesn't do this (you can google for flush_dcache_page and my name to > find a few places where I tried to fix this). > Thanks:) > AFAIK, The SCU only snoops the D-cache, not the I-cache. We have a full > I-cache invalidation during task migration. ^^^^^^^^^^^^ Could you please point it to me?
On Thu, May 31, 2012 at 04:11:46AM +0100, bill4carson wrote: > On 2012年05月31日 11:00, Catalin Marinas wrote: > > AFAIK, The SCU only snoops the D-cache, not the I-cache. We have a full > > I-cache invalidation during task migration. > > ^^^^^^^^^^^^ > Could you please point it to me? There is a check in switch_mm() in the asm/mmu_context.h file.
On Thu, May 31, 2012 at 04:11:46AM +0100, bill4carson wrote: > On 2012年05月31日 11:00, Catalin Marinas wrote: > > On Wed, May 30, 2012 at 11:01:59AM +0100, bill4carson wrote: > >> On 2012年05月30日 14:38, Will Deacon wrote: > >>> On Tue, May 29, 2012 at 06:28:11AM +0100, bill4carson wrote: > >>>> --- a/arch/arm/mm/cache-v6.S > >>>> +++ b/arch/arm/mm/cache-v6.S > >>>> @@ -170,6 +170,10 @@ ENDPROC(v6_coherent_kern_range) > >>>> ENTRY(v6_flush_kern_dcache_area) > >>>> add r1, r0, r1 > >>>> 1: > >>>> +#ifdef CONFIG_SMP > >>>> + ldr r2, [r0] @ read for ownership > >>>> + str r2, [r0] @ write for ownership > >>>> +#endif /* CONFIG_SMP */ > >>>> #ifdef HARVARD_CACHE > >>>> mcr p15, 0, r0, c7, c14, 1 @ clean& invalidate D line > >>>> #else > >>> > >>> I don't think the invalidation is needed here, so you probably don't need to > >>> hack this function at all. > > ... > >> 1. task t runs on CPU 0, it executes one program in nand flash, > >> so first task t read *part* of this program into its local Dcache, > >> let's call it part_a; > >> > >> 2. task t migrates from CPU0 onto CPU1, in there reads the rest of > >> program into its local Dcache, label it part_b; > >> > >> 3. on CPU1, task t flush the whole address range of this program, > >> As with ARM11 MPCore, this flush is locally effective, after flush > >> part_a still resides in CPU0 L1 Dcache. > >> > >> 4. When this program gets executed, CPU 1 fetch its instructions from > >> Icache, SCU will notice this action, > >> > >> Question: > >> At this point, is stale part_a in main memory *or* L1 Dcache part_a > >> in CPU0 routed into CPU1 as instruction? > > > > This has been discussed in the past. The first thing is to disable > > CONFIG_PREEMPT to make sure you are not preempted between page copying > > and task execution. The code doing the page copy needs to call > > flush_dcache_page() on the CPU where the copy occurred and Linux does a > > non-lazy D-cache flush. But there are many situations where Linux > > doesn't do this (you can google for flush_dcache_page and my name to > > find a few places where I tried to fix this). > > Thanks:) BTW, see this as a starting point (and a hack): http://article.gmane.org/gmane.linux.ports.arm.kernel/51556
On 2012年05月31日 11:19, Catalin Marinas wrote: > On Thu, May 31, 2012 at 04:11:46AM +0100, bill4carson wrote: >> On 2012年05月31日 11:00, Catalin Marinas wrote: >>> On Wed, May 30, 2012 at 11:01:59AM +0100, bill4carson wrote: >>>> On 2012年05月30日 14:38, Will Deacon wrote: >>>>> On Tue, May 29, 2012 at 06:28:11AM +0100, bill4carson wrote: >>>>>> --- a/arch/arm/mm/cache-v6.S >>>>>> +++ b/arch/arm/mm/cache-v6.S >>>>>> @@ -170,6 +170,10 @@ ENDPROC(v6_coherent_kern_range) >>>>>> ENTRY(v6_flush_kern_dcache_area) >>>>>> add r1, r0, r1 >>>>>> 1: >>>>>> +#ifdef CONFIG_SMP >>>>>> + ldr r2, [r0] @ read for ownership >>>>>> + str r2, [r0] @ write for ownership >>>>>> +#endif /* CONFIG_SMP */ >>>>>> #ifdef HARVARD_CACHE >>>>>> mcr p15, 0, r0, c7, c14, 1 @ clean& invalidate D line >>>>>> #else >>>>> >>>>> I don't think the invalidation is needed here, so you probably don't need to >>>>> hack this function at all. >>> ... >>>> 1. task t runs on CPU 0, it executes one program in nand flash, >>>> so first task t read *part* of this program into its local Dcache, >>>> let's call it part_a; >>>> >>>> 2. task t migrates from CPU0 onto CPU1, in there reads the rest of >>>> program into its local Dcache, label it part_b; >>>> >>>> 3. on CPU1, task t flush the whole address range of this program, >>>> As with ARM11 MPCore, this flush is locally effective, after flush >>>> part_a still resides in CPU0 L1 Dcache. >>>> >>>> 4. When this program gets executed, CPU 1 fetch its instructions from >>>> Icache, SCU will notice this action, >>>> >>>> Question: >>>> At this point, is stale part_a in main memory *or* L1 Dcache part_a >>>> in CPU0 routed into CPU1 as instruction? >>> >>> This has been discussed in the past. The first thing is to disable >>> CONFIG_PREEMPT to make sure you are not preempted between page copying >>> and task execution. The code doing the page copy needs to call >>> flush_dcache_page() on the CPU where the copy occurred and Linux does a >>> non-lazy D-cache flush. But there are many situations where Linux >>> doesn't do this (you can google for flush_dcache_page and my name to >>> find a few places where I tried to fix this). >> >> Thanks:) > > BTW, see this as a starting point (and a hack): > > http://article.gmane.org/gmane.linux.ports.arm.kernel/51556 I think we has some mis-understanding here :( As for:v6_flush_kern_dcache_area/v6_flush_kern_dcache_all these two hooks is supposed to be globally effective, *not* locally! Hence, there should below fix to make it as globally effective. +#ifdef CONFIG_SMP + ldr r2, [r0] @ read for ownership + str r2, [r0] @ write for ownership +#endif /* CONFIG_SMP */ Or there maybe some other better solution for this issue. thanks
On 31 May 2012 11:38, bill4carson <bill4carson@gmail.com> wrote: > On 2012年05月31日 11:19, Catalin Marinas wrote: >> BTW, see this as a starting point (and a hack): >> >> http://article.gmane.org/gmane.linux.ports.arm.kernel/51556 > > I think we has some mis-understanding here :( > > As for:v6_flush_kern_dcache_area/v6_flush_kern_dcache_all > these two hooks is supposed to be globally effective, *not* > locally! > > Hence, there should below fix to make it as globally effective. > > > +#ifdef CONFIG_SMP > + ldr r2, [r0] @ read for ownership > + str r2, [r0] @ write for ownership > +#endif /* CONFIG_SMP */ > > Or there maybe some other better solution for this issue. I still didn't fully understand what the problem is. So, to make sure, if you run some applications from flash using a yaffs filesystem, you get random crashes. Is this correct? If yes, a solution is to actually call flush_dcache_page() on the CPU that does the page copying from flash into RAM, which could be the yaffs filesystem.
On 2012年05月31日 11:58, Catalin Marinas wrote: > On 31 May 2012 11:38, bill4carson<bill4carson@gmail.com> wrote: >> On 2012年05月31日 11:19, Catalin Marinas wrote: >>> BTW, see this as a starting point (and a hack): >>> >>> http://article.gmane.org/gmane.linux.ports.arm.kernel/51556 >> >> I think we has some mis-understanding here :( >> >> As for:v6_flush_kern_dcache_area/v6_flush_kern_dcache_all >> these two hooks is supposed to be globally effective, *not* >> locally! >> >> Hence, there should below fix to make it as globally effective. >> >> >> +#ifdef CONFIG_SMP >> + ldr r2, [r0] @ read for ownership >> + str r2, [r0] @ write for ownership >> +#endif /* CONFIG_SMP */ >> >> Or there maybe some other better solution for this issue. > > I still didn't fully understand what the problem is. So, to make sure, > if you run some applications from flash using a yaffs filesystem, you > get random crashes. Is this correct? If yes, a solution is to actually > call flush_dcache_page() on the CPU that does the page copying from > flash into RAM, which could be the yaffs filesystem. > The story goes like this: function "flush_dcache_page" should be global effective but in ARMv6 MPCore, it was not, it was just local effective due to hardware design. This may cause error in some cases for example: 1) Task running on Core-0 loading text section into memory. It was preempted and then migrate into Core-1; 2) On Core-1, this task continue loading it and then "flush_dcache_page" to make sure the loaded text section write into main memory. 3) Task tend to the loaded text section and running it. If the "flush_dcache_page" was not global effective, there maybe data still in Core-0's data cache, not write into main memory. Thus in step 3, error instruction maybe fetched thus cause strange error. If I'm missing something here, please let me know. thanks
On 31 May 2012 13:06, bill4carson <bill4carson@gmail.com> wrote: > On 2012年05月31日 11:58, Catalin Marinas wrote: >> I still didn't fully understand what the problem is. So, to make sure, >> if you run some applications from flash using a yaffs filesystem, you >> get random crashes. Is this correct? If yes, a solution is to actually >> call flush_dcache_page() on the CPU that does the page copying from >> flash into RAM, which could be the yaffs filesystem. > > The story goes like this: > function "flush_dcache_page" should be global effective > but in ARMv6 MPCore, it was not, it was just local effective due > to hardware design. Yes, I know this. > This may cause error in some cases for example: > > 1) Task running on Core-0 loading text section into memory. > It was preempted and then migrate into Core-1; BTW, do you have CONFIG_PREEMPT enabled? To be clear - is your application reading some data from flash and trying to execute or it's the kernel doing the load via the page/prefetch abort mechanism? If the latter, task running on core 0 gets a prefetch abort when trying to execute some code. The kernel reads the page from flash (via mtd, block layer, VFS) and copies it into RAM. It can be on any CPU as long as it calls flush_dcache_page on the same CPU that copied the data. No matter where the task was running or migrated to, if the code doing the copy also called flush_dcache_page() on the same core, there is no data left in the D-cache for that page. > 2) On Core-1, this task continue loading it and then > "flush_dcache_page" to make sure the loaded text section write > into main memory. The flush_dcache_page() must be called by the code doing the copy. If that copy happened on core 0, the call is done there and not where the task migrated. We don't do lazy flushing on ARM11MPCore. > 3) Task tend to the loaded text section and running it. > > If the "flush_dcache_page" was not global effective, > there maybe data still in Core-0's data cache, not write > into main memory. Thus in step 3, error instruction maybe > fetched thus cause strange error. This can only happen if you have either preempt enabled (so that the kernel code doing the copy is migrated) or the mtd driver or fs do not call flush_dcache_page().
On 2012年05月31日 13:19, Catalin Marinas wrote: > On 31 May 2012 13:06, bill4carson<bill4carson@gmail.com> wrote: >> On 2012年05月31日 11:58, Catalin Marinas wrote: >>> I still didn't fully understand what the problem is. So, to make sure, >>> if you run some applications from flash using a yaffs filesystem, you >>> get random crashes. Is this correct? If yes, a solution is to actually >>> call flush_dcache_page() on the CPU that does the page copying from >>> flash into RAM, which could be the yaffs filesystem. >> >> The story goes like this: >> function "flush_dcache_page" should be global effective >> but in ARMv6 MPCore, it was not, it was just local effective due >> to hardware design. > > Yes, I know this. > Then, why not fix "flush_dcache_page" to make it globally effective? >> This may cause error in some cases for example: >> >> 1) Task running on Core-0 loading text section into memory. >> It was preempted and then migrate into Core-1; > > BTW, do you have CONFIG_PREEMPT enabled? > Yes, CONFIG_PREEMPT enabled. Thus cause the task was preempted. :-) > To be clear - is your application reading some data from flash and > trying to execute or it's the kernel doing the load via the > page/prefetch abort mechanism? > In my current case, it is yaffs root file system. > If the latter, task running on core 0 gets a prefetch abort when > trying to execute some code. The kernel reads the page from flash (via > mtd, block layer, VFS) and copies it into RAM. It can be on any CPU as > long as it calls flush_dcache_page on the same CPU that copied the > data. > > No matter where the task was running or migrated to, if the code doing > the copy also called flush_dcache_page() on the same core, there is no > data left in the D-cache for that page. Yes, I agree with it. But how to flush the data cache on the same core with PREEMPT enabled? And, I think according to the design, there is no such operation that guarantee it. > >> 2) On Core-1, this task continue loading it and then >> "flush_dcache_page" to make sure the loaded text section write >> into main memory. > > The flush_dcache_page() must be called by the code doing the copy. If > that copy happened on core 0, the call is done there and not where the > task migrated. We don't do lazy flushing on ARM11MPCore. > WHY? >> 3) Task tend to the loaded text section and running it. >> >> If the "flush_dcache_page" was not global effective, >> there maybe data still in Core-0's data cache, not write >> into main memory. Thus in step 3, error instruction maybe >> fetched thus cause strange error. > > This can only happen if you have either preempt enabled (so that the > kernel code doing the copy is migrated) or the mtd driver or fs do not > call flush_dcache_page(). > PREEMPT + MTD root file system. :-) But any way, I think flush_dcache_page should be global effective. If ARMv6 MPCore didn't make it, we should try to accomplish it.
On Thu, May 31, 2012 at 06:51:22AM +0100, bill4carson wrote: > On 2012年05月31日 13:19, Catalin Marinas wrote: > > On 31 May 2012 13:06, bill4carson<bill4carson@gmail.com> wrote: > >> On 2012年05月31日 11:58, Catalin Marinas wrote: > >>> I still didn't fully understand what the problem is. So, to make sure, > >>> if you run some applications from flash using a yaffs filesystem, you > >>> get random crashes. Is this correct? If yes, a solution is to actually > >>> call flush_dcache_page() on the CPU that does the page copying from > >>> flash into RAM, which could be the yaffs filesystem. > >> > >> The story goes like this: > >> function "flush_dcache_page" should be global effective > >> but in ARMv6 MPCore, it was not, it was just local effective due > >> to hardware design. > > > > Yes, I know this. > > Then, why not fix "flush_dcache_page" to make it globally effective? Performance? And it's also just ARM11MPCore microarchitecture specific. > >> This may cause error in some cases for example: > >> > >> 1) Task running on Core-0 loading text section into memory. > >> It was preempted and then migrate into Core-1; > > > > BTW, do you have CONFIG_PREEMPT enabled? > > > Yes, CONFIG_PREEMPT enabled. Thus cause the task was preempted. :-) I told you that CONFIG_PREEMPT is not supported on ARM11MPCore :). > > To be clear - is your application reading some data from flash and > > trying to execute or it's the kernel doing the load via the > > page/prefetch abort mechanism? > > In my current case, it is yaffs root file system. > > > If the latter, task running on core 0 gets a prefetch abort when > > trying to execute some code. The kernel reads the page from flash (via > > mtd, block layer, VFS) and copies it into RAM. It can be on any CPU as > > long as it calls flush_dcache_page on the same CPU that copied the > > data. > > > > No matter where the task was running or migrated to, if the code doing > > the copy also called flush_dcache_page() on the same core, there is no > > data left in the D-cache for that page. > > Yes, I agree with it. > But how to flush the data cache on the same core with PREEMPT enabled? That's not easily possible. But you may get better results with VOLUNTARY_PREEMPT. > And, I think according to the design, there is no such operation that > guarantee it. RFO/WFO tricks only work on ARM11MPCore. > But any way, I think flush_dcache_page should be global effective. > If ARMv6 MPCore didn't make it, we should try to accomplish it. Do some performance tests first.
On 2012年05月31日 14:56, Catalin Marinas wrote: > On Thu, May 31, 2012 at 06:51:22AM +0100, bill4carson wrote: >> On 2012年05月31日 13:19, Catalin Marinas wrote: >>> On 31 May 2012 13:06, bill4carson<bill4carson@gmail.com> wrote: >>>> On 2012年05月31日 11:58, Catalin Marinas wrote: >>>>> I still didn't fully understand what the problem is. So, to make sure, >>>>> if you run some applications from flash using a yaffs filesystem, you >>>>> get random crashes. Is this correct? If yes, a solution is to actually >>>>> call flush_dcache_page() on the CPU that does the page copying from >>>>> flash into RAM, which could be the yaffs filesystem. >>>> >>>> The story goes like this: >>>> function "flush_dcache_page" should be global effective >>>> but in ARMv6 MPCore, it was not, it was just local effective due >>>> to hardware design. >>> >>> Yes, I know this. >> >> Then, why not fix "flush_dcache_page" to make it globally effective? > > Performance? > > And it's also just ARM11MPCore microarchitecture specific. > >>>> This may cause error in some cases for example: >>>> >>>> 1) Task running on Core-0 loading text section into memory. >>>> It was preempted and then migrate into Core-1; >>> >>> BTW, do you have CONFIG_PREEMPT enabled? >>> >> Yes, CONFIG_PREEMPT enabled. Thus cause the task was preempted. :-) > > I told you that CONFIG_PREEMPT is not supported on ARM11MPCore :). Point! Is it better to add comment, such as "PREEMPT is not supported for ARM11MPCore" in somewhere(for now, I don't find such place)? then custom will be alerted with such notice when they trying change to PREEMPT. And again, thanks for your patience with me :) > >>> To be clear - is your application reading some data from flash and >>> trying to execute or it's the kernel doing the load via the >>> page/prefetch abort mechanism? >> >> In my current case, it is yaffs root file system. >> >>> If the latter, task running on core 0 gets a prefetch abort when >>> trying to execute some code. The kernel reads the page from flash (via >>> mtd, block layer, VFS) and copies it into RAM. It can be on any CPU as >>> long as it calls flush_dcache_page on the same CPU that copied the >>> data. >>> >>> No matter where the task was running or migrated to, if the code doing >>> the copy also called flush_dcache_page() on the same core, there is no >>> data left in the D-cache for that page. >> >> Yes, I agree with it. >> But how to flush the data cache on the same core with PREEMPT enabled? > > That's not easily possible. But you may get better results with > VOLUNTARY_PREEMPT. > >> And, I think according to the design, there is no such operation that >> guarantee it. > > RFO/WFO tricks only work on ARM11MPCore. > >> But any way, I think flush_dcache_page should be global effective. >> If ARMv6 MPCore didn't make it, we should try to accomplish it. > > Do some performance tests first. >
On 2012年05月31日 15:21, bill4carson wrote: > > > On 2012年05月31日 14:56, Catalin Marinas wrote: >> On Thu, May 31, 2012 at 06:51:22AM +0100, bill4carson wrote: >>> On 2012年05月31日 13:19, Catalin Marinas wrote: >>>> On 31 May 2012 13:06, bill4carson<bill4carson@gmail.com> wrote: >>>>> On 2012年05月31日 11:58, Catalin Marinas wrote: >>>>>> I still didn't fully understand what the problem is. So, to make >>>>>> sure, >>>>>> if you run some applications from flash using a yaffs filesystem, >>>>>> you >>>>>> get random crashes. Is this correct? If yes, a solution is to >>>>>> actually >>>>>> call flush_dcache_page() on the CPU that does the page copying from >>>>>> flash into RAM, which could be the yaffs filesystem. >>>>> >>>>> The story goes like this: >>>>> function "flush_dcache_page" should be global effective >>>>> but in ARMv6 MPCore, it was not, it was just local effective due >>>>> to hardware design. >>>> >>>> Yes, I know this. >>> >>> Then, why not fix "flush_dcache_page" to make it globally effective? >> >> Performance? >> >> And it's also just ARM11MPCore microarchitecture specific. >> Yes, seems newer CPUs has no such limitation thus this function is global effective naturally. :-) And , I find Mips's c-r4k also has this issue but it use IPI to make it. Details in arch/mips/mm/c-r4k.c. From my point of view, PREEMPT should not related to CPU type. So if this type of CPU does not support PREEMPT for performance reason, can we add something in Documentation and related Kconfig to make/mark it? Or maybe disable task migration is also a choice. :-) Thanks Xiao >>>>> This may cause error in some cases for example: >>>>> >>>>> 1) Task running on Core-0 loading text section into memory. >>>>> It was preempted and then migrate into Core-1; >>>> >>>> BTW, do you have CONFIG_PREEMPT enabled? >>>> >>> Yes, CONFIG_PREEMPT enabled. Thus cause the task was preempted. :-) >> >> I told you that CONFIG_PREEMPT is not supported on ARM11MPCore :). > > Point! > > Is it better to add comment, such as "PREEMPT is not supported for > ARM11MPCore" in somewhere(for now, I don't find such place)? > then custom will be alerted with such notice when they trying change > to PREEMPT. > > And again, thanks for your patience with me :) > > >> >>>> To be clear - is your application reading some data from flash and >>>> trying to execute or it's the kernel doing the load via the >>>> page/prefetch abort mechanism? >>> >>> In my current case, it is yaffs root file system. >>> >>>> If the latter, task running on core 0 gets a prefetch abort when >>>> trying to execute some code. The kernel reads the page from flash (via >>>> mtd, block layer, VFS) and copies it into RAM. It can be on any CPU as >>>> long as it calls flush_dcache_page on the same CPU that copied the >>>> data. >>>> >>>> No matter where the task was running or migrated to, if the code doing >>>> the copy also called flush_dcache_page() on the same core, there is no >>>> data left in the D-cache for that page. >>> >>> Yes, I agree with it. >>> But how to flush the data cache on the same core with PREEMPT enabled? >> >> That's not easily possible. But you may get better results with >> VOLUNTARY_PREEMPT. >> >>> And, I think according to the design, there is no such operation that >>> guarantee it. >> >> RFO/WFO tricks only work on ARM11MPCore. >> >>> But any way, I think flush_dcache_page should be global effective. >>> If ARMv6 MPCore didn't make it, we should try to accomplish it. >> >> Do some performance tests first. >> >
On 31 May 2012 15:46, <snakky.zhang@gmail.com> wrote: > On 2012年05月31日 15:21, bill4carson wrote: >> On 2012年05月31日 14:56, Catalin Marinas wrote: >>> On Thu, May 31, 2012 at 06:51:22AM +0100, bill4carson wrote: >>>> On 2012年05月31日 13:19, Catalin Marinas wrote: >>>>> On 31 May 2012 13:06, bill4carson<bill4carson@gmail.com> wrote: >>>>>> On 2012年05月31日 11:58, Catalin Marinas wrote: >>>>>>> I still didn't fully understand what the problem is. So, to make >>>>>>> sure, >>>>>>> if you run some applications from flash using a yaffs filesystem, you >>>>>>> get random crashes. Is this correct? If yes, a solution is to >>>>>>> actually >>>>>>> call flush_dcache_page() on the CPU that does the page copying from >>>>>>> flash into RAM, which could be the yaffs filesystem. >>>>>> >>>>>> >>>>>> The story goes like this: >>>>>> function "flush_dcache_page" should be global effective >>>>>> but in ARMv6 MPCore, it was not, it was just local effective due >>>>>> to hardware design. >>>>> >>>>> >>>>> Yes, I know this. >>>> >>>> >>>> Then, why not fix "flush_dcache_page" to make it globally effective? >>> >>> >>> Performance? >>> >>> And it's also just ARM11MPCore microarchitecture specific. >>> > Yes, seems newer CPUs has no such limitation thus this function is global > effective naturally. :-) > > And , I find Mips's c-r4k also has this issue but it use IPI to make it. > Details in arch/mips/mm/c-r4k.c. Rather than IPI we would better use the read-for-ownership trick like in this patch to make flush_dcache_page global (no need for write-for-ownership): http://dchs.spinics.net/lists/arm-kernel/msg125075.html (it may no longer apply, I haven't checked it for some time). That's the first thing. Secondly you still need preemption disable so that it is not preempted between RFO and the actual cache cleaning.
On 2012年06月01日 00:04, Catalin Marinas wrote: > On 31 May 2012 15:46,<snakky.zhang@gmail.com> wrote: >> On 2012年05月31日 15:21, bill4carson wrote: >>> On 2012年05月31日 14:56, Catalin Marinas wrote: >>>> On Thu, May 31, 2012 at 06:51:22AM +0100, bill4carson wrote: >>>>> On 2012年05月31日 13:19, Catalin Marinas wrote: >>>>>> On 31 May 2012 13:06, bill4carson<bill4carson@gmail.com> wrote: >>>>>>> On 2012年05月31日 11:58, Catalin Marinas wrote: >>>>>>>> I still didn't fully understand what the problem is. So, to make >>>>>>>> sure, >>>>>>>> if you run some applications from flash using a yaffs filesystem, you >>>>>>>> get random crashes. Is this correct? If yes, a solution is to >>>>>>>> actually >>>>>>>> call flush_dcache_page() on the CPU that does the page copying from >>>>>>>> flash into RAM, which could be the yaffs filesystem. >>>>>>> >>>>>>> The story goes like this: >>>>>>> function "flush_dcache_page" should be global effective >>>>>>> but in ARMv6 MPCore, it was not, it was just local effective due >>>>>>> to hardware design. >>>>>> >>>>>> Yes, I know this. >>>>> >>>>> Then, why not fix "flush_dcache_page" to make it globally effective? >>>> >>>> Performance? >>>> >>>> And it's also just ARM11MPCore microarchitecture specific. >>>> >> Yes, seems newer CPUs has no such limitation thus this function is global >> effective naturally. :-) >> >> And , I find Mips's c-r4k also has this issue but it use IPI to make it. >> Details in arch/mips/mm/c-r4k.c. > Rather than IPI we would better use the read-for-ownership trick like > in this patch to make flush_dcache_page global (no need for > write-for-ownership): I think write for ownership is necessary for flush_dcache_xxx: Read guarantee local data cache get newest data, at the same time write guarantee the data can be flushed into memory. See Section 7.1 of ARM11 MPCore. Processor Technical Reference Manual(Revision: r2p0) P146/728: ====== Clean Applies to write-back data caches. If the cache line targeted by the Clean operation contains stored data that has not yet been written out to main memory, it is written to main memory, and the line is marked as clean. ====== So I am afraid without the write action, the "clean & invalidate" action later will not write data back to main memory. Another question here: Why the flush_kern_dcache_xxx in arch/arm/mm/cache-v6 use "clean & invalidate" progress instead of "clean"? Seems clean is enough here. Please fix me if I mis-understand something. > > http://dchs.spinics.net/lists/arm-kernel/msg125075.html > > (it may no longer apply, I haven't checked it for some time). > > That's the first thing. Secondly you still need preemption disable so > that it is not preempted between RFO and the actual cache cleaning. > PREEMPT. :-) Get it. But currently, I can't find anything related to ARMv6 MPCore conflict with PREEMPT. So if it is also necessary to add something in Documentation and related Kconfig to describe it and make sure PREEMPT can't been enabled on such CPUs? Thanks Xiao
>> Yes, seems newer CPUs has no such limitation thus this function is global >> effective naturally. :-) >> >> And , I find Mips's c-r4k also has this issue but it use IPI to make it. >> Details in arch/mips/mm/c-r4k.c. > Rather than IPI we would better use the read-for-ownership trick like > in this patch to make flush_dcache_page global (no need for > write-for-ownership): > > http://dchs.spinics.net/lists/arm-kernel/msg125075.html > > (it may no longer apply, I haven't checked it for some time). > > That's the first thing. Secondly you still need preemption disable so > that it is not preempted between RFO and the actual cache cleaning. > And, another confusion for PREEMPT: Even if we disable preempt, with locally effective flush_dcache_xxx, there is still possibility to reproduce such issue(Similar with the previous case): 1) Task running on Core-0 loading text section into memory. It was schedule out and then migrate into Core-1; 2) On Core-1, this task continue loading it and then "flush_dcache_page" to make sure the loaded text section write into main memory. 3) Task tend to the loaded text section and running it. Similar as the previous case, the difference lies in step1 that the task was interrupted by timer interrupt. Thus it still can be switch out and then been migrate to another core. Thus in step2 and 3, this issue may still been reproduced. So, disable preempt can only lower the possibility of this issue but can't avoid it. If I mis-understand something, please correct me. Thanks Xiao
On Fri, Jun 01, 2012 at 02:11:59AM +0100, snakky.zhang@gmail.com wrote: > On 2012年06月01日 00:04, Catalin Marinas wrote: > > Rather than IPI we would better use the read-for-ownership trick like > > in this patch to make flush_dcache_page global (no need for > > write-for-ownership): > > I think write for ownership is necessary for flush_dcache_xxx: Read > guarantee local data cache get newest data, at the same time write > guarantee the data can be flushed into memory. > > See Section 7.1 of ARM11 MPCore. Processor Technical Reference > Manual(Revision: r2p0) P146/728: > ====== > Clean Applies to write-back data caches. If the cache line targeted by > the Clean operation contains stored data that has not yet been written > out to main memory, it is written to main memory, and the line is > marked as clean. > ====== > > So I am afraid without the write action, the "clean & invalidate" > action later will not write data back to main memory. If there is a dirty cache line, it will be written to memory by the clean&invalidate operation. If the data in the cache line is in a clean state, it means that it is identical to the main memory (or L2 if present). With just a read, clean&invalidate would not invalidate (remove) the cache lines from the other CPUs. Doing a write forces the cache line to only be present on the current CPU (though automatically invalidating it on the other CPUs). > Another question here: Why the flush_kern_dcache_xxx in > arch/arm/mm/cache-v6 use "clean & invalidate" progress instead of > "clean"? Seems clean is enough here. I think in the context of VIPT caches clean would be enough. > > http://dchs.spinics.net/lists/arm-kernel/msg125075.html > > > > (it may no longer apply, I haven't checked it for some time). > > > > That's the first thing. Secondly you still need preemption disable so > > that it is not preempted between RFO and the actual cache cleaning. > > PREEMPT. :-) > > Get it. But currently, I can't find anything related to ARMv6 MPCore > conflict with PREEMPT. So if it is also necessary to add something in > Documentation and related Kconfig to describe it and make sure PREEMPT > can't been enabled on such CPUs? Well, we either get it to work or, if not possible, we add a comment. Let's try the former option first :)
On Fri, Jun 01, 2012 at 02:34:12AM +0100, snakky.zhang@gmail.com wrote: > >> Yes, seems newer CPUs has no such limitation thus this function is global > >> effective naturally. :-) > >> > >> And , I find Mips's c-r4k also has this issue but it use IPI to make it. > >> Details in arch/mips/mm/c-r4k.c. > > Rather than IPI we would better use the read-for-ownership trick like > > in this patch to make flush_dcache_page global (no need for > > write-for-ownership): > > > > http://dchs.spinics.net/lists/arm-kernel/msg125075.html > > > > (it may no longer apply, I haven't checked it for some time). > > > > That's the first thing. Secondly you still need preemption disable so > > that it is not preempted between RFO and the actual cache cleaning. > > > And, another confusion for PREEMPT: Even if we disable preempt, with locally > effective flush_dcache_xxx, there is still possibility to reproduce such > issue(Similar with the previous case): > > 1) Task running on Core-0 loading text section into memory. > It was schedule out and then migrate into Core-1; > 2) On Core-1, this task continue loading it and then > "flush_dcache_page" to make sure the loaded text section write > into main memory. > 3) Task tend to the loaded text section and running it. > > Similar as the previous case, the difference lies in step1 that the > task was interrupted by timer interrupt. Thus it still can be switch > out and then been migrate to another core. Thus in step2 and 3, this > issue may still been reproduced. So, disable preempt can only lower > the possibility of this issue but can't avoid it. It would work as long as the data copying into the text area (done by the driver and VFS layer) and the flush_dcache_page() sequence are not preemptible. A timer interrupt between data copying and flush_dcache_page() would interrupt a kernel routine which is not preemptible.
On 2012年06月01日 11:25, Catalin Marinas wrote: > On Fri, Jun 01, 2012 at 02:11:59AM +0100, snakky.zhang@gmail.com wrote: >> On 2012年06月01日 00:04, Catalin Marinas wrote: >>> Rather than IPI we would better use the read-for-ownership trick like >>> in this patch to make flush_dcache_page global (no need for >>> write-for-ownership): >> I think write for ownership is necessary for flush_dcache_xxx: Read >> guarantee local data cache get newest data, at the same time write >> guarantee the data can be flushed into memory. >> >> See Section 7.1 of ARM11 MPCore. Processor Technical Reference >> Manual(Revision: r2p0) P146/728: >> ====== >> Clean Applies to write-back data caches. If the cache line targeted by >> the Clean operation contains stored data that has not yet been written >> out to main memory, it is written to main memory, and the line is >> marked as clean. >> ====== >> >> So I am afraid without the write action, the "clean& invalidate" >> action later will not write data back to main memory. > If there is a dirty cache line, it will be written to memory by the > clean&invalidate operation. If the data in the cache line is in a clean > state, it means that it is identical to the main memory (or L2 if > present). > > With just a read, clean&invalidate would not invalidate (remove) the > cache lines from the other CPUs. Doing a write forces the cache line to > only be present on the current CPU (though automatically invalidating it > on the other CPUs). > >> Another question here: Why the flush_kern_dcache_xxx in >> arch/arm/mm/cache-v6 use "clean& invalidate" progress instead of >> "clean"? Seems clean is enough here. > I think in the context of VIPT caches clean would be enough. > >>> http://dchs.spinics.net/lists/arm-kernel/msg125075.html >>> >>> (it may no longer apply, I haven't checked it for some time). >>> >>> That's the first thing. Secondly you still need preemption disable so >>> that it is not preempted between RFO and the actual cache cleaning. >> PREEMPT. :-) >> >> Get it. But currently, I can't find anything related to ARMv6 MPCore >> conflict with PREEMPT. So if it is also necessary to add something in >> Documentation and related Kconfig to describe it and make sure PREEMPT >> can't been enabled on such CPUs? > Well, we either get it to work or, if not possible, we add a comment. > Let's try the former option first :) > Thanks for your patient explanation. :-) Thanks a lot! Xiao
On Fri, Jun 01, 2012 at 11:29:13AM +0800, Catalin Marinas wrote: > On Fri, Jun 01, 2012 at 02:34:12AM +0100, snakky.zhang@gmail.com wrote: > > >> Yes, seems newer CPUs has no such limitation thus this function is global > > >> effective naturally. :-) > > >> > > >> And , I find Mips's c-r4k also has this issue but it use IPI to make it. > > >> Details in arch/mips/mm/c-r4k.c. > > > Rather than IPI we would better use the read-for-ownership trick like > > > in this patch to make flush_dcache_page global (no need for > > > write-for-ownership): > > > > > > http://dchs.spinics.net/lists/arm-kernel/msg125075.html > > > > > > (it may no longer apply, I haven't checked it for some time). > > > > > > That's the first thing. Secondly you still need preemption disable so > > > that it is not preempted between RFO and the actual cache cleaning. > > > > > And, another confusion for PREEMPT: Even if we disable preempt, with locally > > effective flush_dcache_xxx, there is still possibility to reproduce such > > issue(Similar with the previous case): > > > > 1) Task running on Core-0 loading text section into memory. > > It was schedule out and then migrate into Core-1; > > 2) On Core-1, this task continue loading it and then > > "flush_dcache_page" to make sure the loaded text section write > > into main memory. > > 3) Task tend to the loaded text section and running it. > > > > Similar as the previous case, the difference lies in step1 that the > > task was interrupted by timer interrupt. Thus it still can be switch > > out and then been migrate to another core. Thus in step2 and 3, this > > issue may still been reproduced. So, disable preempt can only lower > > the possibility of this issue but can't avoid it. > > It would work as long as the data copying into the text area (done by > the driver and VFS layer) and the flush_dcache_page() sequence are not > preemptible. A timer interrupt between data copying and > flush_dcache_page() would interrupt a kernel routine which is not > preemptible. And that doesn't matter because on a non-preemptible kernel, timer ticks do _not_ cause the threads to be rescheduled while in kernel mode. If they did, it would be a _preemptible_ kernel. The only things on a non-preemptible kernel which cause a schedule point are functions which may sleep, so semaphores, waiting for events, etc.
On 2012年06月03日 19:34, Russell King - ARM Linux wrote: > On Fri, Jun 01, 2012 at 11:29:13AM +0800, Catalin Marinas wrote: >> On Fri, Jun 01, 2012 at 02:34:12AM +0100, snakky.zhang@gmail.com wrote: >>>>> Yes, seems newer CPUs has no such limitation thus this function is global >>>>> effective naturally. :-) >>>>> >>>>> And , I find Mips's c-r4k also has this issue but it use IPI to make it. >>>>> Details in arch/mips/mm/c-r4k.c. >>>> Rather than IPI we would better use the read-for-ownership trick like >>>> in this patch to make flush_dcache_page global (no need for >>>> write-for-ownership): >>>> >>>> http://dchs.spinics.net/lists/arm-kernel/msg125075.html >>>> >>>> (it may no longer apply, I haven't checked it for some time). >>>> >>>> That's the first thing. Secondly you still need preemption disable so >>>> that it is not preempted between RFO and the actual cache cleaning. >>>> >>> And, another confusion for PREEMPT: Even if we disable preempt, with locally >>> effective flush_dcache_xxx, there is still possibility to reproduce such >>> issue(Similar with the previous case): >>> >>> 1) Task running on Core-0 loading text section into memory. >>> It was schedule out and then migrate into Core-1; >>> 2) On Core-1, this task continue loading it and then >>> "flush_dcache_page" to make sure the loaded text section write >>> into main memory. >>> 3) Task tend to the loaded text section and running it. >>> >>> Similar as the previous case, the difference lies in step1 that the >>> task was interrupted by timer interrupt. Thus it still can be switch >>> out and then been migrate to another core. Thus in step2 and 3, this >>> issue may still been reproduced. So, disable preempt can only lower >>> the possibility of this issue but can't avoid it. >> It would work as long as the data copying into the text area (done by >> the driver and VFS layer) and the flush_dcache_page() sequence are not >> preemptible. A timer interrupt between data copying and >> flush_dcache_page() would interrupt a kernel routine which is not >> preemptible. > And that doesn't matter because on a non-preemptible kernel, timer ticks > do _not_ cause the threads to be rescheduled while in kernel mode. If > they did, it would be a _preemptible_ kernel. > > The only things on a non-preemptible kernel which cause a schedule point > are functions which may sleep, so semaphores, waiting for events, etc. > > Thanks for your description. :-) Thanks Xiao
--- a/arch/arm/mm/cache-v6.S +++ b/arch/arm/mm/cache-v6.S @@ -170,6 +170,10 @@ ENDPROC(v6_coherent_kern_range) ENTRY(v6_flush_kern_dcache_area) add r1, r0, r1 1: +#ifdef CONFIG_SMP + ldr r2, [r0] @ read for ownership + str r2, [r0] @ write for ownership +#endif /* CONFIG_SMP */ #ifdef HARVARD_CACHE mcr p15, 0, r0, c7, c14, 1 @ clean & invalidate D line #else