[RFC] powerpc/mm: honor O_SYNC flag for memory map

Submitted by Yang Li on Nov. 17, 2009, 7:10 a.m.

Details

Message ID 1258441832-20133-1-git-send-email-leoli@freescale.com
State Superseded
Headers show

Commit Message

Yang Li Nov. 17, 2009, 7:10 a.m.
Rather than the original intelligent way, we grant user more freedom.
This enables user to map cacheable memory not managed by Linux.

Signed-off-by: Li Yang <leoli@freescale.com>
---
The only direct users of this function is fb_mmap() and /dev/mem mmap.
Although I'm not sure if anything is depending on the intelligent setting of
cacheability.

 arch/powerpc/mm/mem.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Comments

Kumar Gala Nov. 19, 2009, 1:55 p.m.
On Nov 17, 2009, at 1:10 AM, Li Yang wrote:

> Rather than the original intelligent way, we grant user more freedom.
> This enables user to map cacheable memory not managed by Linux.
>
> Signed-off-by: Li Yang <leoli@freescale.com>
> ---
> The only direct users of this function is fb_mmap() and /dev/mem mmap.
> Although I'm not sure if anything is depending on the intelligent  
> setting of
> cacheability.

is there some reason to change this?

- k

>
> arch/powerpc/mm/mem.c |    2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index 579382c..0fd267e 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -101,7 +101,7 @@ pgprot_t phys_mem_access_prot(struct file *file,  
> unsigned long pfn,
> 	if (ppc_md.phys_mem_access_prot)
> 		return ppc_md.phys_mem_access_prot(file, pfn, size, vma_prot);
>
> -	if (!page_is_ram(pfn))
> +	if (file->f_flags & O_SYNC)
> 		vma_prot = pgprot_noncached(vma_prot);
>
> 	return vma_prot;
> -- 
> 1.6.4
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
Li Yang-R58472 Nov. 20, 2009, 3 a.m.
>-----Original Message-----
>From: Kumar Gala [mailto:galak@kernel.crashing.org] 
>
>On Nov 17, 2009, at 1:10 AM, Li Yang wrote:
>
>> Rather than the original intelligent way, we grant user more freedom.
>> This enables user to map cacheable memory not managed by Linux.
>>
>> Signed-off-by: Li Yang <leoli@freescale.com>
>> ---
>> The only direct users of this function is fb_mmap() and 
>/dev/mem mmap.
>> Although I'm not sure if anything is depending on the intelligent 
>> setting of cacheability.
>
>is there some reason to change this?

Because there is no way to set mapped memory as cacheable if the memory
is not managed by Linux kernel.  While, it's not rare in real system to
allocate some dedicated memory to a certain application which is not
managed by kernel and then mmap'ed the memory to the application.  The
memory should be cacheable but we can't map it to be cacheable due to
this intelligent setting.  And it is a big hit to the performance.
Moreover, the standard O_SYNC flag suggest that user has the control
over cacheablity, but actually we had not.

- Leo

>
>- k
>
>>
>> arch/powerpc/mm/mem.c |    2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index 
>> 579382c..0fd267e 100644
>> --- a/arch/powerpc/mm/mem.c
>> +++ b/arch/powerpc/mm/mem.c
>> @@ -101,7 +101,7 @@ pgprot_t phys_mem_access_prot(struct file *file, 
>> unsigned long pfn,
>> 	if (ppc_md.phys_mem_access_prot)
>> 		return ppc_md.phys_mem_access_prot(file, pfn, 
>size, vma_prot);
>>
>> -	if (!page_is_ram(pfn))
>> +	if (file->f_flags & O_SYNC)
>> 		vma_prot = pgprot_noncached(vma_prot);
>>
>> 	return vma_prot;
>> --
>> 1.6.4
>>
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
>
>
Benjamin Herrenschmidt Nov. 20, 2009, 9:03 a.m.
On Fri, 2009-11-20 at 11:00 +0800, Li Yang-R58472 wrote:
> Because there is no way to set mapped memory as cacheable if the
> memory
> is not managed by Linux kernel.  While, it's not rare in real system
> to
> allocate some dedicated memory to a certain application which is not
> managed by kernel and then mmap'ed the memory to the application.  The
> memory should be cacheable but we can't map it to be cacheable due to
> this intelligent setting.  And it is a big hit to the performance.
> Moreover, the standard O_SYNC flag suggest that user has the control
> over cacheablity, but actually we had not.

You need to be a bit more careful tho. You must not allow RAM managed by
the kernel to be mapped non-cachable.

Cheers,
Ben.
Yang Li Nov. 20, 2009, 9:23 a.m.
On Fri, Nov 20, 2009 at 5:03 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Fri, 2009-11-20 at 11:00 +0800, Li Yang-R58472 wrote:
>> Because there is no way to set mapped memory as cacheable if the
>> memory
>> is not managed by Linux kernel.  While, it's not rare in real system
>> to
>> allocate some dedicated memory to a certain application which is not
>> managed by kernel and then mmap'ed the memory to the application.  The
>> memory should be cacheable but we can't map it to be cacheable due to
>> this intelligent setting.  And it is a big hit to the performance.
>> Moreover, the standard O_SYNC flag suggest that user has the control
>> over cacheablity, but actually we had not.
>
> You need to be a bit more careful tho. You must not allow RAM managed by
> the kernel to be mapped non-cachable.

Even if the user explicitly sets the O_SYNC flag?  IMHO, it's a bug of
the application if it uses O_SYNC on main memory to be mmap'ed later.
And we don't need to cover up the bug.

- Leo
Segher Boessenkool Nov. 21, 2009, 8:01 p.m.
>> You need to be a bit more careful tho. You must not allow RAM  
>> managed by
>> the kernel to be mapped non-cachable.
>
> Even if the user explicitly sets the O_SYNC flag?  IMHO, it's a bug of
> the application if it uses O_SYNC on main memory to be mmap'ed later.
> And we don't need to cover up the bug.

Is that "embedded thinking"?  Conflicts like this cause machine  
checks or
checkstops on many PowerPC implementations, we do not normally allow  
such
to be caused by userland.


Segher
Yang Li Nov. 25, 2009, 8:07 a.m.
On Sun, Nov 22, 2009 at 4:01 AM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>>> You need to be a bit more careful tho. You must not allow RAM managed by
>>> the kernel to be mapped non-cachable.
>>
>> Even if the user explicitly sets the O_SYNC flag?  IMHO, it's a bug of
>> the application if it uses O_SYNC on main memory to be mmap'ed later.
>> And we don't need to cover up the bug.
>
> Is that "embedded thinking"?  Conflicts like this cause machine checks or
> checkstops on many PowerPC implementations, we do not normally allow such
> to be caused by userland.

So what you are saying is that if the kernel has mapped a physical
page as cacheable while user application is trying to map it as
non-cacheable, there will be machine checks and checkstops rather than
just performance drop?  This is new to me.  Could you elaborate a bit?
 Thanks.

- Leo
Gabriel Paubert Nov. 25, 2009, 11:30 a.m.
On Wed, Nov 25, 2009 at 04:07:46PM +0800, Li Yang wrote:
> On Sun, Nov 22, 2009 at 4:01 AM, Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> >>> You need to be a bit more careful tho. You must not allow RAM managed by
> >>> the kernel to be mapped non-cachable.
> >>
> >> Even if the user explicitly sets the O_SYNC flag?  IMHO, it's a bug of
> >> the application if it uses O_SYNC on main memory to be mmap'ed later.
> >> And we don't need to cover up the bug.
> >
> > Is that "embedded thinking"?  Conflicts like this cause machine checks or
> > checkstops on many PowerPC implementations, we do not normally allow such
> > to be caused by userland.
> 
> So what you are saying is that if the kernel has mapped a physical
> page as cacheable while user application is trying to map it as
> non-cacheable, there will be machine checks and checkstops rather than
> just performance drop?  This is new to me.  Could you elaborate a bit?

That's called cache paradoxes. And yes they may be a problem. 

Besides that, existing  application may have used mmap without O_SYNC on
I/O devices, knowing that the kernel would map them uncached. Your
patch would break them by using cached accesses (and it can cause
really hard to debug lockups, I've seen this, probably caused by 
infinite retries on the PCI bus).

	Gabriel
Yang Li Nov. 26, 2009, 7:43 a.m.
On Wed, Nov 25, 2009 at 7:30 PM, Gabriel Paubert <paubert@iram.es> wrote:
> On Wed, Nov 25, 2009 at 04:07:46PM +0800, Li Yang wrote:
>> On Sun, Nov 22, 2009 at 4:01 AM, Segher Boessenkool
>> <segher@kernel.crashing.org> wrote:
>> >>> You need to be a bit more careful tho. You must not allow RAM managed by
>> >>> the kernel to be mapped non-cachable.
>> >>
>> >> Even if the user explicitly sets the O_SYNC flag?  IMHO, it's a bug of
>> >> the application if it uses O_SYNC on main memory to be mmap'ed later.
>> >> And we don't need to cover up the bug.
>> >
>> > Is that "embedded thinking"?  Conflicts like this cause machine checks or
>> > checkstops on many PowerPC implementations, we do not normally allow such
>> > to be caused by userland.
>>
>> So what you are saying is that if the kernel has mapped a physical
>> page as cacheable while user application is trying to map it as
>> non-cacheable, there will be machine checks and checkstops rather than
>> just performance drop?  This is new to me.  Could you elaborate a bit?
>
> That's called cache paradoxes. And yes they may be a problem.

Thanks.  I will prevent this from happening.

>
> Besides that, existing  application may have used mmap without O_SYNC on
> I/O devices, knowing that the kernel would map them uncached. Your
> patch would break them by using cached accesses (and it can cause
> really hard to debug lockups, I've seen this, probably caused by
> infinite retries on the PCI bus).

That's my concern too.  But after all mmap without O_SYNC on I/O
devices should be deprecated.  A warning message in deprecation period
could reduce potential problem caused by this change.

- Leo
Segher Boessenkool Nov. 26, 2009, 9:26 p.m.
> So what you are saying is that if the kernel has mapped a physical
> page as cacheable while user application is trying to map it as
> non-cacheable, there will be machine checks and checkstops rather than
> just performance drop?  This is new to me.  Could you elaborate a bit?

If some data is in cache at a certain physical address, and you then
do an uncached read from that address, on at least some CPUs both the
bus interface and the cache will reply.  Bang, machine check.

Writes are problematic as well.


Segher
Paul Mackerras Nov. 27, 2009, 2:30 a.m.
Li Yang writes:

> That's my concern too.  But after all mmap without O_SYNC on I/O
> devices should be deprecated.

It should?  Why?

Shouldn't the onus rather be on those proposing an incompatible change
to the kernel ABI, such as this is, to show why the change is
absolutely essential?

>  A warning message in deprecation period
> could reduce potential problem caused by this change.

Not making the change at all would reduce the potential problems even
further. :)

Paul.
Yang Li Nov. 30, 2009, 10 a.m.
On Fri, Nov 27, 2009 at 10:30 AM, Paul Mackerras <paulus@samba.org> wrote:
> Li Yang writes:
>
>> That's my concern too.  But after all mmap without O_SYNC on I/O
>> devices should be deprecated.
>
> It should?  Why?
>
> Shouldn't the onus rather be on those proposing an incompatible change
> to the kernel ABI, such as this is, to show why the change is
> absolutely essential?

In addition to the performance issue I stated earlier in this thread.
There was also cache paradoxes problem too.  If a memory region is
shared between two cores/OS'es and Linux can't map the region with the
same cacheable property as the other OS had mapped, there will be a
problem.  The best solution for this problem is to make the
cacheablity controllable.

And it seems to be generic issue which does not only exist on powerpc
platforms.  So it might be better dealt with the already existed
O_SYNC flag which was meant to deal with this kind of thing.  I agree
with your comment in another email that device tree could be used, but
that solution is not generic enough though.

- Leo

Patch hide | download patch | download mbox

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 579382c..0fd267e 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -101,7 +101,7 @@  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 	if (ppc_md.phys_mem_access_prot)
 		return ppc_md.phys_mem_access_prot(file, pfn, size, vma_prot);
 
-	if (!page_is_ram(pfn))
+	if (file->f_flags & O_SYNC)
 		vma_prot = pgprot_noncached(vma_prot);
 
 	return vma_prot;