Patchwork [RFC] powerpc/mm: honor O_SYNC flag for memory map

login
register
mail settings
Submitter Yang Li
Date Nov. 17, 2009, 7:10 a.m.
Message ID <1258441832-20133-1-git-send-email-leoli@freescale.com>
Download mbox | patch
Permalink /patch/38581/
State Superseded
Headers show

Comments

Yang Li - Nov. 17, 2009, 7:10 a.m.
Rather than the original intelligent way, we grant user more freedom.
This enables user to map cacheable memory not managed by Linux.

Signed-off-by: Li Yang <leoli@freescale.com>
---
The only direct users of this function is fb_mmap() and /dev/mem mmap.
Although I'm not sure if anything is depending on the intelligent setting of
cacheability.

 arch/powerpc/mm/mem.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)
Kumar Gala - Nov. 19, 2009, 1:55 p.m.
On Nov 17, 2009, at 1:10 AM, Li Yang wrote:

> Rather than the original intelligent way, we grant user more freedom.
> This enables user to map cacheable memory not managed by Linux.
>
> Signed-off-by: Li Yang <leoli@freescale.com>
> ---
> The only direct users of this function is fb_mmap() and /dev/mem mmap.
> Although I'm not sure if anything is depending on the intelligent  
> setting of
> cacheability.

is there some reason to change this?

- k

>
> arch/powerpc/mm/mem.c |    2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index 579382c..0fd267e 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -101,7 +101,7 @@ pgprot_t phys_mem_access_prot(struct file *file,  
> unsigned long pfn,
> 	if (ppc_md.phys_mem_access_prot)
> 		return ppc_md.phys_mem_access_prot(file, pfn, size, vma_prot);
>
> -	if (!page_is_ram(pfn))
> +	if (file->f_flags & O_SYNC)
> 		vma_prot = pgprot_noncached(vma_prot);
>
> 	return vma_prot;
> -- 
> 1.6.4
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
Li Yang-R58472 - Nov. 20, 2009, 3 a.m.
>-----Original Message-----
>From: Kumar Gala [mailto:galak@kernel.crashing.org] 
>
>On Nov 17, 2009, at 1:10 AM, Li Yang wrote:
>
>> Rather than the original intelligent way, we grant user more freedom.
>> This enables user to map cacheable memory not managed by Linux.
>>
>> Signed-off-by: Li Yang <leoli@freescale.com>
>> ---
>> The only direct users of this function is fb_mmap() and 
>/dev/mem mmap.
>> Although I'm not sure if anything is depending on the intelligent 
>> setting of cacheability.
>
>is there some reason to change this?

Because there is no way to set mapped memory as cacheable if the memory
is not managed by Linux kernel.  While, it's not rare in real system to
allocate some dedicated memory to a certain application which is not
managed by kernel and then mmap'ed the memory to the application.  The
memory should be cacheable but we can't map it to be cacheable due to
this intelligent setting.  And it is a big hit to the performance.
Moreover, the standard O_SYNC flag suggest that user has the control
over cacheablity, but actually we had not.

- Leo

>
>- k
>
>>
>> arch/powerpc/mm/mem.c |    2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index 
>> 579382c..0fd267e 100644
>> --- a/arch/powerpc/mm/mem.c
>> +++ b/arch/powerpc/mm/mem.c
>> @@ -101,7 +101,7 @@ pgprot_t phys_mem_access_prot(struct file *file, 
>> unsigned long pfn,
>> 	if (ppc_md.phys_mem_access_prot)
>> 		return ppc_md.phys_mem_access_prot(file, pfn, 
>size, vma_prot);
>>
>> -	if (!page_is_ram(pfn))
>> +	if (file->f_flags & O_SYNC)
>> 		vma_prot = pgprot_noncached(vma_prot);
>>
>> 	return vma_prot;
>> --
>> 1.6.4
>>
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
>
>
Benjamin Herrenschmidt - Nov. 20, 2009, 9:03 a.m.
On Fri, 2009-11-20 at 11:00 +0800, Li Yang-R58472 wrote:
> Because there is no way to set mapped memory as cacheable if the
> memory
> is not managed by Linux kernel.  While, it's not rare in real system
> to
> allocate some dedicated memory to a certain application which is not
> managed by kernel and then mmap'ed the memory to the application.  The
> memory should be cacheable but we can't map it to be cacheable due to
> this intelligent setting.  And it is a big hit to the performance.
> Moreover, the standard O_SYNC flag suggest that user has the control
> over cacheablity, but actually we had not.

You need to be a bit more careful tho. You must not allow RAM managed by
the kernel to be mapped non-cachable.

Cheers,
Ben.
Yang Li - Nov. 20, 2009, 9:23 a.m.
On Fri, Nov 20, 2009 at 5:03 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Fri, 2009-11-20 at 11:00 +0800, Li Yang-R58472 wrote:
>> Because there is no way to set mapped memory as cacheable if the
>> memory
>> is not managed by Linux kernel.  While, it's not rare in real system
>> to
>> allocate some dedicated memory to a certain application which is not
>> managed by kernel and then mmap'ed the memory to the application.  The
>> memory should be cacheable but we can't map it to be cacheable due to
>> this intelligent setting.  And it is a big hit to the performance.
>> Moreover, the standard O_SYNC flag suggest that user has the control
>> over cacheablity, but actually we had not.
>
> You need to be a bit more careful tho. You must not allow RAM managed by
> the kernel to be mapped non-cachable.

Even if the user explicitly sets the O_SYNC flag?  IMHO, it's a bug of
the application if it uses O_SYNC on main memory to be mmap'ed later.
And we don't need to cover up the bug.

- Leo
Segher Boessenkool - Nov. 21, 2009, 8:01 p.m.
>> You need to be a bit more careful tho. You must not allow RAM  
>> managed by
>> the kernel to be mapped non-cachable.
>
> Even if the user explicitly sets the O_SYNC flag?  IMHO, it's a bug of
> the application if it uses O_SYNC on main memory to be mmap'ed later.
> And we don't need to cover up the bug.

Is that "embedded thinking"?  Conflicts like this cause machine  
checks or
checkstops on many PowerPC implementations, we do not normally allow  
such
to be caused by userland.


Segher
Yang Li - Nov. 25, 2009, 8:07 a.m.
On Sun, Nov 22, 2009 at 4:01 AM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>>> You need to be a bit more careful tho. You must not allow RAM managed by
>>> the kernel to be mapped non-cachable.
>>
>> Even if the user explicitly sets the O_SYNC flag?  IMHO, it's a bug of
>> the application if it uses O_SYNC on main memory to be mmap'ed later.
>> And we don't need to cover up the bug.
>
> Is that "embedded thinking"?  Conflicts like this cause machine checks or
> checkstops on many PowerPC implementations, we do not normally allow such
> to be caused by userland.

So what you are saying is that if the kernel has mapped a physical
page as cacheable while user application is trying to map it as
non-cacheable, there will be machine checks and checkstops rather than
just performance drop?  This is new to me.  Could you elaborate a bit?
 Thanks.

- Leo
Gabriel Paubert - Nov. 25, 2009, 11:30 a.m.
On Wed, Nov 25, 2009 at 04:07:46PM +0800, Li Yang wrote:
> On Sun, Nov 22, 2009 at 4:01 AM, Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> >>> You need to be a bit more careful tho. You must not allow RAM managed by
> >>> the kernel to be mapped non-cachable.
> >>
> >> Even if the user explicitly sets the O_SYNC flag?  IMHO, it's a bug of
> >> the application if it uses O_SYNC on main memory to be mmap'ed later.
> >> And we don't need to cover up the bug.
> >
> > Is that "embedded thinking"?  Conflicts like this cause machine checks or
> > checkstops on many PowerPC implementations, we do not normally allow such
> > to be caused by userland.
> 
> So what you are saying is that if the kernel has mapped a physical
> page as cacheable while user application is trying to map it as
> non-cacheable, there will be machine checks and checkstops rather than
> just performance drop?  This is new to me.  Could you elaborate a bit?

That's called cache paradoxes. And yes they may be a problem. 

Besides that, existing  application may have used mmap without O_SYNC on
I/O devices, knowing that the kernel would map them uncached. Your
patch would break them by using cached accesses (and it can cause
really hard to debug lockups, I've seen this, probably caused by 
infinite retries on the PCI bus).

	Gabriel
Yang Li - Nov. 26, 2009, 7:43 a.m.
On Wed, Nov 25, 2009 at 7:30 PM, Gabriel Paubert <paubert@iram.es> wrote:
> On Wed, Nov 25, 2009 at 04:07:46PM +0800, Li Yang wrote:
>> On Sun, Nov 22, 2009 at 4:01 AM, Segher Boessenkool
>> <segher@kernel.crashing.org> wrote:
>> >>> You need to be a bit more careful tho. You must not allow RAM managed by
>> >>> the kernel to be mapped non-cachable.
>> >>
>> >> Even if the user explicitly sets the O_SYNC flag?  IMHO, it's a bug of
>> >> the application if it uses O_SYNC on main memory to be mmap'ed later.
>> >> And we don't need to cover up the bug.
>> >
>> > Is that "embedded thinking"?  Conflicts like this cause machine checks or
>> > checkstops on many PowerPC implementations, we do not normally allow such
>> > to be caused by userland.
>>
>> So what you are saying is that if the kernel has mapped a physical
>> page as cacheable while user application is trying to map it as
>> non-cacheable, there will be machine checks and checkstops rather than
>> just performance drop?  This is new to me.  Could you elaborate a bit?
>
> That's called cache paradoxes. And yes they may be a problem.

Thanks.  I will prevent this from happening.

>
> Besides that, existing  application may have used mmap without O_SYNC on
> I/O devices, knowing that the kernel would map them uncached. Your
> patch would break them by using cached accesses (and it can cause
> really hard to debug lockups, I've seen this, probably caused by
> infinite retries on the PCI bus).

That's my concern too.  But after all mmap without O_SYNC on I/O
devices should be deprecated.  A warning message in deprecation period
could reduce potential problem caused by this change.

- Leo
Segher Boessenkool - Nov. 26, 2009, 9:26 p.m.
> So what you are saying is that if the kernel has mapped a physical
> page as cacheable while user application is trying to map it as
> non-cacheable, there will be machine checks and checkstops rather than
> just performance drop?  This is new to me.  Could you elaborate a bit?

If some data is in cache at a certain physical address, and you then
do an uncached read from that address, on at least some CPUs both the
bus interface and the cache will reply.  Bang, machine check.

Writes are problematic as well.


Segher
Paul Mackerras - Nov. 27, 2009, 2:30 a.m.
Li Yang writes:

> That's my concern too.  But after all mmap without O_SYNC on I/O
> devices should be deprecated.

It should?  Why?

Shouldn't the onus rather be on those proposing an incompatible change
to the kernel ABI, such as this is, to show why the change is
absolutely essential?

>  A warning message in deprecation period
> could reduce potential problem caused by this change.

Not making the change at all would reduce the potential problems even
further. :)

Paul.
Yang Li - Nov. 30, 2009, 10 a.m.
On Fri, Nov 27, 2009 at 10:30 AM, Paul Mackerras <paulus@samba.org> wrote:
> Li Yang writes:
>
>> That's my concern too.  But after all mmap without O_SYNC on I/O
>> devices should be deprecated.
>
> It should?  Why?
>
> Shouldn't the onus rather be on those proposing an incompatible change
> to the kernel ABI, such as this is, to show why the change is
> absolutely essential?

In addition to the performance issue I stated earlier in this thread.
There was also cache paradoxes problem too.  If a memory region is
shared between two cores/OS'es and Linux can't map the region with the
same cacheable property as the other OS had mapped, there will be a
problem.  The best solution for this problem is to make the
cacheablity controllable.

And it seems to be generic issue which does not only exist on powerpc
platforms.  So it might be better dealt with the already existed
O_SYNC flag which was meant to deal with this kind of thing.  I agree
with your comment in another email that device tree could be used, but
that solution is not generic enough though.

- Leo

Patch

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 579382c..0fd267e 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -101,7 +101,7 @@  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 	if (ppc_md.phys_mem_access_prot)
 		return ppc_md.phys_mem_access_prot(file, pfn, size, vma_prot);
 
-	if (!page_is_ram(pfn))
+	if (file->f_flags & O_SYNC)
 		vma_prot = pgprot_noncached(vma_prot);
 
 	return vma_prot;