ARC: mm: PAE40: Cast pfn to pte_t in pfn_pte() macro
diff mbox

Message ID 1480306037-15415-1-git-send-email-yuriy.kolerov@synopsys.com
State New
Headers show

Commit Message

Yuriy Kolerov Nov. 28, 2016, 4:07 a.m. UTC
Originally pfn_pte(pfn, prot) macro had this definition:

    __pte(((pfn) << PAGE_SHIFT) | pgprot_val(prot))

The value of pfn (Page Frame Number) is shifted to the left to get the
value of pte (Page Table Entry). Usually a 4-byte value is passed to
this macro as value of pfn. However if Linux is configured with support
of PAE40 then value of pte has 8-byte type because it must contain
additional 8 bits of the physical address. Thus if value of pfn
represents a physical page frame above of 4GB boundary then
shifting of pfn to the left by PAGE_SHIFT wipes most significant
bits of the 40-bit physical address.

As a result all physical addresses above of 4GB boundary in systems
with PAE40 are mapped to virtual address incorrectly. An error may
occur when the kernel tries to unmap such bad pages:

    [ECR   ]: 0x00050100 => Invalid Read @ 0x41414144 by insn @ 0x801644c6
    [EFA   ]: 0x41414144
    [BLINK ]: unmap_page_range+0x134/0x700
    [ERET  ]: unmap_page_range+0x17a/0x700
    [STAT32]: 0x8008021e : IE K
    BTA: 0x801644c6	 SP: 0x901a5e84	 FP: 0x5ff35de8
    LPS: 0x8026462c	LPE: 0x80264630	LPC: 0x00000000
    r00: 0x8fcc4fc0	r01: 0x2fe68000	r02: 0x41414140
    r03: 0x2c05c000	r04: 0x2fe6a000	r05: 0x0009ffff
    r06: 0x901b6898	r07: 0x2fe68000	r08: 0x00000001
    r09: 0x804a807c	r10: 0x0000067e	r11: 0xffffffff
    r12: 0x80164480
    Stack Trace:
      unmap_page_range+0x17a/0x700
      unmap_vmas+0x46/0x64
      do_munmap+0x210/0x450
      SyS_munmap+0x2c/0x50
      EV_Trap+0xfc/0x100

So the value of pfn must be casted to pte_t before shifting to
ensure that 40-bit address will not be truncated:

    __pte(((pte_t) (pfn) << PAGE_SHIFT) | pgprot_val(prot))

Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com>
---
 arch/arc/include/asm/pgtable.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Alexey Brodkin Nov. 28, 2016, 11:43 a.m. UTC | #1
Hi Yuriy,

Really nice catch!
Though a couple of nitpicks below.

On Mon, 2016-11-28 at 07:07 +0300, Yuriy Kolerov wrote:
> Originally pfn_pte(pfn, prot) macro had this definition:

> 

>     __pte(((pfn) << PAGE_SHIFT) | pgprot_val(prot))

> 

> The value of pfn (Page Frame Number) is shifted to the left to get the

> value of pte (Page Table Entry). Usually a 4-byte value is passed to

> this macro as value of pfn. However if Linux is configured with support

> of PAE40 then value of pte has 8-byte type because it must contain

> additional 8 bits of the physical address. Thus if value of pfn

> represents a physical page frame above of 4GB boundary then

> shifting of pfn to the left by PAGE_SHIFT wipes most significant

> bits of the 40-bit physical address.

> 

> As a result all physical addresses above of 4GB boundary in systems

> with PAE40 are mapped to virtual address incorrectly. An error may

> occur when the kernel tries to unmap such bad pages:

> 

>     [ECR   ]: 0x00050100 => Invalid Read @ 0x41414144 by insn @ 0x801644c6

>     [EFA   ]: 0x41414144

>     [BLINK ]: unmap_page_range+0x134/0x700

>     [ERET  ]: unmap_page_range+0x17a/0x700

>     [STAT32]: 0x8008021e : IE K

>     BTA: 0x801644c6	 SP: 0x901a5e84	 FP: 0x5ff35de8

>     LPS: 0x8026462c	LPE: 0x80264630	LPC: 0x00000000

>     r00: 0x8fcc4fc0	r01: 0x2fe68000	r02: 0x41414140

>     r03: 0x2c05c000	r04: 0x2fe6a000	r05: 0x0009ffff

>     r06: 0x901b6898	r07: 0x2fe68000	r08: 0x00000001

>     r09: 0x804a807c	r10: 0x0000067e	r11: 0xffffffff

>     r12: 0x80164480

>     Stack Trace:

>       unmap_page_range+0x17a/0x700

>       unmap_vmas+0x46/0x64

>       do_munmap+0x210/0x450

>       SyS_munmap+0x2c/0x50

>       EV_Trap+0xfc/0x100


This example makes not much sense in its current form.
I'd like to see how mentioned above problem leads to this
failure. I.e. pfn = 0xXXX gave pte = 0xYYY and at truncated to 0xYYY
address there's no data we expected thus reading from 0x41414144
end up in exception etc.

> So the value of pfn must be casted to pte_t before shifting to

> ensure that 40-bit address will not be truncated:

> 

>     __pte(((pte_t) (pfn) << PAGE_SHIFT) | pgprot_val(prot))

> 

> Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com>

> ---

>  arch/arc/include/asm/pgtable.h | 3 ++-

>  1 file changed, 2 insertions(+), 1 deletion(-)

> 

> diff --git a/arch/arc/include/asm/pgtable.h b/arch/arc/include/asm/pgtable.h

> index 89eeb37..77bc51c 100644

> --- a/arch/arc/include/asm/pgtable.h

> +++ b/arch/arc/include/asm/pgtable.h

> @@ -280,7 +280,8 @@ static inline void pmd_set(pmd_t *pmdp, pte_t *ptep)

>  

>  #define pte_page(pte)		pfn_to_page(pte_pfn(pte))

>  #define mk_pte(page, prot)	pfn_pte(page_to_pfn(page), prot)

> -#define pfn_pte(pfn, prot)	__pte(((pfn) << PAGE_SHIFT) | pgprot_val(prot))

> +#define pfn_pte(pfn, prot) \

> +	__pte(((pte_t) (pfn) << PAGE_SHIFT) | pgprot_val(prot))


I think it's better to split it in a bit different manner like:
--------------------------------->8-----------------------------
#define pfn_pte(pfn, prot)	__pte(((pte_t) (pfn) << PAGE_SHIFT) | \
				      pgprot_val(prot))
--------------------------------->8-----------------------------

Also see how this macro is implemented for example on ARM:
http://lxr.free-electrons.com/source/arch/arm/include/asm/pgtable.h#L211
-------------------->8------------------
#define pfn_pte(pfn,prot)       __pte(__pfn_to_phys(pfn) | pgprot_val(prot))
-------------------->8------------------

Where __pfn_to_phys() is:
http://lxr.free-electrons.com/source/include/asm-generic/memory_model.h#L78
-------------------->8------------------
#define __pfn_to_phys(pfn)      PFN_PHYS(pfn)
-------------------->8------------------

PFN_PHYS() is:
http://lxr.free-electrons.com/source/include/linux/pfn.h#L20
-------------------->8------------------
#define PFN_PHYS(x)     ((phys_addr_t)(x) << PAGE_SHIFT)
-------------------->8------------------

And finally phys_addr_t is:
http://lxr.free-electrons.com/source/include/linux/types.h#L161
-------------------->8------------------
#ifdef CONFIG_PHYS_ADDR_T_64BIT
typedef u64 phys_addr_t;
#else
typedef u32 phys_addr_t;
#endif
-------------------->8------------------

Not really sure though which implementation is better.
I like your approach because its simplicity instead of another
couple of layers of definitions but maybe there's a reason for this
kind of complication. Funny enough other arches take their own approaches
ranging from the same you did to this __pfn_to_phys() to casting to "long long".

-Alexey
Vineet Gupta Nov. 28, 2016, 4:39 p.m. UTC | #2
Hi Yuriy,

Indeed good find. My nits and not-quite-nits below :-)

On 11/27/2016 08:07 PM, Yuriy Kolerov wrote:
> Originally pfn_pte(pfn, prot) macro had this definition:
> 
>     __pte(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
> 
> The value of pfn (Page Frame Number) is shifted to the left to get the
> value of pte (Page Table Entry). Usually a 4-byte value is passed to
> this macro as value of pfn. However if Linux is configured with support
> of PAE40 then value of pte has 8-byte type because it must contain
> additional 8 bits of the physical address. Thus if value of pfn
> represents a physical page frame above of 4GB boundary then
> shifting of pfn to the left by PAGE_SHIFT wipes most significant
> bits of the 40-bit physical address.
> 
> As a result all physical addresses above of 4GB boundary in systems
> with PAE40 are mapped to virtual address incorrectly. An error may
> occur when the kernel tries to unmap such bad pages:
> 
>     [ECR   ]: 0x00050100 => Invalid Read @ 0x41414144 by insn @ 0x801644c6
>     [EFA   ]: 0x41414144
>     [BLINK ]: unmap_page_range+0x134/0x700
>     [ERET  ]: unmap_page_range+0x17a/0x700
>     [STAT32]: 0x8008021e : IE K
>     BTA: 0x801644c6	 SP: 0x901a5e84	 FP: 0x5ff35de8
>     LPS: 0x8026462c	LPE: 0x80264630	LPC: 0x00000000
>     r00: 0x8fcc4fc0	r01: 0x2fe68000	r02: 0x41414140
>     r03: 0x2c05c000	r04: 0x2fe6a000	r05: 0x0009ffff
>     r06: 0x901b6898	r07: 0x2fe68000	r08: 0x00000001
>     r09: 0x804a807c	r10: 0x0000067e	r11: 0xffffffff
>     r12: 0x80164480
>     Stack Trace:
>       unmap_page_range+0x17a/0x700
>       unmap_vmas+0x46/0x64
>       do_munmap+0x210/0x450
>       SyS_munmap+0x2c/0x50
>       EV_Trap+0xfc/0x100

This is all too verbose. You need to describe the problem in fewer words

 - Also subject needs to say "Fix kernel crash with PAE40..." and possibly include
   the small test case which triggered it. It needs to describe the "why",
   "how" is already explained in patch. And if that is not sufficient,
   please add comments...

> So the value of pfn must be casted to pte_t before shifting to
> ensure that 40-bit address will not be truncated:
> 
>     __pte(((pte_t) (pfn) << PAGE_SHIFT) | pgprot_val(prot))

Why are you duplicating the patch contents in changelog.

> Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com>
> ---
>  arch/arc/include/asm/pgtable.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arc/include/asm/pgtable.h b/arch/arc/include/asm/pgtable.h
> index 89eeb37..77bc51c 100644
> --- a/arch/arc/include/asm/pgtable.h
> +++ b/arch/arc/include/asm/pgtable.h
> @@ -280,7 +280,8 @@ static inline void pmd_set(pmd_t *pmdp, pte_t *ptep)
>  
>  #define pte_page(pte)		pfn_to_page(pte_pfn(pte))
>  #define mk_pte(page, prot)	pfn_pte(page_to_pfn(page), prot)
> -#define pfn_pte(pfn, prot)	__pte(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
> +#define pfn_pte(pfn, prot) \
> +	__pte(((pte_t) (pfn) << PAGE_SHIFT) | pgprot_val(prot))

The idea is right, and will work - but it breaks something else. Did you do a git
log on this file as this is exactly what we used to have before commit
1c3c90930392 "ARC: mm: fix build breakage with STRICT_MM_TYPECHECKS". Technically
your patch should have been a simple Revert of the above commit.

But as Alexey suggested in his reply, we can use the ARM approach:

+#define pfn_pte(pfn, prot)     __pte(__pfn_to_phys(pfn) | pgprot_val(prot))

-Vineet
Yuriy Kolerov Nov. 29, 2016, 11:37 a.m. UTC | #3
> -----Original Message-----

> From: Alexey Brodkin [mailto:abrodkin@synopsys.com]

> Sent: Monday, November 28, 2016 2:43 PM

> To: yuriy.kolerov@synopsys.com

> Cc: linux-kernel@vger.kernel.org; Alexey.Brodkin@synopsys.com; Vineet

> Gupta <vgupta@synopsys.com>; linux-snps-arc@lists.infradead.org

> Subject: Re: [PATCH] ARC: mm: PAE40: Cast pfn to pte_t in pfn_pte() macro

> 

> Hi Yuriy,

> 

> Really nice catch!

> Though a couple of nitpicks below.

> 

> On Mon, 2016-11-28 at 07:07 +0300, Yuriy Kolerov wrote:

> > Originally pfn_pte(pfn, prot) macro had this definition:

> >

> >     __pte(((pfn) << PAGE_SHIFT) | pgprot_val(prot))

> >

> > The value of pfn (Page Frame Number) is shifted to the left to get the

> > value of pte (Page Table Entry). Usually a 4-byte value is passed to

> > this macro as value of pfn. However if Linux is configured with

> > support of PAE40 then value of pte has 8-byte type because it must

> > contain additional 8 bits of the physical address. Thus if value of

> > pfn represents a physical page frame above of 4GB boundary then

> > shifting of pfn to the left by PAGE_SHIFT wipes most significant bits

> > of the 40-bit physical address.

> >

> > As a result all physical addresses above of 4GB boundary in systems

> > with PAE40 are mapped to virtual address incorrectly. An error may

> > occur when the kernel tries to unmap such bad pages:

> >

> >     [ECR   ]: 0x00050100 => Invalid Read @ 0x41414144 by insn @

> > 0x801644c6

> >     [EFA   ]: 0x41414144

> >     [BLINK ]: unmap_page_range+0x134/0x700

> >     [ERET  ]: unmap_page_range+0x17a/0x700

> >     [STAT32]: 0x8008021e : IE K

> >     BTA: 0x801644c6	 SP: 0x901a5e84	 FP: 0x5ff35de8

> >     LPS: 0x8026462c	LPE: 0x80264630	LPC: 0x00000000

> >     r00: 0x8fcc4fc0	r01: 0x2fe68000	r02: 0x41414140

> >     r03: 0x2c05c000	r04: 0x2fe6a000	r05: 0x0009ffff

> >     r06: 0x901b6898	r07: 0x2fe68000	r08: 0x00000001

> >     r09: 0x804a807c	r10: 0x0000067e	r11: 0xffffffff

> >     r12: 0x80164480

> >     Stack Trace:

> >       unmap_page_range+0x17a/0x700

> >       unmap_vmas+0x46/0x64

> >       do_munmap+0x210/0x450

> >       SyS_munmap+0x2c/0x50

> >       EV_Trap+0xfc/0x100

> 

> This example makes not much sense in its current form.

> I'd like to see how mentioned above problem leads to this failure. I.e. pfn =

> 0xXXX gave pte = 0xYYY and at truncated to 0xYYY address there's no data we

> expected thus reading from 0x41414144 end up in exception etc.

> 

> > So the value of pfn must be casted to pte_t before shifting to ensure

> > that 40-bit address will not be truncated:

> >

> >     __pte(((pte_t) (pfn) << PAGE_SHIFT) | pgprot_val(prot))

> >

> > Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com>

> > ---

> >  arch/arc/include/asm/pgtable.h | 3 ++-

> >  1 file changed, 2 insertions(+), 1 deletion(-)

> >

> > diff --git a/arch/arc/include/asm/pgtable.h

> > b/arch/arc/include/asm/pgtable.h index 89eeb37..77bc51c 100644

> > --- a/arch/arc/include/asm/pgtable.h

> > +++ b/arch/arc/include/asm/pgtable.h

> > @@ -280,7 +280,8 @@ static inline void pmd_set(pmd_t *pmdp, pte_t

> > *ptep)

> >

> >  #define pte_page(pte)		pfn_to_page(pte_pfn(pte))

> >  #define mk_pte(page, prot)	pfn_pte(page_to_pfn(page), prot)

> > -#define pfn_pte(pfn, prot)	__pte(((pfn) << PAGE_SHIFT) |

> pgprot_val(prot))

> > +#define pfn_pte(pfn, prot) \

> > +	__pte(((pte_t) (pfn) << PAGE_SHIFT) | pgprot_val(prot))

> 

> I think it's better to split it in a bit different manner like:

> --------------------------------->8-----------------------------

> #define pfn_pte(pfn, prot)	__pte(((pte_t) (pfn) << PAGE_SHIFT) | \

> 				      pgprot_val(prot))

> --------------------------------->8-----------------------------

> 

> Also see how this macro is implemented for example on ARM:

> http://lxr.free-electrons.com/source/arch/arm/include/asm/pgtable.h#L211

> -------------------->8------------------

> #define pfn_pte(pfn,prot)       __pte(__pfn_to_phys(pfn) |

> pgprot_val(prot))

> -------------------->8------------------

> 

> Where __pfn_to_phys() is:

> http://lxr.free-electrons.com/source/include/asm-

> generic/memory_model.h#L78

> -------------------->8------------------

> #define __pfn_to_phys(pfn)      PFN_PHYS(pfn)

> -------------------->8------------------

> 

> PFN_PHYS() is:

> http://lxr.free-electrons.com/source/include/linux/pfn.h#L20

> -------------------->8------------------

> #define PFN_PHYS(x)     ((phys_addr_t)(x) << PAGE_SHIFT)

> -------------------->8------------------

> 

> And finally phys_addr_t is:

> http://lxr.free-electrons.com/source/include/linux/types.h#L161

> -------------------->8------------------

> #ifdef CONFIG_PHYS_ADDR_T_64BIT

> typedef u64 phys_addr_t;

> #else

> typedef u32 phys_addr_t;

> #endif

> -------------------->8------------------

> 

> Not really sure though which implementation is better.

> I like your approach because its simplicity instead of another couple of layers

> of definitions but maybe there's a reason for this kind of complication. Funny

> enough other arches take their own approaches ranging from the same you

> did to this __pfn_to_phys() to casting to "long long".


Yes, you are right. __pfn_to_phys() does what is needed.

> -Alexey

Patch
diff mbox

diff --git a/arch/arc/include/asm/pgtable.h b/arch/arc/include/asm/pgtable.h
index 89eeb37..77bc51c 100644
--- a/arch/arc/include/asm/pgtable.h
+++ b/arch/arc/include/asm/pgtable.h
@@ -280,7 +280,8 @@  static inline void pmd_set(pmd_t *pmdp, pte_t *ptep)
 
 #define pte_page(pte)		pfn_to_page(pte_pfn(pte))
 #define mk_pte(page, prot)	pfn_pte(page_to_pfn(page), prot)
-#define pfn_pte(pfn, prot)	__pte(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
+#define pfn_pte(pfn, prot) \
+	__pte(((pte_t) (pfn) << PAGE_SHIFT) | pgprot_val(prot))
 
 /* Don't use virt_to_pfn for macros below: could cause truncations for PAE40*/
 #define pte_pfn(pte)		(pte_val(pte) >> PAGE_SHIFT)