mbox series

[U-Boot,RESEND,0/3] arm: Introduce writel/readl_relaxed accessors

Message ID 20190210161726.5454-1-andre.przywara@arm.com
Headers show
Series arm: Introduce writel/readl_relaxed accessors | expand

Message

Andre Przywara Feb. 10, 2019, 4:17 p.m. UTC
Hi, this is a resend of what I posted some weeks ago, just adding the
missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
the opportunity to add his Reviewed-by: tags on the first two patches.
(Many thanks for that!) The rest is unchanged.
-------------------

Admittedly this is the long way round to solve some nasty SPL code size
problem, but it looked beneficial to others as well, so here we go:

arch/arm/include/asm/io.h looks like it's been around since the dawn of
time, and was more or less blindly copied from Linux.
We don't use and don't need most of the definitions, and mainline Linux
got rid of them anyway, so patch 1/3 cleans up this header file to
just contain what we need in U-Boot.

Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
but more importantly save one (barrier) instruction per accessor. This
helps to bring down code size, since especially DRAM controller inits in
SPLs tend to do a lot of MMIO.

Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
which reduces the SPL size by a whopping 2KB, due to a twist:
The AArch64 exception table needs to be 2KB aligned, but we don't do
anything special about it the linker script. So depending on where the
code before the vectors ends, we have potentially large padding:
At the moment this last address is 0x1824 for the H6, so the vectors can
only start at 0x2000. By reducing the code size before the vectors by just 
(at least) 9 instructions, the vectors start at 0x1800 and we save most of
the padding.

I understand that the proper solution is to fill the gap before the vectors
with code instead of NOPs, but I couldn't find any obvious way doing this
in the linker script. If anyone has any idea here, I am all ears.

Cheers,
Andre.

Andre Przywara (3):
  arm: clean up asm/io.h
  arm: introduce _relaxed MMIO accessors
  sunxi: H6: use writel_relaxed for DRAM timing register accesses

 arch/arm/include/asm/io.h            | 164 +++--------------------------------
 arch/arm/mach-sunxi/dram_sun50i_h6.c |  79 +++++++++--------
 2 files changed, 54 insertions(+), 189 deletions(-)

Comments

Jagan Teki April 15, 2019, 6:07 a.m. UTC | #1
On Sun, Feb 10, 2019 at 9:49 PM Andre Przywara <andre.przywara@arm.com> wrote:
>
> Hi, this is a resend of what I posted some weeks ago, just adding the
> missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
> the opportunity to add his Reviewed-by: tags on the first two patches.
> (Many thanks for that!) The rest is unchanged.
> -------------------
>
> Admittedly this is the long way round to solve some nasty SPL code size
> problem, but it looked beneficial to others as well, so here we go:
>
> arch/arm/include/asm/io.h looks like it's been around since the dawn of
> time, and was more or less blindly copied from Linux.
> We don't use and don't need most of the definitions, and mainline Linux
> got rid of them anyway, so patch 1/3 cleans up this header file to
> just contain what we need in U-Boot.
>
> Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
> but more importantly save one (barrier) instruction per accessor. This
> helps to bring down code size, since especially DRAM controller inits in
> SPLs tend to do a lot of MMIO.
>
> Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
> which reduces the SPL size by a whopping 2KB, due to a twist:
> The AArch64 exception table needs to be 2KB aligned, but we don't do
> anything special about it the linker script. So depending on where the
> code before the vectors ends, we have potentially large padding:
> At the moment this last address is 0x1824 for the H6, so the vectors can
> only start at 0x2000. By reducing the code size before the vectors by just
> (at least) 9 instructions, the vectors start at 0x1800 and we save most of
> the padding.

How come it reduces to 2KB? I can see the diff size of 160 bytes for gcc-6.3.1

₹ aarch64-linux-gnu-size spl/u-boot-spl*
   text       data        bss        dec        hex    filename
  28376        408        504      29288       7268    spl/u-boot-spl

₹ aarch64-linux-gnu-size spl/u-boot-spl*
   text       data        bss        dec        hex    filename
  28216        408        504      29128       71c8    spl/u-boot-spl
Chen-Yu Tsai April 15, 2019, 6:10 a.m. UTC | #2
On Mon, Apr 15, 2019 at 2:07 PM Jagan Teki <jagan@amarulasolutions.com> wrote:
>
> On Sun, Feb 10, 2019 at 9:49 PM Andre Przywara <andre.przywara@arm.com> wrote:
> >
> > Hi, this is a resend of what I posted some weeks ago, just adding the
> > missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
> > the opportunity to add his Reviewed-by: tags on the first two patches.
> > (Many thanks for that!) The rest is unchanged.
> > -------------------
> >
> > Admittedly this is the long way round to solve some nasty SPL code size
> > problem, but it looked beneficial to others as well, so here we go:
> >
> > arch/arm/include/asm/io.h looks like it's been around since the dawn of
> > time, and was more or less blindly copied from Linux.
> > We don't use and don't need most of the definitions, and mainline Linux
> > got rid of them anyway, so patch 1/3 cleans up this header file to
> > just contain what we need in U-Boot.
> >
> > Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
> > but more importantly save one (barrier) instruction per accessor. This
> > helps to bring down code size, since especially DRAM controller inits in
> > SPLs tend to do a lot of MMIO.
> >
> > Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
> > which reduces the SPL size by a whopping 2KB, due to a twist:
> > The AArch64 exception table needs to be 2KB aligned, but we don't do
> > anything special about it the linker script. So depending on where the
> > code before the vectors ends, we have potentially large padding:
> > At the moment this last address is 0x1824 for the H6, so the vectors can
> > only start at 0x2000. By reducing the code size before the vectors by just
> > (at least) 9 instructions, the vectors start at 0x1800 and we save most of
> > the padding.
>
> How come it reduces to 2KB? I can see the diff size of 160 bytes for gcc-6.3.1
>
> ₹ aarch64-linux-gnu-size spl/u-boot-spl*
>    text       data        bss        dec        hex    filename
>   28376        408        504      29288       7268    spl/u-boot-spl
>
> ₹ aarch64-linux-gnu-size spl/u-boot-spl*
>    text       data        bss        dec        hex    filename
>   28216        408        504      29128       71c8    spl/u-boot-spl

Because of section alignment issues? I believe Andre is referring to the
size of the whole file. Since it gets loaded as a whole, the total size
is what matters, not the size of the individual sections.

ChenYu
Jagan Teki April 15, 2019, 6:22 a.m. UTC | #3
On Mon, Apr 15, 2019 at 11:40 AM Chen-Yu Tsai <wens@kernel.org> wrote:
>
> On Mon, Apr 15, 2019 at 2:07 PM Jagan Teki <jagan@amarulasolutions.com> wrote:
> >
> > On Sun, Feb 10, 2019 at 9:49 PM Andre Przywara <andre.przywara@arm.com> wrote:
> > >
> > > Hi, this is a resend of what I posted some weeks ago, just adding the
> > > missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
> > > the opportunity to add his Reviewed-by: tags on the first two patches.
> > > (Many thanks for that!) The rest is unchanged.
> > > -------------------
> > >
> > > Admittedly this is the long way round to solve some nasty SPL code size
> > > problem, but it looked beneficial to others as well, so here we go:
> > >
> > > arch/arm/include/asm/io.h looks like it's been around since the dawn of
> > > time, and was more or less blindly copied from Linux.
> > > We don't use and don't need most of the definitions, and mainline Linux
> > > got rid of them anyway, so patch 1/3 cleans up this header file to
> > > just contain what we need in U-Boot.
> > >
> > > Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
> > > but more importantly save one (barrier) instruction per accessor. This
> > > helps to bring down code size, since especially DRAM controller inits in
> > > SPLs tend to do a lot of MMIO.
> > >
> > > Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
> > > which reduces the SPL size by a whopping 2KB, due to a twist:
> > > The AArch64 exception table needs to be 2KB aligned, but we don't do
> > > anything special about it the linker script. So depending on where the
> > > code before the vectors ends, we have potentially large padding:
> > > At the moment this last address is 0x1824 for the H6, so the vectors can
> > > only start at 0x2000. By reducing the code size before the vectors by just
> > > (at least) 9 instructions, the vectors start at 0x1800 and we save most of
> > > the padding.
> >
> > How come it reduces to 2KB? I can see the diff size of 160 bytes for gcc-6.3.1
> >
> > ₹ aarch64-linux-gnu-size spl/u-boot-spl*
> >    text       data        bss        dec        hex    filename
> >   28376        408        504      29288       7268    spl/u-boot-spl
> >
> > ₹ aarch64-linux-gnu-size spl/u-boot-spl*
> >    text       data        bss        dec        hex    filename
> >   28216        408        504      29128       71c8    spl/u-boot-spl
>
> Because of section alignment issues? I believe Andre is referring to the
> size of the whole file. Since it gets loaded as a whole, the total size
> is what matters, not the size of the individual sections.

Well, the input for final sunxi-spl.bin would be u-boot-spl and the
above shows the size of file as well 29128 bytes with -160 bytes from
29288.

Since the size of sunxi-spl.bin is truncated to 32K, I couldn't see
any difference either.
Andre Przywara April 15, 2019, 7:48 a.m. UTC | #4
On 15/04/2019 07:22, Jagan Teki wrote:

Hi,

> On Mon, Apr 15, 2019 at 11:40 AM Chen-Yu Tsai <wens@kernel.org> wrote:
>>
>> On Mon, Apr 15, 2019 at 2:07 PM Jagan Teki <jagan@amarulasolutions.com> wrote:
>>>
>>> On Sun, Feb 10, 2019 at 9:49 PM Andre Przywara <andre.przywara@arm.com> wrote:
>>>>
>>>> Hi, this is a resend of what I posted some weeks ago, just adding the
>>>> missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
>>>> the opportunity to add his Reviewed-by: tags on the first two patches.
>>>> (Many thanks for that!) The rest is unchanged.
>>>> -------------------
>>>>
>>>> Admittedly this is the long way round to solve some nasty SPL code size
>>>> problem, but it looked beneficial to others as well, so here we go:
>>>>
>>>> arch/arm/include/asm/io.h looks like it's been around since the dawn of
>>>> time, and was more or less blindly copied from Linux.
>>>> We don't use and don't need most of the definitions, and mainline Linux
>>>> got rid of them anyway, so patch 1/3 cleans up this header file to
>>>> just contain what we need in U-Boot.
>>>>
>>>> Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
>>>> but more importantly save one (barrier) instruction per accessor. This
>>>> helps to bring down code size, since especially DRAM controller inits in
>>>> SPLs tend to do a lot of MMIO.
>>>>
>>>> Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
>>>> which reduces the SPL size by a whopping 2KB, due to a twist:
>>>> The AArch64 exception table needs to be 2KB aligned, but we don't do
>>>> anything special about it the linker script. So depending on where the
>>>> code before the vectors ends, we have potentially large padding:
>>>> At the moment this last address is 0x1824 for the H6, so the vectors can
>>>> only start at 0x2000. By reducing the code size before the vectors by just
>>>> (at least) 9 instructions, the vectors start at 0x1800 and we save most of
>>>> the padding.
>>>
>>> How come it reduces to 2KB? I can see the diff size of 160 bytes for gcc-6.3.1
>>>
>>> ₹ aarch64-linux-gnu-size spl/u-boot-spl*
>>>    text       data        bss        dec        hex    filename
>>>   28376        408        504      29288       7268    spl/u-boot-spl
>>>
>>> ₹ aarch64-linux-gnu-size spl/u-boot-spl*
>>>    text       data        bss        dec        hex    filename
>>>   28216        408        504      29128       71c8    spl/u-boot-spl
>>
>> Because of section alignment issues? I believe Andre is referring to the
>> size of the whole file. Since it gets loaded as a whole, the total size
>> is what matters, not the size of the individual sections.
> 
> Well, the input for final sunxi-spl.bin would be u-boot-spl and the
> above shows the size of file as well 29128 bytes with -160 bytes from
> 29288.
> 
> Since the size of sunxi-spl.bin is truncated to 32K, I couldn't see
> any difference either.

As mentioned in the commit messasge, this is a fragile topic. Since
commit ef331e3685fe ("armv8: Disable exception vectors in SPL by
default") we disable the SPL exception vectors by default now, so the
numbers are now different.
You should be able to see the 2K saving with the SPL exception vectors
explicitly enabled in menuconfig.

Cheers,
Andre.
Jagan Teki April 17, 2019, noon UTC | #5
On Mon, Apr 15, 2019 at 1:21 PM André Przywara <andre.przywara@arm.com> wrote:
>
> On 15/04/2019 07:22, Jagan Teki wrote:
>
> Hi,
>
> > On Mon, Apr 15, 2019 at 11:40 AM Chen-Yu Tsai <wens@kernel.org> wrote:
> >>
> >> On Mon, Apr 15, 2019 at 2:07 PM Jagan Teki <jagan@amarulasolutions.com> wrote:
> >>>
> >>> On Sun, Feb 10, 2019 at 9:49 PM Andre Przywara <andre.przywara@arm.com> wrote:
> >>>>
> >>>> Hi, this is a resend of what I posted some weeks ago, just adding the
> >>>> missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
> >>>> the opportunity to add his Reviewed-by: tags on the first two patches.
> >>>> (Many thanks for that!) The rest is unchanged.
> >>>> -------------------
> >>>>
> >>>> Admittedly this is the long way round to solve some nasty SPL code size
> >>>> problem, but it looked beneficial to others as well, so here we go:
> >>>>
> >>>> arch/arm/include/asm/io.h looks like it's been around since the dawn of
> >>>> time, and was more or less blindly copied from Linux.
> >>>> We don't use and don't need most of the definitions, and mainline Linux
> >>>> got rid of them anyway, so patch 1/3 cleans up this header file to
> >>>> just contain what we need in U-Boot.
> >>>>
> >>>> Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
> >>>> but more importantly save one (barrier) instruction per accessor. This
> >>>> helps to bring down code size, since especially DRAM controller inits in
> >>>> SPLs tend to do a lot of MMIO.
> >>>>
> >>>> Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
> >>>> which reduces the SPL size by a whopping 2KB, due to a twist:
> >>>> The AArch64 exception table needs to be 2KB aligned, but we don't do
> >>>> anything special about it the linker script. So depending on where the
> >>>> code before the vectors ends, we have potentially large padding:
> >>>> At the moment this last address is 0x1824 for the H6, so the vectors can
> >>>> only start at 0x2000. By reducing the code size before the vectors by just
> >>>> (at least) 9 instructions, the vectors start at 0x1800 and we save most of
> >>>> the padding.
> >>>
> >>> How come it reduces to 2KB? I can see the diff size of 160 bytes for gcc-6.3.1
> >>>
> >>> ₹ aarch64-linux-gnu-size spl/u-boot-spl*
> >>>    text       data        bss        dec        hex    filename
> >>>   28376        408        504      29288       7268    spl/u-boot-spl
> >>>
> >>> ₹ aarch64-linux-gnu-size spl/u-boot-spl*
> >>>    text       data        bss        dec        hex    filename
> >>>   28216        408        504      29128       71c8    spl/u-boot-spl
> >>
> >> Because of section alignment issues? I believe Andre is referring to the
> >> size of the whole file. Since it gets loaded as a whole, the total size
> >> is what matters, not the size of the individual sections.
> >
> > Well, the input for final sunxi-spl.bin would be u-boot-spl and the
> > above shows the size of file as well 29128 bytes with -160 bytes from
> > 29288.
> >
> > Since the size of sunxi-spl.bin is truncated to 32K, I couldn't see
> > any difference either.
>
> As mentioned in the commit messasge, this is a fragile topic. Since
> commit ef331e3685fe ("armv8: Disable exception vectors in SPL by
> default") we disable the SPL exception vectors by default now, so the
> numbers are now different.

Sorry, I over looked the commit messages.

> You should be able to see the 2K saving with the SPL exception vectors
> explicitly enabled in menuconfig.

Oh. So I enabled the vectors via ARMV8_SPL_EXCEPTION_VECTORS=y but it
seems increased the SPL size.

₹ aarch64-linux-gnu-size spl/u-boot-spl
   text       data        bss        dec        hex    filename
  30130        408        504      31042       7942    spl/u-boot-spl
Andre Przywara April 18, 2019, 12:37 a.m. UTC | #6
On 17/04/2019 13:00, Jagan Teki wrote:
> On Mon, Apr 15, 2019 at 1:21 PM André Przywara <andre.przywara@arm.com> wrote:
>>
>> On 15/04/2019 07:22, Jagan Teki wrote:
>>
>> Hi,
>>
>>> On Mon, Apr 15, 2019 at 11:40 AM Chen-Yu Tsai <wens@kernel.org> wrote:
>>>>
>>>> On Mon, Apr 15, 2019 at 2:07 PM Jagan Teki <jagan@amarulasolutions.com> wrote:
>>>>>
>>>>> On Sun, Feb 10, 2019 at 9:49 PM Andre Przywara <andre.przywara@arm.com> wrote:
>>>>>>
>>>>>> Hi, this is a resend of what I posted some weeks ago, just adding the
>>>>>> missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
>>>>>> the opportunity to add his Reviewed-by: tags on the first two patches.
>>>>>> (Many thanks for that!) The rest is unchanged.
>>>>>> -------------------
>>>>>>
>>>>>> Admittedly this is the long way round to solve some nasty SPL code size
>>>>>> problem, but it looked beneficial to others as well, so here we go:
>>>>>>
>>>>>> arch/arm/include/asm/io.h looks like it's been around since the dawn of
>>>>>> time, and was more or less blindly copied from Linux.
>>>>>> We don't use and don't need most of the definitions, and mainline Linux
>>>>>> got rid of them anyway, so patch 1/3 cleans up this header file to
>>>>>> just contain what we need in U-Boot.
>>>>>>
>>>>>> Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
>>>>>> but more importantly save one (barrier) instruction per accessor. This
>>>>>> helps to bring down code size, since especially DRAM controller inits in
>>>>>> SPLs tend to do a lot of MMIO.
>>>>>>
>>>>>> Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
>>>>>> which reduces the SPL size by a whopping 2KB, due to a twist:
>>>>>> The AArch64 exception table needs to be 2KB aligned, but we don't do
>>>>>> anything special about it the linker script. So depending on where the
>>>>>> code before the vectors ends, we have potentially large padding:
>>>>>> At the moment this last address is 0x1824 for the H6, so the vectors can
>>>>>> only start at 0x2000. By reducing the code size before the vectors by just
>>>>>> (at least) 9 instructions, the vectors start at 0x1800 and we save most of
>>>>>> the padding.
>>>>>
>>>>> How come it reduces to 2KB? I can see the diff size of 160 bytes for gcc-6.3.1
>>>>>
>>>>> ₹ aarch64-linux-gnu-size spl/u-boot-spl*
>>>>>    text       data        bss        dec        hex    filename
>>>>>   28376        408        504      29288       7268    spl/u-boot-spl
>>>>>
>>>>> ₹ aarch64-linux-gnu-size spl/u-boot-spl*
>>>>>    text       data        bss        dec        hex    filename
>>>>>   28216        408        504      29128       71c8    spl/u-boot-spl
>>>>
>>>> Because of section alignment issues? I believe Andre is referring to the
>>>> size of the whole file. Since it gets loaded as a whole, the total size
>>>> is what matters, not the size of the individual sections.
>>>
>>> Well, the input for final sunxi-spl.bin would be u-boot-spl and the
>>> above shows the size of file as well 29128 bytes with -160 bytes from
>>> 29288.
>>>
>>> Since the size of sunxi-spl.bin is truncated to 32K, I couldn't see
>>> any difference either.
>>
>> As mentioned in the commit messasge, this is a fragile topic. Since
>> commit ef331e3685fe ("armv8: Disable exception vectors in SPL by
>> default") we disable the SPL exception vectors by default now, so the
>> numbers are now different.
> 
> Sorry, I over looked the commit messages.
> 
>> You should be able to see the 2K saving with the SPL exception vectors
>> explicitly enabled in menuconfig.
> 
> Oh. So I enabled the vectors via ARMV8_SPL_EXCEPTION_VECTORS=y but it
> seems increased the SPL size.
> 
> ₹ aarch64-linux-gnu-size spl/u-boot-spl
>    text       data        bss        dec        hex    filename
>   30130        408        504      31042       7942    spl/u-boot-spl

Sure, I meant you should see an effect with vs. without these patches,
both with the vectors enabled. But as mentioned, this is somewhat
random, depending on other code changes.
With current mainline for pine_h64_defconfig I get 29344 vs. 29504 bytes
without the vectors, and 31202 vs. "will not fit" with the vectors.

Cheers,
Andre.
Jagan Teki April 18, 2019, 4:59 p.m. UTC | #7
On Thu, Apr 18, 2019 at 6:07 AM André Przywara <andre.przywara@arm.com> wrote:
>
> On 17/04/2019 13:00, Jagan Teki wrote:
> > On Mon, Apr 15, 2019 at 1:21 PM André Przywara <andre.przywara@arm.com> wrote:
> >>
> >> On 15/04/2019 07:22, Jagan Teki wrote:
> >>
> >> Hi,
> >>
> >>> On Mon, Apr 15, 2019 at 11:40 AM Chen-Yu Tsai <wens@kernel.org> wrote:
> >>>>
> >>>> On Mon, Apr 15, 2019 at 2:07 PM Jagan Teki <jagan@amarulasolutions.com> wrote:
> >>>>>
> >>>>> On Sun, Feb 10, 2019 at 9:49 PM Andre Przywara <andre.przywara@arm.com> wrote:
> >>>>>>
> >>>>>> Hi, this is a resend of what I posted some weeks ago, just adding the
> >>>>>> missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
> >>>>>> the opportunity to add his Reviewed-by: tags on the first two patches.
> >>>>>> (Many thanks for that!) The rest is unchanged.
> >>>>>> -------------------
> >>>>>>
> >>>>>> Admittedly this is the long way round to solve some nasty SPL code size
> >>>>>> problem, but it looked beneficial to others as well, so here we go:
> >>>>>>
> >>>>>> arch/arm/include/asm/io.h looks like it's been around since the dawn of
> >>>>>> time, and was more or less blindly copied from Linux.
> >>>>>> We don't use and don't need most of the definitions, and mainline Linux
> >>>>>> got rid of them anyway, so patch 1/3 cleans up this header file to
> >>>>>> just contain what we need in U-Boot.
> >>>>>>
> >>>>>> Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
> >>>>>> but more importantly save one (barrier) instruction per accessor. This
> >>>>>> helps to bring down code size, since especially DRAM controller inits in
> >>>>>> SPLs tend to do a lot of MMIO.
> >>>>>>
> >>>>>> Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
> >>>>>> which reduces the SPL size by a whopping 2KB, due to a twist:
> >>>>>> The AArch64 exception table needs to be 2KB aligned, but we don't do
> >>>>>> anything special about it the linker script. So depending on where the
> >>>>>> code before the vectors ends, we have potentially large padding:
> >>>>>> At the moment this last address is 0x1824 for the H6, so the vectors can
> >>>>>> only start at 0x2000. By reducing the code size before the vectors by just
> >>>>>> (at least) 9 instructions, the vectors start at 0x1800 and we save most of
> >>>>>> the padding.
> >>>>>
> >>>>> How come it reduces to 2KB? I can see the diff size of 160 bytes for gcc-6.3.1
> >>>>>
> >>>>> ₹ aarch64-linux-gnu-size spl/u-boot-spl*
> >>>>>    text       data        bss        dec        hex    filename
> >>>>>   28376        408        504      29288       7268    spl/u-boot-spl
> >>>>>
> >>>>> ₹ aarch64-linux-gnu-size spl/u-boot-spl*
> >>>>>    text       data        bss        dec        hex    filename
> >>>>>   28216        408        504      29128       71c8    spl/u-boot-spl
> >>>>
> >>>> Because of section alignment issues? I believe Andre is referring to the
> >>>> size of the whole file. Since it gets loaded as a whole, the total size
> >>>> is what matters, not the size of the individual sections.
> >>>
> >>> Well, the input for final sunxi-spl.bin would be u-boot-spl and the
> >>> above shows the size of file as well 29128 bytes with -160 bytes from
> >>> 29288.
> >>>
> >>> Since the size of sunxi-spl.bin is truncated to 32K, I couldn't see
> >>> any difference either.
> >>
> >> As mentioned in the commit messasge, this is a fragile topic. Since
> >> commit ef331e3685fe ("armv8: Disable exception vectors in SPL by
> >> default") we disable the SPL exception vectors by default now, so the
> >> numbers are now different.
> >
> > Sorry, I over looked the commit messages.
> >
> >> You should be able to see the 2K saving with the SPL exception vectors
> >> explicitly enabled in menuconfig.
> >
> > Oh. So I enabled the vectors via ARMV8_SPL_EXCEPTION_VECTORS=y but it
> > seems increased the SPL size.
> >
> > ₹ aarch64-linux-gnu-size spl/u-boot-spl
> >    text       data        bss        dec        hex    filename
> >   30130        408        504      31042       7942    spl/u-boot-spl
>
> Sure, I meant you should see an effect with vs. without these patches,
> both with the vectors enabled. But as mentioned, this is somewhat
> random, depending on other code changes.
> With current mainline for pine_h64_defconfig I get 29344 vs. 29504 bytes
> without the vectors, and 31202 vs. "will not fit" with the vectors.

Okay.
Jagan Teki April 18, 2019, 5 p.m. UTC | #8
On Sun, Feb 10, 2019 at 9:49 PM Andre Przywara <andre.przywara@arm.com> wrote:
>
> Hi, this is a resend of what I posted some weeks ago, just adding the
> missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
> the opportunity to add his Reviewed-by: tags on the first two patches.
> (Many thanks for that!) The rest is unchanged.
> -------------------
>
> Admittedly this is the long way round to solve some nasty SPL code size
> problem, but it looked beneficial to others as well, so here we go:
>
> arch/arm/include/asm/io.h looks like it's been around since the dawn of
> time, and was more or less blindly copied from Linux.
> We don't use and don't need most of the definitions, and mainline Linux
> got rid of them anyway, so patch 1/3 cleans up this header file to
> just contain what we need in U-Boot.
>
> Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
> but more importantly save one (barrier) instruction per accessor. This
> helps to bring down code size, since especially DRAM controller inits in
> SPLs tend to do a lot of MMIO.
>
> Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
> which reduces the SPL size by a whopping 2KB, due to a twist:
> The AArch64 exception table needs to be 2KB aligned, but we don't do
> anything special about it the linker script. So depending on where the
> code before the vectors ends, we have potentially large padding:
> At the moment this last address is 0x1824 for the H6, so the vectors can
> only start at 0x2000. By reducing the code size before the vectors by just
> (at least) 9 instructions, the vectors start at 0x1800 and we save most of
> the padding.
>
> I understand that the proper solution is to fill the gap before the vectors
> with code instead of NOPs, but I couldn't find any obvious way doing this
> in the linker script. If anyone has any idea here, I am all ears.
>
> Cheers,
> Andre.
>
> Andre Przywara (3):
>   arm: clean up asm/io.h
>   arm: introduce _relaxed MMIO accessors
>   sunxi: H6: use writel_relaxed for DRAM timing register accesses

Anyone has any further comments on this? would like to pick this before MW.

Jagan.
Jagan Teki April 25, 2019, 6:01 p.m. UTC | #9
On Thu, Apr 18, 2019 at 10:30 PM Jagan Teki <jagan@amarulasolutions.com> wrote:
>
> On Sun, Feb 10, 2019 at 9:49 PM Andre Przywara <andre.przywara@arm.com> wrote:
> >
> > Hi, this is a resend of what I posted some weeks ago, just adding the
> > missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
> > the opportunity to add his Reviewed-by: tags on the first two patches.
> > (Many thanks for that!) The rest is unchanged.
> > -------------------
> >
> > Admittedly this is the long way round to solve some nasty SPL code size
> > problem, but it looked beneficial to others as well, so here we go:
> >
> > arch/arm/include/asm/io.h looks like it's been around since the dawn of
> > time, and was more or less blindly copied from Linux.
> > We don't use and don't need most of the definitions, and mainline Linux
> > got rid of them anyway, so patch 1/3 cleans up this header file to
> > just contain what we need in U-Boot.
> >
> > Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
> > but more importantly save one (barrier) instruction per accessor. This
> > helps to bring down code size, since especially DRAM controller inits in
> > SPLs tend to do a lot of MMIO.
> >
> > Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
> > which reduces the SPL size by a whopping 2KB, due to a twist:
> > The AArch64 exception table needs to be 2KB aligned, but we don't do
> > anything special about it the linker script. So depending on where the
> > code before the vectors ends, we have potentially large padding:
> > At the moment this last address is 0x1824 for the H6, so the vectors can
> > only start at 0x2000. By reducing the code size before the vectors by just
> > (at least) 9 instructions, the vectors start at 0x1800 and we save most of
> > the padding.
> >
> > I understand that the proper solution is to fill the gap before the vectors
> > with code instead of NOPs, but I couldn't find any obvious way doing this
> > in the linker script. If anyone has any idea here, I am all ears.
> >
> > Cheers,
> > Andre.
> >
> > Andre Przywara (3):
> >   arm: clean up asm/io.h
> >   arm: introduce _relaxed MMIO accessors
> >   sunxi: H6: use writel_relaxed for DRAM timing register accesses
>
> Anyone has any further comments on this? would like to pick this before MW.

Applied to u-boot-sunxi/master
Jagan Teki April 29, 2019, 5:16 p.m. UTC | #10
On Sun, Feb 10, 2019 at 9:49 PM Andre Przywara <andre.przywara@arm.com> wrote:
>
> Hi, this is a resend of what I posted some weeks ago, just adding the
> missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
> the opportunity to add his Reviewed-by: tags on the first two patches.
> (Many thanks for that!) The rest is unchanged.
> -------------------
>
> Admittedly this is the long way round to solve some nasty SPL code size
> problem, but it looked beneficial to others as well, so here we go:
>
> arch/arm/include/asm/io.h looks like it's been around since the dawn of
> time, and was more or less blindly copied from Linux.
> We don't use and don't need most of the definitions, and mainline Linux
> got rid of them anyway, so patch 1/3 cleans up this header file to
> just contain what we need in U-Boot.
>
> Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
> but more importantly save one (barrier) instruction per accessor. This
> helps to bring down code size, since especially DRAM controller inits in
> SPLs tend to do a lot of MMIO.
>
> Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
> which reduces the SPL size by a whopping 2KB, due to a twist:
> The AArch64 exception table needs to be 2KB aligned, but we don't do
> anything special about it the linker script. So depending on where the
> code before the vectors ends, we have potentially large padding:
> At the moment this last address is 0x1824 for the H6, so the vectors can
> only start at 0x2000. By reducing the code size before the vectors by just
> (at least) 9 instructions, the vectors start at 0x1800 and we save most of
> the padding.
>
> I understand that the proper solution is to fill the gap before the vectors
> with code instead of NOPs, but I couldn't find any obvious way doing this
> in the linker script. If anyone has any idea here, I am all ears.
>
> Cheers,
> Andre.
>
> Andre Przywara (3):
>   arm: clean up asm/io.h
>   arm: introduce _relaxed MMIO accessors
>   sunxi: H6: use writel_relaxed for DRAM timing register accesses

These have build issues with arm32, please send another series.
Andre Przywara April 30, 2019, 10:06 p.m. UTC | #11
On 29/04/2019 18:16, Jagan Teki wrote:

Hi,

> On Sun, Feb 10, 2019 at 9:49 PM Andre Przywara <andre.przywara@arm.com> wrote:
>>
>> Hi, this is a resend of what I posted some weeks ago, just adding the
>> missing Signed-off-by: in patch 2/3, as pointed out by Philipp. I used
>> the opportunity to add his Reviewed-by: tags on the first two patches.
>> (Many thanks for that!) The rest is unchanged.
>> -------------------
>>
>> Admittedly this is the long way round to solve some nasty SPL code size
>> problem, but it looked beneficial to others as well, so here we go:
>>
>> arch/arm/include/asm/io.h looks like it's been around since the dawn of
>> time, and was more or less blindly copied from Linux.
>> We don't use and don't need most of the definitions, and mainline Linux
>> got rid of them anyway, so patch 1/3 cleans up this header file to
>> just contain what we need in U-Boot.
>>
>> Patch 2/3 introduces readl/writel_relaxed accessors, which are cheaper,
>> but more importantly save one (barrier) instruction per accessor. This
>> helps to bring down code size, since especially DRAM controller inits in
>> SPLs tend to do a lot of MMIO.
>>
>> Consequently patch 3/3 introduces them in the Allwinner H6 DRAM driver,
>> which reduces the SPL size by a whopping 2KB, due to a twist:
>> The AArch64 exception table needs to be 2KB aligned, but we don't do
>> anything special about it the linker script. So depending on where the
>> code before the vectors ends, we have potentially large padding:
>> At the moment this last address is 0x1824 for the H6, so the vectors can
>> only start at 0x2000. By reducing the code size before the vectors by just
>> (at least) 9 instructions, the vectors start at 0x1800 and we save most of
>> the padding.
>>
>> I understand that the proper solution is to fill the gap before the vectors
>> with code instead of NOPs, but I couldn't find any obvious way doing this
>> in the linker script. If anyone has any idea here, I am all ears.
>>
>> Cheers,
>> Andre.
>>
>> Andre Przywara (3):
>>   arm: clean up asm/io.h
>>   arm: introduce _relaxed MMIO accessors
>>   sunxi: H6: use writel_relaxed for DRAM timing register accesses
> 
> These have build issues with arm32, please send another series.

Thanks for the elaborate error report ;-)

There is commit 6478848d165b63293f7021db9b70ce25a1e1062c, which does
basically the same thing as patch 2/3 in this series and was merged by
Tom already. This causes the double definition.
So just dropping the middle patch from this series should do the trick.

Cheers,
Andre.