diff mbox series

[U-Boot] common/memsize.c: restore content of the base address

Message ID 1512575263-23010-1-git-send-email-patrick.delaunay@st.com
State Superseded
Delegated to: Tom Rini
Headers show
Series [U-Boot] common/memsize.c: restore content of the base address | expand

Commit Message

Patrick DELAUNAY Dec. 6, 2017, 3:47 p.m. UTC
In function get_ram_size() and for 2 last cases the content of
the base address (*base)  is not restored even it is
correctly saved in stack (in save[i]).

This patch solved this issue.
The content of the base address is saved in new variable
in stack (save_base) to avoid the need of other information
(value of i) and restored in all the cases.

Signed-off-by: Patrick Delaunay <patrick.delaunay@st.com>
Reviewed-by: CITOOLS <smet-aci-reviews@lists.codex.cro.st.com>
Reviewed-by: CIBUILD <smet-aci-builds@lists.codex.cro.st.com>
---
issue detected on my custom board with
- base = 0xC0000000
- size = maxsize = 0x4000000

The loop is completely executed and the size is correctly
detected, but content in 0xC0000000 is not restored
So I have 0xC0000000 = 0x0 at the end of the function.

That cause issue in my use-case because U-Boot in loaded
at  0xC1000000 and some information can be saved by
ATF in DDR at 0xC0000000 address.
And the content of DDR is modified by this function.

 common/memsize.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Comments

Wolfgang Denk Dec. 6, 2017, 9:06 p.m. UTC | #1
Dear Patrick,

In message <1512575263-23010-1-git-send-email-patrick.delaunay@st.com> you wrote:
> In function get_ram_size() and for 2 last cases the content of
> the base address (*base)  is not restored even it is
> correctly saved in stack (in save[i]).
> 
> This patch solved this issue.
> The content of the base address is saved in new variable
> in stack (save_base) to avoid the need of other information
> (value of i) and restored in all the cases.

What exactly is the problem you are trying to fix?  How exactly does
it manifest for you?

On which boards/architectures did you observe this problem, and on
which did you actually test your patch?

How exactly is your memory mapped and tested on the boards where
your patch fixes a problem?

The thing is, that this "fix" comes up again and again wevery coplu
of months / years, and IIRC so far all these patches broke some
system, while the code as is has been working fine of many systems.

See for example commit b8496cce and revert in 3ab270d5 in 2012, or
commit 8e7cba04 and revert in cc8d698f in 2016.

See also the threads starting at

Subject: [U-Boot-Users] memsize.c patch
From: "Sangmoon Kim" <dogoil@etinsys.com>
Date: Fri, 2 Apr 2004 13:08:50 +0900

and

Subject: [PATCH v2] memsize: Fix for bug in memory sizing code
From: Gerd Hoffmann <kraxel@redhat.com>
Date: Tue, 21 Oct 2014 18:49:13 +0200

Thanks.

Best regards,

Wolfgang Denk
Patrick DELAUNAY Dec. 7, 2017, 3:48 p.m. UTC | #2
Dear Wolfgang,

> 

> Dear Patrick,

> 

> In message <1512575263-23010-1-git-send-email-patrick.delaunay@st.com>

> you wrote:

> > In function get_ram_size() and for 2 last cases the content of the

> > base address (*base)  is not restored even it is correctly saved in

> > stack (in save[i]).

> >

> > This patch solved this issue.

> > The content of the base address is saved in new variable in stack

> > (save_base) to avoid the need of other information (value of i) and

> > restored in all the cases.

> 

> What exactly is the problem you are trying to fix?  How exactly does it manifest

> for you?


My issue is that the base address (*base)  in not restored at the end of the test.
I will explain the sequence in details after.

I know that this code is sensible because I already debug a spurious issue on this function some month ago.
I had abort during get_ram_size() execution but problem disappear when I change just a little the code.

After investigation, I found an potential issue when the current code of get_ram_size() 
is loaded near of power of 2 offset (just before an address modified by the code)... 
In fact the content of the RAM is modified during the code execution just after the
PC address, and the code is not yet in instruction cache, so this the code crash.
I try to found a good solution (limit the min offset to be tested, 
to avoid to modify the address used by U-Boot code)

You can found analysis in :
  [U-Boot] questions about get_ram_size usage in U-Boot = DDR modification during code execution
  From Patrick DELAUNAY Mon Apr 24 16:11:41 UTC 2017
  https://lists.denx.de/pipermail/u-boot/2017-April/288709.html

Without answer, I prefer don't modify the code and finally I avoid the issue just by loading U-boot at 1MB offset 
from the beginning of the DDR, so get_ram_size() is no more loaded at power of 2 offset...

But I suspect this case can also occur on other board when U-Boot is loaded at address==base, 
when the address of get_ram_size() change (even a little) after any modification.

> 

> On which boards/architectures did you observe this problem, and on which did

> you actually test your patch?


I use a custom board, with armv7 cortex A7 chip....
But it is not yet up-streamed, I plan to upstream this board next year when the validation
will be finished and when this board will be available outside ST.

Unfortunately I test my patch only on my board:
The preloader (I don't use not SPL for this test but a custom loader) load a file with checksum 
at RAM start and U-Boot at 1MiB offset:
- without my patch, the loaded file is corrupted, memory dump indicate than *base = 0x0
   and not the magic for the loaded file, all the  other DDR content is OK
- with the patch, I have no more issue (magic is OK and checksum is OK)

> 

> How exactly is your memory mapped and tested on the boards where your patch

> fixes a problem?


I am on ARM platform, before relocation, so dcache not yet activated.

RAM start at 0xC0000000, available size in 1 GB = 0x40000000

U-Boot loaded /executed on 0xC1000000

In first loop U-Boot save the RAM and the update the power of 2 address
	cnt = 0x8000000 => 0xE0000000 = 0xE0000000 (save[0])
	cnt = 0x4000000 => 0xD0000000 = 0xD0000000 (save[1])
	cnt = 0x2000000 => 0xC8000000 = 0xC8000000 (save[2]) 
	cnt = 0x1000000 => 0xC4000000 = 0xC4000000 (save[3]) 
	....
	cnt = 0x0000040 => 0xC4000100 = 0xC0000100 (save[21]) 
	cnt = 0x0000020 => 0xC4000080 = 0xC0000080 (save[22]) 
	cnt = 0x0000010 => 0xC4000040 = 0xC0000040 (save[23]) 
	cnt = 0x0000008 => 0xC4000020 = 0xC0000020 (save[24]) 
	cnt = 0x0000004 => 0xC4000010 = 0xC0000010 (save[25]) 
	cnt = 0x0000002 => 0xC4000008 = 0xC0000008 (save[26]) 
	cnt = 0x0000001 => 0xC4000004 = 0xC0000004 (save[27])
	
Then update base to 0, i = 28
	0xC0000000 = 0x00000000 (save[28])

Then the second loop is executed, the content of DDR is restored of each power of 2 offset
As the test if (val != ~cnt)  is always false

	cnt = 0x0000001 => 0xC4000004 = save[27] 
	cnt = 0x0000002 => 0xC4000008 = save[26]
	... 
	cnt = 0x4000000 => 0xD0000000 = save[1]
	cnt = 0x8000000 => 0xE0000000 = save[0]

and finally the function return max size

at this point the saved value in save[28) is NOT restored and 
0xC0000000 is still stay at 0x00000000

> 

> The thing is, that this "fix" comes up again and again wevery coplu of months /

> years, and IIRC so far all these patches broke some system, while the code as is

> has been working fine of many systems.

> 

> See for example commit b8496cce and revert in 3ab270d5 in 2012, or commit

> 8e7cba04 and revert in cc8d698f in 2016.

> 

> See also the threads starting at

> 

> Subject: [U-Boot-Users] memsize.c patch

> From: "Sangmoon Kim" <dogoil@etinsys.com>

> Date: Fri, 2 Apr 2004 13:08:50 +0900

> 

> and

> 

> Subject: [PATCH v2] memsize: Fix for bug in memory sizing code

> From: Gerd Hoffmann <kraxel@redhat.com>

> Date: Tue, 21 Oct 2014 18:49:13 +0200


I agree than the second patch is exactly the same and I don't understood why this patch broke after board .
The only difference is that I also reduce the size of the array save[] (as base in no more saved in this array), 
so I don't increase the stack usage. 

Perhaps other board failed because size of function code increase (not fully inside instruction cache page) 
or it  is the same issue indicated previously: the address of function change and so code change during execution.

If the same patch come again I think that this function have really a problem and we could
solve the issue (with my patch for exmale) and then understood the regression on other the boards.


> Thanks.

> 

> Best regards,

> 

> Wolfgang Denk

> 

> --

> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk

> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de A

> supercomputer is a machine that runs an endless loop in 2 seconds.


If you think it is this risk is too high, I can also manage the issue on by board :
	save *base before to call and restore it after
But it only a workaround, so I prefer a generic correction.

Regards,

Patrick Delaunay
Wolfgang Denk Dec. 7, 2017, 5:26 p.m. UTC | #3
Dear Patrick,

In message <6daf1478e4284b8590c2862c6a504e9c@SFHDAG6NODE3.st.com> you wrote:
> 
> After investigation, I found an potential issue when the current code of get_ram_size() 
> is loaded near of power of 2 offset (just before an address modified by the code)... 
> In fact the content of the RAM is modified during the code execution just after the
> PC address, and the code is not yet in instruction cache, so this the code crash.
> I try to found a good solution (limit the min offset to be tested, 
> to avoid to modify the address used by U-Boot code)

You mean you are running this code from the very memory you are
sizing?  This is fundamentally broken.  You must not do this!

get_ram_size() is supposed to run from some _different_ memory (at
least a different memopry bank, usually better from a different
memory type).

> You can found analysis in :
>   [U-Boot] questions about get_ram_size usage in U-Boot = DDR modification during code execution
>   From Patrick DELAUNAY Mon Apr 24 16:11:41 UTC 2017
>   https://lists.denx.de/pipermail/u-boot/2017-April/288709.html

You write there:

| Today, I found a issue with  get_ram_size() usage in U-Boot, when
| U-Boot is executed in the tested memory.

You must not do this.  The memory under test is supposed to be
otherwise idle.

> and finally the function return max size

For the test to work correctly, max test size should be chosen at
least twice as big as the maximum possible memory configuration.

> Perhaps other board failed because size of function code increase (not fully inside instruction cache page) 

No.  We are talking of boards that were implemented properly, i. e.
the tested RAM was NOT the memory where get_ram_size() was running
from.

> or it  is the same issue indicated previously: the address of function change and so code change during execution.

Code location or size does not play a role here.  The code was
running from parallel NOR flash, and testing the actual RAM size, so
thsi is totally independent.

> If the same patch come again I think that this function have really a problem and we could
> solve the issue (with my patch for exmale) and then understood the regression on other the boards.

I agree there appears to be a problem,  but it should be
understood..

In your case you use the code in an illegal way, so I tend to
argument you are provoking undefined behaviour.

Best regards,

Wolfgang Denk
Patrick DELAUNAY Dec. 8, 2017, 3:12 p.m. UTC | #4
> From: Wolfgang Denk [mailto:wd@denx.de]

> Subject: Re: [U-Boot] [PATCH] common/memsize.c: restore content of the base

> address

> 

> Dear Patrick,

> 

> In message <6daf1478e4284b8590c2862c6a504e9c@SFHDAG6NODE3.st.com>

> you wrote:

> >

> > After investigation, I found an potential issue when the current code

> > of get_ram_size() is loaded near of power of 2 offset (just before an address

> modified by the code)...

> > In fact the content of the RAM is modified during the code execution

> > just after the PC address, and the code is not yet in instruction cache, so this

> the code crash.

> > I try to found a good solution (limit the min offset to be tested, to

> > avoid to modify the address used by U-Boot code)

> 

> You mean you are running this code from the very memory you are sizing?  This

> is fundamentally broken.  You must not do this!

>


Yes I do it, sorry if it is a error.

But it is recommended in ./doc/README.arm-relocation:
     At board level:

	dram_init(): bd pointer is now at this point not accessible, so only
	detect the real dramsize, and store it in gd->ram_size. Bst detected
	with get_ram_size().

And before to setup by board I check on many device with ARM architecture.
Most the time U-Boot loaded at beginning of the RAM, relocated at the end of the RAM 
and dram_init use get_ram_size
(I read also many debate about the end of relocation even when U-Boot in loaded in RAM on U-Boot mainling list)

Example1 :
  ./board/toradex/colibri_imx6/colibri_imx6.c
  int dram_init(void)
  {
	/* use the DDR controllers configured size */
	gd->ram_size = get_ram_size((void *)CONFIG_SYS_SDRAM_BASE,
				    (ulong)imx_ddr_size());

	return 0;
  }

./arch/arm/include/asm/arch-mx6/imx-regs.h:118:#define MMDC0_ARB_BASE_ADDR             0x10000000
./include/configs/colibri_imx6.h:249:#define PHYS_SDRAM			MMDC0_ARB_BASE_ADDR
./include/configs/colibri_imx6.h:251:#define CONFIG_SYS_SDRAM_BASE		PHYS_SDRAM
./include/configs/colibri_imx6.h:126:#define CONFIG_SYS_TEXT_BASE		0x17800000

Example 2
./arch/arm/mach-omap2/am33xx/board.c: int dram_init(void)
	/* dram_init must store complete ramsize in gd->ram_size */
	gd->ram_size = get_ram_size(
			(void *)CONFIG_SYS_SDRAM_BASE,
			CONFIG_MAX_RAM_BANK_SIZE);

./include/configs/bur_am335x_common.h:67:#define CONFIG_SYS_SDRAM_BASE		0x80000000
./include/configs/siemens-am33x-common.h:155:#define CONFIG_SYS_TEXT_BASE		0x80100000

So I just do the same for my arm/armv7 platform.

On ARM device device, I think we don't have the banks...

> get_ram_size() is supposed to run from some _different_ memory (at least a

> different memopry bank, usually better from a different memory type).


 I agree that the function is not safe in this case.
So I will change my board to remove this call....

> > You can found analysis in :

> >   [U-Boot] questions about get_ram_size usage in U-Boot = DDR modification

> during code execution

> >   From Patrick DELAUNAY Mon Apr 24 16:11:41 UTC 2017

> >   https://lists.denx.de/pipermail/u-boot/2017-April/288709.html

> 

> You write there:

> 

> | Today, I found a issue with  get_ram_size() usage in U-Boot, when

> | U-Boot is executed in the tested memory.

> 

> You must not do this.  The memory under test is supposed to be otherwise idle.

> 

> > and finally the function return max size

> 

> For the test to work correctly, max test size should be chosen at least twice as

> big as the maximum possible memory configuration.


This requirement seems strange to me

I have 32 bits ARM platform, with addressable RAM area 1GiB
DDR start => 0xC0000000
DDR end => 0xFFFFFFFF (size = 0x40000000)

If I need to test with 2Gib, the code will test access to 0xC0000000 + max_size / 2 = 0x100000000 
=> greater than 32bits access to 0x0 address and crash (not allowed to access to 0x0 = bootrom)

For me the function is working correctly with  max test size chosen at the maximum memory size
=> no overlap detected, so retrun max_size

> 

> > Perhaps other board failed because size of function code increase (not

> > fully inside instruction cache page)

> 

> No.  We are talking of boards that were implemented properly, i. e.

> the tested RAM was NOT the memory where get_ram_size() was running from.

> 

> > or it  is the same issue indicated previously: the address of function change and

> so code change during execution.

> 

> Code location or size does not play a role here.  The code was running from

> parallel NOR flash, and testing the actual RAM size, so thsi is totally

> independent.


OK, sorry
> 

> > If the same patch come again I think that this function have really a

> > problem and we could solve the issue (with my patch for exmale) and then

> understood the regression on other the boards.

> 

> I agree there appears to be a problem,  but it should be understood..

> 

> In your case you use the code in an illegal way, so I tend to argument you are

> provoking undefined behaviour.


> 

> Best regards,

> 

> Wolfgang Denk

>

Ok, I will remove the call of the code on my board, even if over arm platform use this function in the same context.
So it should be no more a problem for me.

But I was still convince that the test don't restore properly the DDR properly, 
So I dig deeper on the algorithm when size < max size and in fact, 
it is working with overlap (I miss it in first review):

For example :
RAM start at 0xC0000000, max available size in 1 GB = 0x40000000

Effective size = 256Mib

In first loop U-Boot save the RAM and the update the power of 2 address
	cnt = 0x8000000 => 0xE0000000 = 0xE0000000 (save[0])
	cnt = 0x4000000 => 0xD0000000 = 0xD0000000 (save[1])
	cnt = 0x2000000 => 0xC8000000 = 0xC8000000 (save[2]) 
	cnt = 0x1000000 => 0xC4000000 = 0xC4000000 (save[3]) 
	....
	cnt = 0x0000040 => 0xC4000100 = 0xC0000100 (save[21]) 
	cnt = 0x0000020 => 0xC4000080 = 0xC0000080 (save[22]) 
	cnt = 0x0000010 => 0xC4000040 = 0xC0000040 (save[23]) 
	cnt = 0x0000008 => 0xC4000020 = 0xC0000020 (save[24]) 
	cnt = 0x0000004 => 0xC4000010 = 0xC0000010 (save[25]) 
	cnt = 0x0000002 => 0xC4000008 = 0xC0000008 (save[26]) 
	cnt = 0x0000001 => 0xC4000004 = 0xC0000004 (save[27])
	
Then update base to 0, i = 28
	0xC0000000 = 0x00000000 (save[28])

Then the second loop is executed, the content of DDR is restored of each power of 2 offset As the test if (val != ~cnt)  is always false

	cnt = 0x0000001 => 0xC4000004 = save[27] 
	cnt = 0x0000002 => 0xC4000008 = save[26]
	... 
	cnt = 0x1000000 => 0xC4000000 = save[3]
	cnt = 0x2000000 => 0xC8000000 = save[2]
	cnt = 0x4000000 => 0xD0000000 = 0 overlap detected !
		size = cnt * 4 = 0x10000000
		and restore  original data
		cnt = 0x4000000 => 0xD0000000 = 0x00000000 = save[1] (with overlap)
		cnt = 0x8000000 => 0xE0000000 = 0x10000000 = save[0] ](with overlap)


So the function don't restore *base only if no overlap is detected , for the case return (maxsize)

But it is the case that you indicate it is not working:
	max test size should be chosen at least twice as big as the maximum possible memory configuration.

Even it is not indicated in function header  and if the size is correctly detected :
 * Check memory range for valid RAM. A simple memory test determines
 * the actually available RAM size between addresses `base' and
 * `base + maxsize'

Here it I not indicated maxsize / 2
I now, I am doubtfully...

Do you think this behavior is correct ?
	When no overlap is detected, return maxsize but don't restore *base

Regards

Patrick
Wolfgang Denk Dec. 9, 2017, 8:04 p.m. UTC | #5
Dear Patrick,

In message <885aaa3abdfe440591ea271f92ab42ff@SFHDAG6NODE3.st.com> you wrote:
> 
> > You mean you are running this code from the very memory you are sizing?  This
> > is fundamentally broken.  You must not do this!
> 
> Yes I do it, sorry if it is a error.
> 
> But it is recommended in ./doc/README.arm-relocation:

Is it?  I don't think so.

>      At board level:
> 
> 	dram_init(): bd pointer is now at this point not accessible, so only
> 	detect the real dramsize, and store it in gd->ram_size. Bst detected
> 	with get_ram_size().

Where do  you read that this allows runing the code from the same
memory which is sized/tested?  I cannot see any such claim.  It is
always supposed that you run from a totally different, independent
memory type - in the olden days usually parallel NOR flash, nor some
on-chip-memory, but never ever the RAM you are testing!

> And before to setup by board I check on many device with ARM architecture.
> Most the time U-Boot loaded at beginning of the RAM, relocated at the end of the RAM 
> and dram_init use get_ram_size

But definitely not in this order!  The sequence should always be:
initialize the RAM and determine it's size; load U-Boot and relocate
it to the (now known) end of RAM (resp. to some lower address,
depending on any reservations made for shared display buffer, shared
log buffer, PRAM, etc.).

> Example1 :
>   ./board/toradex/colibri_imx6/colibri_imx6.c
>   int dram_init(void)
>   {
> 	/* use the DDR controllers configured size */
> 	gd->ram_size = get_ram_size((void *)CONFIG_SYS_SDRAM_BASE,
> 				    (ulong)imx_ddr_size());
> 
> 	return 0;
>   }

What do you mean to show by this?  It just means that dram_init()
must be running from some other memory (OCM), too.

> So I just do the same for my arm/armv7 platform.

No, apparently not.

> On ARM device device, I think we don't have the banks...

So you must run this (typically int he SPL) fom some on-chip memory.

>  I agree that the function is not safe in this case.
> So I will change my board to remove this call....

That's the wrong conclusion. get_ram_size() is not only useful to
measure the size of the RAM, it also catches 95% of the typical
in-field memory errors.  When you see a board stuk or crashing after
reporting a memory size that differs from what you expect, you don't
have to search long for the culprit.

> > For the test to work correctly, max test size should be chosen at least twice as
> > big as the maximum possible memory configuration.
> 
> This requirement seems strange to me

Then you have not thought about what the code does in this case,
especially on memory controllers where the initialization of the
cntroller includes setting the maximum access size.  Especially on
board which can be shipped with different memory types / sizes
installed.  Here you can find the real end of the memory only by
verifying that there is no memory any more "above" the expected
range, sothe test range must be (at least) twice as big. [And things
get funny when you _do_ find memory at such locations - then again
it's time to call the hardware guys to fixz the board.]

> I have 32 bits ARM platform, with addressable RAM area 1GiB

Not all memory controllers are so primitive.

> Ok, I will remove the call of the code on my board, even if over arm platform use this function in the same context.

You should keep the functionality, but move it to where it belongs,
i. e. to the SPL running from OCM.

> But it is the case that you indicate it is not working:
> 	max test size should be chosen at least twice as big as the maximum possible memory configuration.

But when we tested it all the many times in the past, no such
problem could be found.  And the supposed "fix" caused boards to
hang/crash.

> 	When no overlap is detected, return maxsize but don't restore *base

get_ram_size() is supposed to be non-destructive, i. e. _all_ memory
locations should be the same before and after running this code. In
our tests this was the case (but I admit that this has not been
tested for a long time - probably not since the last discussion
about this "fix").

Best regards,

Wolfgang Denk
Patrick DELAUNAY Dec. 13, 2017, 1:45 p.m. UTC | #6
Dear Wolfgang,

> From: Wolfgang Denk [mailto:wd@denx.de]

> 

> Dear Patrick,

> 

> In message <885aaa3abdfe440591ea271f92ab42ff@SFHDAG6NODE3.st.com>

> you wrote:

>>

> > But it is recommended in ./doc/README.arm-relocation:

> 

> 

> Where do  you read that this allows runing the code from the same memory

> which is sized/tested?  I cannot see any such claim.  It is always supposed that

> you run from a totally different, independent memory type - in the olden days

> usually parallel NOR flash, nor some on-chip-memory, but never ever the RAM

> you are testing!


Fine, so I misunderstood the sentence.
 
> > And before to setup by board I check on many device with ARM architecture.

> > Most the time U-Boot loaded at beginning of the RAM, relocated at the

> > end of the RAM and dram_init use get_ram_size

> 

> But definitely not in this order!  The sequence should always be:

> initialize the RAM and determine it's size; load U-Boot and relocate it to the

> (now known) end of RAM (resp. to some lower address, depending on any

> reservations made for shared display buffer, shared log buffer, PRAM, etc.).


OK.
I will modified my board to have
- SPL => get_ram_size on DDR / chacke size of DDR is correct (value from DTB) 
               + load U-boot at beginning of DDR
- U-Boot => relocated at the end of the DDR (get_ram_size no more use), size found in DTB

> > Example1 :

> >   ./board/toradex/colibri_imx6/colibri_imx6.c

> 

> What do you mean to show by this?  It just means that dram_init() must be

> running from some other memory (OCM), too.


I thought that it is an example of ARM platform which do the same sequence 
=>  dram_init is called in U-Boot before relocation (common/board_f.c), 
      it is using get_ram_size() and U-Boot link address (CONFIG_SYS_TEXT_BASE = 0x17800000) is the same value than
      the base address used in get_ram_size
      (CONFIG_SYS_SDRAM_BASE = PHYS_SDRAM = MMDC0_ARB_BASE_ADDR = 0x10000000)

But perhaps I make a error in the code analysis (I have no other board),
 sorry  for that....
I will no more argue with other ARM target, aas I accept that use get_ram_size in DDR is a really bad idea.

> > So I just do the same for my arm/armv7 platform.

> 

> No, apparently not.


So I will remove it
 
> > On ARM device device, I think we don't have the banks...

> 

> So you must run this (typically int he SPL) fom some on-chip memory.


Yes

 
> > Ok, I will remove the call of the code on my board, even if over arm platform

> use this function in the same context.

> 

> You should keep the functionality, but move it to where it belongs, i. e. to the

> SPL running from OCM.

> 


I remove it in U-Boot and I call it only in SPL,
executed in onchip RAM, juste after DDR controller initalisation

So for my board, I have no more issue with the function.

> get_ram_size() is supposed to be non-destructive, i. e. _all_ memory locations

> should be the same before and after running this code. In our tests this was the

> case (but I admit that this has not been tested for a long time - probably not

> since the last discussion about this "fix").


For the last case (return (maxsize)),
when no overlap is detect, the content of base is not restored.
So for me, the function get_ram_size is not safe for the DDR content.

Do you think, that I need to continue with the patchset,
(I need a v2 to remove the uncessary restore in the second loop after code analysis)
even it is no more usefull for me.

Or do you prefer that I drop the patchset to avoid regression ?

> Best regards,

> 

> Wolfgang Denk

> 


Best Regards

Patrick Delaunay
Wolfgang Denk Dec. 14, 2017, 11:30 a.m. UTC | #7
Dear Patrick,

In message <10532397af3d416f9a1b30f0b09a94f1@SFHDAG6NODE3.st.com> you wrote:
> 
> > You should keep the functionality, but move it to where it belongs, i. e. to the
> > SPL running from OCM.
> 
> I remove it in U-Boot and I call it only in SPL,
> executed in onchip RAM, juste after DDR controller initalisation
> 
> So for my board, I have no more issue with the function.

Can you please do me the favour and verify that with this change no
memmory corruption / modification happens?

> For the last case (return (maxsize)),
> when no overlap is detect, the content of base is not restored.
> So for me, the function get_ram_size is not safe for the DDR content.

What exactly do you mean by "overlap" here?

> Do you think, that I need to continue with the patchset,
> (I need a v2 to remove the uncessary restore in the second loop after code analysis)
> even it is no more usefull for me.
> 
> Or do you prefer that I drop the patchset to avoid regression ?

I would really like to finally find an explanation why this code is
(or has been) working well on so many boards, while on a few boards
this problem gets reported - and the supposed fix (which looks
reasonable when lookingon it) breaks other boards.

So I would really appreciate if you could continue to work on this.

Thanks!

Best regards,

Wolfgang Denk
Patrick DELAUNAY Dec. 14, 2017, 1:17 p.m. UTC | #8
Hi Wolfgang,

> -----Original Message-----

> From: Wolfgang Denk [mailto:wd@denx.de]

> 

> Dear Patrick,

> 

> In message <10532397af3d416f9a1b30f0b09a94f1@SFHDAG6NODE3.st.com>

> you wrote:

> >

> > > You should keep the functionality, but move it to where it belongs,

> > > i. e. to the SPL running from OCM.

> >

> > I remove it in U-Boot and I call it only in SPL, executed in onchip

> > RAM, juste after DDR controller initalisation

> >

> > So for my board, I have no more issue with the function.

> 

> Can you please do me the favour and verify that with this change no memmory

> corruption / modification happens?


OK I will check, I will indicated the test done in V2 comment.
And potentially I will check if I can implement a test in sandbox.

> 

> > For the last case (return (maxsize)),

> > when no overlap is detect, the content of base is not restored.

> > So for me, the function get_ram_size is not safe for the DDR content.

> 

> What exactly do you mean by "overlap" here?


Perhaps I don't use the correct term...
When the higher bit of address are not used physical for ram access 
(when size is lower than the max size)
access to  2 address access to the same physical
=> I use overlap to indicate this case

example for physical size limited to 128MB = bit 0 to 26 mapped to the memory, bit 27 used
access to 0x0000000 => physical access to 0x00000000
acesss to 0x7FFFFFFF => physical access to 0x7FFFFFFF
access to 0x8000000 => physical access to 0x00000000 overlap detected here by the code !


> > Do you think, that I need to continue with the patchset, (I need a v2

> > to remove the uncessary restore in the second loop after code

> > analysis) even it is no more usefull for me.

> >

> > Or do you prefer that I drop the patchset to avoid regression ?

> 

> I would really like to finally find an explanation why this code is (or has been)

> working well on so many boards, while on a few boards this problem gets

> reported - and the supposed fix (which looks reasonable when lookingon it)

> breaks other boards.

> 

> So I would really appreciate if you could continue to work on this.


Ok, I will do it but only in background...
I will propose a V2 and I will crosscheck with previously broken board if it is still the case.

> 

> Thanks!

> 

> Best regards,

> 

> Wolfgang Denk

> 


Best Regard,

Patrick Delaunay
Wolfgang Denk Dec. 21, 2017, 9:14 a.m. UTC | #9
Dear Patrick,

In message <94c2391ea4e943fb9f65e7b0943c9a96@SFHDAG6NODE3.st.com> you wrote:
> 
> example for physical size limited to 128MB = bit 0 to 26 mapped to the memory, bit 27 used
> access to 0x0000000 => physical access to 0x00000000
> acesss to 0x7FFFFFFF => physical access to 0x7FFFFFFF
> access to 0x8000000 => physical access to 0x00000000 overlap detected here by the code !

Correct - this is the way how we detect the real memory size of
devices which can have miltiple configurations - the memory
controller gets programmed for maxium size, and then we run
get_ram_size() (eventually repeatedly for differept possible
row/column combinations).

> Ok, I will do it but only in background...
> I will propose a V2 and I will crosscheck with previously broken board if it is still the case.

Thanks a lot.  Please ping me (if needed PM or on jabber) if I
should not react.

Best regards,

Wolfgang Denk
Patrick DELAUNAY Jan. 25, 2018, 5:06 p.m. UTC | #10
Hi Wolfgang,

> From: Wolfgang Denk [mailto:wd@denx.de]

> Subject: Re: [U-Boot] [PATCH] common/memsize.c: restore content of the base

> 

> Dear Patrick,

> 

> In message <94c2391ea4e943fb9f65e7b0943c9a96@SFHDAG6NODE3.st.com>

> you wrote:

> 

> > Ok, I will do it but only in background...

> > I will propose a V2 and I will crosscheck with previously broken board if it is still

> the case.

> 

> Thanks a lot.  Please ping me (if needed PM or on jabber) if I should not react.


Sorry for the delay  (Christmas break and other project priority) but I have finally sent a V2 version of the patch.
I split the patchset in 2 to better understood the possible regression.

Don't hesitate to ask me if I need to ask some board maintainer to test these 2 patches...

> Best regards,

> 

> Wolfgang Denk


Best regards

Patrick
diff mbox series

Patch

diff --git a/common/memsize.c b/common/memsize.c
index 0fb9ba5..d632293 100644
--- a/common/memsize.c
+++ b/common/memsize.c
@@ -27,7 +27,8 @@  DECLARE_GLOBAL_DATA_PTR;
 long get_ram_size(long *base, long maxsize)
 {
 	volatile long *addr;
-	long           save[32];
+	long           save[31];
+	long           save_base;
 	long           cnt;
 	long           val;
 	long           size;
@@ -43,7 +44,7 @@  long get_ram_size(long *base, long maxsize)
 
 	addr = base;
 	sync();
-	save[i] = *addr;
+	save_base = *addr;
 	sync();
 	*addr = 0;
 
@@ -51,7 +52,7 @@  long get_ram_size(long *base, long maxsize)
 	if ((val = *addr) != 0) {
 		/* Restore the original data before leaving the function. */
 		sync();
-		*addr = save[i];
+		*base = save_base;
 		for (cnt = 1; cnt < maxsize / sizeof(long); cnt <<= 1) {
 			addr  = base + cnt;
 			sync();
@@ -76,9 +77,11 @@  long get_ram_size(long *base, long maxsize)
 				addr  = base + cnt;
 				*addr = save[--i];
 			}
+			*base = save_base;
 			return (size);
 		}
 	}
+	*base = save_base;
 
 	return (maxsize);
 }