diff mbox

[U-Boot] kbuild: move ARCH, CPU, etc. to top Makefile to fix random build error

Message ID 1427803341-28286-1-git-send-email-yamada.masahiro@socionext.com
State Rejected
Delegated to: Tom Rini
Headers show

Commit Message

Masahiro Yamada March 31, 2015, 12:02 p.m. UTC
Since the Kconfig conversion, some developers have reported that
Kbuild sometimes fails completely at random.  According to the error
reports, it seems to occur for any target board, but only on very
fast computers.

The log message for the fail case is like this:

  make[1]: *** No rule to make target `../arch//cpu/u-boot.lds',
  needed by `u-boot.lds'.  Stop.

It looks like the top config.mk has not been included for *some*
reason, and $(ARCH) has been left blank.

I suspect "autoconf_is_current" is not working in some situation.

This commit moves the definition of ARCH, CPU, SOC, etc. to the
top Makefile, so they are surely set.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reported-by: Tom Rini <trini@konsulko.com>>
Reported-by: York Sun <yorksun@freescale.com>
Reported-by: Stephen Warren <swarren@nvidia.com>
---

Sorry for leaving this problem so long.

I have never been able to reproduce this bug on my computer,
so I am not sure this patch can fix the problem.
I wrote this patch based on my guess.
(I just tested this patch has no bad impact.)

Tom, York, Stephen,

Could you test this patch fixes the problem?


 Makefile                  |  8 ++++++++
 arch/arm/config.mk        |  7 +++++++
 config.mk                 | 33 ++-------------------------------
 scripts/Makefile.autoconf |  8 ++++++++
 4 files changed, 25 insertions(+), 31 deletions(-)

Comments

York Sun March 31, 2015, 3:37 p.m. UTC | #1
On 03/31/2015 05:02 AM, Masahiro Yamada wrote:
> Since the Kconfig conversion, some developers have reported that
> Kbuild sometimes fails completely at random.  According to the error
> reports, it seems to occur for any target board, but only on very
> fast computers.
> 
> The log message for the fail case is like this:
> 
>   make[1]: *** No rule to make target `../arch//cpu/u-boot.lds',
>   needed by `u-boot.lds'.  Stop.
> 
> It looks like the top config.mk has not been included for *some*
> reason, and $(ARCH) has been left blank.
> 
> I suspect "autoconf_is_current" is not working in some situation.
> 
> This commit moves the definition of ARCH, CPU, SOC, etc. to the
> top Makefile, so they are surely set.
> 
> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
> Reported-by: Tom Rini <trini@konsulko.com>>
> Reported-by: York Sun <yorksun@freescale.com>
> Reported-by: Stephen Warren <swarren@nvidia.com>
> ---
> 
> Sorry for leaving this problem so long.
> 
> I have never been able to reproduce this bug on my computer,
> so I am not sure this patch can fix the problem.
> I wrote this patch based on my guess.
> (I just tested this patch has no bad impact.)
> 
> Tom, York, Stephen,
> 
> Could you test this patch fixes the problem?
> 

Yes I will try. It seems to related to specific OS. I have seen it on old CentOS
(fast machine though) when running by script. But I haven't been able to
reproduce it reliably.

York
Stephen Warren March 31, 2015, 3:45 p.m. UTC | #2
On 03/31/2015 06:02 AM, Masahiro Yamada wrote:
> Since the Kconfig conversion, some developers have reported that
> Kbuild sometimes fails completely at random.  According to the error
> reports, it seems to occur for any target board, but only on very
> fast computers.
>
> The log message for the fail case is like this:
>
>    make[1]: *** No rule to make target `../arch//cpu/u-boot.lds',
>    needed by `u-boot.lds'.  Stop.
>
> It looks like the top config.mk has not been included for *some*
> reason, and $(ARCH) has been left blank.
>
> I suspect "autoconf_is_current" is not working in some situation.
>
> This commit moves the definition of ARCH, CPU, SOC, etc. to the
> top Makefile, so they are surely set.

This doesn't solve the issue for me, although it improves the chance of 
success and changes the error I get.

With and without this patch, I ran:

CROSS_COMPILE=arm-linux-gnueabi- ./MAKEALL rpi_2

... until whichever of success/fail had the fewest "hits" had at least 5 
"hits".

Without this patch, the build failed 6 out of 11 runs with:

======================================================================
boards.cfg is up to date. Nothing to do.
Building rpi_2 board...
arm-linux-gnueabi-size: './u-boot': No such file
make: *** No rule to make target `arch//cpu/u-boot.lds', needed by 
`u-boot.lds'.  Stop.
make: *** Waiting for unfinished jobs....

--------------------- SUMMARY ----------------------------
Boards compiled: 1
Boards with warnings but no errors: 1 ( rpi_2 )
----------------------------------------------------------
======================================================================

With this patch, the build failed 5 out of 28 runs with:

======================================================================
boards.cfg is up to date. Nothing to do.
Building rpi_2 board...
arm-linux-gnueabi-size: './u-boot': No such file
make: *** [prepare1] Error 1
   Your architecture does not support generic board.
   Please undefine CONFIG_SYS_GENERIC_BOARD in your board config file.
make: *** [prepare1] Error 1

--------------------- SUMMARY ----------------------------
Boards compiled: 1
Boards with errors: 1 ( rpi_2 )
----------------------------------------------------------
======================================================================

This feels more like a missing dependency or similar timing issue than 
an issue with the order of variable setup in the makefile.

I tried to repro this on a couple of VM services (Linode 8192, and EC2 
c3.2xlarge) with the idea you could debug on those systems, but could 
not repro on those. Unfortunately the machine I have which repros the 
issue is on my work network, so I can't give you access.

One comment on the patch below:

> diff --git a/Makefile b/Makefile

> +ARCH		= $(CONFIG_SYS_ARCH:"%"=%)
> +CPU		= $(CONFIG_SYS_CPU:"%"=%)
> +SOC		= $(CONFIG_SYS_SOC:"%"=%)
> +VENDOR		= $(CONFIG_SYS_VENDOR:"%"=%)
> +BOARD		= $(CONFIG_SYS_BOARD:"%"=%)
> +CPUDIR		= arch/$(ARCH)/cpu$(if $(CPU),/$(CPU),)
> +BOARDDIR	= $(if $(BOARD),$(if $(VENDOR),$(VENDOR)/)$(BOARD))

> diff --git a/scripts/Makefile.autoconf b/scripts/Makefile.autoconf

> +ARCH		= $(CONFIG_SYS_ARCH:"%"=%)
> +CPU		= $(CONFIG_SYS_CPU:"%"=%)
> +SOC		= $(CONFIG_SYS_SOC:"%"=%)
> +VENDOR		= $(CONFIG_SYS_VENDOR:"%"=%)
> +BOARD		= $(CONFIG_SYS_BOARD:"%"=%)
> +CPUDIR		= arch/$(ARCH)/cpu$(if $(CPU),/$(CPU),)
> +BOARDDIR	= $(if $(BOARD),$(if $(VENDOR),$(VENDOR)))

Duplicating that doesn't seem like a good idea. Could it at least be 
included from a shared foo.mk file?
York Sun March 31, 2015, 3:57 p.m. UTC | #3
On 03/31/2015 08:45 AM, Stephen Warren wrote:
> On 03/31/2015 06:02 AM, Masahiro Yamada wrote:
>> Since the Kconfig conversion, some developers have reported that
>> Kbuild sometimes fails completely at random.  According to the error
>> reports, it seems to occur for any target board, but only on very
>> fast computers.
>>
>> The log message for the fail case is like this:
>>
>>    make[1]: *** No rule to make target `../arch//cpu/u-boot.lds',
>>    needed by `u-boot.lds'.  Stop.
>>
>> It looks like the top config.mk has not been included for *some*
>> reason, and $(ARCH) has been left blank.
>>
>> I suspect "autoconf_is_current" is not working in some situation.
>>
>> This commit moves the definition of ARCH, CPU, SOC, etc. to the
>> top Makefile, so they are surely set.
> 
> This doesn't solve the issue for me, although it improves the chance of 
> success and changes the error I get.
> 
> With and without this patch, I ran:
> 
> CROSS_COMPILE=arm-linux-gnueabi- ./MAKEALL rpi_2
> 
> ... until whichever of success/fail had the fewest "hits" had at least 5 
> "hits".
> 
> Without this patch, the build failed 6 out of 11 runs with:
> 

Stephen is so fast, saved me time to setup and test.

Let me know if I need to run test from my side.

York
Stephen Warren April 1, 2015, 10:33 p.m. UTC | #4
On 03/31/2015 06:02 AM, Masahiro Yamada wrote:
> Since the Kconfig conversion, some developers have reported that
> Kbuild sometimes fails completely at random.  According to the error
> reports, it seems to occur for any target board, but only on very
> fast computers.
>
> The log message for the fail case is like this:
>
>    make[1]: *** No rule to make target `../arch//cpu/u-boot.lds',
>    needed by `u-boot.lds'.  Stop.
>
> It looks like the top config.mk has not been included for *some*
> reason, and $(ARCH) has been left blank.
>
> I suspect "autoconf_is_current" is not working in some situation.

That's certainly true. The following code doesn't end up including 
config.mk in the bad case:

> # We want to include arch/$(ARCH)/config.mk only when include/config/auto.conf
> # is up-to-date. When we switch to a different board configuration, old CONFIG
> # macros are still remaining in include/config/auto.conf. Without the following
> # gimmick, wrong config.mk would be included leading nasty warnings/errors.
> autoconf_is_current := $(if $(wildcard $(KCONFIG_CONFIG)),$(shell find . \
> 		-path ./include/config/auto.conf -newer $(KCONFIG_CONFIG)))
> ifneq ($(autoconf_is_current),)
> include $(srctree)/config.mk
> include $(srctree)/arch/$(ARCH)/Makefile
> endif

That's because:

> [swarren@swarren-lx1 tegra-uboot-flasher]$ ls -l --full-time u-boot-*/{.config,include/config/auto.conf}|cat
> -rw-rw-r-- 1 swarren swarren 9219 2015-04-01 15:50:08.000000000 -0600 u-boot-bad/.config
> -rw-rw-r-- 1 swarren swarren  928 2015-04-01 15:50:08.000000000 -0600 u-boot-bad/include/config/auto.conf
> -rw-rw-r-- 1 swarren swarren 9219 2015-04-01 15:51:25.000000000 -0600 u-boot-ok/.config
> -rw-rw-r-- 1 swarren swarren  928 2015-04-01 15:51:26.000000000 -0600 u-boot-ok/include/config/auto.conf

In the bad case, the timestamps are equal (and hence the -newer check 
fails), whereas in the good case they're different. Recall ext* 
filesystems have a 1s timestamp resolution.

Note that this is state left over from "make xxx_defconfig"; If I just 
manually run "make all" in a tree in this state, it'll stay in this 
state forever, and vice-versa for a working tree.

I expect that simulating this condition with some judicious manually 
executed touch commands would be extremely easy.

Possible solutions are:

Is there a -newer-or-equal that could be used in the find command rather 
than -newer?

When running "make xxx_defconfig", can the code there compare the 
timestamp of those two files, and keep looping and touching the 
auto.conf file until the timestamps differ.

Something else entirely? I couldn't see anything relating to 
autoconf_is_current in the kernel's makefiles.
Masahiro Yamada April 3, 2015, 3:54 a.m. UTC | #5
Hi Stephen,


Thanks for your analisys!

2015-04-02 7:33 GMT+09:00 Stephen Warren <swarren@wwwdotorg.org>:
> On 03/31/2015 06:02 AM, Masahiro Yamada wrote:
>>
>> Since the Kconfig conversion, some developers have reported that
>> Kbuild sometimes fails completely at random.  According to the error
>> reports, it seems to occur for any target board, but only on very
>> fast computers.
>>
>> The log message for the fail case is like this:
>>
>>    make[1]: *** No rule to make target `../arch//cpu/u-boot.lds',
>>    needed by `u-boot.lds'.  Stop.
>>
>> It looks like the top config.mk has not been included for *some*
>> reason, and $(ARCH) has been left blank.
>>
>> I suspect "autoconf_is_current" is not working in some situation.
>
>
> That's certainly true. The following code doesn't end up including config.mk
> in the bad case:
>
>> # We want to include arch/$(ARCH)/config.mk only when
>> include/config/auto.conf
>> # is up-to-date. When we switch to a different board configuration, old
>> CONFIG
>> # macros are still remaining in include/config/auto.conf. Without the
>> following
>> # gimmick, wrong config.mk would be included leading nasty
>> warnings/errors.
>> autoconf_is_current := $(if $(wildcard $(KCONFIG_CONFIG)),$(shell find . \
>>                 -path ./include/config/auto.conf -newer
>> $(KCONFIG_CONFIG)))
>> ifneq ($(autoconf_is_current),)
>> include $(srctree)/config.mk
>> include $(srctree)/arch/$(ARCH)/Makefile
>> endif
>
>
> That's because:
>
>> [swarren@swarren-lx1 tegra-uboot-flasher]$ ls -l --full-time
>> u-boot-*/{.config,include/config/auto.conf}|cat
>> -rw-rw-r-- 1 swarren swarren 9219 2015-04-01 15:50:08.000000000 -0600
>> u-boot-bad/.config
>> -rw-rw-r-- 1 swarren swarren  928 2015-04-01 15:50:08.000000000 -0600
>> u-boot-bad/include/config/auto.conf
>> -rw-rw-r-- 1 swarren swarren 9219 2015-04-01 15:51:25.000000000 -0600
>> u-boot-ok/.config
>> -rw-rw-r-- 1 swarren swarren  928 2015-04-01 15:51:26.000000000 -0600
>> u-boot-ok/include/config/auto.conf
>
>
> In the bad case, the timestamps are equal (and hence the -newer check
> fails), whereas in the good case they're different. Recall ext* filesystems
> have a 1s timestamp resolution.


It makes sense that this problem happens more frequently on faster computers.


> Note that this is state left over from "make xxx_defconfig"; If I just
> manually run "make all" in a tree in this state, it'll stay in this state
> forever, and vice-versa for a working tree.
>
> I expect that simulating this condition with some judicious manually
> executed touch commands would be extremely easy.
>
> Possible solutions are:
>
> Is there a -newer-or-equal that could be used in the find command rather
> than -newer?

I checked the "man find", but I could only find a -newer option,
but it works enough for our purpose.

Please check if this patch solves the problem:
http://patchwork.ozlabs.org/patch/457838/


> When running "make xxx_defconfig", can the code there compare the timestamp
> of those two files, and keep looping and touching the auto.conf file until
> the timestamps differ.
>
> Something else entirely? I couldn't see anything relating to
> autoconf_is_current in the kernel's makefiles.


Right. The kernel does not need this hack.

When we build the kernel, ARCH= is given from the command line,
i.e. the correct arch/${ARCH}/Makefile is always included.
(In other words, it is users who are responsible
for passing the correct ARCH from the command line.)

For U-boot, on the other hand, ARCH is decided by the board configuration.
(It was done by mkconfig+boards.cfg, and it is by Kconfig now.)

If we build ARM board after PowerPC board, for example,
the .config points to ARM, but include/config/auto.conf to PowerPC,
it causes a wrong  arch/${ARCH}/{config.mk,Makefile} to be included at
the first run.

I know the autoconf_is_current is an ugly hack,
but it requires more changes to the build system to rip it off.
diff mbox

Patch

diff --git a/Makefile b/Makefile
index 0d160c9..10af554 100644
--- a/Makefile
+++ b/Makefile
@@ -236,6 +236,14 @@  export	HOSTARCH HOSTOS
 
 #########################################################################
 
+ARCH		= $(CONFIG_SYS_ARCH:"%"=%)
+CPU		= $(CONFIG_SYS_CPU:"%"=%)
+SOC		= $(CONFIG_SYS_SOC:"%"=%)
+VENDOR		= $(CONFIG_SYS_VENDOR:"%"=%)
+BOARD		= $(CONFIG_SYS_BOARD:"%"=%)
+CPUDIR		= arch/$(ARCH)/cpu$(if $(CPU),/$(CPU),)
+BOARDDIR	= $(if $(BOARD),$(if $(VENDOR),$(VENDOR)/)$(BOARD))
+
 # set default to nothing for native builds
 ifeq ($(HOSTARCH),$(ARCH))
 CROSS_COMPILE ?=
diff --git a/arch/arm/config.mk b/arch/arm/config.mk
index c005ce4..2400c16 100644
--- a/arch/arm/config.mk
+++ b/arch/arm/config.mk
@@ -13,6 +13,13 @@  CONFIG_STANDALONE_LOAD_ADDR = 0xc100000
 endif
 endif
 
+ifdef CONFIG_SPL_BUILD
+ifdef CONFIG_TEGRA
+CPU := arm720t
+CPUDIR := arch/arm/cpu/arm720t
+endif
+endif
+
 LDFLAGS_FINAL += --gc-sections
 PLATFORM_RELFLAGS += -ffunction-sections -fdata-sections \
 		     -fno-common -ffixed-r9
diff --git a/config.mk b/config.mk
index 6282919..61bf255 100644
--- a/config.mk
+++ b/config.mk
@@ -18,45 +18,16 @@  PLATFORM_LDFLAGS :=
 LDFLAGS :=
 LDFLAGS_FINAL :=
 OBJCOPYFLAGS :=
-# clear VENDOR for tcsh
-VENDOR :=
 #########################################################################
 
-ARCH := $(CONFIG_SYS_ARCH:"%"=%)
-CPU := $(CONFIG_SYS_CPU:"%"=%)
-ifdef CONFIG_SPL_BUILD
-ifdef CONFIG_TEGRA
-CPU := arm720t
-endif
-endif
-BOARD := $(CONFIG_SYS_BOARD:"%"=%)
-ifneq ($(CONFIG_SYS_VENDOR),)
-VENDOR := $(CONFIG_SYS_VENDOR:"%"=%)
-endif
-ifneq ($(CONFIG_SYS_SOC),)
-SOC := $(CONFIG_SYS_SOC:"%"=%)
-endif
-
-# Some architecture config.mk files need to know what CPUDIR is set to,
-# so calculate CPUDIR before including ARCH/SOC/CPU config.mk files.
-# Check if arch/$ARCH/cpu/$CPU exists, otherwise assume arch/$ARCH/cpu contains
-# CPU-specific code.
-CPUDIR=arch/$(ARCH)/cpu$(if $(CPU),/$(CPU),)
-
 sinclude $(srctree)/arch/$(ARCH)/config.mk	# include architecture dependend rules
 sinclude $(srctree)/$(CPUDIR)/config.mk		# include  CPU	specific rules
 
-ifdef	SOC
+ifneq ($(SOC),)
 sinclude $(srctree)/$(CPUDIR)/$(SOC)/config.mk	# include  SoC	specific rules
 endif
+
 ifneq ($(BOARD),)
-ifdef	VENDOR
-BOARDDIR = $(VENDOR)/$(BOARD)
-else
-BOARDDIR = $(BOARD)
-endif
-endif
-ifdef	BOARD
 sinclude $(srctree)/board/$(BOARDDIR)/config.mk	# include board specific rules
 endif
 
diff --git a/scripts/Makefile.autoconf b/scripts/Makefile.autoconf
index f054081..66bc1ea 100644
--- a/scripts/Makefile.autoconf
+++ b/scripts/Makefile.autoconf
@@ -27,6 +27,14 @@  include scripts/Kbuild.include
 CC		= $(CROSS_COMPILE)gcc
 CPP		= $(CC) -E
 
+ARCH		= $(CONFIG_SYS_ARCH:"%"=%)
+CPU		= $(CONFIG_SYS_CPU:"%"=%)
+SOC		= $(CONFIG_SYS_SOC:"%"=%)
+VENDOR		= $(CONFIG_SYS_VENDOR:"%"=%)
+BOARD		= $(CONFIG_SYS_BOARD:"%"=%)
+CPUDIR		= arch/$(ARCH)/cpu$(if $(CPU),/$(CPU),)
+BOARDDIR	= $(if $(BOARD),$(if $(VENDOR),$(VENDOR)))
+
 include config.mk
 
 UBOOTINCLUDE    := \