diff mbox series

kconfig: default to zero if int/hex symbol lacks default property

Message ID 20240505135353.1507200-1-yoann.congal@smile.fr
State Deferred
Delegated to: Tom Rini
Headers show
Series kconfig: default to zero if int/hex symbol lacks default property | expand

Commit Message

Yoann Congal May 5, 2024, 1:53 p.m. UTC
From: Masahiro Yamada <masahiroy@kernel.org>

This is a cherry-pick from the kernel commit:
6262afa10ef7c (kconfig: default to zero if int/hex symbol lacks default property, 2023-11-26)

When a default property is missing in an int or hex symbol, it defaults
to an empty string, which is not a valid symbol value.

It results in an incorrect .config, and can also lead to an infinite
loop in scripting.

Use "0" for int and "0x0" for hex as a default value.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Yoann Congal <yoann.congal@smile.fr>

Signed-off-by: Yoann Congal <yoann.congal@smile.fr>
---
Added context that was not in the upstream commit:
The infinite loop case happens with a configuration defined like this
(a hex config without a valid default value):
  config HEX_TEST
  	hex "Hex config without default"

And using:
  $ make oldconfig < /dev/null
  scripts/kconfig/conf  --oldconfig Kconfig
  *
  * General setup
  *

  Error in reading or end of file.

  Error in reading or end of file.
  Hex config without default (HEX_TEST) [] (NEW)

  Error in reading or end of file.
  Hex config without default (HEX_TEST) [] (NEW)
  # This loops forever

NB: Scripted config manipulation often call make with /dev/null as
stdin (Yocto recipe, CI build, ...)

This was discovered when working on Yocto bug:
https://bugzilla.yoctoproject.org/show_bug.cgi?id=14136
---
 scripts/kconfig/symbol.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

Comments

Tom Rini May 6, 2024, 5:43 p.m. UTC | #1
On Sun, May 05, 2024 at 03:53:53PM +0200, Yoann Congal wrote:

> From: Masahiro Yamada <masahiroy@kernel.org>
> 
> This is a cherry-pick from the kernel commit:
> 6262afa10ef7c (kconfig: default to zero if int/hex symbol lacks default property, 2023-11-26)
> 
> When a default property is missing in an int or hex symbol, it defaults
> to an empty string, which is not a valid symbol value.
> 
> It results in an incorrect .config, and can also lead to an infinite
> loop in scripting.
> 
> Use "0" for int and "0x0" for hex as a default value.
> 
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> Reviewed-by: Yoann Congal <yoann.congal@smile.fr>
> 
> Signed-off-by: Yoann Congal <yoann.congal@smile.fr>
> ---
> Added context that was not in the upstream commit:
> The infinite loop case happens with a configuration defined like this
> (a hex config without a valid default value):
>   config HEX_TEST
>   	hex "Hex config without default"
> 
> And using:
>   $ make oldconfig < /dev/null
>   scripts/kconfig/conf  --oldconfig Kconfig
>   *
>   * General setup
>   *
> 
>   Error in reading or end of file.
> 
>   Error in reading or end of file.
>   Hex config without default (HEX_TEST) [] (NEW)
> 
>   Error in reading or end of file.
>   Hex config without default (HEX_TEST) [] (NEW)
>   # This loops forever
> 
> NB: Scripted config manipulation often call make with /dev/null as
> stdin (Yocto recipe, CI build, ...)
> 
> This was discovered when working on Yocto bug:
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14136

I'm surprised this was accepted. In the past I've wanted to avoid this
kind of change in Kconfig because while the empty string can be easily
be checked in the code as "user didn't really configure this, do
nothing" a value of zero is a valid option in these cases and so then in
the code we need a bool symbol to decide if the hex/int symbol is set or
not.

Today this is less of an issue than it used to be in U-Boot with
everything CONFIG-related migrated to Kconfig and so there's no longer
the question of if we missed migrating a file that defined the value but
there's still places we have in the code where hex symbol is undefined
is not the same thing as hex symbol is 0x0.

Is there a specific use case you have for this in U-Boot? It's been a
while, but it's also been cases of newly introduced symbols in Kconfig
files with incorrect dependencies, where the infinite loop in kconfig
happened, CI failed and we caught the problem.
Yoann Congal May 7, 2024, 7:17 a.m. UTC | #2
Le 06/05/2024 à 19:43, Tom Rini a écrit :
> On Sun, May 05, 2024 at 03:53:53PM +0200, Yoann Congal wrote:
> 
>> From: Masahiro Yamada <masahiroy@kernel.org>
>>
>> This is a cherry-pick from the kernel commit:
>> 6262afa10ef7c (kconfig: default to zero if int/hex symbol lacks default property, 2023-11-26)
>>
>> When a default property is missing in an int or hex symbol, it defaults
>> to an empty string, which is not a valid symbol value.
>>
>> It results in an incorrect .config, and can also lead to an infinite
>> loop in scripting.
>>
>> Use "0" for int and "0x0" for hex as a default value.
>>
>> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
>> Reviewed-by: Yoann Congal <yoann.congal@smile.fr>
>>
>> Signed-off-by: Yoann Congal <yoann.congal@smile.fr>
>> ---
>> Added context that was not in the upstream commit:
>> The infinite loop case happens with a configuration defined like this
>> (a hex config without a valid default value):
>>   config HEX_TEST
>>   	hex "Hex config without default"
>>
>> And using:
>>   $ make oldconfig < /dev/null
>>   scripts/kconfig/conf  --oldconfig Kconfig
>>   *
>>   * General setup
>>   *
>>
>>   Error in reading or end of file.
>>
>>   Error in reading or end of file.
>>   Hex config without default (HEX_TEST) [] (NEW)
>>
>>   Error in reading or end of file.
>>   Hex config without default (HEX_TEST) [] (NEW)
>>   # This loops forever
>>
>> NB: Scripted config manipulation often call make with /dev/null as
>> stdin (Yocto recipe, CI build, ...)
>>
>> This was discovered when working on Yocto bug:
>> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14136

Hi Tom,

> I'm surprised this was accepted. In the past I've wanted to avoid this
> kind of change in Kconfig because while the empty string can be easily
> be checked in the code as "user didn't really configure this, do
> nothing" a value of zero is a valid option in these cases and so then in
> the code we need a bool symbol to decide if the hex/int symbol is set or
> not.

For context (and what it's worth), before this patch was merged, I've tried to fix this problem with a patch of my own which only exited the config process when the infinite loop would start:
"[PATCH v4] kconfig: avoid an infinite loop in oldconfig/syncconfig" [0]
... the v5 version was a bit more severe and exited as soon as an error was hit, it was then removed from -next because it triggered a build failure [1].

> Today this is less of an issue than it used to be in U-Boot with
> everything CONFIG-related migrated to Kconfig and so there's no longer
> the question of if we missed migrating a file that defined the value but
> there's still places we have in the code where hex symbol is undefined
> is not the same thing as hex symbol is 0x0.
> 
> Is there a specific use case you have for this in U-Boot? It's been a
> while, but it's also been cases of newly introduced symbols in Kconfig
> files with incorrect dependencies, where the infinite loop in kconfig
> happened, CI failed and we caught the problem.

For a specific example, one can trigger this with CMD_PSTORE_MEM_ADDR like this (tested on today's master):
  make qemu_arm64_defconfig
  make menuconfig
   # activate CONFIG_CMD_PSTORE=y but forget to fill CMD_PSTORE_MEM_ADDR
  make < /dev/null # This emulates a Yocto/scripted/CI build and loops infinitely

But, my more general use case is the Yocto dev trying to change the U-Boot (or any kconfig based software) config and accidentally trigger this.
In Yocto, what they will see is a do_configure task taking a very long time but they won't see the looping logs the detect the problem (This is caught later by filing the RAM and the build process is killed by OOM, or the disk run out of space filed multi-GB log files).

I'm sadly aware that defining a default value for this kind of config (fixed addresses) is not really sensible but the build time infinite loop is not good either.

Thanks!

Regards,

[0]: https://lore.kernel.org/lkml/20231031111647.111093-1-yoann.congal@smile.fr/
[1]: https://lore.kernel.org/lkml/20231107210004.GA2065849@dev-arch.thelio-3990X/
Tom Rini May 7, 2024, 5:23 p.m. UTC | #3
On Tue, May 07, 2024 at 09:17:46AM +0200, Yoann Congal wrote:
> Le 06/05/2024 à 19:43, Tom Rini a écrit :
> > On Sun, May 05, 2024 at 03:53:53PM +0200, Yoann Congal wrote:
> > 
> >> From: Masahiro Yamada <masahiroy@kernel.org>
> >>
> >> This is a cherry-pick from the kernel commit:
> >> 6262afa10ef7c (kconfig: default to zero if int/hex symbol lacks default property, 2023-11-26)
> >>
> >> When a default property is missing in an int or hex symbol, it defaults
> >> to an empty string, which is not a valid symbol value.
> >>
> >> It results in an incorrect .config, and can also lead to an infinite
> >> loop in scripting.
> >>
> >> Use "0" for int and "0x0" for hex as a default value.
> >>
> >> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> >> Reviewed-by: Yoann Congal <yoann.congal@smile.fr>
> >>
> >> Signed-off-by: Yoann Congal <yoann.congal@smile.fr>
> >> ---
> >> Added context that was not in the upstream commit:
> >> The infinite loop case happens with a configuration defined like this
> >> (a hex config without a valid default value):
> >>   config HEX_TEST
> >>   	hex "Hex config without default"
> >>
> >> And using:
> >>   $ make oldconfig < /dev/null
> >>   scripts/kconfig/conf  --oldconfig Kconfig
> >>   *
> >>   * General setup
> >>   *
> >>
> >>   Error in reading or end of file.
> >>
> >>   Error in reading or end of file.
> >>   Hex config without default (HEX_TEST) [] (NEW)
> >>
> >>   Error in reading or end of file.
> >>   Hex config without default (HEX_TEST) [] (NEW)
> >>   # This loops forever
> >>
> >> NB: Scripted config manipulation often call make with /dev/null as
> >> stdin (Yocto recipe, CI build, ...)
> >>
> >> This was discovered when working on Yocto bug:
> >> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14136
> 
> Hi Tom,
> 
> > I'm surprised this was accepted. In the past I've wanted to avoid this
> > kind of change in Kconfig because while the empty string can be easily
> > be checked in the code as "user didn't really configure this, do
> > nothing" a value of zero is a valid option in these cases and so then in
> > the code we need a bool symbol to decide if the hex/int symbol is set or
> > not.
> 
> For context (and what it's worth), before this patch was merged, I've tried to fix this problem with a patch of my own which only exited the config process when the infinite loop would start:
> "[PATCH v4] kconfig: avoid an infinite loop in oldconfig/syncconfig" [0]
> ... the v5 version was a bit more severe and exited as soon as an error was hit, it was then removed from -next because it triggered a build failure [1].
> 
> > Today this is less of an issue than it used to be in U-Boot with
> > everything CONFIG-related migrated to Kconfig and so there's no longer
> > the question of if we missed migrating a file that defined the value but
> > there's still places we have in the code where hex symbol is undefined
> > is not the same thing as hex symbol is 0x0.
> > 
> > Is there a specific use case you have for this in U-Boot? It's been a
> > while, but it's also been cases of newly introduced symbols in Kconfig
> > files with incorrect dependencies, where the infinite loop in kconfig
> > happened, CI failed and we caught the problem.
> 
> For a specific example, one can trigger this with CMD_PSTORE_MEM_ADDR like this (tested on today's master):
>   make qemu_arm64_defconfig
>   make menuconfig
>    # activate CONFIG_CMD_PSTORE=y but forget to fill CMD_PSTORE_MEM_ADDR
>   make < /dev/null # This emulates a Yocto/scripted/CI build and loops infinitely
> 
> But, my more general use case is the Yocto dev trying to change the U-Boot (or any kconfig based software) config and accidentally trigger this.
> In Yocto, what they will see is a do_configure task taking a very long time but they won't see the looping logs the detect the problem (This is caught later by filing the RAM and the build process is killed by OOM, or the disk run out of space filed multi-GB log files).
> 
> I'm sadly aware that defining a default value for this kind of config (fixed addresses) is not really sensible but the build time infinite loop is not good either.

Exactly, but to me it's worse to cover up the build issue and introduce
a runtime issue instead. Something builds but then crashes at run time
is worse to me than builds fail / get stuck. Using your example,
CMD_PSTORE_MEM_ADDR depends on CMD_PSTORE and so if someone enables
CMD_PSTORE in a config fragment but doesn't define the address, the
build should not complete. Using 0x0 means that oops, now it fails to
work and might not be obvious why either.

I won't revert this change when in the future we're able to sync up with
the kernel again, but I'm not eager to bring this in by itself as it
feels like the wrong way to deal with the problem. Thanks for explaining
the history here.
Yoann Congal May 13, 2024, 8:02 a.m. UTC | #4
Le 07/05/2024 à 19:23, Tom Rini a écrit :
> On Tue, May 07, 2024 at 09:17:46AM +0200, Yoann Congal wrote:
>> Le 06/05/2024 à 19:43, Tom Rini a écrit :
>>> On Sun, May 05, 2024 at 03:53:53PM +0200, Yoann Congal wrote:
>>>
>>>> From: Masahiro Yamada <masahiroy@kernel.org>
>>>>
>>>> This is a cherry-pick from the kernel commit:
>>>> 6262afa10ef7c (kconfig: default to zero if int/hex symbol lacks default property, 2023-11-26)
>>>>
>>>> When a default property is missing in an int or hex symbol, it defaults
>>>> to an empty string, which is not a valid symbol value.
>>>>
>>>> It results in an incorrect .config, and can also lead to an infinite
>>>> loop in scripting.
>>>>
>>>> Use "0" for int and "0x0" for hex as a default value.
>>>>
>>>> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
>>>> Reviewed-by: Yoann Congal <yoann.congal@smile.fr>
>>>>
>>>> Signed-off-by: Yoann Congal <yoann.congal@smile.fr>
>>>> ---
>>>> Added context that was not in the upstream commit:
>>>> The infinite loop case happens with a configuration defined like this
>>>> (a hex config without a valid default value):
>>>>   config HEX_TEST
>>>>   	hex "Hex config without default"
>>>>
>>>> And using:
>>>>   $ make oldconfig < /dev/null
>>>>   scripts/kconfig/conf  --oldconfig Kconfig
>>>>   *
>>>>   * General setup
>>>>   *
>>>>
>>>>   Error in reading or end of file.
>>>>
>>>>   Error in reading or end of file.
>>>>   Hex config without default (HEX_TEST) [] (NEW)
>>>>
>>>>   Error in reading or end of file.
>>>>   Hex config without default (HEX_TEST) [] (NEW)
>>>>   # This loops forever
>>>>
>>>> NB: Scripted config manipulation often call make with /dev/null as
>>>> stdin (Yocto recipe, CI build, ...)
>>>>
>>>> This was discovered when working on Yocto bug:
>>>> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14136
>>
>> Hi Tom,
>>
>>> I'm surprised this was accepted. In the past I've wanted to avoid this
>>> kind of change in Kconfig because while the empty string can be easily
>>> be checked in the code as "user didn't really configure this, do
>>> nothing" a value of zero is a valid option in these cases and so then in
>>> the code we need a bool symbol to decide if the hex/int symbol is set or
>>> not.
>>
>> For context (and what it's worth), before this patch was merged, I've tried to fix this problem with a patch of my own which only exited the config process when the infinite loop would start:
>> "[PATCH v4] kconfig: avoid an infinite loop in oldconfig/syncconfig" [0]
>> ... the v5 version was a bit more severe and exited as soon as an error was hit, it was then removed from -next because it triggered a build failure [1].
>>
>>> Today this is less of an issue than it used to be in U-Boot with
>>> everything CONFIG-related migrated to Kconfig and so there's no longer
>>> the question of if we missed migrating a file that defined the value but
>>> there's still places we have in the code where hex symbol is undefined
>>> is not the same thing as hex symbol is 0x0.
>>>
>>> Is there a specific use case you have for this in U-Boot? It's been a
>>> while, but it's also been cases of newly introduced symbols in Kconfig
>>> files with incorrect dependencies, where the infinite loop in kconfig
>>> happened, CI failed and we caught the problem.
>>
>> For a specific example, one can trigger this with CMD_PSTORE_MEM_ADDR like this (tested on today's master):
>>   make qemu_arm64_defconfig
>>   make menuconfig
>>    # activate CONFIG_CMD_PSTORE=y but forget to fill CMD_PSTORE_MEM_ADDR
>>   make < /dev/null # This emulates a Yocto/scripted/CI build and loops infinitely
>>
>> But, my more general use case is the Yocto dev trying to change the U-Boot (or any kconfig based software) config and accidentally trigger this.
>> In Yocto, what they will see is a do_configure task taking a very long time but they won't see the looping logs the detect the problem (This is caught later by filing the RAM and the build process is killed by OOM, or the disk run out of space filed multi-GB log files).
>>
>> I'm sadly aware that defining a default value for this kind of config (fixed addresses) is not really sensible but the build time infinite loop is not good either.
> 
> Exactly, but to me it's worse to cover up the build issue and introduce
> a runtime issue instead. Something builds but then crashes at run time
> is worse to me than builds fail / get stuck. Using your example,
> CMD_PSTORE_MEM_ADDR depends on CMD_PSTORE and so if someone enables
> CMD_PSTORE in a config fragment but doesn't define the address, the
> build should not complete. Using 0x0 means that oops, now it fails to
> work and might not be obvious why either.
> 
> I won't revert this change when in the future we're able to sync up with
> the kernel again, but I'm not eager to bring this in by itself as it
> feels like the wrong way to deal with the problem. Thanks for explaining
> the history here.

Ok, I understand your position (I agree that runtime failure is bad)

Thanks for looking at this. I need to find another solution.

Regards,
Rasmus Villemoes May 13, 2024, 8:28 a.m. UTC | #5
On 07/05/2024 19.23, Tom Rini wrote:
> On Tue, May 07, 2024 at 09:17:46AM +0200, Yoann Congal wrote:
>> Le 06/05/2024 à 19:43, Tom Rini a écrit :
>>> On Sun, May 05, 2024 at 03:53:53PM +0200, Yoann Congal wrote:
>>>
>>>> From: Masahiro Yamada <masahiroy@kernel.org>
>>>>
>>>> This is a cherry-pick from the kernel commit:
>>>> 6262afa10ef7c (kconfig: default to zero if int/hex symbol lacks default property, 2023-11-26)
>>>>
>>>> When a default property is missing in an int or hex symbol, it defaults
>>>> to an empty string, which is not a valid symbol value.
>>>>
>>>> It results in an incorrect .config, and can also lead to an infinite
>>>> loop in scripting.
>>>>
>>>> Use "0" for int and "0x0" for hex as a default value.
>>>>
>>>> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
>>>> Reviewed-by: Yoann Congal <yoann.congal@smile.fr>
>>>>
>>>> Signed-off-by: Yoann Congal <yoann.congal@smile.fr>
>>>> ---
>>>> Added context that was not in the upstream commit:
>>>> The infinite loop case happens with a configuration defined like this
>>>> (a hex config without a valid default value):
>>>>   config HEX_TEST
>>>>   	hex "Hex config without default"
>>>>
>>>> And using:
>>>>   $ make oldconfig < /dev/null
>>>>   scripts/kconfig/conf  --oldconfig Kconfig
>>>>   *
>>>>   * General setup
>>>>   *
>>>>
>>>>   Error in reading or end of file.
>>>>
>>>>   Error in reading or end of file.
>>>>   Hex config without default (HEX_TEST) [] (NEW)
>>>>
>>>>   Error in reading or end of file.
>>>>   Hex config without default (HEX_TEST) [] (NEW)
>>>>   # This loops forever
>>>>
>>>> NB: Scripted config manipulation often call make with /dev/null as
>>>> stdin (Yocto recipe, CI build, ...)
>>>>
>>>> This was discovered when working on Yocto bug:
>>>> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14136
>>
>> Hi Tom,
>>
>>> I'm surprised this was accepted. In the past I've wanted to avoid this
>>> kind of change in Kconfig because while the empty string can be easily
>>> be checked in the code as "user didn't really configure this, do
>>> nothing" a value of zero is a valid option in these cases and so then in
>>> the code we need a bool symbol to decide if the hex/int symbol is set or
>>> not.
>>
>> For context (and what it's worth), before this patch was merged, I've tried to fix this problem with a patch of my own which only exited the config process when the infinite loop would start:
>> "[PATCH v4] kconfig: avoid an infinite loop in oldconfig/syncconfig" [0]
>> ... the v5 version was a bit more severe and exited as soon as an error was hit, it was then removed from -next because it triggered a build failure [1].
>>
>>> Today this is less of an issue than it used to be in U-Boot with
>>> everything CONFIG-related migrated to Kconfig and so there's no longer
>>> the question of if we missed migrating a file that defined the value but
>>> there's still places we have in the code where hex symbol is undefined
>>> is not the same thing as hex symbol is 0x0.
>>>
>>> Is there a specific use case you have for this in U-Boot? It's been a
>>> while, but it's also been cases of newly introduced symbols in Kconfig
>>> files with incorrect dependencies, where the infinite loop in kconfig
>>> happened, CI failed and we caught the problem.
>>
>> For a specific example, one can trigger this with CMD_PSTORE_MEM_ADDR like this (tested on today's master):
>>   make qemu_arm64_defconfig
>>   make menuconfig
>>    # activate CONFIG_CMD_PSTORE=y but forget to fill CMD_PSTORE_MEM_ADDR
>>   make < /dev/null # This emulates a Yocto/scripted/CI build and loops infinitely
>>
>> But, my more general use case is the Yocto dev trying to change the U-Boot (or any kconfig based software) config and accidentally trigger this.
>> In Yocto, what they will see is a do_configure task taking a very long time but they won't see the looping logs the detect the problem (This is caught later by filing the RAM and the build process is killed by OOM, or the disk run out of space filed multi-GB log files).
>>
>> I'm sadly aware that defining a default value for this kind of config (fixed addresses) is not really sensible but the build time infinite loop is not good either.
> 
> Exactly, but to me it's worse to cover up the build issue and introduce
> a runtime issue instead. Something builds but then crashes at run time
> is worse to me than builds fail / get stuck. Using your example,
> CMD_PSTORE_MEM_ADDR depends on CMD_PSTORE and so if someone enables
> CMD_PSTORE in a config fragment but doesn't define the address, the
> build should not complete. Using 0x0 means that oops, now it fails to
> work and might not be obvious why either.

I agree with Tom. Yes, we've also hit those yocto failures with a
multi-GB do_configure log file, and that's pretty annoying when one just
starts a build cooking over night and then the next day one's laptop has
a filled disk. So clearly there is/was a problem to solve.

However, blindly adding a "default default" of 0 is not the right thing
to do. It seems much better that, when there's no configured default and
kconfig gets EOF when asking the user for input (as in the 'make
oldconfig < /dev/null' case), kconfig should exit with an error
explicitly saying "no value given for CONFIG_XYZ which also doesn't have
a default value". That stops the build, and leaves precisely an
indication what the problem is. Then the developer must either provide a
value for CONFIG_XYZ in the fragments, or send a patch to add a
reasonable default when that is at all possible (not so for e.g. PSTORE).

Rasmus
Yoann Congal May 13, 2024, 9:03 a.m. UTC | #6
Le 13/05/2024 à 10:28, Rasmus Villemoes a écrit :
> On 07/05/2024 19.23, Tom Rini wrote:
>> On Tue, May 07, 2024 at 09:17:46AM +0200, Yoann Congal wrote:
>>> Le 06/05/2024 à 19:43, Tom Rini a écrit :
>>>> On Sun, May 05, 2024 at 03:53:53PM +0200, Yoann Congal wrote:
>>>>
>>>>> From: Masahiro Yamada <masahiroy@kernel.org>
>>>>>
>>>>> This is a cherry-pick from the kernel commit:
>>>>> 6262afa10ef7c (kconfig: default to zero if int/hex symbol lacks default property, 2023-11-26)
>>>>>
>>>>> When a default property is missing in an int or hex symbol, it defaults
>>>>> to an empty string, which is not a valid symbol value.
>>>>>
>>>>> It results in an incorrect .config, and can also lead to an infinite
>>>>> loop in scripting.
>>>>>
>>>>> Use "0" for int and "0x0" for hex as a default value.
>>>>>
>>>>> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
>>>>> Reviewed-by: Yoann Congal <yoann.congal@smile.fr>
>>>>>
>>>>> Signed-off-by: Yoann Congal <yoann.congal@smile.fr>
>>>>> ---
>>>>> Added context that was not in the upstream commit:
>>>>> The infinite loop case happens with a configuration defined like this
>>>>> (a hex config without a valid default value):
>>>>>   config HEX_TEST
>>>>>   	hex "Hex config without default"
>>>>>
>>>>> And using:
>>>>>   $ make oldconfig < /dev/null
>>>>>   scripts/kconfig/conf  --oldconfig Kconfig
>>>>>   *
>>>>>   * General setup
>>>>>   *
>>>>>
>>>>>   Error in reading or end of file.
>>>>>
>>>>>   Error in reading or end of file.
>>>>>   Hex config without default (HEX_TEST) [] (NEW)
>>>>>
>>>>>   Error in reading or end of file.
>>>>>   Hex config without default (HEX_TEST) [] (NEW)
>>>>>   # This loops forever
>>>>>
>>>>> NB: Scripted config manipulation often call make with /dev/null as
>>>>> stdin (Yocto recipe, CI build, ...)
>>>>>
>>>>> This was discovered when working on Yocto bug:
>>>>> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14136
>>>
>>> Hi Tom,
>>>
>>>> I'm surprised this was accepted. In the past I've wanted to avoid this
>>>> kind of change in Kconfig because while the empty string can be easily
>>>> be checked in the code as "user didn't really configure this, do
>>>> nothing" a value of zero is a valid option in these cases and so then in
>>>> the code we need a bool symbol to decide if the hex/int symbol is set or
>>>> not.
>>>
>>> For context (and what it's worth), before this patch was merged, I've tried to fix this problem with a patch of my own which only exited the config process when the infinite loop would start:
>>> "[PATCH v4] kconfig: avoid an infinite loop in oldconfig/syncconfig" [0]
>>> ... the v5 version was a bit more severe and exited as soon as an error was hit, it was then removed from -next because it triggered a build failure [1].
>>>
>>>> Today this is less of an issue than it used to be in U-Boot with
>>>> everything CONFIG-related migrated to Kconfig and so there's no longer
>>>> the question of if we missed migrating a file that defined the value but
>>>> there's still places we have in the code where hex symbol is undefined
>>>> is not the same thing as hex symbol is 0x0.
>>>>
>>>> Is there a specific use case you have for this in U-Boot? It's been a
>>>> while, but it's also been cases of newly introduced symbols in Kconfig
>>>> files with incorrect dependencies, where the infinite loop in kconfig
>>>> happened, CI failed and we caught the problem.
>>>
>>> For a specific example, one can trigger this with CMD_PSTORE_MEM_ADDR like this (tested on today's master):
>>>   make qemu_arm64_defconfig
>>>   make menuconfig
>>>    # activate CONFIG_CMD_PSTORE=y but forget to fill CMD_PSTORE_MEM_ADDR
>>>   make < /dev/null # This emulates a Yocto/scripted/CI build and loops infinitely
>>>
>>> But, my more general use case is the Yocto dev trying to change the U-Boot (or any kconfig based software) config and accidentally trigger this.
>>> In Yocto, what they will see is a do_configure task taking a very long time but they won't see the looping logs the detect the problem (This is caught later by filing the RAM and the build process is killed by OOM, or the disk run out of space filed multi-GB log files).
>>>
>>> I'm sadly aware that defining a default value for this kind of config (fixed addresses) is not really sensible but the build time infinite loop is not good either.
>>
>> Exactly, but to me it's worse to cover up the build issue and introduce
>> a runtime issue instead. Something builds but then crashes at run time
>> is worse to me than builds fail / get stuck. Using your example,
>> CMD_PSTORE_MEM_ADDR depends on CMD_PSTORE and so if someone enables
>> CMD_PSTORE in a config fragment but doesn't define the address, the
>> build should not complete. Using 0x0 means that oops, now it fails to
>> work and might not be obvious why either.
> 
> I agree with Tom. Yes, we've also hit those yocto failures with a
> multi-GB do_configure log file, and that's pretty annoying when one just
> starts a build cooking over night and then the next day one's laptop has
> a filled disk. So clearly there is/was a problem to solve.
> 
> However, blindly adding a "default default" of 0 is not the right thing
> to do. It seems much better that, when there's no configured default and
> kconfig gets EOF when asking the user for input (as in the 'make
> oldconfig < /dev/null' case), kconfig should exit with an error
> explicitly saying "no value given for CONFIG_XYZ which also doesn't have
> a default value". That stops the build, and leaves precisely an
> indication what the problem is. Then the developer must either provide a
> value for CONFIG_XYZ in the fragments, or send a patch to add a
> reasonable default when that is at all possible (not so for e.g. PSTORE).


Agreed, that was the spirit of my first attempt at a fix[0] which was then made broader[1] and finally rejected from "-next" because it triggered a build failure[2].

I still think my first approach (exit with an error instead of starting the infinite loop) was the right one (at least for U-Boot). The problem is that this approach was rejected by the Linux kconfig maintainer (Masahiro Yamada CC'd). I'm willing to clean it up for U-boot but this would diverge from linux kconfig... Is that acceptable?

[0]: https://lore.kernel.org/all/20231031111647.111093-1-yoann.congal@smile.fr/
[1]: https://lore.kernel.org/all/20231104222715.3967791-1-yoann.congal@smile.fr/
[2]: https://lore.kernel.org/all/20231107210004.GA2065849@dev-arch.thelio-3990X/

Regards,
Rasmus Villemoes May 13, 2024, 9:07 a.m. UTC | #7
On 13/05/2024 10.28, Rasmus Villemoes wrote:
> On 07/05/2024 19.23, Tom Rini wrote:
>> On Tue, May 07, 2024 at 09:17:46AM +0200, Yoann Congal wrote:
>>> Le 06/05/2024 à 19:43, Tom Rini a écrit :
>>>> On Sun, May 05, 2024 at 03:53:53PM +0200, Yoann Congal wrote:
>>>>
>>>>> From: Masahiro Yamada <masahiroy@kernel.org>
>>>>>
>>>>> This is a cherry-pick from the kernel commit:
>>>>> 6262afa10ef7c (kconfig: default to zero if int/hex symbol lacks default property, 2023-11-26)
>>>>>
>>>>> When a default property is missing in an int or hex symbol, it defaults
>>>>> to an empty string, which is not a valid symbol value.
>>>>>
>>>>> It results in an incorrect .config, and can also lead to an infinite
>>>>> loop in scripting.
>>>>>
>>>>> Use "0" for int and "0x0" for hex as a default value.
>>>>>
>>>>> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
>>>>> Reviewed-by: Yoann Congal <yoann.congal@smile.fr>
>>>>>
>>>>> Signed-off-by: Yoann Congal <yoann.congal@smile.fr>
>>>>> ---
>>>>> Added context that was not in the upstream commit:
>>>>> The infinite loop case happens with a configuration defined like this
>>>>> (a hex config without a valid default value):
>>>>>   config HEX_TEST
>>>>>   	hex "Hex config without default"
>>>>>
>>>>> And using:
>>>>>   $ make oldconfig < /dev/null
>>>>>   scripts/kconfig/conf  --oldconfig Kconfig
>>>>>   *
>>>>>   * General setup
>>>>>   *
>>>>>
>>>>>   Error in reading or end of file.
>>>>>
>>>>>   Error in reading or end of file.
>>>>>   Hex config without default (HEX_TEST) [] (NEW)
>>>>>
>>>>>   Error in reading or end of file.
>>>>>   Hex config without default (HEX_TEST) [] (NEW)
>>>>>   # This loops forever
>>>>>
>>>>> NB: Scripted config manipulation often call make with /dev/null as
>>>>> stdin (Yocto recipe, CI build, ...)
>>>>>
>>>>> This was discovered when working on Yocto bug:
>>>>> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14136
>>>
>>> Hi Tom,
>>>
>>>> I'm surprised this was accepted. In the past I've wanted to avoid this
>>>> kind of change in Kconfig because while the empty string can be easily
>>>> be checked in the code as "user didn't really configure this, do
>>>> nothing" a value of zero is a valid option in these cases and so then in
>>>> the code we need a bool symbol to decide if the hex/int symbol is set or
>>>> not.
>>>
>>> For context (and what it's worth), before this patch was merged, I've tried to fix this problem with a patch of my own which only exited the config process when the infinite loop would start:
>>> "[PATCH v4] kconfig: avoid an infinite loop in oldconfig/syncconfig" [0]
>>> ... the v5 version was a bit more severe and exited as soon as an error was hit, it was then removed from -next because it triggered a build failure [1].
>>>
>>>> Today this is less of an issue than it used to be in U-Boot with
>>>> everything CONFIG-related migrated to Kconfig and so there's no longer
>>>> the question of if we missed migrating a file that defined the value but
>>>> there's still places we have in the code where hex symbol is undefined
>>>> is not the same thing as hex symbol is 0x0.
>>>>
>>>> Is there a specific use case you have for this in U-Boot? It's been a
>>>> while, but it's also been cases of newly introduced symbols in Kconfig
>>>> files with incorrect dependencies, where the infinite loop in kconfig
>>>> happened, CI failed and we caught the problem.
>>>
>>> For a specific example, one can trigger this with CMD_PSTORE_MEM_ADDR like this (tested on today's master):
>>>   make qemu_arm64_defconfig
>>>   make menuconfig
>>>    # activate CONFIG_CMD_PSTORE=y but forget to fill CMD_PSTORE_MEM_ADDR
>>>   make < /dev/null # This emulates a Yocto/scripted/CI build and loops infinitely
>>>
>>> But, my more general use case is the Yocto dev trying to change the U-Boot (or any kconfig based software) config and accidentally trigger this.
>>> In Yocto, what they will see is a do_configure task taking a very long time but they won't see the looping logs the detect the problem (This is caught later by filing the RAM and the build process is killed by OOM, or the disk run out of space filed multi-GB log files).
>>>
>>> I'm sadly aware that defining a default value for this kind of config (fixed addresses) is not really sensible but the build time infinite loop is not good either.
>>
>> Exactly, but to me it's worse to cover up the build issue and introduce
>> a runtime issue instead. Something builds but then crashes at run time
>> is worse to me than builds fail / get stuck. Using your example,
>> CMD_PSTORE_MEM_ADDR depends on CMD_PSTORE and so if someone enables
>> CMD_PSTORE in a config fragment but doesn't define the address, the
>> build should not complete. Using 0x0 means that oops, now it fails to
>> work and might not be obvious why either.
> 
> I agree with Tom. Yes, we've also hit those yocto failures with a
> multi-GB do_configure log file, and that's pretty annoying when one just
> starts a build cooking over night and then the next day one's laptop has
> a filled disk. So clearly there is/was a problem to solve.
> 
> However, blindly adding a "default default" of 0 is not the right thing
> to do. It seems much better that, when there's no configured default and
> kconfig gets EOF when asking the user for input (as in the 'make
> oldconfig < /dev/null' case), kconfig should exit with an error
> explicitly saying "no value given for CONFIG_XYZ which also doesn't have
> a default value". That stops the build, and leaves precisely an
> indication what the problem is. 

For my own curiosity I tried to find where the infinite loop is, and I
think it's the while(1) in conf_string(). Not tested at all, but perhaps
something like

diff --git a/scripts/kconfig/conf.c b/scripts/kconfig/conf.c
index 965bb40c50e5..604b55124318 100644
--- a/scripts/kconfig/conf.c
+++ b/scripts/kconfig/conf.c
@@ -355,8 +355,7 @@ static int conf_string(struct menu *menu)
                printf("%*s%s ", indent - 1, "", menu->prompt->text);
                printf("(%s) ", sym->name);
                def = sym_get_string_value(sym);
-               if (def)
-                       printf("[%s] ", def);
+               printf("[%s] ", def ?: "no default");
                if (!conf_askvalue(sym, def))
                        return 0;
                switch (line[0]) {
@@ -376,6 +375,11 @@ static int conf_string(struct menu *menu)
                }
                if (def && sym_set_string_value(sym, def))
                        return 0;
+               if (!def) {
+                       fprintf(stderr, "A value for %s must be
provided\n", sym->name);
+                       if (feof(stdin))
+                               exit(1);
+               }
        }
 }


would be an improvement. There really is no point in looping forever
when there's nothing more to get from stdin.

Rasmus
Rasmus Villemoes May 13, 2024, 9:45 a.m. UTC | #8
On 13/05/2024 11.03, Yoann Congal wrote:
> 
> 
> Le 13/05/2024 à 10:28, Rasmus Villemoes a écrit :
>> On 07/05/2024 19.23, Tom Rini wrote:
>>> On Tue, May 07, 2024 at 09:17:46AM +0200, Yoann Congal wrote:

>>>> But, my more general use case is the Yocto dev trying to change the U-Boot (or any kconfig based software) config and accidentally trigger this.
>>>> In Yocto, what they will see is a do_configure task taking a very long time but they won't see the looping logs the detect the problem (This is caught later by filing the RAM and the build process is killed by OOM, or the disk run out of space filed multi-GB log files).
>>>>
>>>> I'm sadly aware that defining a default value for this kind of config (fixed addresses) is not really sensible but the build time infinite loop is not good either.
>>>
>>> Exactly, but to me it's worse to cover up the build issue and introduce
>>> a runtime issue instead. Something builds but then crashes at run time
>>> is worse to me than builds fail / get stuck. Using your example,
>>> CMD_PSTORE_MEM_ADDR depends on CMD_PSTORE and so if someone enables
>>> CMD_PSTORE in a config fragment but doesn't define the address, the
>>> build should not complete. Using 0x0 means that oops, now it fails to
>>> work and might not be obvious why either.
>>
>> I agree with Tom. Yes, we've also hit those yocto failures with a
>> multi-GB do_configure log file, and that's pretty annoying when one just
>> starts a build cooking over night and then the next day one's laptop has
>> a filled disk. So clearly there is/was a problem to solve.
>>
>> However, blindly adding a "default default" of 0 is not the right thing
>> to do. It seems much better that, when there's no configured default and
>> kconfig gets EOF when asking the user for input (as in the 'make
>> oldconfig < /dev/null' case), kconfig should exit with an error
>> explicitly saying "no value given for CONFIG_XYZ which also doesn't have
>> a default value". That stops the build, and leaves precisely an
>> indication what the problem is. Then the developer must either provide a
>> value for CONFIG_XYZ in the fragments, or send a patch to add a
>> reasonable default when that is at all possible (not so for e.g. PSTORE).
> 
> 
> Agreed, that was the spirit of my first attempt at a fix[0] which was then made broader[1] and finally rejected from "-next" because it triggered a build failure[2].
> 
> I still think my first approach (exit with an error instead of starting the infinite loop) was the right one (at least for U-Boot). The problem is that this approach was rejected by the Linux kconfig maintainer (Masahiro Yamada CC'd). I'm willing to clean it up for U-boot but this would diverge from linux kconfig... Is that acceptable?
> 
> [0]: https://lore.kernel.org/all/20231031111647.111093-1-yoann.congal@smile.fr/
> [1]: https://lore.kernel.org/all/20231104222715.3967791-1-yoann.congal@smile.fr/
> [2]: https://lore.kernel.org/all/20231107210004.GA2065849@dev-arch.thelio-3990X/

Thanks for the pointers to the previous discussions. I see that we
arrived at more or less the same patch in at least one of your versions.

Masahiro, can you reconsider that "default default". Even if "most
int/hex items" in linux usually have a default, one could still hit
cases where they don't (usually some arch/soc/board specific physical
address, e.g. for an early console, but naturally a bootloader is more
prone to have those kinds of kconfig options). And if it really is not a
problem for linux itself, the "default default" is a no-op for linux,
but does cause unwanted behaviour for downstream users of kconfig.

In
https://lore.kernel.org/lkml/CAK7LNAS8a=8n9r7kQrLTPpKwqXG1d1sd0WjJ8PQhOXHXxnSyNQ@mail.gmail.com/,
which was a reply to a patch almost identical to the one I arrived at,
you said

  It is strange (and consistent) to bail out only for particular types.

[assume you meant inconsistent], but AFAICT, bool/tristate values always
have (and always have had) a default, so this isn't really inconsistent
as the bail out is only applicable to int/hex types (and maybe string?
dunno if they default default to "").

Rasmus
diff mbox series

Patch

diff --git a/scripts/kconfig/symbol.c b/scripts/kconfig/symbol.c
index 703b9b899ee..b4c51de3f20 100644
--- a/scripts/kconfig/symbol.c
+++ b/scripts/kconfig/symbol.c
@@ -343,7 +343,11 @@  void sym_calc_value(struct symbol *sym)
 
 	switch (sym->type) {
 	case S_INT:
+		newval.val = "0";
+		break;
 	case S_HEX:
+		newval.val = "0x0";
+		break;
 	case S_STRING:
 		newval = symbol_empty.curr;
 		break;
@@ -753,15 +757,17 @@  const char *sym_get_string_default(struct symbol *sym)
 		case yes: return "y";
 		}
 	case S_INT:
+		if (!str[0])
+			str = "0";
+		break;
 	case S_HEX:
-		return str;
-	case S_STRING:
-		return str;
-	case S_OTHER:
-	case S_UNKNOWN:
+		if (!str[0])
+			str = "0x0";
+		break;
+	default:
 		break;
 	}
-	return "";
+	return str;
 }
 
 const char *sym_get_string_value(struct symbol *sym)