diff mbox series

[V2] spi: Update speed/mode on change

Message ID 20210226142124.56305-1-marex@denx.de
State New
Delegated to: Tom Rini
Headers show
Series [V2] spi: Update speed/mode on change | expand

Commit Message

Marek Vasut Feb. 26, 2021, 2:21 p.m. UTC
The spi_get_bus_and_cs() may be called on the same bus and chipselect
with different frequency or mode. This is valid usecase, but the code
fails to notify the controller of such a configuration change. Call
spi_set_speed_mode() in case bus frequency or bus mode changed to let
the controller update the configuration.

The problem can easily be triggered using the sspi command:
=> sspi 0:0@1000
=> sspi 0:0@2000
Without this patch, both transfers happen at 1000 Hz. With this patch,
the later transfer happens correctly at 2000 Hz.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Jagan Teki <jagan@amarulasolutions.com>
Cc: Patrick Delaunay <patrick.delaunay@st.com>
---
V2: - Use bus_data->speed for the check
    - Release the bus on error
---
 drivers/spi/spi-uclass.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Marek Vasut March 25, 2021, 5:41 p.m. UTC | #1
On 2/26/21 3:21 PM, Marek Vasut wrote:
> The spi_get_bus_and_cs() may be called on the same bus and chipselect
> with different frequency or mode. This is valid usecase, but the code
> fails to notify the controller of such a configuration change. Call
> spi_set_speed_mode() in case bus frequency or bus mode changed to let
> the controller update the configuration.
> 
> The problem can easily be triggered using the sspi command:
> => sspi 0:0@1000
> => sspi 0:0@2000
> Without this patch, both transfers happen at 1000 Hz. With this patch,
> the later transfer happens correctly at 2000 Hz.
> 
> Signed-off-by: Marek Vasut <marex@denx.de>
> Cc: Jagan Teki <jagan@amarulasolutions.com>
> Cc: Patrick Delaunay <patrick.delaunay@st.com>

This bugfix is still missing from upstream.
Tom Rini March 28, 2021, 9:30 p.m. UTC | #2
On Fri, Feb 26, 2021 at 03:21:24PM +0100, Marek Vasut wrote:

> The spi_get_bus_and_cs() may be called on the same bus and chipselect
> with different frequency or mode. This is valid usecase, but the code
> fails to notify the controller of such a configuration change. Call
> spi_set_speed_mode() in case bus frequency or bus mode changed to let
> the controller update the configuration.
> 
> The problem can easily be triggered using the sspi command:
> => sspi 0:0@1000
> => sspi 0:0@2000
> Without this patch, both transfers happen at 1000 Hz. With this patch,
> the later transfer happens correctly at 2000 Hz.
> 
> Signed-off-by: Marek Vasut <marex@denx.de>
> Cc: Jagan Teki <jagan@amarulasolutions.com>
> Cc: Patrick Delaunay <patrick.delaunay@st.com>

So, very reliably I can make:
https://source.denx.de/u-boot/u-boot/-/jobs/245517
happen locally as well building with clang.  It's not obvious to me why
the test now fails however.
Marek Vasut March 29, 2021, 1:32 a.m. UTC | #3
On 3/28/21 11:30 PM, Tom Rini wrote:
> On Fri, Feb 26, 2021 at 03:21:24PM +0100, Marek Vasut wrote:
> 
>> The spi_get_bus_and_cs() may be called on the same bus and chipselect
>> with different frequency or mode. This is valid usecase, but the code
>> fails to notify the controller of such a configuration change. Call
>> spi_set_speed_mode() in case bus frequency or bus mode changed to let
>> the controller update the configuration.
>>
>> The problem can easily be triggered using the sspi command:
>> => sspi 0:0@1000
>> => sspi 0:0@2000
>> Without this patch, both transfers happen at 1000 Hz. With this patch,
>> the later transfer happens correctly at 2000 Hz.
>>
>> Signed-off-by: Marek Vasut <marex@denx.de>
>> Cc: Jagan Teki <jagan@amarulasolutions.com>
>> Cc: Patrick Delaunay <patrick.delaunay@st.com>
> 
> So, very reliably I can make:
> https://source.denx.de/u-boot/u-boot/-/jobs/245517
> happen locally as well building with clang.  It's not obvious to me why
> the test now fails however.

Can you please be more specific / clear ? I have no idea what those 300 
lines of cryptic output mean, nor what are you trying to say by the 
above, sorry.
Tom Rini March 29, 2021, 1:59 a.m. UTC | #4
On Mon, Mar 29, 2021 at 03:32:51AM +0200, Marek Vasut wrote:
> On 3/28/21 11:30 PM, Tom Rini wrote:
> > On Fri, Feb 26, 2021 at 03:21:24PM +0100, Marek Vasut wrote:
> > 
> > > The spi_get_bus_and_cs() may be called on the same bus and chipselect
> > > with different frequency or mode. This is valid usecase, but the code
> > > fails to notify the controller of such a configuration change. Call
> > > spi_set_speed_mode() in case bus frequency or bus mode changed to let
> > > the controller update the configuration.
> > > 
> > > The problem can easily be triggered using the sspi command:
> > > => sspi 0:0@1000
> > > => sspi 0:0@2000
> > > Without this patch, both transfers happen at 1000 Hz. With this patch,
> > > the later transfer happens correctly at 2000 Hz.
> > > 
> > > Signed-off-by: Marek Vasut <marex@denx.de>
> > > Cc: Jagan Teki <jagan@amarulasolutions.com>
> > > Cc: Patrick Delaunay <patrick.delaunay@st.com>
> > 
> > So, very reliably I can make:
> > https://source.denx.de/u-boot/u-boot/-/jobs/245517
> > happen locally as well building with clang.  It's not obvious to me why
> > the test now fails however.
> 
> Can you please be more specific / clear ? I have no idea what those 300
> lines of cryptic output mean, nor what are you trying to say by the above,
> sorry.

If you build with clang, for sandbox, and run the tests, U-Boot crashes
in the unit tests that you start with "ut dm".
Marek Vasut March 29, 2021, 2:09 a.m. UTC | #5
On 3/29/21 3:59 AM, Tom Rini wrote:
> On Mon, Mar 29, 2021 at 03:32:51AM +0200, Marek Vasut wrote:
>> On 3/28/21 11:30 PM, Tom Rini wrote:
>>> On Fri, Feb 26, 2021 at 03:21:24PM +0100, Marek Vasut wrote:
>>>
>>>> The spi_get_bus_and_cs() may be called on the same bus and chipselect
>>>> with different frequency or mode. This is valid usecase, but the code
>>>> fails to notify the controller of such a configuration change. Call
>>>> spi_set_speed_mode() in case bus frequency or bus mode changed to let
>>>> the controller update the configuration.
>>>>
>>>> The problem can easily be triggered using the sspi command:
>>>> => sspi 0:0@1000
>>>> => sspi 0:0@2000
>>>> Without this patch, both transfers happen at 1000 Hz. With this patch,
>>>> the later transfer happens correctly at 2000 Hz.
>>>>
>>>> Signed-off-by: Marek Vasut <marex@denx.de>
>>>> Cc: Jagan Teki <jagan@amarulasolutions.com>
>>>> Cc: Patrick Delaunay <patrick.delaunay@st.com>
>>>
>>> So, very reliably I can make:
>>> https://source.denx.de/u-boot/u-boot/-/jobs/245517
>>> happen locally as well building with clang.  It's not obvious to me why
>>> the test now fails however.
>>
>> Can you please be more specific / clear ? I have no idea what those 300
>> lines of cryptic output mean, nor what are you trying to say by the above,
>> sorry.
> 
> If you build with clang, for sandbox, and run the tests, U-Boot crashes
> in the unit tests that you start with "ut dm".

And that is related to this patch somehow ? How ?
Marek Vasut March 29, 2021, 2:40 a.m. UTC | #6
On 3/29/21 4:09 AM, Marek Vasut wrote:
> On 3/29/21 3:59 AM, Tom Rini wrote:
>> On Mon, Mar 29, 2021 at 03:32:51AM +0200, Marek Vasut wrote:
>>> On 3/28/21 11:30 PM, Tom Rini wrote:
>>>> On Fri, Feb 26, 2021 at 03:21:24PM +0100, Marek Vasut wrote:
>>>>
>>>>> The spi_get_bus_and_cs() may be called on the same bus and chipselect
>>>>> with different frequency or mode. This is valid usecase, but the code
>>>>> fails to notify the controller of such a configuration change. Call
>>>>> spi_set_speed_mode() in case bus frequency or bus mode changed to let
>>>>> the controller update the configuration.
>>>>>
>>>>> The problem can easily be triggered using the sspi command:
>>>>> => sspi 0:0@1000
>>>>> => sspi 0:0@2000
>>>>> Without this patch, both transfers happen at 1000 Hz. With this patch,
>>>>> the later transfer happens correctly at 2000 Hz.
>>>>>
>>>>> Signed-off-by: Marek Vasut <marex@denx.de>
>>>>> Cc: Jagan Teki <jagan@amarulasolutions.com>
>>>>> Cc: Patrick Delaunay <patrick.delaunay@st.com>
>>>>
>>>> So, very reliably I can make:
>>>> https://source.denx.de/u-boot/u-boot/-/jobs/245517
>>>> happen locally as well building with clang.  It's not obvious to me why
>>>> the test now fails however.
>>>
>>> Can you please be more specific / clear ? I have no idea what those 300
>>> lines of cryptic output mean, nor what are you trying to say by the 
>>> above,
>>> sorry.
>>
>> If you build with clang, for sandbox, and run the tests, U-Boot crashes
>> in the unit tests that you start with "ut dm".
> 
> And that is related to this patch somehow ? How ?

... after further discussion off-list to get a better test case 
description ...

It seems the problem is in sound_beep() and it is unrelated to this 
patch, as the same problem happens with / without this patch being 
applied, on:

a7d3ac61482 ("Merge branch '2021-03-28-assorted-bugfixes'")

$ clang --version
Debian clang version 11.0.1-2
...
$ make CC=clang HOSTCC=clang sandbox_defconfig
$ make CC=clang HOSTCC=clang
$ gdb --args ./u-boot -d arch/sandbox/dts/test.dtb
=> ut dm
...
Program received signal SIGSEGV, Segmentation fault.
dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at common/dlmalloc.c:1623
1623      if (!(inuse_bit_at_offset(next, nextsz)))   /* consolidate 
forward */
(gdb) bt
#0  dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at 
common/dlmalloc.c:1623
#1  0x00000000004759e3 in sound_beep (dev=0x15c05580, msecs=<optimized 
out>, frequency_hz=100)
     at drivers/sound/sound-uclass.c:118
#2  0x000000000055e083 in dm_test_sound (uts=0x652400 
<global_dm_test_state>) at test/dm/sound.c:28
#3  0x0000000000514c4d in dm_do_test (uts=0x652400 <global_dm_test_state>,
     test=test@entry=0x621520 <_u_boot_list_2_dm_test_2_dm_test_sound>, 
of_live=<optimized out>) at test/dm/test-main.c:107
#4  0x00000000005147a4 in dm_test_main (test_name=0x0) at 
test/dm/test-main.c:189
#5  0x00000000004420c8 in cmd_call (cmdtp=0x61ebf8 
<_u_boot_list_2_cmd_2_ut>, flag=<optimized out>, argc=2, argv=0x15bd63d0,
     repeatable=0x7fffffffda34) at common/command.c:580
#6  cmd_process (flag=<optimized out>, flag@entry=0, argc=<optimized 
out>, argv=0x15bd63d0, repeatable=0x63b97c <flag_repeat>,
     ticks=ticks@entry=0x0) at common/command.c:635
#7  0x000000000042c574 in run_pipe_real (pi=0x15bd6300) at 
common/cli_hush.c:1676
#8  run_list_real (pi=<optimized out>, pi@entry=0x15bd6300) at 
common/cli_hush.c:1873
#9  0x000000000042b415 in run_list (pi=0x15bd6300) at common/cli_hush.c:2022
#10 parse_stream_outer (inp=inp@entry=0x7fffffffdbe8, flag=<optimized 
out>, flag@entry=2) at common/cli_hush.c:3206
#11 0x000000000042b57a in parse_file_outer () at common/cli_hush.c:3289
#12 0x0000000000441517 in cli_loop () at common/cli.c:230
#13 0x000000000042a0ad in main_loop () at common/main.c:66
#14 0x000000000042d57e in run_main_loop () at common/board_r.c:588
#15 0x000000000042d32e in initcall_run_list (init_sequence=<optimized 
out>) at /u-boot/include/initcall.h:46
#16 board_init_r (new_gd=<optimized out>, dest_addr=dest_addr@entry=0) 
at common/board_r.c:830
#17 0x0000000000403a85 in main (argc=3, argv=0x7fffffffdfa8) at 
arch/sandbox/cpu/start.c:498
Marek Vasut March 31, 2021, 7:18 p.m. UTC | #7
On 3/29/21 4:40 AM, Marek Vasut wrote:
> On 3/29/21 4:09 AM, Marek Vasut wrote:
>> On 3/29/21 3:59 AM, Tom Rini wrote:
>>> On Mon, Mar 29, 2021 at 03:32:51AM +0200, Marek Vasut wrote:
>>>> On 3/28/21 11:30 PM, Tom Rini wrote:
>>>>> On Fri, Feb 26, 2021 at 03:21:24PM +0100, Marek Vasut wrote:
>>>>>
>>>>>> The spi_get_bus_and_cs() may be called on the same bus and chipselect
>>>>>> with different frequency or mode. This is valid usecase, but the code
>>>>>> fails to notify the controller of such a configuration change. Call
>>>>>> spi_set_speed_mode() in case bus frequency or bus mode changed to let
>>>>>> the controller update the configuration.
>>>>>>
>>>>>> The problem can easily be triggered using the sspi command:
>>>>>> => sspi 0:0@1000
>>>>>> => sspi 0:0@2000
>>>>>> Without this patch, both transfers happen at 1000 Hz. With this 
>>>>>> patch,
>>>>>> the later transfer happens correctly at 2000 Hz.
>>>>>>
>>>>>> Signed-off-by: Marek Vasut <marex@denx.de>
>>>>>> Cc: Jagan Teki <jagan@amarulasolutions.com>
>>>>>> Cc: Patrick Delaunay <patrick.delaunay@st.com>
>>>>>
>>>>> So, very reliably I can make:
>>>>> https://source.denx.de/u-boot/u-boot/-/jobs/245517
>>>>> happen locally as well building with clang.  It's not obvious to me 
>>>>> why
>>>>> the test now fails however.
>>>>
>>>> Can you please be more specific / clear ? I have no idea what those 300
>>>> lines of cryptic output mean, nor what are you trying to say by the 
>>>> above,
>>>> sorry.
>>>
>>> If you build with clang, for sandbox, and run the tests, U-Boot crashes
>>> in the unit tests that you start with "ut dm".
>>
>> And that is related to this patch somehow ? How ?
> 
> ... after further discussion off-list to get a better test case 
> description ...
> 
> It seems the problem is in sound_beep() and it is unrelated to this 
> patch, as the same problem happens with / without this patch being 
> applied, on:
> 
> a7d3ac61482 ("Merge branch '2021-03-28-assorted-bugfixes'")
> 
> $ clang --version
> Debian clang version 11.0.1-2
> ...
> $ make CC=clang HOSTCC=clang sandbox_defconfig
> $ make CC=clang HOSTCC=clang
> $ gdb --args ./u-boot -d arch/sandbox/dts/test.dtb
> => ut dm
> ...
> Program received signal SIGSEGV, Segmentation fault.
> dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at 
> common/dlmalloc.c:1623
> 1623      if (!(inuse_bit_at_offset(next, nextsz)))   /* consolidate 
> forward */
> (gdb) bt
> #0  dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at 
> common/dlmalloc.c:1623
> #1  0x00000000004759e3 in sound_beep (dev=0x15c05580, msecs=<optimized 
> out>, frequency_hz=100)
>     at drivers/sound/sound-uclass.c:118

Tom, any further comments ? It seems your claim that this bugfix is 
causing problems with clang is not correct, so I think it should still 
be applied to v2021.04 , unless you can provide more details about the 
clang problem ?
Tom Rini March 31, 2021, 7:26 p.m. UTC | #8
On Wed, Mar 31, 2021 at 09:18:28PM +0200, Marek Vasut wrote:
> On 3/29/21 4:40 AM, Marek Vasut wrote:
> > On 3/29/21 4:09 AM, Marek Vasut wrote:
> > > On 3/29/21 3:59 AM, Tom Rini wrote:
> > > > On Mon, Mar 29, 2021 at 03:32:51AM +0200, Marek Vasut wrote:
> > > > > On 3/28/21 11:30 PM, Tom Rini wrote:
> > > > > > On Fri, Feb 26, 2021 at 03:21:24PM +0100, Marek Vasut wrote:
> > > > > > 
> > > > > > > The spi_get_bus_and_cs() may be called on the same bus and chipselect
> > > > > > > with different frequency or mode. This is valid usecase, but the code
> > > > > > > fails to notify the controller of such a configuration change. Call
> > > > > > > spi_set_speed_mode() in case bus frequency or bus mode changed to let
> > > > > > > the controller update the configuration.
> > > > > > > 
> > > > > > > The problem can easily be triggered using the sspi command:
> > > > > > > => sspi 0:0@1000
> > > > > > > => sspi 0:0@2000
> > > > > > > Without this patch, both transfers happen at 1000
> > > > > > > Hz. With this patch,
> > > > > > > the later transfer happens correctly at 2000 Hz.
> > > > > > > 
> > > > > > > Signed-off-by: Marek Vasut <marex@denx.de>
> > > > > > > Cc: Jagan Teki <jagan@amarulasolutions.com>
> > > > > > > Cc: Patrick Delaunay <patrick.delaunay@st.com>
> > > > > > 
> > > > > > So, very reliably I can make:
> > > > > > https://source.denx.de/u-boot/u-boot/-/jobs/245517
> > > > > > happen locally as well building with clang.  It's not
> > > > > > obvious to me why
> > > > > > the test now fails however.
> > > > > 
> > > > > Can you please be more specific / clear ? I have no idea what those 300
> > > > > lines of cryptic output mean, nor what are you trying to say
> > > > > by the above,
> > > > > sorry.
> > > > 
> > > > If you build with clang, for sandbox, and run the tests, U-Boot crashes
> > > > in the unit tests that you start with "ut dm".
> > > 
> > > And that is related to this patch somehow ? How ?
> > 
> > ... after further discussion off-list to get a better test case
> > description ...
> > 
> > It seems the problem is in sound_beep() and it is unrelated to this
> > patch, as the same problem happens with / without this patch being
> > applied, on:
> > 
> > a7d3ac61482 ("Merge branch '2021-03-28-assorted-bugfixes'")
> > 
> > $ clang --version
> > Debian clang version 11.0.1-2
> > ...
> > $ make CC=clang HOSTCC=clang sandbox_defconfig
> > $ make CC=clang HOSTCC=clang
> > $ gdb --args ./u-boot -d arch/sandbox/dts/test.dtb
> > => ut dm
> > ...
> > Program received signal SIGSEGV, Segmentation fault.
> > dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
> > common/dlmalloc.c:1623
> > 1623      if (!(inuse_bit_at_offset(next, nextsz)))   /* consolidate
> > forward */
> > (gdb) bt
> > #0  dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
> > common/dlmalloc.c:1623
> > #1  0x00000000004759e3 in sound_beep (dev=0x15c05580, msecs=<optimized
> > out>, frequency_hz=100)
> >     at drivers/sound/sound-uclass.c:118
> 
> Tom, any further comments ? It seems your claim that this bugfix is causing
> problems with clang is not correct, so I think it should still be applied to
> v2021.04 , unless you can provide more details about the clang problem ?

There's at least a few funny things going on.  Yes, apparently outside
of CI building sandbox with clang and running the tests manually crashes
in sound, before it gets to the SPI tests.  Except when it crashes on
the SPI tests instead, which I did get to happen once.  So there's some
sort of use-after-free or similar bug around here somewhere, similar to
how we can't use the -fstack-protector patch as it exposes other real
problems that need to be fixed first.

So, I'll see if with a new tag and a few other changes having been made
we once again get to the point where your patch doesn't trigger this
issue.  But since it's also not a regression fix, if no one can figure
out what to do about fixing what clang shows us, then it can wait for
v2021.07.
Tom Rini March 31, 2021, 7:40 p.m. UTC | #9
On Wed, Mar 31, 2021 at 03:26:27PM -0400, Tom Rini wrote:
> On Wed, Mar 31, 2021 at 09:18:28PM +0200, Marek Vasut wrote:
> > On 3/29/21 4:40 AM, Marek Vasut wrote:
> > > On 3/29/21 4:09 AM, Marek Vasut wrote:
> > > > On 3/29/21 3:59 AM, Tom Rini wrote:
> > > > > On Mon, Mar 29, 2021 at 03:32:51AM +0200, Marek Vasut wrote:
> > > > > > On 3/28/21 11:30 PM, Tom Rini wrote:
> > > > > > > On Fri, Feb 26, 2021 at 03:21:24PM +0100, Marek Vasut wrote:
> > > > > > > 
> > > > > > > > The spi_get_bus_and_cs() may be called on the same bus and chipselect
> > > > > > > > with different frequency or mode. This is valid usecase, but the code
> > > > > > > > fails to notify the controller of such a configuration change. Call
> > > > > > > > spi_set_speed_mode() in case bus frequency or bus mode changed to let
> > > > > > > > the controller update the configuration.
> > > > > > > > 
> > > > > > > > The problem can easily be triggered using the sspi command:
> > > > > > > > => sspi 0:0@1000
> > > > > > > > => sspi 0:0@2000
> > > > > > > > Without this patch, both transfers happen at 1000
> > > > > > > > Hz. With this patch,
> > > > > > > > the later transfer happens correctly at 2000 Hz.
> > > > > > > > 
> > > > > > > > Signed-off-by: Marek Vasut <marex@denx.de>
> > > > > > > > Cc: Jagan Teki <jagan@amarulasolutions.com>
> > > > > > > > Cc: Patrick Delaunay <patrick.delaunay@st.com>
> > > > > > > 
> > > > > > > So, very reliably I can make:
> > > > > > > https://source.denx.de/u-boot/u-boot/-/jobs/245517
> > > > > > > happen locally as well building with clang.  It's not
> > > > > > > obvious to me why
> > > > > > > the test now fails however.
> > > > > > 
> > > > > > Can you please be more specific / clear ? I have no idea what those 300
> > > > > > lines of cryptic output mean, nor what are you trying to say
> > > > > > by the above,
> > > > > > sorry.
> > > > > 
> > > > > If you build with clang, for sandbox, and run the tests, U-Boot crashes
> > > > > in the unit tests that you start with "ut dm".
> > > > 
> > > > And that is related to this patch somehow ? How ?
> > > 
> > > ... after further discussion off-list to get a better test case
> > > description ...
> > > 
> > > It seems the problem is in sound_beep() and it is unrelated to this
> > > patch, as the same problem happens with / without this patch being
> > > applied, on:
> > > 
> > > a7d3ac61482 ("Merge branch '2021-03-28-assorted-bugfixes'")
> > > 
> > > $ clang --version
> > > Debian clang version 11.0.1-2
> > > ...
> > > $ make CC=clang HOSTCC=clang sandbox_defconfig
> > > $ make CC=clang HOSTCC=clang
> > > $ gdb --args ./u-boot -d arch/sandbox/dts/test.dtb
> > > => ut dm
> > > ...
> > > Program received signal SIGSEGV, Segmentation fault.
> > > dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
> > > common/dlmalloc.c:1623
> > > 1623      if (!(inuse_bit_at_offset(next, nextsz)))   /* consolidate
> > > forward */
> > > (gdb) bt
> > > #0  dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
> > > common/dlmalloc.c:1623
> > > #1  0x00000000004759e3 in sound_beep (dev=0x15c05580, msecs=<optimized
> > > out>, frequency_hz=100)
> > >     at drivers/sound/sound-uclass.c:118
> > 
> > Tom, any further comments ? It seems your claim that this bugfix is causing
> > problems with clang is not correct, so I think it should still be applied to
> > v2021.04 , unless you can provide more details about the clang problem ?
> 
> There's at least a few funny things going on.  Yes, apparently outside
> of CI building sandbox with clang and running the tests manually crashes
> in sound, before it gets to the SPI tests.  Except when it crashes on
> the SPI tests instead, which I did get to happen once.  So there's some
> sort of use-after-free or similar bug around here somewhere, similar to
> how we can't use the -fstack-protector patch as it exposes other real
> problems that need to be fixed first.
> 
> So, I'll see if with a new tag and a few other changes having been made
> we once again get to the point where your patch doesn't trigger this
> issue.  But since it's also not a regression fix, if no one can figure
> out what to do about fixing what clang shows us, then it can wait for
> v2021.07.

Nope, still gets things to blow up:
https://source.denx.de/u-boot/u-boot/-/jobs/246848

and yes, it would be nice if when the reason a pytest fails is that
U-Boot crashed, it would say something more clear than "OSError: [Errno
5] Input/output error" and you have to know that means U-Boot died.
Marek Vasut March 31, 2021, 8:05 p.m. UTC | #10
On 3/31/21 9:40 PM, Tom Rini wrote:
> On Wed, Mar 31, 2021 at 03:26:27PM -0400, Tom Rini wrote:
>> On Wed, Mar 31, 2021 at 09:18:28PM +0200, Marek Vasut wrote:
>>> On 3/29/21 4:40 AM, Marek Vasut wrote:
>>>> On 3/29/21 4:09 AM, Marek Vasut wrote:
>>>>> On 3/29/21 3:59 AM, Tom Rini wrote:
>>>>>> On Mon, Mar 29, 2021 at 03:32:51AM +0200, Marek Vasut wrote:
>>>>>>> On 3/28/21 11:30 PM, Tom Rini wrote:
>>>>>>>> On Fri, Feb 26, 2021 at 03:21:24PM +0100, Marek Vasut wrote:
>>>>>>>>
>>>>>>>>> The spi_get_bus_and_cs() may be called on the same bus and chipselect
>>>>>>>>> with different frequency or mode. This is valid usecase, but the code
>>>>>>>>> fails to notify the controller of such a configuration change. Call
>>>>>>>>> spi_set_speed_mode() in case bus frequency or bus mode changed to let
>>>>>>>>> the controller update the configuration.
>>>>>>>>>
>>>>>>>>> The problem can easily be triggered using the sspi command:
>>>>>>>>> => sspi 0:0@1000
>>>>>>>>> => sspi 0:0@2000
>>>>>>>>> Without this patch, both transfers happen at 1000
>>>>>>>>> Hz. With this patch,
>>>>>>>>> the later transfer happens correctly at 2000 Hz.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Marek Vasut <marex@denx.de>
>>>>>>>>> Cc: Jagan Teki <jagan@amarulasolutions.com>
>>>>>>>>> Cc: Patrick Delaunay <patrick.delaunay@st.com>
>>>>>>>>
>>>>>>>> So, very reliably I can make:
>>>>>>>> https://source.denx.de/u-boot/u-boot/-/jobs/245517
>>>>>>>> happen locally as well building with clang.  It's not
>>>>>>>> obvious to me why
>>>>>>>> the test now fails however.
>>>>>>>
>>>>>>> Can you please be more specific / clear ? I have no idea what those 300
>>>>>>> lines of cryptic output mean, nor what are you trying to say
>>>>>>> by the above,
>>>>>>> sorry.
>>>>>>
>>>>>> If you build with clang, for sandbox, and run the tests, U-Boot crashes
>>>>>> in the unit tests that you start with "ut dm".
>>>>>
>>>>> And that is related to this patch somehow ? How ?
>>>>
>>>> ... after further discussion off-list to get a better test case
>>>> description ...
>>>>
>>>> It seems the problem is in sound_beep() and it is unrelated to this
>>>> patch, as the same problem happens with / without this patch being
>>>> applied, on:
>>>>
>>>> a7d3ac61482 ("Merge branch '2021-03-28-assorted-bugfixes'")
>>>>
>>>> $ clang --version
>>>> Debian clang version 11.0.1-2
>>>> ...
>>>> $ make CC=clang HOSTCC=clang sandbox_defconfig
>>>> $ make CC=clang HOSTCC=clang
>>>> $ gdb --args ./u-boot -d arch/sandbox/dts/test.dtb
>>>> => ut dm
>>>> ...
>>>> Program received signal SIGSEGV, Segmentation fault.
>>>> dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
>>>> common/dlmalloc.c:1623
>>>> 1623      if (!(inuse_bit_at_offset(next, nextsz)))   /* consolidate
>>>> forward */
>>>> (gdb) bt
>>>> #0  dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
>>>> common/dlmalloc.c:1623
>>>> #1  0x00000000004759e3 in sound_beep (dev=0x15c05580, msecs=<optimized
>>>> out>, frequency_hz=100)
>>>>      at drivers/sound/sound-uclass.c:118
>>>
>>> Tom, any further comments ? It seems your claim that this bugfix is causing
>>> problems with clang is not correct, so I think it should still be applied to
>>> v2021.04 , unless you can provide more details about the clang problem ?
>>
>> There's at least a few funny things going on.  Yes, apparently outside
>> of CI building sandbox with clang and running the tests manually crashes
>> in sound, before it gets to the SPI tests.  Except when it crashes on
>> the SPI tests instead, which I did get to happen once.  So there's some
>> sort of use-after-free or similar bug around here somewhere, similar to
>> how we can't use the -fstack-protector patch as it exposes other real
>> problems that need to be fixed first.
>>
>> So, I'll see if with a new tag and a few other changes having been made
>> we once again get to the point where your patch doesn't trigger this
>> issue.  But since it's also not a regression fix, if no one can figure
>> out what to do about fixing what clang shows us, then it can wait for
>> v2021.07.
> 
> Nope, still gets things to blow up:
> https://source.denx.de/u-boot/u-boot/-/jobs/246848
> 
> and yes, it would be nice if when the reason a pytest fails is that
> U-Boot crashed, it would say something more clear than "OSError: [Errno
> 5] Input/output error" and you have to know that means U-Boot died.

Can you please provide a detailed test case how to reproduce this ?
So far I don't have one, the only test description I got is "install 
random version of clang, run ut dm and it fails", but that kind of 
imprecise test description is really not useful. Please provide more 
specific instructions.
Tom Rini March 31, 2021, 9:45 p.m. UTC | #11
On Wed, Mar 31, 2021 at 10:05:28PM +0200, Marek Vasut wrote:
> On 3/31/21 9:40 PM, Tom Rini wrote:
> > On Wed, Mar 31, 2021 at 03:26:27PM -0400, Tom Rini wrote:
> > > On Wed, Mar 31, 2021 at 09:18:28PM +0200, Marek Vasut wrote:
> > > > On 3/29/21 4:40 AM, Marek Vasut wrote:
> > > > > On 3/29/21 4:09 AM, Marek Vasut wrote:
> > > > > > On 3/29/21 3:59 AM, Tom Rini wrote:
> > > > > > > On Mon, Mar 29, 2021 at 03:32:51AM +0200, Marek Vasut wrote:
> > > > > > > > On 3/28/21 11:30 PM, Tom Rini wrote:
> > > > > > > > > On Fri, Feb 26, 2021 at 03:21:24PM +0100, Marek Vasut wrote:
> > > > > > > > > 
> > > > > > > > > > The spi_get_bus_and_cs() may be called on the same bus and chipselect
> > > > > > > > > > with different frequency or mode. This is valid usecase, but the code
> > > > > > > > > > fails to notify the controller of such a configuration change. Call
> > > > > > > > > > spi_set_speed_mode() in case bus frequency or bus mode changed to let
> > > > > > > > > > the controller update the configuration.
> > > > > > > > > > 
> > > > > > > > > > The problem can easily be triggered using the sspi command:
> > > > > > > > > > => sspi 0:0@1000
> > > > > > > > > > => sspi 0:0@2000
> > > > > > > > > > Without this patch, both transfers happen at 1000
> > > > > > > > > > Hz. With this patch,
> > > > > > > > > > the later transfer happens correctly at 2000 Hz.
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Marek Vasut <marex@denx.de>
> > > > > > > > > > Cc: Jagan Teki <jagan@amarulasolutions.com>
> > > > > > > > > > Cc: Patrick Delaunay <patrick.delaunay@st.com>
> > > > > > > > > 
> > > > > > > > > So, very reliably I can make:
> > > > > > > > > https://source.denx.de/u-boot/u-boot/-/jobs/245517
> > > > > > > > > happen locally as well building with clang.  It's not
> > > > > > > > > obvious to me why
> > > > > > > > > the test now fails however.
> > > > > > > > 
> > > > > > > > Can you please be more specific / clear ? I have no idea what those 300
> > > > > > > > lines of cryptic output mean, nor what are you trying to say
> > > > > > > > by the above,
> > > > > > > > sorry.
> > > > > > > 
> > > > > > > If you build with clang, for sandbox, and run the tests, U-Boot crashes
> > > > > > > in the unit tests that you start with "ut dm".
> > > > > > 
> > > > > > And that is related to this patch somehow ? How ?
> > > > > 
> > > > > ... after further discussion off-list to get a better test case
> > > > > description ...
> > > > > 
> > > > > It seems the problem is in sound_beep() and it is unrelated to this
> > > > > patch, as the same problem happens with / without this patch being
> > > > > applied, on:
> > > > > 
> > > > > a7d3ac61482 ("Merge branch '2021-03-28-assorted-bugfixes'")
> > > > > 
> > > > > $ clang --version
> > > > > Debian clang version 11.0.1-2
> > > > > ...
> > > > > $ make CC=clang HOSTCC=clang sandbox_defconfig
> > > > > $ make CC=clang HOSTCC=clang
> > > > > $ gdb --args ./u-boot -d arch/sandbox/dts/test.dtb
> > > > > => ut dm
> > > > > ...
> > > > > Program received signal SIGSEGV, Segmentation fault.
> > > > > dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
> > > > > common/dlmalloc.c:1623
> > > > > 1623      if (!(inuse_bit_at_offset(next, nextsz)))   /* consolidate
> > > > > forward */
> > > > > (gdb) bt
> > > > > #0  dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
> > > > > common/dlmalloc.c:1623
> > > > > #1  0x00000000004759e3 in sound_beep (dev=0x15c05580, msecs=<optimized
> > > > > out>, frequency_hz=100)
> > > > >      at drivers/sound/sound-uclass.c:118
> > > > 
> > > > Tom, any further comments ? It seems your claim that this bugfix is causing
> > > > problems with clang is not correct, so I think it should still be applied to
> > > > v2021.04 , unless you can provide more details about the clang problem ?
> > > 
> > > There's at least a few funny things going on.  Yes, apparently outside
> > > of CI building sandbox with clang and running the tests manually crashes
> > > in sound, before it gets to the SPI tests.  Except when it crashes on
> > > the SPI tests instead, which I did get to happen once.  So there's some
> > > sort of use-after-free or similar bug around here somewhere, similar to
> > > how we can't use the -fstack-protector patch as it exposes other real
> > > problems that need to be fixed first.
> > > 
> > > So, I'll see if with a new tag and a few other changes having been made
> > > we once again get to the point where your patch doesn't trigger this
> > > issue.  But since it's also not a regression fix, if no one can figure
> > > out what to do about fixing what clang shows us, then it can wait for
> > > v2021.07.
> > 
> > Nope, still gets things to blow up:
> > https://source.denx.de/u-boot/u-boot/-/jobs/246848
> > 
> > and yes, it would be nice if when the reason a pytest fails is that
> > U-Boot crashed, it would say something more clear than "OSError: [Errno
> > 5] Input/output error" and you have to know that means U-Boot died.
> 
> Can you please provide a detailed test case how to reproduce this ?
> So far I don't have one, the only test description I got is "install random
> version of clang, run ut dm and it fails", but that kind of imprecise test
> description is really not useful. Please provide more specific instructions.

Yes, follow what CI does.  It's done in a container, so you can have the
exact same runtime and the tests are in .azure-pipelines.yml and
.gitlab-ci.yml or if you look at
https://source.denx.de/u-boot/u-boot/-/jobs/246848/raw you can see the
commands in there.
Simon Glass March 31, 2021, 10:19 p.m. UTC | #12
+Heinrich Schuchardt

On Thu, 1 Apr 2021 at 10:45, Tom Rini <trini@konsulko.com> wrote:
>
> On Wed, Mar 31, 2021 at 10:05:28PM +0200, Marek Vasut wrote:
> > On 3/31/21 9:40 PM, Tom Rini wrote:
> > > On Wed, Mar 31, 2021 at 03:26:27PM -0400, Tom Rini wrote:
> > > > On Wed, Mar 31, 2021 at 09:18:28PM +0200, Marek Vasut wrote:
> > > > > On 3/29/21 4:40 AM, Marek Vasut wrote:
> > > > > > On 3/29/21 4:09 AM, Marek Vasut wrote:
> > > > > > > On 3/29/21 3:59 AM, Tom Rini wrote:
> > > > > > > > On Mon, Mar 29, 2021 at 03:32:51AM +0200, Marek Vasut wrote:
> > > > > > > > > On 3/28/21 11:30 PM, Tom Rini wrote:
> > > > > > > > > > On Fri, Feb 26, 2021 at 03:21:24PM +0100, Marek Vasut wrote:
> > > > > > > > > >
> > > > > > > > > > > The spi_get_bus_and_cs() may be called on the same bus and chipselect
> > > > > > > > > > > with different frequency or mode. This is valid usecase, but the code
> > > > > > > > > > > fails to notify the controller of such a configuration change. Call
> > > > > > > > > > > spi_set_speed_mode() in case bus frequency or bus mode changed to let
> > > > > > > > > > > the controller update the configuration.
> > > > > > > > > > >
> > > > > > > > > > > The problem can easily be triggered using the sspi command:
> > > > > > > > > > > => sspi 0:0@1000
> > > > > > > > > > > => sspi 0:0@2000
> > > > > > > > > > > Without this patch, both transfers happen at 1000
> > > > > > > > > > > Hz. With this patch,
> > > > > > > > > > > the later transfer happens correctly at 2000 Hz.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Marek Vasut <marex@denx.de>
> > > > > > > > > > > Cc: Jagan Teki <jagan@amarulasolutions.com>
> > > > > > > > > > > Cc: Patrick Delaunay <patrick.delaunay@st.com>
> > > > > > > > > >
> > > > > > > > > > So, very reliably I can make:
> > > > > > > > > > https://source.denx.de/u-boot/u-boot/-/jobs/245517
> > > > > > > > > > happen locally as well building with clang.  It's not
> > > > > > > > > > obvious to me why
> > > > > > > > > > the test now fails however.
> > > > > > > > >
> > > > > > > > > Can you please be more specific / clear ? I have no idea what those 300
> > > > > > > > > lines of cryptic output mean, nor what are you trying to say
> > > > > > > > > by the above,
> > > > > > > > > sorry.
> > > > > > > >
> > > > > > > > If you build with clang, for sandbox, and run the tests, U-Boot crashes
> > > > > > > > in the unit tests that you start with "ut dm".
> > > > > > >
> > > > > > > And that is related to this patch somehow ? How ?
> > > > > >
> > > > > > ... after further discussion off-list to get a better test case
> > > > > > description ...
> > > > > >
> > > > > > It seems the problem is in sound_beep() and it is unrelated to this
> > > > > > patch, as the same problem happens with / without this patch being
> > > > > > applied, on:
> > > > > >
> > > > > > a7d3ac61482 ("Merge branch '2021-03-28-assorted-bugfixes'")
> > > > > >
> > > > > > $ clang --version
> > > > > > Debian clang version 11.0.1-2
> > > > > > ...
> > > > > > $ make CC=clang HOSTCC=clang sandbox_defconfig
> > > > > > $ make CC=clang HOSTCC=clang
> > > > > > $ gdb --args ./u-boot -d arch/sandbox/dts/test.dtb
> > > > > > => ut dm
> > > > > > ...
> > > > > > Program received signal SIGSEGV, Segmentation fault.
> > > > > > dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
> > > > > > common/dlmalloc.c:1623
> > > > > > 1623      if (!(inuse_bit_at_offset(next, nextsz)))   /* consolidate
> > > > > > forward */
> > > > > > (gdb) bt
> > > > > > #0  dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
> > > > > > common/dlmalloc.c:1623
> > > > > > #1  0x00000000004759e3 in sound_beep (dev=0x15c05580, msecs=<optimized
> > > > > > out>, frequency_hz=100)
> > > > > >      at drivers/sound/sound-uclass.c:118
> > > > >
> > > > > Tom, any further comments ? It seems your claim that this bugfix is causing
> > > > > problems with clang is not correct, so I think it should still be applied to
> > > > > v2021.04 , unless you can provide more details about the clang problem ?
> > > >
> > > > There's at least a few funny things going on.  Yes, apparently outside
> > > > of CI building sandbox with clang and running the tests manually crashes
> > > > in sound, before it gets to the SPI tests.  Except when it crashes on
> > > > the SPI tests instead, which I did get to happen once.  So there's some
> > > > sort of use-after-free or similar bug around here somewhere, similar to
> > > > how we can't use the -fstack-protector patch as it exposes other real
> > > > problems that need to be fixed first.
> > > >
> > > > So, I'll see if with a new tag and a few other changes having been made
> > > > we once again get to the point where your patch doesn't trigger this
> > > > issue.  But since it's also not a regression fix, if no one can figure
> > > > out what to do about fixing what clang shows us, then it can wait for
> > > > v2021.07.
> > >
> > > Nope, still gets things to blow up:
> > > https://source.denx.de/u-boot/u-boot/-/jobs/246848
> > >
> > > and yes, it would be nice if when the reason a pytest fails is that
> > > U-Boot crashed, it would say something more clear than "OSError: [Errno
> > > 5] Input/output error" and you have to know that means U-Boot died.
> >
> > Can you please provide a detailed test case how to reproduce this ?
> > So far I don't have one, the only test description I got is "install random
> > version of clang, run ut dm and it fails", but that kind of imprecise test
> > description is really not useful. Please provide more specific instructions.
>
> Yes, follow what CI does.  It's done in a container, so you can have the
> exact same runtime and the tests are in .azure-pipelines.yml and
> .gitlab-ci.yml or if you look at
> https://source.denx.de/u-boot/u-boot/-/jobs/246848/raw you can see the
> commands in there.

This does not seem good. I wonder if this patch might have introduced
some instability?

b308d9fd18f sandbox: Avoid using malloc() for system state

I have not seen this issue but did see a strange problem on sandbox
when I tried to malloc() more space that was available. I haven't got
back to it yet.

Regards,
Simon
Marek Vasut March 31, 2021, 10:39 p.m. UTC | #13
On 3/31/21 11:45 PM, Tom Rini wrote:
> On Wed, Mar 31, 2021 at 10:05:28PM +0200, Marek Vasut wrote:
>> On 3/31/21 9:40 PM, Tom Rini wrote:
>>> On Wed, Mar 31, 2021 at 03:26:27PM -0400, Tom Rini wrote:
>>>> On Wed, Mar 31, 2021 at 09:18:28PM +0200, Marek Vasut wrote:
>>>>> On 3/29/21 4:40 AM, Marek Vasut wrote:
>>>>>> On 3/29/21 4:09 AM, Marek Vasut wrote:
>>>>>>> On 3/29/21 3:59 AM, Tom Rini wrote:
>>>>>>>> On Mon, Mar 29, 2021 at 03:32:51AM +0200, Marek Vasut wrote:
>>>>>>>>> On 3/28/21 11:30 PM, Tom Rini wrote:
>>>>>>>>>> On Fri, Feb 26, 2021 at 03:21:24PM +0100, Marek Vasut wrote:
>>>>>>>>>>
>>>>>>>>>>> The spi_get_bus_and_cs() may be called on the same bus and chipselect
>>>>>>>>>>> with different frequency or mode. This is valid usecase, but the code
>>>>>>>>>>> fails to notify the controller of such a configuration change. Call
>>>>>>>>>>> spi_set_speed_mode() in case bus frequency or bus mode changed to let
>>>>>>>>>>> the controller update the configuration.
>>>>>>>>>>>
>>>>>>>>>>> The problem can easily be triggered using the sspi command:
>>>>>>>>>>> => sspi 0:0@1000
>>>>>>>>>>> => sspi 0:0@2000
>>>>>>>>>>> Without this patch, both transfers happen at 1000
>>>>>>>>>>> Hz. With this patch,
>>>>>>>>>>> the later transfer happens correctly at 2000 Hz.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Marek Vasut <marex@denx.de>
>>>>>>>>>>> Cc: Jagan Teki <jagan@amarulasolutions.com>
>>>>>>>>>>> Cc: Patrick Delaunay <patrick.delaunay@st.com>
>>>>>>>>>>
>>>>>>>>>> So, very reliably I can make:
>>>>>>>>>> https://source.denx.de/u-boot/u-boot/-/jobs/245517
>>>>>>>>>> happen locally as well building with clang.  It's not
>>>>>>>>>> obvious to me why
>>>>>>>>>> the test now fails however.
>>>>>>>>>
>>>>>>>>> Can you please be more specific / clear ? I have no idea what those 300
>>>>>>>>> lines of cryptic output mean, nor what are you trying to say
>>>>>>>>> by the above,
>>>>>>>>> sorry.
>>>>>>>>
>>>>>>>> If you build with clang, for sandbox, and run the tests, U-Boot crashes
>>>>>>>> in the unit tests that you start with "ut dm".
>>>>>>>
>>>>>>> And that is related to this patch somehow ? How ?
>>>>>>
>>>>>> ... after further discussion off-list to get a better test case
>>>>>> description ...
>>>>>>
>>>>>> It seems the problem is in sound_beep() and it is unrelated to this
>>>>>> patch, as the same problem happens with / without this patch being
>>>>>> applied, on:
>>>>>>
>>>>>> a7d3ac61482 ("Merge branch '2021-03-28-assorted-bugfixes'")
>>>>>>
>>>>>> $ clang --version
>>>>>> Debian clang version 11.0.1-2
>>>>>> ...
>>>>>> $ make CC=clang HOSTCC=clang sandbox_defconfig
>>>>>> $ make CC=clang HOSTCC=clang
>>>>>> $ gdb --args ./u-boot -d arch/sandbox/dts/test.dtb
>>>>>> => ut dm
>>>>>> ...
>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>> dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
>>>>>> common/dlmalloc.c:1623
>>>>>> 1623      if (!(inuse_bit_at_offset(next, nextsz)))   /* consolidate
>>>>>> forward */
>>>>>> (gdb) bt
>>>>>> #0  dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
>>>>>> common/dlmalloc.c:1623
>>>>>> #1  0x00000000004759e3 in sound_beep (dev=0x15c05580, msecs=<optimized
>>>>>> out>, frequency_hz=100)
>>>>>>       at drivers/sound/sound-uclass.c:118
>>>>>
>>>>> Tom, any further comments ? It seems your claim that this bugfix is causing
>>>>> problems with clang is not correct, so I think it should still be applied to
>>>>> v2021.04 , unless you can provide more details about the clang problem ?
>>>>
>>>> There's at least a few funny things going on.  Yes, apparently outside
>>>> of CI building sandbox with clang and running the tests manually crashes
>>>> in sound, before it gets to the SPI tests.  Except when it crashes on
>>>> the SPI tests instead, which I did get to happen once.  So there's some
>>>> sort of use-after-free or similar bug around here somewhere, similar to
>>>> how we can't use the -fstack-protector patch as it exposes other real
>>>> problems that need to be fixed first.
>>>>
>>>> So, I'll see if with a new tag and a few other changes having been made
>>>> we once again get to the point where your patch doesn't trigger this
>>>> issue.  But since it's also not a regression fix, if no one can figure
>>>> out what to do about fixing what clang shows us, then it can wait for
>>>> v2021.07.
>>>
>>> Nope, still gets things to blow up:
>>> https://source.denx.de/u-boot/u-boot/-/jobs/246848
>>>
>>> and yes, it would be nice if when the reason a pytest fails is that
>>> U-Boot crashed, it would say something more clear than "OSError: [Errno
>>> 5] Input/output error" and you have to know that means U-Boot died.
>>
>> Can you please provide a detailed test case how to reproduce this ?
>> So far I don't have one, the only test description I got is "install random
>> version of clang, run ut dm and it fails", but that kind of imprecise test
>> description is really not useful. Please provide more specific instructions.
> 
> Yes, follow what CI does.  It's done in a container, so you can have the
> exact same runtime and the tests are in .azure-pipelines.yml and
> .gitlab-ci.yml or if you look at
> https://source.denx.de/u-boot/u-boot/-/jobs/246848/raw you can see the
> commands in there.

Can you please provide me the exact commands to run to reproduce this 
problem ? I just spent a lot of time trying to find this one single 
command in a wall of text, "ut dm spi_find", which I think is the one 
failing for you ? It seems to work for me with and without the patch:

$ ./u-boot -d arch/sandbox/dts/test.dtb
sandbox: starting...


U-Boot 2021.04-rc5-00005-g583b6858733 (Apr 01 2021 - 00:34:09 +0200)

Model: sandbox
DRAM:  128 MiB
WDT:   Started with servicing (60s timeout)
MMC:   mmc2: 2 (SD), mmc1: 1 (SD), mmc0: 0 (SD)
In:    serial
Out:   vidconsole
Err:   vidconsole
Model: sandbox
SCSI:
Net:   eth0: eth@10002000, eth5: eth@10003000, eth3: sbe5, eth6: 
eth@10004000
Hit any key to stop autoboot:  0

Without patch
=> ut dm spi_find
Test: dm_test_spi_find: spi.c
test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum, 
cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0 
(0), got 0xfffffffb (-5)
Test: dm_test_spi_find: spi.c (flat tree)
test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum, 
cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0 
(0), got 0xfffffffb (-5)
Failures: 2

With patch
=> ut dm spi_find
Test: dm_test_spi_find: spi.c
test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum, 
cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0 
(0), got 0xfffffffb (-5)
Test: dm_test_spi_find: spi.c (flat tree)
test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum, 
cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0 
(0), got 0xfffffffb (-5)
Failures: 2
Simon Glass March 31, 2021, 10:50 p.m. UTC | #14
Hi Marek,

On Thu, 1 Apr 2021 at 11:39, Marek Vasut <marex@denx.de> wrote:
>
> On 3/31/21 11:45 PM, Tom Rini wrote:
> > On Wed, Mar 31, 2021 at 10:05:28PM +0200, Marek Vasut wrote:
> >> On 3/31/21 9:40 PM, Tom Rini wrote:
> >>> On Wed, Mar 31, 2021 at 03:26:27PM -0400, Tom Rini wrote:
> >>>> On Wed, Mar 31, 2021 at 09:18:28PM +0200, Marek Vasut wrote:
> >>>>> On 3/29/21 4:40 AM, Marek Vasut wrote:
> >>>>>> On 3/29/21 4:09 AM, Marek Vasut wrote:
> >>>>>>> On 3/29/21 3:59 AM, Tom Rini wrote:
> >>>>>>>> On Mon, Mar 29, 2021 at 03:32:51AM +0200, Marek Vasut wrote:
> >>>>>>>>> On 3/28/21 11:30 PM, Tom Rini wrote:
> >>>>>>>>>> On Fri, Feb 26, 2021 at 03:21:24PM +0100, Marek Vasut wrote:
> >>>>>>>>>>
> >>>>>>>>>>> The spi_get_bus_and_cs() may be called on the same bus and chipselect
> >>>>>>>>>>> with different frequency or mode. This is valid usecase, but the code
> >>>>>>>>>>> fails to notify the controller of such a configuration change. Call
> >>>>>>>>>>> spi_set_speed_mode() in case bus frequency or bus mode changed to let
> >>>>>>>>>>> the controller update the configuration.
> >>>>>>>>>>>
> >>>>>>>>>>> The problem can easily be triggered using the sspi command:
> >>>>>>>>>>> => sspi 0:0@1000
> >>>>>>>>>>> => sspi 0:0@2000
> >>>>>>>>>>> Without this patch, both transfers happen at 1000
> >>>>>>>>>>> Hz. With this patch,
> >>>>>>>>>>> the later transfer happens correctly at 2000 Hz.
> >>>>>>>>>>>
> >>>>>>>>>>> Signed-off-by: Marek Vasut <marex@denx.de>
> >>>>>>>>>>> Cc: Jagan Teki <jagan@amarulasolutions.com>
> >>>>>>>>>>> Cc: Patrick Delaunay <patrick.delaunay@st.com>
> >>>>>>>>>>
> >>>>>>>>>> So, very reliably I can make:
> >>>>>>>>>> https://source.denx.de/u-boot/u-boot/-/jobs/245517
> >>>>>>>>>> happen locally as well building with clang.  It's not
> >>>>>>>>>> obvious to me why
> >>>>>>>>>> the test now fails however.
> >>>>>>>>>
> >>>>>>>>> Can you please be more specific / clear ? I have no idea what those 300
> >>>>>>>>> lines of cryptic output mean, nor what are you trying to say
> >>>>>>>>> by the above,
> >>>>>>>>> sorry.
> >>>>>>>>
> >>>>>>>> If you build with clang, for sandbox, and run the tests, U-Boot crashes
> >>>>>>>> in the unit tests that you start with "ut dm".
> >>>>>>>
> >>>>>>> And that is related to this patch somehow ? How ?
> >>>>>>
> >>>>>> ... after further discussion off-list to get a better test case
> >>>>>> description ...
> >>>>>>
> >>>>>> It seems the problem is in sound_beep() and it is unrelated to this
> >>>>>> patch, as the same problem happens with / without this patch being
> >>>>>> applied, on:
> >>>>>>
> >>>>>> a7d3ac61482 ("Merge branch '2021-03-28-assorted-bugfixes'")
> >>>>>>
> >>>>>> $ clang --version
> >>>>>> Debian clang version 11.0.1-2
> >>>>>> ...
> >>>>>> $ make CC=clang HOSTCC=clang sandbox_defconfig
> >>>>>> $ make CC=clang HOSTCC=clang
> >>>>>> $ gdb --args ./u-boot -d arch/sandbox/dts/test.dtb
> >>>>>> => ut dm
> >>>>>> ...
> >>>>>> Program received signal SIGSEGV, Segmentation fault.
> >>>>>> dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
> >>>>>> common/dlmalloc.c:1623
> >>>>>> 1623      if (!(inuse_bit_at_offset(next, nextsz)))   /* consolidate
> >>>>>> forward */
> >>>>>> (gdb) bt
> >>>>>> #0  dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
> >>>>>> common/dlmalloc.c:1623
> >>>>>> #1  0x00000000004759e3 in sound_beep (dev=0x15c05580, msecs=<optimized
> >>>>>> out>, frequency_hz=100)
> >>>>>>       at drivers/sound/sound-uclass.c:118
> >>>>>
> >>>>> Tom, any further comments ? It seems your claim that this bugfix is causing
> >>>>> problems with clang is not correct, so I think it should still be applied to
> >>>>> v2021.04 , unless you can provide more details about the clang problem ?
> >>>>
> >>>> There's at least a few funny things going on.  Yes, apparently outside
> >>>> of CI building sandbox with clang and running the tests manually crashes
> >>>> in sound, before it gets to the SPI tests.  Except when it crashes on
> >>>> the SPI tests instead, which I did get to happen once.  So there's some
> >>>> sort of use-after-free or similar bug around here somewhere, similar to
> >>>> how we can't use the -fstack-protector patch as it exposes other real
> >>>> problems that need to be fixed first.
> >>>>
> >>>> So, I'll see if with a new tag and a few other changes having been made
> >>>> we once again get to the point where your patch doesn't trigger this
> >>>> issue.  But since it's also not a regression fix, if no one can figure
> >>>> out what to do about fixing what clang shows us, then it can wait for
> >>>> v2021.07.
> >>>
> >>> Nope, still gets things to blow up:
> >>> https://source.denx.de/u-boot/u-boot/-/jobs/246848
> >>>
> >>> and yes, it would be nice if when the reason a pytest fails is that
> >>> U-Boot crashed, it would say something more clear than "OSError: [Errno
> >>> 5] Input/output error" and you have to know that means U-Boot died.
> >>
> >> Can you please provide a detailed test case how to reproduce this ?
> >> So far I don't have one, the only test description I got is "install random
> >> version of clang, run ut dm and it fails", but that kind of imprecise test
> >> description is really not useful. Please provide more specific instructions.
> >
> > Yes, follow what CI does.  It's done in a container, so you can have the
> > exact same runtime and the tests are in .azure-pipelines.yml and
> > .gitlab-ci.yml or if you look at
> > https://source.denx.de/u-boot/u-boot/-/jobs/246848/raw you can see the
> > commands in there.
>
> Can you please provide me the exact commands to run to reproduce this
> problem ? I just spent a lot of time trying to find this one single
> command in a wall of text, "ut dm spi_find", which I think is the one
> failing for you ? It seems to work for me with and without the patch:
>
> $ ./u-boot -d arch/sandbox/dts/test.dtb
> sandbox: starting...
>
>
> U-Boot 2021.04-rc5-00005-g583b6858733 (Apr 01 2021 - 00:34:09 +0200)
>
> Model: sandbox
> DRAM:  128 MiB
> WDT:   Started with servicing (60s timeout)
> MMC:   mmc2: 2 (SD), mmc1: 1 (SD), mmc0: 0 (SD)
> In:    serial
> Out:   vidconsole
> Err:   vidconsole
> Model: sandbox
> SCSI:
> Net:   eth0: eth@10002000, eth5: eth@10003000, eth3: sbe5, eth6:
> eth@10004000
> Hit any key to stop autoboot:  0
>
> Without patch
> => ut dm spi_find
> Test: dm_test_spi_find: spi.c
> test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum,
> cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0
> (0), got 0xfffffffb (-5)
> Test: dm_test_spi_find: spi.c (flat tree)
> test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum,
> cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0
> (0), got 0xfffffffb (-5)
> Failures: 2
>
> With patch
> => ut dm spi_find
> Test: dm_test_spi_find: spi.c
> test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum,
> cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0
> (0), got 0xfffffffb (-5)
> Test: dm_test_spi_find: spi.c (flat tree)
> test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum,
> cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0
> (0), got 0xfffffffb (-5)
> Failures: 2

The Python test sets up a SPI-flash file (spi.bin) that is needed by the test.

I use this alias:

# Run a pytest on sandbox
# $1: Name of test to run (optional, else run all)
function pyt {
   test/py/test.py -B sandbox --build-dir /tmp/b/sandbox ${1:+"-k $1"}
}

then 'pyt spi_find'

Once you've done that once you can do what you had before:

./u-boot -T

Regards,
Simon
Marek Vasut March 31, 2021, 11:19 p.m. UTC | #15
On 4/1/21 12:50 AM, Simon Glass wrote:
> Hi Marek,
> 
> On Thu, 1 Apr 2021 at 11:39, Marek Vasut <marex@denx.de> wrote:
>>
>> On 3/31/21 11:45 PM, Tom Rini wrote:
>>> On Wed, Mar 31, 2021 at 10:05:28PM +0200, Marek Vasut wrote:
>>>> On 3/31/21 9:40 PM, Tom Rini wrote:
>>>>> On Wed, Mar 31, 2021 at 03:26:27PM -0400, Tom Rini wrote:
>>>>>> On Wed, Mar 31, 2021 at 09:18:28PM +0200, Marek Vasut wrote:
>>>>>>> On 3/29/21 4:40 AM, Marek Vasut wrote:
>>>>>>>> On 3/29/21 4:09 AM, Marek Vasut wrote:
>>>>>>>>> On 3/29/21 3:59 AM, Tom Rini wrote:
>>>>>>>>>> On Mon, Mar 29, 2021 at 03:32:51AM +0200, Marek Vasut wrote:
>>>>>>>>>>> On 3/28/21 11:30 PM, Tom Rini wrote:
>>>>>>>>>>>> On Fri, Feb 26, 2021 at 03:21:24PM +0100, Marek Vasut wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> The spi_get_bus_and_cs() may be called on the same bus and chipselect
>>>>>>>>>>>>> with different frequency or mode. This is valid usecase, but the code
>>>>>>>>>>>>> fails to notify the controller of such a configuration change. Call
>>>>>>>>>>>>> spi_set_speed_mode() in case bus frequency or bus mode changed to let
>>>>>>>>>>>>> the controller update the configuration.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The problem can easily be triggered using the sspi command:
>>>>>>>>>>>>> => sspi 0:0@1000
>>>>>>>>>>>>> => sspi 0:0@2000
>>>>>>>>>>>>> Without this patch, both transfers happen at 1000
>>>>>>>>>>>>> Hz. With this patch,
>>>>>>>>>>>>> the later transfer happens correctly at 2000 Hz.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Marek Vasut <marex@denx.de>
>>>>>>>>>>>>> Cc: Jagan Teki <jagan@amarulasolutions.com>
>>>>>>>>>>>>> Cc: Patrick Delaunay <patrick.delaunay@st.com>
>>>>>>>>>>>>
>>>>>>>>>>>> So, very reliably I can make:
>>>>>>>>>>>> https://source.denx.de/u-boot/u-boot/-/jobs/245517
>>>>>>>>>>>> happen locally as well building with clang.  It's not
>>>>>>>>>>>> obvious to me why
>>>>>>>>>>>> the test now fails however.
>>>>>>>>>>>
>>>>>>>>>>> Can you please be more specific / clear ? I have no idea what those 300
>>>>>>>>>>> lines of cryptic output mean, nor what are you trying to say
>>>>>>>>>>> by the above,
>>>>>>>>>>> sorry.
>>>>>>>>>>
>>>>>>>>>> If you build with clang, for sandbox, and run the tests, U-Boot crashes
>>>>>>>>>> in the unit tests that you start with "ut dm".
>>>>>>>>>
>>>>>>>>> And that is related to this patch somehow ? How ?
>>>>>>>>
>>>>>>>> ... after further discussion off-list to get a better test case
>>>>>>>> description ...
>>>>>>>>
>>>>>>>> It seems the problem is in sound_beep() and it is unrelated to this
>>>>>>>> patch, as the same problem happens with / without this patch being
>>>>>>>> applied, on:
>>>>>>>>
>>>>>>>> a7d3ac61482 ("Merge branch '2021-03-28-assorted-bugfixes'")
>>>>>>>>
>>>>>>>> $ clang --version
>>>>>>>> Debian clang version 11.0.1-2
>>>>>>>> ...
>>>>>>>> $ make CC=clang HOSTCC=clang sandbox_defconfig
>>>>>>>> $ make CC=clang HOSTCC=clang
>>>>>>>> $ gdb --args ./u-boot -d arch/sandbox/dts/test.dtb
>>>>>>>> => ut dm
>>>>>>>> ...
>>>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>>>> dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
>>>>>>>> common/dlmalloc.c:1623
>>>>>>>> 1623      if (!(inuse_bit_at_offset(next, nextsz)))   /* consolidate
>>>>>>>> forward */
>>>>>>>> (gdb) bt
>>>>>>>> #0  dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
>>>>>>>> common/dlmalloc.c:1623
>>>>>>>> #1  0x00000000004759e3 in sound_beep (dev=0x15c05580, msecs=<optimized
>>>>>>>> out>, frequency_hz=100)
>>>>>>>>        at drivers/sound/sound-uclass.c:118
>>>>>>>
>>>>>>> Tom, any further comments ? It seems your claim that this bugfix is causing
>>>>>>> problems with clang is not correct, so I think it should still be applied to
>>>>>>> v2021.04 , unless you can provide more details about the clang problem ?
>>>>>>
>>>>>> There's at least a few funny things going on.  Yes, apparently outside
>>>>>> of CI building sandbox with clang and running the tests manually crashes
>>>>>> in sound, before it gets to the SPI tests.  Except when it crashes on
>>>>>> the SPI tests instead, which I did get to happen once.  So there's some
>>>>>> sort of use-after-free or similar bug around here somewhere, similar to
>>>>>> how we can't use the -fstack-protector patch as it exposes other real
>>>>>> problems that need to be fixed first.
>>>>>>
>>>>>> So, I'll see if with a new tag and a few other changes having been made
>>>>>> we once again get to the point where your patch doesn't trigger this
>>>>>> issue.  But since it's also not a regression fix, if no one can figure
>>>>>> out what to do about fixing what clang shows us, then it can wait for
>>>>>> v2021.07.
>>>>>
>>>>> Nope, still gets things to blow up:
>>>>> https://source.denx.de/u-boot/u-boot/-/jobs/246848
>>>>>
>>>>> and yes, it would be nice if when the reason a pytest fails is that
>>>>> U-Boot crashed, it would say something more clear than "OSError: [Errno
>>>>> 5] Input/output error" and you have to know that means U-Boot died.
>>>>
>>>> Can you please provide a detailed test case how to reproduce this ?
>>>> So far I don't have one, the only test description I got is "install random
>>>> version of clang, run ut dm and it fails", but that kind of imprecise test
>>>> description is really not useful. Please provide more specific instructions.
>>>
>>> Yes, follow what CI does.  It's done in a container, so you can have the
>>> exact same runtime and the tests are in .azure-pipelines.yml and
>>> .gitlab-ci.yml or if you look at
>>> https://source.denx.de/u-boot/u-boot/-/jobs/246848/raw you can see the
>>> commands in there.
>>
>> Can you please provide me the exact commands to run to reproduce this
>> problem ? I just spent a lot of time trying to find this one single
>> command in a wall of text, "ut dm spi_find", which I think is the one
>> failing for you ? It seems to work for me with and without the patch:
>>
>> $ ./u-boot -d arch/sandbox/dts/test.dtb
>> sandbox: starting...
>>
>>
>> U-Boot 2021.04-rc5-00005-g583b6858733 (Apr 01 2021 - 00:34:09 +0200)
>>
>> Model: sandbox
>> DRAM:  128 MiB
>> WDT:   Started with servicing (60s timeout)
>> MMC:   mmc2: 2 (SD), mmc1: 1 (SD), mmc0: 0 (SD)
>> In:    serial
>> Out:   vidconsole
>> Err:   vidconsole
>> Model: sandbox
>> SCSI:
>> Net:   eth0: eth@10002000, eth5: eth@10003000, eth3: sbe5, eth6:
>> eth@10004000
>> Hit any key to stop autoboot:  0
>>
>> Without patch
>> => ut dm spi_find
>> Test: dm_test_spi_find: spi.c
>> test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum,
>> cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0
>> (0), got 0xfffffffb (-5)
>> Test: dm_test_spi_find: spi.c (flat tree)
>> test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum,
>> cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0
>> (0), got 0xfffffffb (-5)
>> Failures: 2
>>
>> With patch
>> => ut dm spi_find
>> Test: dm_test_spi_find: spi.c
>> test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum,
>> cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0
>> (0), got 0xfffffffb (-5)
>> Test: dm_test_spi_find: spi.c (flat tree)
>> test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum,
>> cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0
>> (0), got 0xfffffffb (-5)
>> Failures: 2
> 
> The Python test sets up a SPI-flash file (spi.bin) that is needed by the test.
> 
> I use this alias:
> 
> # Run a pytest on sandbox
> # $1: Name of test to run (optional, else run all)
> function pyt {
>     test/py/test.py -B sandbox --build-dir /tmp/b/sandbox ${1:+"-k $1"}
> }
> 
> then 'pyt spi_find'
> 
> Once you've done that once you can do what you had before:
> 
> ./u-boot -T

Do you also have a minimal test which does not depend on pytest at all ?
A testcase for a bug should be minimal and as simple as possible, it 
shouldn't depend on all kinds of variables which are only adding 
complexity and confusion.
Simon Glass March 31, 2021, 11:41 p.m. UTC | #16
Hi Marek,

On Thu, 1 Apr 2021 at 12:19, Marek Vasut <marex@denx.de> wrote:
>
> On 4/1/21 12:50 AM, Simon Glass wrote:
> > Hi Marek,
> >
> > On Thu, 1 Apr 2021 at 11:39, Marek Vasut <marex@denx.de> wrote:
> >>
> >> On 3/31/21 11:45 PM, Tom Rini wrote:
> >>> On Wed, Mar 31, 2021 at 10:05:28PM +0200, Marek Vasut wrote:
> >>>> On 3/31/21 9:40 PM, Tom Rini wrote:
> >>>>> On Wed, Mar 31, 2021 at 03:26:27PM -0400, Tom Rini wrote:
> >>>>>> On Wed, Mar 31, 2021 at 09:18:28PM +0200, Marek Vasut wrote:
> >>>>>>> On 3/29/21 4:40 AM, Marek Vasut wrote:
> >>>>>>>> On 3/29/21 4:09 AM, Marek Vasut wrote:
> >>>>>>>>> On 3/29/21 3:59 AM, Tom Rini wrote:
> >>>>>>>>>> On Mon, Mar 29, 2021 at 03:32:51AM +0200, Marek Vasut wrote:
> >>>>>>>>>>> On 3/28/21 11:30 PM, Tom Rini wrote:
> >>>>>>>>>>>> On Fri, Feb 26, 2021 at 03:21:24PM +0100, Marek Vasut wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> The spi_get_bus_and_cs() may be called on the same bus and chipselect
> >>>>>>>>>>>>> with different frequency or mode. This is valid usecase, but the code
> >>>>>>>>>>>>> fails to notify the controller of such a configuration change. Call
> >>>>>>>>>>>>> spi_set_speed_mode() in case bus frequency or bus mode changed to let
> >>>>>>>>>>>>> the controller update the configuration.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The problem can easily be triggered using the sspi command:
> >>>>>>>>>>>>> => sspi 0:0@1000
> >>>>>>>>>>>>> => sspi 0:0@2000
> >>>>>>>>>>>>> Without this patch, both transfers happen at 1000
> >>>>>>>>>>>>> Hz. With this patch,
> >>>>>>>>>>>>> the later transfer happens correctly at 2000 Hz.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Signed-off-by: Marek Vasut <marex@denx.de>
> >>>>>>>>>>>>> Cc: Jagan Teki <jagan@amarulasolutions.com>
> >>>>>>>>>>>>> Cc: Patrick Delaunay <patrick.delaunay@st.com>
> >>>>>>>>>>>>
> >>>>>>>>>>>> So, very reliably I can make:
> >>>>>>>>>>>> https://source.denx.de/u-boot/u-boot/-/jobs/245517
> >>>>>>>>>>>> happen locally as well building with clang.  It's not
> >>>>>>>>>>>> obvious to me why
> >>>>>>>>>>>> the test now fails however.
> >>>>>>>>>>>
> >>>>>>>>>>> Can you please be more specific / clear ? I have no idea what those 300
> >>>>>>>>>>> lines of cryptic output mean, nor what are you trying to say
> >>>>>>>>>>> by the above,
> >>>>>>>>>>> sorry.
> >>>>>>>>>>
> >>>>>>>>>> If you build with clang, for sandbox, and run the tests, U-Boot crashes
> >>>>>>>>>> in the unit tests that you start with "ut dm".
> >>>>>>>>>
> >>>>>>>>> And that is related to this patch somehow ? How ?
> >>>>>>>>
> >>>>>>>> ... after further discussion off-list to get a better test case
> >>>>>>>> description ...
> >>>>>>>>
> >>>>>>>> It seems the problem is in sound_beep() and it is unrelated to this
> >>>>>>>> patch, as the same problem happens with / without this patch being
> >>>>>>>> applied, on:
> >>>>>>>>
> >>>>>>>> a7d3ac61482 ("Merge branch '2021-03-28-assorted-bugfixes'")
> >>>>>>>>
> >>>>>>>> $ clang --version
> >>>>>>>> Debian clang version 11.0.1-2
> >>>>>>>> ...
> >>>>>>>> $ make CC=clang HOSTCC=clang sandbox_defconfig
> >>>>>>>> $ make CC=clang HOSTCC=clang
> >>>>>>>> $ gdb --args ./u-boot -d arch/sandbox/dts/test.dtb
> >>>>>>>> => ut dm
> >>>>>>>> ...
> >>>>>>>> Program received signal SIGSEGV, Segmentation fault.
> >>>>>>>> dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
> >>>>>>>> common/dlmalloc.c:1623
> >>>>>>>> 1623      if (!(inuse_bit_at_offset(next, nextsz)))   /* consolidate
> >>>>>>>> forward */
> >>>>>>>> (gdb) bt
> >>>>>>>> #0  dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
> >>>>>>>> common/dlmalloc.c:1623
> >>>>>>>> #1  0x00000000004759e3 in sound_beep (dev=0x15c05580, msecs=<optimized
> >>>>>>>> out>, frequency_hz=100)
> >>>>>>>>        at drivers/sound/sound-uclass.c:118
> >>>>>>>
> >>>>>>> Tom, any further comments ? It seems your claim that this bugfix is causing
> >>>>>>> problems with clang is not correct, so I think it should still be applied to
> >>>>>>> v2021.04 , unless you can provide more details about the clang problem ?
> >>>>>>
> >>>>>> There's at least a few funny things going on.  Yes, apparently outside
> >>>>>> of CI building sandbox with clang and running the tests manually crashes
> >>>>>> in sound, before it gets to the SPI tests.  Except when it crashes on
> >>>>>> the SPI tests instead, which I did get to happen once.  So there's some
> >>>>>> sort of use-after-free or similar bug around here somewhere, similar to
> >>>>>> how we can't use the -fstack-protector patch as it exposes other real
> >>>>>> problems that need to be fixed first.
> >>>>>>
> >>>>>> So, I'll see if with a new tag and a few other changes having been made
> >>>>>> we once again get to the point where your patch doesn't trigger this
> >>>>>> issue.  But since it's also not a regression fix, if no one can figure
> >>>>>> out what to do about fixing what clang shows us, then it can wait for
> >>>>>> v2021.07.
> >>>>>
> >>>>> Nope, still gets things to blow up:
> >>>>> https://source.denx.de/u-boot/u-boot/-/jobs/246848
> >>>>>
> >>>>> and yes, it would be nice if when the reason a pytest fails is that
> >>>>> U-Boot crashed, it would say something more clear than "OSError: [Errno
> >>>>> 5] Input/output error" and you have to know that means U-Boot died.
> >>>>
> >>>> Can you please provide a detailed test case how to reproduce this ?
> >>>> So far I don't have one, the only test description I got is "install random
> >>>> version of clang, run ut dm and it fails", but that kind of imprecise test
> >>>> description is really not useful. Please provide more specific instructions.
> >>>
> >>> Yes, follow what CI does.  It's done in a container, so you can have the
> >>> exact same runtime and the tests are in .azure-pipelines.yml and
> >>> .gitlab-ci.yml or if you look at
> >>> https://source.denx.de/u-boot/u-boot/-/jobs/246848/raw you can see the
> >>> commands in there.
> >>
> >> Can you please provide me the exact commands to run to reproduce this
> >> problem ? I just spent a lot of time trying to find this one single
> >> command in a wall of text, "ut dm spi_find", which I think is the one
> >> failing for you ? It seems to work for me with and without the patch:
> >>
> >> $ ./u-boot -d arch/sandbox/dts/test.dtb
> >> sandbox: starting...
> >>
> >>
> >> U-Boot 2021.04-rc5-00005-g583b6858733 (Apr 01 2021 - 00:34:09 +0200)
> >>
> >> Model: sandbox
> >> DRAM:  128 MiB
> >> WDT:   Started with servicing (60s timeout)
> >> MMC:   mmc2: 2 (SD), mmc1: 1 (SD), mmc0: 0 (SD)
> >> In:    serial
> >> Out:   vidconsole
> >> Err:   vidconsole
> >> Model: sandbox
> >> SCSI:
> >> Net:   eth0: eth@10002000, eth5: eth@10003000, eth3: sbe5, eth6:
> >> eth@10004000
> >> Hit any key to stop autoboot:  0
> >>
> >> Without patch
> >> => ut dm spi_find
> >> Test: dm_test_spi_find: spi.c
> >> test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum,
> >> cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0
> >> (0), got 0xfffffffb (-5)
> >> Test: dm_test_spi_find: spi.c (flat tree)
> >> test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum,
> >> cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0
> >> (0), got 0xfffffffb (-5)
> >> Failures: 2
> >>
> >> With patch
> >> => ut dm spi_find
> >> Test: dm_test_spi_find: spi.c
> >> test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum,
> >> cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0
> >> (0), got 0xfffffffb (-5)
> >> Test: dm_test_spi_find: spi.c (flat tree)
> >> test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum,
> >> cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0
> >> (0), got 0xfffffffb (-5)
> >> Failures: 2
> >
> > The Python test sets up a SPI-flash file (spi.bin) that is needed by the test.
> >
> > I use this alias:
> >
> > # Run a pytest on sandbox
> > # $1: Name of test to run (optional, else run all)
> > function pyt {
> >     test/py/test.py -B sandbox --build-dir /tmp/b/sandbox ${1:+"-k $1"}
> > }
> >
> > then 'pyt spi_find'
> >
> > Once you've done that once you can do what you had before:
> >
> > ./u-boot -T
>
> Do you also have a minimal test which does not depend on pytest at all ?
> A testcase for a bug should be minimal and as simple as possible, it
> shouldn't depend on all kinds of variables which are only adding
> complexity and confusion.

There is quite a bit more documentation languishing in the -next branch, e.g.:

https://github.com/u-boot/u-boot/blob/next/doc/develop/tests_sandbox.rst

The C SPI tests could perhaps be updated to automatically create the
spi.bin file, but several of them share it, so it would need to be a
shared function.

I certainly only used pytest when needed, hence the substantial effort
I made to document how to run sandbox tests directly, but I should
point out that pytest is the overall framework that we use.

Regards,
Simon
Marek Vasut March 31, 2021, 11:47 p.m. UTC | #17
On 4/1/21 1:41 AM, Simon Glass wrote:
> Hi Marek,
> 
> On Thu, 1 Apr 2021 at 12:19, Marek Vasut <marex@denx.de> wrote:
>>
>> On 4/1/21 12:50 AM, Simon Glass wrote:
>>> Hi Marek,
>>>
>>> On Thu, 1 Apr 2021 at 11:39, Marek Vasut <marex@denx.de> wrote:
>>>>
>>>> On 3/31/21 11:45 PM, Tom Rini wrote:
>>>>> On Wed, Mar 31, 2021 at 10:05:28PM +0200, Marek Vasut wrote:
>>>>>> On 3/31/21 9:40 PM, Tom Rini wrote:
>>>>>>> On Wed, Mar 31, 2021 at 03:26:27PM -0400, Tom Rini wrote:
>>>>>>>> On Wed, Mar 31, 2021 at 09:18:28PM +0200, Marek Vasut wrote:
>>>>>>>>> On 3/29/21 4:40 AM, Marek Vasut wrote:
>>>>>>>>>> On 3/29/21 4:09 AM, Marek Vasut wrote:
>>>>>>>>>>> On 3/29/21 3:59 AM, Tom Rini wrote:
>>>>>>>>>>>> On Mon, Mar 29, 2021 at 03:32:51AM +0200, Marek Vasut wrote:
>>>>>>>>>>>>> On 3/28/21 11:30 PM, Tom Rini wrote:
>>>>>>>>>>>>>> On Fri, Feb 26, 2021 at 03:21:24PM +0100, Marek Vasut wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The spi_get_bus_and_cs() may be called on the same bus and chipselect
>>>>>>>>>>>>>>> with different frequency or mode. This is valid usecase, but the code
>>>>>>>>>>>>>>> fails to notify the controller of such a configuration change. Call
>>>>>>>>>>>>>>> spi_set_speed_mode() in case bus frequency or bus mode changed to let
>>>>>>>>>>>>>>> the controller update the configuration.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The problem can easily be triggered using the sspi command:
>>>>>>>>>>>>>>> => sspi 0:0@1000
>>>>>>>>>>>>>>> => sspi 0:0@2000
>>>>>>>>>>>>>>> Without this patch, both transfers happen at 1000
>>>>>>>>>>>>>>> Hz. With this patch,
>>>>>>>>>>>>>>> the later transfer happens correctly at 2000 Hz.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Signed-off-by: Marek Vasut <marex@denx.de>
>>>>>>>>>>>>>>> Cc: Jagan Teki <jagan@amarulasolutions.com>
>>>>>>>>>>>>>>> Cc: Patrick Delaunay <patrick.delaunay@st.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So, very reliably I can make:
>>>>>>>>>>>>>> https://source.denx.de/u-boot/u-boot/-/jobs/245517
>>>>>>>>>>>>>> happen locally as well building with clang.  It's not
>>>>>>>>>>>>>> obvious to me why
>>>>>>>>>>>>>> the test now fails however.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you please be more specific / clear ? I have no idea what those 300
>>>>>>>>>>>>> lines of cryptic output mean, nor what are you trying to say
>>>>>>>>>>>>> by the above,
>>>>>>>>>>>>> sorry.
>>>>>>>>>>>>
>>>>>>>>>>>> If you build with clang, for sandbox, and run the tests, U-Boot crashes
>>>>>>>>>>>> in the unit tests that you start with "ut dm".
>>>>>>>>>>>
>>>>>>>>>>> And that is related to this patch somehow ? How ?
>>>>>>>>>>
>>>>>>>>>> ... after further discussion off-list to get a better test case
>>>>>>>>>> description ...
>>>>>>>>>>
>>>>>>>>>> It seems the problem is in sound_beep() and it is unrelated to this
>>>>>>>>>> patch, as the same problem happens with / without this patch being
>>>>>>>>>> applied, on:
>>>>>>>>>>
>>>>>>>>>> a7d3ac61482 ("Merge branch '2021-03-28-assorted-bugfixes'")
>>>>>>>>>>
>>>>>>>>>> $ clang --version
>>>>>>>>>> Debian clang version 11.0.1-2
>>>>>>>>>> ...
>>>>>>>>>> $ make CC=clang HOSTCC=clang sandbox_defconfig
>>>>>>>>>> $ make CC=clang HOSTCC=clang
>>>>>>>>>> $ gdb --args ./u-boot -d arch/sandbox/dts/test.dtb
>>>>>>>>>> => ut dm
>>>>>>>>>> ...
>>>>>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>>>>>> dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
>>>>>>>>>> common/dlmalloc.c:1623
>>>>>>>>>> 1623      if (!(inuse_bit_at_offset(next, nextsz)))   /* consolidate
>>>>>>>>>> forward */
>>>>>>>>>> (gdb) bt
>>>>>>>>>> #0  dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
>>>>>>>>>> common/dlmalloc.c:1623
>>>>>>>>>> #1  0x00000000004759e3 in sound_beep (dev=0x15c05580, msecs=<optimized
>>>>>>>>>> out>, frequency_hz=100)
>>>>>>>>>>         at drivers/sound/sound-uclass.c:118
>>>>>>>>>
>>>>>>>>> Tom, any further comments ? It seems your claim that this bugfix is causing
>>>>>>>>> problems with clang is not correct, so I think it should still be applied to
>>>>>>>>> v2021.04 , unless you can provide more details about the clang problem ?
>>>>>>>>
>>>>>>>> There's at least a few funny things going on.  Yes, apparently outside
>>>>>>>> of CI building sandbox with clang and running the tests manually crashes
>>>>>>>> in sound, before it gets to the SPI tests.  Except when it crashes on
>>>>>>>> the SPI tests instead, which I did get to happen once.  So there's some
>>>>>>>> sort of use-after-free or similar bug around here somewhere, similar to
>>>>>>>> how we can't use the -fstack-protector patch as it exposes other real
>>>>>>>> problems that need to be fixed first.
>>>>>>>>
>>>>>>>> So, I'll see if with a new tag and a few other changes having been made
>>>>>>>> we once again get to the point where your patch doesn't trigger this
>>>>>>>> issue.  But since it's also not a regression fix, if no one can figure
>>>>>>>> out what to do about fixing what clang shows us, then it can wait for
>>>>>>>> v2021.07.
>>>>>>>
>>>>>>> Nope, still gets things to blow up:
>>>>>>> https://source.denx.de/u-boot/u-boot/-/jobs/246848
>>>>>>>
>>>>>>> and yes, it would be nice if when the reason a pytest fails is that
>>>>>>> U-Boot crashed, it would say something more clear than "OSError: [Errno
>>>>>>> 5] Input/output error" and you have to know that means U-Boot died.
>>>>>>
>>>>>> Can you please provide a detailed test case how to reproduce this ?
>>>>>> So far I don't have one, the only test description I got is "install random
>>>>>> version of clang, run ut dm and it fails", but that kind of imprecise test
>>>>>> description is really not useful. Please provide more specific instructions.
>>>>>
>>>>> Yes, follow what CI does.  It's done in a container, so you can have the
>>>>> exact same runtime and the tests are in .azure-pipelines.yml and
>>>>> .gitlab-ci.yml or if you look at
>>>>> https://source.denx.de/u-boot/u-boot/-/jobs/246848/raw you can see the
>>>>> commands in there.
>>>>
>>>> Can you please provide me the exact commands to run to reproduce this
>>>> problem ? I just spent a lot of time trying to find this one single
>>>> command in a wall of text, "ut dm spi_find", which I think is the one
>>>> failing for you ? It seems to work for me with and without the patch:
>>>>
>>>> $ ./u-boot -d arch/sandbox/dts/test.dtb
>>>> sandbox: starting...
>>>>
>>>>
>>>> U-Boot 2021.04-rc5-00005-g583b6858733 (Apr 01 2021 - 00:34:09 +0200)
>>>>
>>>> Model: sandbox
>>>> DRAM:  128 MiB
>>>> WDT:   Started with servicing (60s timeout)
>>>> MMC:   mmc2: 2 (SD), mmc1: 1 (SD), mmc0: 0 (SD)
>>>> In:    serial
>>>> Out:   vidconsole
>>>> Err:   vidconsole
>>>> Model: sandbox
>>>> SCSI:
>>>> Net:   eth0: eth@10002000, eth5: eth@10003000, eth3: sbe5, eth6:
>>>> eth@10004000
>>>> Hit any key to stop autoboot:  0
>>>>
>>>> Without patch
>>>> => ut dm spi_find
>>>> Test: dm_test_spi_find: spi.c
>>>> test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum,
>>>> cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0
>>>> (0), got 0xfffffffb (-5)
>>>> Test: dm_test_spi_find: spi.c (flat tree)
>>>> test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum,
>>>> cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0
>>>> (0), got 0xfffffffb (-5)
>>>> Failures: 2
>>>>
>>>> With patch
>>>> => ut dm spi_find
>>>> Test: dm_test_spi_find: spi.c
>>>> test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum,
>>>> cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0
>>>> (0), got 0xfffffffb (-5)
>>>> Test: dm_test_spi_find: spi.c (flat tree)
>>>> test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum,
>>>> cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0
>>>> (0), got 0xfffffffb (-5)
>>>> Failures: 2
>>>
>>> The Python test sets up a SPI-flash file (spi.bin) that is needed by the test.
>>>
>>> I use this alias:
>>>
>>> # Run a pytest on sandbox
>>> # $1: Name of test to run (optional, else run all)
>>> function pyt {
>>>      test/py/test.py -B sandbox --build-dir /tmp/b/sandbox ${1:+"-k $1"}
>>> }
>>>
>>> then 'pyt spi_find'
>>>
>>> Once you've done that once you can do what you had before:
>>>
>>> ./u-boot -T
>>
>> Do you also have a minimal test which does not depend on pytest at all ?
>> A testcase for a bug should be minimal and as simple as possible, it
>> shouldn't depend on all kinds of variables which are only adding
>> complexity and confusion.
> 
> There is quite a bit more documentation languishing in the -next branch, e.g.:
> 
> https://github.com/u-boot/u-boot/blob/next/doc/develop/tests_sandbox.rst
> 
> The C SPI tests could perhaps be updated to automatically create the
> spi.bin file, but several of them share it, so it would need to be a
> shared function.
> 
> I certainly only used pytest when needed, hence the substantial effort
> I made to document how to run sandbox tests directly, but I should
> point out that pytest is the overall framework that we use.

Surely for the sake of trivial test , spi.bin file can be generated 
manually using e.g. dd ?
Tom Rini March 31, 2021, 11:54 p.m. UTC | #18
On Thu, Apr 01, 2021 at 01:47:09AM +0200, Marek Vasut wrote:
> On 4/1/21 1:41 AM, Simon Glass wrote:
> > Hi Marek,
> > 
> > On Thu, 1 Apr 2021 at 12:19, Marek Vasut <marex@denx.de> wrote:
> > > 
> > > On 4/1/21 12:50 AM, Simon Glass wrote:
> > > > Hi Marek,
> > > > 
> > > > On Thu, 1 Apr 2021 at 11:39, Marek Vasut <marex@denx.de> wrote:
> > > > > 
> > > > > On 3/31/21 11:45 PM, Tom Rini wrote:
> > > > > > On Wed, Mar 31, 2021 at 10:05:28PM +0200, Marek Vasut wrote:
> > > > > > > On 3/31/21 9:40 PM, Tom Rini wrote:
> > > > > > > > On Wed, Mar 31, 2021 at 03:26:27PM -0400, Tom Rini wrote:
> > > > > > > > > On Wed, Mar 31, 2021 at 09:18:28PM +0200, Marek Vasut wrote:
> > > > > > > > > > On 3/29/21 4:40 AM, Marek Vasut wrote:
> > > > > > > > > > > On 3/29/21 4:09 AM, Marek Vasut wrote:
> > > > > > > > > > > > On 3/29/21 3:59 AM, Tom Rini wrote:
> > > > > > > > > > > > > On Mon, Mar 29, 2021 at 03:32:51AM +0200, Marek Vasut wrote:
> > > > > > > > > > > > > > On 3/28/21 11:30 PM, Tom Rini wrote:
> > > > > > > > > > > > > > > On Fri, Feb 26, 2021 at 03:21:24PM +0100, Marek Vasut wrote:
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > The spi_get_bus_and_cs() may be called on the same bus and chipselect
> > > > > > > > > > > > > > > > with different frequency or mode. This is valid usecase, but the code
> > > > > > > > > > > > > > > > fails to notify the controller of such a configuration change. Call
> > > > > > > > > > > > > > > > spi_set_speed_mode() in case bus frequency or bus mode changed to let
> > > > > > > > > > > > > > > > the controller update the configuration.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > The problem can easily be triggered using the sspi command:
> > > > > > > > > > > > > > > > => sspi 0:0@1000
> > > > > > > > > > > > > > > > => sspi 0:0@2000
> > > > > > > > > > > > > > > > Without this patch, both transfers happen at 1000
> > > > > > > > > > > > > > > > Hz. With this patch,
> > > > > > > > > > > > > > > > the later transfer happens correctly at 2000 Hz.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Signed-off-by: Marek Vasut <marex@denx.de>
> > > > > > > > > > > > > > > > Cc: Jagan Teki <jagan@amarulasolutions.com>
> > > > > > > > > > > > > > > > Cc: Patrick Delaunay <patrick.delaunay@st.com>
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > So, very reliably I can make:
> > > > > > > > > > > > > > > https://source.denx.de/u-boot/u-boot/-/jobs/245517
> > > > > > > > > > > > > > > happen locally as well building with clang.  It's not
> > > > > > > > > > > > > > > obvious to me why
> > > > > > > > > > > > > > > the test now fails however.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Can you please be more specific / clear ? I have no idea what those 300
> > > > > > > > > > > > > > lines of cryptic output mean, nor what are you trying to say
> > > > > > > > > > > > > > by the above,
> > > > > > > > > > > > > > sorry.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > If you build with clang, for sandbox, and run the tests, U-Boot crashes
> > > > > > > > > > > > > in the unit tests that you start with "ut dm".
> > > > > > > > > > > > 
> > > > > > > > > > > > And that is related to this patch somehow ? How ?
> > > > > > > > > > > 
> > > > > > > > > > > ... after further discussion off-list to get a better test case
> > > > > > > > > > > description ...
> > > > > > > > > > > 
> > > > > > > > > > > It seems the problem is in sound_beep() and it is unrelated to this
> > > > > > > > > > > patch, as the same problem happens with / without this patch being
> > > > > > > > > > > applied, on:
> > > > > > > > > > > 
> > > > > > > > > > > a7d3ac61482 ("Merge branch '2021-03-28-assorted-bugfixes'")
> > > > > > > > > > > 
> > > > > > > > > > > $ clang --version
> > > > > > > > > > > Debian clang version 11.0.1-2
> > > > > > > > > > > ...
> > > > > > > > > > > $ make CC=clang HOSTCC=clang sandbox_defconfig
> > > > > > > > > > > $ make CC=clang HOSTCC=clang
> > > > > > > > > > > $ gdb --args ./u-boot -d arch/sandbox/dts/test.dtb
> > > > > > > > > > > => ut dm
> > > > > > > > > > > ...
> > > > > > > > > > > Program received signal SIGSEGV, Segmentation fault.
> > > > > > > > > > > dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
> > > > > > > > > > > common/dlmalloc.c:1623
> > > > > > > > > > > 1623      if (!(inuse_bit_at_offset(next, nextsz)))   /* consolidate
> > > > > > > > > > > forward */
> > > > > > > > > > > (gdb) bt
> > > > > > > > > > > #0  dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
> > > > > > > > > > > common/dlmalloc.c:1623
> > > > > > > > > > > #1  0x00000000004759e3 in sound_beep (dev=0x15c05580, msecs=<optimized
> > > > > > > > > > > out>, frequency_hz=100)
> > > > > > > > > > >         at drivers/sound/sound-uclass.c:118
> > > > > > > > > > 
> > > > > > > > > > Tom, any further comments ? It seems your claim that this bugfix is causing
> > > > > > > > > > problems with clang is not correct, so I think it should still be applied to
> > > > > > > > > > v2021.04 , unless you can provide more details about the clang problem ?
> > > > > > > > > 
> > > > > > > > > There's at least a few funny things going on.  Yes, apparently outside
> > > > > > > > > of CI building sandbox with clang and running the tests manually crashes
> > > > > > > > > in sound, before it gets to the SPI tests.  Except when it crashes on
> > > > > > > > > the SPI tests instead, which I did get to happen once.  So there's some
> > > > > > > > > sort of use-after-free or similar bug around here somewhere, similar to
> > > > > > > > > how we can't use the -fstack-protector patch as it exposes other real
> > > > > > > > > problems that need to be fixed first.
> > > > > > > > > 
> > > > > > > > > So, I'll see if with a new tag and a few other changes having been made
> > > > > > > > > we once again get to the point where your patch doesn't trigger this
> > > > > > > > > issue.  But since it's also not a regression fix, if no one can figure
> > > > > > > > > out what to do about fixing what clang shows us, then it can wait for
> > > > > > > > > v2021.07.
> > > > > > > > 
> > > > > > > > Nope, still gets things to blow up:
> > > > > > > > https://source.denx.de/u-boot/u-boot/-/jobs/246848
> > > > > > > > 
> > > > > > > > and yes, it would be nice if when the reason a pytest fails is that
> > > > > > > > U-Boot crashed, it would say something more clear than "OSError: [Errno
> > > > > > > > 5] Input/output error" and you have to know that means U-Boot died.
> > > > > > > 
> > > > > > > Can you please provide a detailed test case how to reproduce this ?
> > > > > > > So far I don't have one, the only test description I got is "install random
> > > > > > > version of clang, run ut dm and it fails", but that kind of imprecise test
> > > > > > > description is really not useful. Please provide more specific instructions.
> > > > > > 
> > > > > > Yes, follow what CI does.  It's done in a container, so you can have the
> > > > > > exact same runtime and the tests are in .azure-pipelines.yml and
> > > > > > .gitlab-ci.yml or if you look at
> > > > > > https://source.denx.de/u-boot/u-boot/-/jobs/246848/raw you can see the
> > > > > > commands in there.
> > > > > 
> > > > > Can you please provide me the exact commands to run to reproduce this
> > > > > problem ? I just spent a lot of time trying to find this one single
> > > > > command in a wall of text, "ut dm spi_find", which I think is the one
> > > > > failing for you ? It seems to work for me with and without the patch:
> > > > > 
> > > > > $ ./u-boot -d arch/sandbox/dts/test.dtb
> > > > > sandbox: starting...
> > > > > 
> > > > > 
> > > > > U-Boot 2021.04-rc5-00005-g583b6858733 (Apr 01 2021 - 00:34:09 +0200)
> > > > > 
> > > > > Model: sandbox
> > > > > DRAM:  128 MiB
> > > > > WDT:   Started with servicing (60s timeout)
> > > > > MMC:   mmc2: 2 (SD), mmc1: 1 (SD), mmc0: 0 (SD)
> > > > > In:    serial
> > > > > Out:   vidconsole
> > > > > Err:   vidconsole
> > > > > Model: sandbox
> > > > > SCSI:
> > > > > Net:   eth0: eth@10002000, eth5: eth@10003000, eth3: sbe5, eth6:
> > > > > eth@10004000
> > > > > Hit any key to stop autoboot:  0
> > > > > 
> > > > > Without patch
> > > > > => ut dm spi_find
> > > > > Test: dm_test_spi_find: spi.c
> > > > > test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum,
> > > > > cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0
> > > > > (0), got 0xfffffffb (-5)
> > > > > Test: dm_test_spi_find: spi.c (flat tree)
> > > > > test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum,
> > > > > cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0
> > > > > (0), got 0xfffffffb (-5)
> > > > > Failures: 2
> > > > > 
> > > > > With patch
> > > > > => ut dm spi_find
> > > > > Test: dm_test_spi_find: spi.c
> > > > > test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum,
> > > > > cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0
> > > > > (0), got 0xfffffffb (-5)
> > > > > Test: dm_test_spi_find: spi.c (flat tree)
> > > > > test/dm/spi.c:71, dm_test_spi_find(): 0 == spi_get_bus_and_cs(busnum,
> > > > > cs, speed, mode, "jedec_spi_nor", "name", &bus, &slave): Expected 0x0
> > > > > (0), got 0xfffffffb (-5)
> > > > > Failures: 2
> > > > 
> > > > The Python test sets up a SPI-flash file (spi.bin) that is needed by the test.
> > > > 
> > > > I use this alias:
> > > > 
> > > > # Run a pytest on sandbox
> > > > # $1: Name of test to run (optional, else run all)
> > > > function pyt {
> > > >      test/py/test.py -B sandbox --build-dir /tmp/b/sandbox ${1:+"-k $1"}
> > > > }
> > > > 
> > > > then 'pyt spi_find'
> > > > 
> > > > Once you've done that once you can do what you had before:
> > > > 
> > > > ./u-boot -T
> > > 
> > > Do you also have a minimal test which does not depend on pytest at all ?
> > > A testcase for a bug should be minimal and as simple as possible, it
> > > shouldn't depend on all kinds of variables which are only adding
> > > complexity and confusion.
> > 
> > There is quite a bit more documentation languishing in the -next branch, e.g.:
> > 
> > https://github.com/u-boot/u-boot/blob/next/doc/develop/tests_sandbox.rst
> > 
> > The C SPI tests could perhaps be updated to automatically create the
> > spi.bin file, but several of them share it, so it would need to be a
> > shared function.
> > 
> > I certainly only used pytest when needed, hence the substantial effort
> > I made to document how to run sandbox tests directly, but I should
> > point out that pytest is the overall framework that we use.
> 
> Surely for the sake of trivial test , spi.bin file can be generated manually
> using e.g. dd ?

But part of the problem, quite literally, is that the existing tests for
SPI functionality are failing.  Given that it seems "this change
directly is wrong" is unlikely but "this change exposes some more
general problem within any of {sandbox,unit tests,unit test
framework,spi}" is likely, there is not yet a more minimal test for the
problem because one has not been found.  The easy reproducer is "run the
tests like CI does".  It further seems likely to me that it's a problem
in either sandbox or unit test framework as you and I both made that
sound test fail sometime.  And note that doc/develop/py_testing.rst
tells how to hook pytest and GDB up.
Tom Rini April 1, 2021, 2:18 p.m. UTC | #19
On Thu, Apr 01, 2021 at 11:19:27AM +1300, Simon Glass wrote:
> +Heinrich Schuchardt
> 
> On Thu, 1 Apr 2021 at 10:45, Tom Rini <trini@konsulko.com> wrote:
> >
> > On Wed, Mar 31, 2021 at 10:05:28PM +0200, Marek Vasut wrote:
> > > On 3/31/21 9:40 PM, Tom Rini wrote:
> > > > On Wed, Mar 31, 2021 at 03:26:27PM -0400, Tom Rini wrote:
> > > > > On Wed, Mar 31, 2021 at 09:18:28PM +0200, Marek Vasut wrote:
> > > > > > On 3/29/21 4:40 AM, Marek Vasut wrote:
> > > > > > > On 3/29/21 4:09 AM, Marek Vasut wrote:
> > > > > > > > On 3/29/21 3:59 AM, Tom Rini wrote:
> > > > > > > > > On Mon, Mar 29, 2021 at 03:32:51AM +0200, Marek Vasut wrote:
> > > > > > > > > > On 3/28/21 11:30 PM, Tom Rini wrote:
> > > > > > > > > > > On Fri, Feb 26, 2021 at 03:21:24PM +0100, Marek Vasut wrote:
> > > > > > > > > > >
> > > > > > > > > > > > The spi_get_bus_and_cs() may be called on the same bus and chipselect
> > > > > > > > > > > > with different frequency or mode. This is valid usecase, but the code
> > > > > > > > > > > > fails to notify the controller of such a configuration change. Call
> > > > > > > > > > > > spi_set_speed_mode() in case bus frequency or bus mode changed to let
> > > > > > > > > > > > the controller update the configuration.
> > > > > > > > > > > >
> > > > > > > > > > > > The problem can easily be triggered using the sspi command:
> > > > > > > > > > > > => sspi 0:0@1000
> > > > > > > > > > > > => sspi 0:0@2000
> > > > > > > > > > > > Without this patch, both transfers happen at 1000
> > > > > > > > > > > > Hz. With this patch,
> > > > > > > > > > > > the later transfer happens correctly at 2000 Hz.
> > > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Marek Vasut <marex@denx.de>
> > > > > > > > > > > > Cc: Jagan Teki <jagan@amarulasolutions.com>
> > > > > > > > > > > > Cc: Patrick Delaunay <patrick.delaunay@st.com>
> > > > > > > > > > >
> > > > > > > > > > > So, very reliably I can make:
> > > > > > > > > > > https://source.denx.de/u-boot/u-boot/-/jobs/245517
> > > > > > > > > > > happen locally as well building with clang.  It's not
> > > > > > > > > > > obvious to me why
> > > > > > > > > > > the test now fails however.
> > > > > > > > > >
> > > > > > > > > > Can you please be more specific / clear ? I have no idea what those 300
> > > > > > > > > > lines of cryptic output mean, nor what are you trying to say
> > > > > > > > > > by the above,
> > > > > > > > > > sorry.
> > > > > > > > >
> > > > > > > > > If you build with clang, for sandbox, and run the tests, U-Boot crashes
> > > > > > > > > in the unit tests that you start with "ut dm".
> > > > > > > >
> > > > > > > > And that is related to this patch somehow ? How ?
> > > > > > >
> > > > > > > ... after further discussion off-list to get a better test case
> > > > > > > description ...
> > > > > > >
> > > > > > > It seems the problem is in sound_beep() and it is unrelated to this
> > > > > > > patch, as the same problem happens with / without this patch being
> > > > > > > applied, on:
> > > > > > >
> > > > > > > a7d3ac61482 ("Merge branch '2021-03-28-assorted-bugfixes'")
> > > > > > >
> > > > > > > $ clang --version
> > > > > > > Debian clang version 11.0.1-2
> > > > > > > ...
> > > > > > > $ make CC=clang HOSTCC=clang sandbox_defconfig
> > > > > > > $ make CC=clang HOSTCC=clang
> > > > > > > $ gdb --args ./u-boot -d arch/sandbox/dts/test.dtb
> > > > > > > => ut dm
> > > > > > > ...
> > > > > > > Program received signal SIGSEGV, Segmentation fault.
> > > > > > > dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
> > > > > > > common/dlmalloc.c:1623
> > > > > > > 1623      if (!(inuse_bit_at_offset(next, nextsz)))   /* consolidate
> > > > > > > forward */
> > > > > > > (gdb) bt
> > > > > > > #0  dlfree (mem=<optimized out>, mem@entry=0x15c19c50) at
> > > > > > > common/dlmalloc.c:1623
> > > > > > > #1  0x00000000004759e3 in sound_beep (dev=0x15c05580, msecs=<optimized
> > > > > > > out>, frequency_hz=100)
> > > > > > >      at drivers/sound/sound-uclass.c:118
> > > > > >
> > > > > > Tom, any further comments ? It seems your claim that this bugfix is causing
> > > > > > problems with clang is not correct, so I think it should still be applied to
> > > > > > v2021.04 , unless you can provide more details about the clang problem ?
> > > > >
> > > > > There's at least a few funny things going on.  Yes, apparently outside
> > > > > of CI building sandbox with clang and running the tests manually crashes
> > > > > in sound, before it gets to the SPI tests.  Except when it crashes on
> > > > > the SPI tests instead, which I did get to happen once.  So there's some
> > > > > sort of use-after-free or similar bug around here somewhere, similar to
> > > > > how we can't use the -fstack-protector patch as it exposes other real
> > > > > problems that need to be fixed first.
> > > > >
> > > > > So, I'll see if with a new tag and a few other changes having been made
> > > > > we once again get to the point where your patch doesn't trigger this
> > > > > issue.  But since it's also not a regression fix, if no one can figure
> > > > > out what to do about fixing what clang shows us, then it can wait for
> > > > > v2021.07.
> > > >
> > > > Nope, still gets things to blow up:
> > > > https://source.denx.de/u-boot/u-boot/-/jobs/246848
> > > >
> > > > and yes, it would be nice if when the reason a pytest fails is that
> > > > U-Boot crashed, it would say something more clear than "OSError: [Errno
> > > > 5] Input/output error" and you have to know that means U-Boot died.
> > >
> > > Can you please provide a detailed test case how to reproduce this ?
> > > So far I don't have one, the only test description I got is "install random
> > > version of clang, run ut dm and it fails", but that kind of imprecise test
> > > description is really not useful. Please provide more specific instructions.
> >
> > Yes, follow what CI does.  It's done in a container, so you can have the
> > exact same runtime and the tests are in .azure-pipelines.yml and
> > .gitlab-ci.yml or if you look at
> > https://source.denx.de/u-boot/u-boot/-/jobs/246848/raw you can see the
> > commands in there.
> 
> This does not seem good. I wonder if this patch might have introduced
> some instability?
> 
> b308d9fd18f sandbox: Avoid using malloc() for system state

The problem is normally seen on master, where this patch is not yet.
But on next it fails the same way.
diff mbox series

Patch

diff --git a/drivers/spi/spi-uclass.c b/drivers/spi/spi-uclass.c
index 7155d4aebd6..5617f6645ee 100644
--- a/drivers/spi/spi-uclass.c
+++ b/drivers/spi/spi-uclass.c
@@ -405,12 +405,22 @@  int spi_get_bus_and_cs(int busnum, int cs, int speed, int mode,
 			goto err;
 	}
 
+	/* In case bus frequency or mode changed, update it. */
+	if ((speed && bus_data->speed && bus_data->speed != speed) ||
+	    (plat->mode != mode)) {
+		ret = spi_set_speed_mode(bus, speed, mode);
+		if (ret)
+			goto err_speed_mode;
+	}
+
 	*busp = bus;
 	*devp = slave;
 	log_debug("%s: bus=%p, slave=%p\n", __func__, bus, *devp);
 
 	return 0;
 
+err_speed_mode:
+	spi_release_bus(slave);
 err:
 	log_debug("%s: Error path, created=%d, device '%s'\n", __func__,
 		  created, dev->name);