diff mbox series

cmd: usb: Prevent reset in usb tree/info command

Message ID ZH39SVA1vbZR3+Gk@xdrudis.tinet.cat
State Superseded
Delegated to: Marek Vasut
Headers show
Series cmd: usb: Prevent reset in usb tree/info command | expand

Commit Message

Xavier Drudis Ferran June 5, 2023, 3:20 p.m. UTC
Add a check to avoid dommed (by null pointer dereference) recursive
call, not only for UCLASS_BLK.

When booting my Rock Pi 4B+ with a USB mass storage stick plugged
into one of the USB 2 ports (EHCI), when it is plugged before power
on, when issuing a

usb tree

or

usb info

command I get a "Synchronous Error" and a reset just after printing the
mass storage device in the usb tree or usb info. It might depend on the
contents of the USB stick too, I'm not sure.

It seems like I have two devices as children of the mass storage
device.  When there's only a UCLASS_BLK it works fine, but when there's
a UCLASS_BLK and a UCLASS_BOOTDEV, it recurses with a null udev as
first parameter and fails.

Likewise for usb_show_info().

Not sure if this should be a patch, an RFC or a bug report.  There may
be a better way to solve this, I haven't researched commit
201417d700a2ab09 ("bootstd: Add the bootdev uclass") and bootdev flow
properly, or thought about cases where udev is not null but the
recursive call might need preventing too. Feel free to think it over
before merging (or after). But this at least fixes a reset at an
innocent looking usb tree or usb info command. Maybe we can improve it
later?

Cc: Simon Glass <sjg@chromium.org>
Cc: Lukasz Majewski <lukma@denx.de>
Cc: Marek Vasut <marex@denx.de>


Signed-off-by: Xavier Drudis Ferran <xdrudis@tinet.cat>
---
 cmd/usb.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Marek Vasut June 7, 2023, 10:05 p.m. UTC | #1
On 6/5/23 17:20, Xavier Drudis Ferran wrote:
> Add a check to avoid dommed (by null pointer dereference) recursive
> call, not only for UCLASS_BLK.
> 
> When booting my Rock Pi 4B+ with a USB mass storage stick plugged
> into one of the USB 2 ports (EHCI), when it is plugged before power
> on, when issuing a
> 
> usb tree
> 
> or
> 
> usb info

I cannot reproduce the problem. Do you perform any other interaction 
with the USB stack, like e.g. 'usb start' or 'usb reset' before issuing 
the aforementioned commands ?

What kind of USB stick is used here, please share model, VID, PID.

> command I get a "Synchronous Error" and a reset just after printing the
> mass storage device in the usb tree or usb info. It might depend on the
> contents of the USB stick too, I'm not sure.
> 
> It seems like I have two devices as children of the mass storage
> device.  When there's only a UCLASS_BLK it works fine, but when there's
> a UCLASS_BLK and a UCLASS_BOOTDEV, it recurses with a null udev as
> first parameter and fails.
> 
> Likewise for usb_show_info().
> 
> Not sure if this should be a patch, an RFC or a bug report.  There may
> be a better way to solve this, I haven't researched commit
> 201417d700a2ab09 ("bootstd: Add the bootdev uclass") and bootdev flow
> properly, or thought about cases where udev is not null but the
> recursive call might need preventing too. Feel free to think it over
> before merging (or after).

It is unclear what the real issue is, so until that is sorted out, no 
merging will occur, sorry.

> But this at least fixes a reset at an
> innocent looking usb tree or usb info command. Maybe we can improve it
> later?

NAK, please let's not add ad-hoc poorly understood changes into core code.
Xavier Drudis Ferran June 8, 2023, 7:39 a.m. UTC | #2
El Thu, Jun 08, 2023 at 12:05:18AM +0200, Marek Vasut deia:
> On 6/5/23 17:20, Xavier Drudis Ferran wrote:
> > Add a check to avoid dommed (by null pointer dereference) recursive
> > call, not only for UCLASS_BLK.
> > 
> > When booting my Rock Pi 4B+ with a USB mass storage stick plugged
> > into one of the USB 2 ports (EHCI), when it is plugged before power
> > on, when issuing a
> > 
> > usb tree
> > 
> > or
> > 
> > usb info
> 
> I cannot reproduce the problem. Do you perform any other interaction with
> the USB stack, like e.g. 'usb start' or 'usb reset' before issuing the
> aforementioned commands ?
>

No. Well, in some tests yes and some no, but I got the error in all cases.

Btw, I was testing on the next branch. I had those two commits on top
https://patchwork.ozlabs.org/project/uboot/patch/202013db5a47ecbac4a53c360ed1ca91ca663996.1685974993.git.xdrudis@tinet.cat/
https://patchwork.ozlabs.org/project/uboot/patch/464111fca83008503022e8ada5305e69ffd1afbd.1685974993.git.xdrudis@tinet.cat/

And minor configuration changes (but I had bootstage active, might or might not be related)

U-Boot was in a microSD card. 


> What kind of USB stick is used here, please share model, VID, PID.
>

It may only happen with ext4 partitions or something...

Or it might have to do with other media not being bootable (they used
to be for some old and customized version of U-boot, but not the
current one)

This is the lsusb -v output of the same USB stick in another computer

Bus 007 Device 008: ID 0718:070a Imation Corp. TF10
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            0 
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0        64
  idVendor           0x0718 Imation Corp.
  idProduct          0x070a 
  bcdDevice            1.00
  iManufacturer           1 TDK LoR
  iProduct                2 TF10
  iSerial                 3 07032B6B1D0ACB96
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength       0x0020
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0x80
      (Bus Powered)
    MaxPower              200mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass         8 Mass Storage
      bInterfaceSubClass      6 SCSI
      bInterfaceProtocol     80 Bulk-Only
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x02  EP 2 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
Device Qualifier (for other device speed):
  bLength                10
  bDescriptorType         6
  bcdUSB               2.00
  bDeviceClass            0 
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0        64
  bNumConfigurations      1
can't get debug descriptor: Resource temporarily unavailable
Device Status:     0x0000
  (Bus Powered)

Partitions:

Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p                                                                
Model: TDK LoR TF10 (scsi)
Disk /dev/sda: 32,0GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags: 

Number  Start  End     Size    File system  Flags
 1      0,00B  32,0GB  32,0GB  ext4


The content in the ext4 partition is just data, 3 files,
lost+found and another directory

Let me get to my logs... (I added a printf here uclass_id=22 is UCLASS_BLK
uclass_id=25 is UCLASS_BOOTDEV )

U-Boot TPL 2023.07-rc2-00089-gab17b3d648-dirty (Jun 05 2023 - 10:42:52)
lpddr4_set_rate: change freq to 400MHz 0, 1
Channel 0: LPDDR4, 400MHz
BW=32 Col=10 Bk=8 CS0 Row=15 CS1 Row=15 CS=2 Die BW=16 Size=2048MB
Channel 1: LPDDR4, 400MHz
BW=32 Col=10 Bk=8 CS0 Row=15 CS1 Row=15 CS=2 Die BW=16 Size=2048MB
256B stride
lpddr4_set_rate: change freq to 800MHz 1, 0
Trying to boot from BOOTROM
Returning to boot ROM...

U-Boot SPL 2023.07-rc2-00089-gab17b3d648-dirty (Jun 05 2023 - 10:42:52 +0200)
Trying to boot from MMC1
NOTICE:  BL31: v2.1(release):v2.1-728-ged01e0c4-dirty
NOTICE:  BL31: Built : 18:29:11, Mar 22 2022


U-Boot 2023.07-rc2-00089-gab17b3d648-dirty (Jun 05 2023 - 10:42:52 +0200)

SoC: Rockchip rk3399
Reset cause: POR
Model: Radxa ROCK Pi 4B
DRAM:  4 GiB (effective 3.9 GiB)
PMIC:  RK808 
Core:  284 devices, 29 uclasses, devicetree: separate
MMC:   mmc@fe310000: 2, mmc@fe320000: 1, mmc@fe330000: 0
Loading Environment from MMC... *** Warning - bad CRC, using default environment

In:    serial
Out:   serial
Err:   serial
Model: Radxa ROCK Pi 4B
Net:   eth0: ethernet@fe300000
Hit any key to stop autoboot:  2  1  0 
rockchip_pcie pcie@f8000000: PCIe link training gen1 timeout!
Bus usb@fe380000: USB EHCI 1.00
Bus usb@fe3c0000: USB EHCI 1.00
Bus usb@fe800000: Register 2000140 NbrPorts 2
Starting the controller
USB XHCI 1.10
Bus usb@fe900000: Register 2000140 NbrPorts 2
Starting the controller
USB XHCI 1.10
scanning bus usb@fe380000 for devices... 1 USB Device(s) found
scanning bus usb@fe3c0000 for devices... 2 USB Device(s) found
scanning bus usb@fe800000 for devices... 1 USB Device(s) found
scanning bus usb@fe900000 for devices... 1 USB Device(s) found
rockchip_pcie pcie@f8000000: failed to find ep-gpios property
ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
Could not initialize PHY ethernet@fe300000
rockchip_pcie pcie@f8000000: failed to find ep-gpios property
ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
Could not initialize PHY ethernet@fe300000
=> usb tree
USB device tree:
  1  Hub (480 Mb/s, 0mA)
     u-boot EHCI Host Controller 
   
  1  Hub (480 Mb/s, 0mA)
  |  u-boot EHCI Host Controller 
  |
                       uclass_id=64
  |+-2  Mass Storage (480 Mb/s, 200mA)
       TDK LoR TF10 07032B6B1D0ACB96
     
                       uclass_id=22
                       uclass_id=25
     "Synchronous Abort" handler, esr 0x96000010, far 0x948
elr: 00000000002157d4 lr : 00000000002157d4 (reloc)
elr: 00000000f3f4f7d4 lr : 00000000f3f4f7d4
x0 : 0000000000000005 x1 : 0000000000000000
x2 : 0000000000000020 x3 : 00000000ff1a0000
x4 : 00000000ff1a0000 x5 : 0000000000000000
x6 : 00000000f1f1c000 x7 : 0000000000000001
x8 : 0000000000000001 x9 : 00000000ffffffd0
x10: 0000000000000006 x11: 000000000001869f
x12: 0000000000000200 x13: 0000000000000000
x14: 00000000ffffffff x15: 00000000f1f1bbc9
x16: 000000007e4f2113 x17: 000000009a11f13e
x18: 00000000f1f31d80 x19: 0000000000000000
x20: 00000000f1f1c110 x21: 0000000000000004
x22: 00000000f1f1c114 x23: 00000000f1f65398
x24: 00000000f1f653b8 x25: 0000000000000000
x26: 0000000000000000 x27: 0000000000000000
x28: 0000000000000000 x29: 00000000f1f1c000

Code: aa0003f5 900003c0 9123b800 940191f5 (f944a660) 
Resetting CPU ...

resetting ...


> > command I get a "Synchronous Error" and a reset just after printing the
> > mass storage device in the usb tree or usb info. It might depend on the
> > contents of the USB stick too, I'm not sure.
> > 
> > It seems like I have two devices as children of the mass storage
> > device.  When there's only a UCLASS_BLK it works fine, but when there's
> > a UCLASS_BLK and a UCLASS_BOOTDEV, it recurses with a null udev as
> > first parameter and fails.
> > 
> > Likewise for usb_show_info().
> > 
> > Not sure if this should be a patch, an RFC or a bug report.  There may
> > be a better way to solve this, I haven't researched commit
> > 201417d700a2ab09 ("bootstd: Add the bootdev uclass") and bootdev flow
> > properly, or thought about cases where udev is not null but the
> > recursive call might need preventing too. Feel free to think it over
> > before merging (or after).
> 
> It is unclear what the real issue is, so until that is sorted out, no
> merging will occur, sorry.
>

That's ok.
I'm guessing the issue may be some mismatched assumption on what is
under USB devices between the bootdev code and the cmd/usb.c code.


> > But this at least fixes a reset at an
> > innocent looking usb tree or usb info command. Maybe we can improve it
> > later?
> 
> NAK, please let's not add ad-hoc poorly understood changes into core code.

Ok, sorry.

I'll reply back if/when I get a clearer theory.

Thanks.
Marek Vasut June 9, 2023, 1:20 a.m. UTC | #3
On 6/8/23 09:39, Xavier Drudis Ferran wrote:
> El Thu, Jun 08, 2023 at 12:05:18AM +0200, Marek Vasut deia:
>> On 6/5/23 17:20, Xavier Drudis Ferran wrote:
>>> Add a check to avoid dommed (by null pointer dereference) recursive
>>> call, not only for UCLASS_BLK.
>>>
>>> When booting my Rock Pi 4B+ with a USB mass storage stick plugged
>>> into one of the USB 2 ports (EHCI), when it is plugged before power
>>> on, when issuing a
>>>
>>> usb tree
>>>
>>> or
>>>
>>> usb info
>>
>> I cannot reproduce the problem. Do you perform any other interaction with
>> the USB stack, like e.g. 'usb start' or 'usb reset' before issuing the
>> aforementioned commands ?
>>
> 
> No. Well, in some tests yes and some no, but I got the error in all cases.

This is doubtful. It is mandatory to run 'usb start' or 'usb reset' 
before you would get any meaningful result out of 'usb info'. Without 
either, you would get 'USB is stopped.' message. Could it be there are 
some extra scripts in your environment that manipulate the USB ?

> Btw, I was testing on the next branch. I had those two commits on top
> https://patchwork.ozlabs.org/project/uboot/patch/202013db5a47ecbac4a53c360ed1ca91ca663996.1685974993.git.xdrudis@tinet.cat/
> https://patchwork.ozlabs.org/project/uboot/patch/464111fca83008503022e8ada5305e69ffd1afbd.1685974993.git.xdrudis@tinet.cat/
> 
> And minor configuration changes (but I had bootstage active, might or might not be related)
> 
> U-Boot was in a microSD card.

Can you test with stock U-Boot ?

Can you test with another USB stick, i.e. is the issue specific to this 
USB stick ?

Is the issue specific to this partition layout of this USB stick, i.e. 
if you clone (dd if=... of=...) the content of the USB stick to another 
USB stick, does the error still occur.

[...]

> Model: Radxa ROCK Pi 4B
> Net:   eth0: ethernet@fe300000
> Hit any key to stop autoboot:  2  1  0
> rockchip_pcie pcie@f8000000: PCIe link training gen1 timeout!
> Bus usb@fe380000: USB EHCI 1.00
> Bus usb@fe3c0000: USB EHCI 1.00
> Bus usb@fe800000: Register 2000140 NbrPorts 2
> Starting the controller
> USB XHCI 1.10
> Bus usb@fe900000: Register 2000140 NbrPorts 2
> Starting the controller
> USB XHCI 1.10
> scanning bus usb@fe380000 for devices... 1 USB Device(s) found
> scanning bus usb@fe3c0000 for devices... 2 USB Device(s) found
> scanning bus usb@fe800000 for devices... 1 USB Device(s) found
> scanning bus usb@fe900000 for devices... 1 USB Device(s) found
> rockchip_pcie pcie@f8000000: failed to find ep-gpios property
> ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
> Could not initialize PHY ethernet@fe300000
> rockchip_pcie pcie@f8000000: failed to find ep-gpios property
> ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
> Could not initialize PHY ethernet@fe300000

Is this some $preboot script which initializes your hardware ?

=> printenv preboot

> => usb tree
> USB device tree:
>    1  Hub (480 Mb/s, 0mA)
>       u-boot EHCI Host Controller
>     
>    1  Hub (480 Mb/s, 0mA)
>    |  u-boot EHCI Host Controller
>    |
>                         uclass_id=64
>    |+-2  Mass Storage (480 Mb/s, 200mA)
>         TDK LoR TF10 07032B6B1D0ACB96
>       
>                         uclass_id=22
>                         uclass_id=25
>       "Synchronous Abort" handler, esr 0x96000010, far 0x948
> elr: 00000000002157d4 lr : 00000000002157d4 (reloc)
> elr: 00000000f3f4f7d4 lr : 00000000f3f4f7d4

Take the u-boot (unstripped elf) which matches this binary, and run 
aarch64-...objdump -lSD on it, then search for the $lr value, see 
doc/develop/crash_dumps.rst for details. That should tell you where 
exactly the crash occurred. Where did it occur ?

> x0 : 0000000000000005 x1 : 0000000000000000
> x2 : 0000000000000020 x3 : 00000000ff1a0000
> x4 : 00000000ff1a0000 x5 : 0000000000000000
> x6 : 00000000f1f1c000 x7 : 0000000000000001
> x8 : 0000000000000001 x9 : 00000000ffffffd0
> x10: 0000000000000006 x11: 000000000001869f
> x12: 0000000000000200 x13: 0000000000000000
> x14: 00000000ffffffff x15: 00000000f1f1bbc9
> x16: 000000007e4f2113 x17: 000000009a11f13e
> x18: 00000000f1f31d80 x19: 0000000000000000
> x20: 00000000f1f1c110 x21: 0000000000000004
> x22: 00000000f1f1c114 x23: 00000000f1f65398
> x24: 00000000f1f653b8 x25: 0000000000000000
> x26: 0000000000000000 x27: 0000000000000000
> x28: 0000000000000000 x29: 00000000f1f1c000
> 
> Code: aa0003f5 900003c0 9123b800 940191f5 (f944a660)
> Resetting CPU ...
> 
> resetting ...
> 
> 
>>> command I get a "Synchronous Error" and a reset just after printing the
>>> mass storage device in the usb tree or usb info. It might depend on the
>>> contents of the USB stick too, I'm not sure.
>>>
>>> It seems like I have two devices as children of the mass storage
>>> device.  When there's only a UCLASS_BLK it works fine, but when there's
>>> a UCLASS_BLK and a UCLASS_BOOTDEV, it recurses with a null udev as
>>> first parameter and fails.
>>>
>>> Likewise for usb_show_info().
>>>
>>> Not sure if this should be a patch, an RFC or a bug report.  There may
>>> be a better way to solve this, I haven't researched commit
>>> 201417d700a2ab09 ("bootstd: Add the bootdev uclass") and bootdev flow
>>> properly, or thought about cases where udev is not null but the
>>> recursive call might need preventing too. Feel free to think it over
>>> before merging (or after).
>>
>> It is unclear what the real issue is, so until that is sorted out, no
>> merging will occur, sorry.
>>
> 
> That's ok.
> I'm guessing the issue may be some mismatched assumption on what is
> under USB devices between the bootdev code and the cmd/usb.c code.
> 
> 
>>> But this at least fixes a reset at an
>>> innocent looking usb tree or usb info command. Maybe we can improve it
>>> later?
>>
>> NAK, please let's not add ad-hoc poorly understood changes into core code.
> 
> Ok, sorry.
> 
> I'll reply back if/when I get a clearer theory.

The NAK is really only to prevent this from accidentally going on.

Please see above, maybe it could be narrowed down ?
Xavier Drudis Ferran June 9, 2023, 6:52 p.m. UTC | #4
Sorry, I replied to Marek only but meant to reply to all.

El Fri, Jun 09, 2023 at 03:20:33AM +0200, Marek Vasut deia:

> > No. Well, in some tests yes and some no, but I got the error in all cases.
> 
> This is doubtful. It is mandatory to run 'usb start' or 'usb reset' before
> you would get any meaningful result out of 'usb info'. Without either, you
> would get 'USB is stopped.' message. Could it be there are some extra
> scripts in your environment that manipulate the USB ?
>

I saw usb_boootdev_hunt() calls usb_init() in drivers/usb/host/usb_bootdev.c

But maybe I don't get that called and it's really something silly in
my setup as you say later... Maybe it doesn't get called unless it
finds nothing else useful to boot.

> 
> Can you test with stock U-Boot ?
>

I don't know. I'll see if I have time ...
I'd rather read the code to understand what's the condition for finding bootdevices...

> Can you test with another USB stick, i.e. is the issue specific to this USB
> stick ?
>

I could test this, yes.

> Is the issue specific to this partition layout of this USB stick, i.e. if
> you clone (dd if=... of=...) the content of the USB stick to another USB
> stick, does the error still occur.
>

I'll try to partition and flash a new USB.

> [...]
> 
> > Model: Radxa ROCK Pi 4B
> > Net:   eth0: ethernet@fe300000
> > Hit any key to stop autoboot:  2  1  0
> > rockchip_pcie pcie@f8000000: PCIe link training gen1 timeout!
> > Bus usb@fe380000: USB EHCI 1.00
> > Bus usb@fe3c0000: USB EHCI 1.00
> > Bus usb@fe800000: Register 2000140 NbrPorts 2
> > Starting the controller
> > USB XHCI 1.10
> > Bus usb@fe900000: Register 2000140 NbrPorts 2
> > Starting the controller
> > USB XHCI 1.10
> > scanning bus usb@fe380000 for devices... 1 USB Device(s) found
> > scanning bus usb@fe3c0000 for devices... 2 USB Device(s) found
> > scanning bus usb@fe800000 for devices... 1 USB Device(s) found
> > scanning bus usb@fe900000 for devices... 1 USB Device(s) found
> > rockchip_pcie pcie@f8000000: failed to find ep-gpios property
> > ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
> > Could not initialize PHY ethernet@fe300000
> > rockchip_pcie pcie@f8000000: failed to find ep-gpios property
> > ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
> > Could not initialize PHY ethernet@fe300000
> 
> Is this some $preboot script which initializes your hardware ?
>

Mmm... yes, I used to have it... I thought not in this test, but I'd better recheck

Anyway, one should be allowed to stop the boot, call usb start and usb tree
and don't get a reset, shouldn't one?

> => printenv preboot
>

I'll send this later when I repeat the test. I'd like to find a
minimal test case or something...

> > => usb tree
> > USB device tree:
> >    1  Hub (480 Mb/s, 0mA)
> >       u-boot EHCI Host Controller
> >    1  Hub (480 Mb/s, 0mA)
> >    |  u-boot EHCI Host Controller
> >    |
> >                         uclass_id=64
> >    |+-2  Mass Storage (480 Mb/s, 200mA)
> >         TDK LoR TF10 07032B6B1D0ACB96
> >                         uclass_id=22
> >                         uclass_id=25
> >       "Synchronous Abort" handler, esr 0x96000010, far 0x948
> > elr: 00000000002157d4 lr : 00000000002157d4 (reloc)
> > elr: 00000000f3f4f7d4 lr : 00000000f3f4f7d4
> 
> Take the u-boot (unstripped elf) which matches this binary, and run
> aarch64-...objdump -lSD on it, then search for the $lr value, see
> doc/develop/crash_dumps.rst for details. That should tell you where exactly
> the crash occurred. Where did it occur ?
>

I didn't do it exactly so, but from u-boot.map I gathered that it was
in cmd/usb.c and the fact that my patch fixed it implies the problem
is the functions usb_show_tree_graph() or usb_show_info() get called
recursively with null as a first parameter.

Now I don't have that u-boot.map anymore and would have to repeat the
experiment, to find out exactly as you say, so I won't do it right
now. But thanks, understood.

The reason usb_show_tree_graph() gets called with a null usb_device *
is that the code in cmd/usb.c for usb info and usb tree assumes
everything a UCLASS_MASS_STORAGE device can have as children are
devices with some usb_device in their dev_get_parent_priv().  It
carves out exceptions to this general rule for UCLASS_USB_EMUL and
UCLASS_BLK, but not for UCLASS_BOOTDEV. When it finds a child that is
UCLASS_BOOTDEV it happily recurses on it passing its parent_priv as
usb_device, but the bootdev code did not put any usb_device there,
it's null. So the first access causes a null pointer dereference.

I would have to wrap my mind around more code to start understanding
if it's better to give that UCLASS_BOOTDEV some usb_device as parent
priv data, or it is better to give USB devices that can be enumerated
for listing (usb tree or usb info) some RECURSIBLE flag that indicates
their priv parent data is reliably a usb_device.

So checking that the alledged usb_device at least isn't null as in my
patch is possibly a partial solution. I'm sure if it's null we
shouldn't call, but if it points to something other than a usb_device we
shouldn't either, and it doesn't address why it is null (well, because
it's not really a USB internal node, not even a proper leaf, so it
shouldn't be recursed anyway).

In usb_show_info() it is similar, usb_display_desc() gets called with a
null udev because that's what came in. Recursion is avoided for
UCLASS_USB_EMUL or UCLASS_BLK but not for UCLASS_BOOTDEV.

A different solution could be to expand the exception to
UCLASS_BOOTDEV, but it still seems a wrong strategy to expect to know
everything you can find so you can list all the exceptions, instead of
checking that you have something expected that you can recurse on. Not
completely sure, just smells so to me. At least checking for null is
more general.

Maybe the solution is to fix common/usb_storage.c when it calls
bootdev_setup_sibling_blk(), to ensure there's some useful usb_device
there as parent priv. Or something needs fixing in
drivers/usb/host/usb_bootdev.c ??? But I'm not sure. We really
shouldn't call recursively for usb info or tree anyway.

For now I was trying to understand when that UCLASS_BOOTDEV is added
and why, and whether it should be removed at some time. This should
hint me at what is the minimum scenario to reproduce the issue.

> 
> The NAK is really only to prevent this from accidentally going on.
>

The NAK it's OK. I like people to want to understand stuff. Don't worry.

> Please see above, maybe it could be narrowed down ?

I'll see if I can test better and send more useful reports.
Not sure when.

I'm not sure I'll have the time to learn all I need. I just hoped someone
else had it in mind...
Marek Vasut June 11, 2023, 12:29 p.m. UTC | #5
On 6/9/23 20:52, Xavier Drudis Ferran wrote:
> Sorry, I replied to Marek only but meant to reply to all.
> 
> El Fri, Jun 09, 2023 at 03:20:33AM +0200, Marek Vasut deia:
> 
>>> No. Well, in some tests yes and some no, but I got the error in all cases.
>>
>> This is doubtful. It is mandatory to run 'usb start' or 'usb reset' before
>> you would get any meaningful result out of 'usb info'. Without either, you
>> would get 'USB is stopped.' message. Could it be there are some extra
>> scripts in your environment that manipulate the USB ?
>>
> 
> I saw usb_boootdev_hunt() calls usb_init() in drivers/usb/host/usb_bootdev.c
> 
> But maybe I don't get that called and it's really something silly in
> my setup as you say later... Maybe it doesn't get called unless it
> finds nothing else useful to boot.
> 
>>
>> Can you test with stock U-Boot ?
>>
> 
> I don't know. I'll see if I have time ...
> I'd rather read the code to understand what's the condition for finding bootdevices...
> 
>> Can you test with another USB stick, i.e. is the issue specific to this USB
>> stick ?
>>
> 
> I could test this, yes.
> 
>> Is the issue specific to this partition layout of this USB stick, i.e. if
>> you clone (dd if=... of=...) the content of the USB stick to another USB
>> stick, does the error still occur.
>>
> 
> I'll try to partition and flash a new USB.
> 
>> [...]
>>
>>> Model: Radxa ROCK Pi 4B
>>> Net:   eth0: ethernet@fe300000
>>> Hit any key to stop autoboot:  2  1  0
>>> rockchip_pcie pcie@f8000000: PCIe link training gen1 timeout!
>>> Bus usb@fe380000: USB EHCI 1.00
>>> Bus usb@fe3c0000: USB EHCI 1.00
>>> Bus usb@fe800000: Register 2000140 NbrPorts 2
>>> Starting the controller
>>> USB XHCI 1.10
>>> Bus usb@fe900000: Register 2000140 NbrPorts 2
>>> Starting the controller
>>> USB XHCI 1.10
>>> scanning bus usb@fe380000 for devices... 1 USB Device(s) found
>>> scanning bus usb@fe3c0000 for devices... 2 USB Device(s) found
>>> scanning bus usb@fe800000 for devices... 1 USB Device(s) found
>>> scanning bus usb@fe900000 for devices... 1 USB Device(s) found
>>> rockchip_pcie pcie@f8000000: failed to find ep-gpios property
>>> ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
>>> Could not initialize PHY ethernet@fe300000
>>> rockchip_pcie pcie@f8000000: failed to find ep-gpios property
>>> ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
>>> Could not initialize PHY ethernet@fe300000
>>
>> Is this some $preboot script which initializes your hardware ?
>>
> 
> Mmm... yes, I used to have it... I thought not in this test, but I'd better recheck
> 
> Anyway, one should be allowed to stop the boot, call usb start and usb tree
> and don't get a reset, shouldn't one?
> 
>> => printenv preboot
>>
> 
> I'll send this later when I repeat the test. I'd like to find a
> minimal test case or something...

Thank you

>>> => usb tree
>>> USB device tree:
>>>     1  Hub (480 Mb/s, 0mA)
>>>        u-boot EHCI Host Controller
>>>     1  Hub (480 Mb/s, 0mA)
>>>     |  u-boot EHCI Host Controller
>>>     |
>>>                          uclass_id=64
>>>     |+-2  Mass Storage (480 Mb/s, 200mA)
>>>          TDK LoR TF10 07032B6B1D0ACB96
>>>                          uclass_id=22
>>>                          uclass_id=25
>>>        "Synchronous Abort" handler, esr 0x96000010, far 0x948
>>> elr: 00000000002157d4 lr : 00000000002157d4 (reloc)
>>> elr: 00000000f3f4f7d4 lr : 00000000f3f4f7d4
>>
>> Take the u-boot (unstripped elf) which matches this binary, and run
>> aarch64-...objdump -lSD on it, then search for the $lr value, see
>> doc/develop/crash_dumps.rst for details. That should tell you where exactly
>> the crash occurred. Where did it occur ?
>>
> 
> I didn't do it exactly so, but from u-boot.map I gathered that it was
> in cmd/usb.c and the fact that my patch fixed it implies the problem
> is the functions usb_show_tree_graph() or usb_show_info() get called
> recursively with null as a first parameter.
> 
> Now I don't have that u-boot.map anymore and would have to repeat the
> experiment, to find out exactly as you say, so I won't do it right
> now. But thanks, understood.
> 
> The reason usb_show_tree_graph() gets called with a null usb_device *
> is that the code in cmd/usb.c for usb info and usb tree assumes
> everything a UCLASS_MASS_STORAGE device can have as children are
> devices with some usb_device in their dev_get_parent_priv().  It
> carves out exceptions to this general rule for UCLASS_USB_EMUL and
> UCLASS_BLK, but not for UCLASS_BOOTDEV. When it finds a child that is
> UCLASS_BOOTDEV it happily recurses on it passing its parent_priv as
> usb_device, but the bootdev code did not put any usb_device there,
> it's null. So the first access causes a null pointer dereference.
> 
> I would have to wrap my mind around more code to start understanding
> if it's better to give that UCLASS_BOOTDEV some usb_device as parent
> priv data, or it is better to give USB devices that can be enumerated
> for listing (usb tree or usb info) some RECURSIBLE flag that indicates
> their priv parent data is reliably a usb_device.
> 
> So checking that the alledged usb_device at least isn't null as in my
> patch is possibly a partial solution. I'm sure if it's null we
> shouldn't call, but if it points to something other than a usb_device we
> shouldn't either, and it doesn't address why it is null (well, because
> it's not really a USB internal node, not even a proper leaf, so it
> shouldn't be recursed anyway).
> 
> In usb_show_info() it is similar, usb_display_desc() gets called with a
> null udev because that's what came in. Recursion is avoided for
> UCLASS_USB_EMUL or UCLASS_BLK but not for UCLASS_BOOTDEV.
> 
> A different solution could be to expand the exception to
> UCLASS_BOOTDEV, but it still seems a wrong strategy to expect to know
> everything you can find so you can list all the exceptions, instead of
> checking that you have something expected that you can recurse on. Not
> completely sure, just smells so to me. At least checking for null is
> more general.
> 
> Maybe the solution is to fix common/usb_storage.c when it calls
> bootdev_setup_sibling_blk(), to ensure there's some useful usb_device
> there as parent priv. Or something needs fixing in
> drivers/usb/host/usb_bootdev.c ??? But I'm not sure. We really
> shouldn't call recursively for usb info or tree anyway.
> 
> For now I was trying to understand when that UCLASS_BOOTDEV is added
> and why, and whether it should be removed at some time. This should
> hint me at what is the minimum scenario to reproduce the issue.
> 
>>
>> The NAK is really only to prevent this from accidentally going on.
>>
> 
> The NAK it's OK. I like people to want to understand stuff. Don't worry.

s@going on@going in@ ... typo.

>> Please see above, maybe it could be narrowed down ?
> 
> I'll see if I can test better and send more useful reports.
> Not sure when.
> 
> I'm not sure I'll have the time to learn all I need. I just hoped someone
> else had it in mind...

I reproducer would be a good starting point, i.e. something like "I run 
these ... commands and I get this crash" . But please make sure you use 
current codebase and no preboot or similar scripting .
Simon Glass June 12, 2023, 9:17 p.m. UTC | #6
Hi,

On Sun, 11 Jun 2023 at 13:29, Marek Vasut <marex@denx.de> wrote:
>
> On 6/9/23 20:52, Xavier Drudis Ferran wrote:
> > Sorry, I replied to Marek only but meant to reply to all.
> >
> > El Fri, Jun 09, 2023 at 03:20:33AM +0200, Marek Vasut deia:
> >
> >>> No. Well, in some tests yes and some no, but I got the error in all cases.
> >>
> >> This is doubtful. It is mandatory to run 'usb start' or 'usb reset' before
> >> you would get any meaningful result out of 'usb info'. Without either, you
> >> would get 'USB is stopped.' message. Could it be there are some extra
> >> scripts in your environment that manipulate the USB ?
> >>
> >
> > I saw usb_boootdev_hunt() calls usb_init() in drivers/usb/host/usb_bootdev.c
> >
> > But maybe I don't get that called and it's really something silly in
> > my setup as you say later... Maybe it doesn't get called unless it
> > finds nothing else useful to boot.
> >
> >>
> >> Can you test with stock U-Boot ?
> >>
> >
> > I don't know. I'll see if I have time ...
> > I'd rather read the code to understand what's the condition for finding bootdevices...
> >
> >> Can you test with another USB stick, i.e. is the issue specific to this USB
> >> stick ?
> >>
> >
> > I could test this, yes.
> >
> >> Is the issue specific to this partition layout of this USB stick, i.e. if
> >> you clone (dd if=... of=...) the content of the USB stick to another USB
> >> stick, does the error still occur.
> >>
> >
> > I'll try to partition and flash a new USB.
> >
> >> [...]
> >>
> >>> Model: Radxa ROCK Pi 4B
> >>> Net:   eth0: ethernet@fe300000
> >>> Hit any key to stop autoboot:  2     1     0
> >>> rockchip_pcie pcie@f8000000: PCIe link training gen1 timeout!
> >>> Bus usb@fe380000: USB EHCI 1.00
> >>> Bus usb@fe3c0000: USB EHCI 1.00
> >>> Bus usb@fe800000: Register 2000140 NbrPorts 2
> >>> Starting the controller
> >>> USB XHCI 1.10
> >>> Bus usb@fe900000: Register 2000140 NbrPorts 2
> >>> Starting the controller
> >>> USB XHCI 1.10
> >>> scanning bus usb@fe380000 for devices... 1 USB Device(s) found
> >>> scanning bus usb@fe3c0000 for devices... 2 USB Device(s) found
> >>> scanning bus usb@fe800000 for devices... 1 USB Device(s) found
> >>> scanning bus usb@fe900000 for devices... 1 USB Device(s) found
> >>> rockchip_pcie pcie@f8000000: failed to find ep-gpios property
> >>> ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
> >>> Could not initialize PHY ethernet@fe300000
> >>> rockchip_pcie pcie@f8000000: failed to find ep-gpios property
> >>> ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
> >>> Could not initialize PHY ethernet@fe300000
> >>
> >> Is this some $preboot script which initializes your hardware ?
> >>
> >
> > Mmm... yes, I used to have it... I thought not in this test, but I'd better recheck
> >
> > Anyway, one should be allowed to stop the boot, call usb start and usb tree
> > and don't get a reset, shouldn't one?
> >
> >> => printenv preboot
> >>
> >
> > I'll send this later when I repeat the test. I'd like to find a
> > minimal test case or something...
>
> Thank you
>
> >>> => usb tree
> >>> USB device tree:
> >>>     1  Hub (480 Mb/s, 0mA)
> >>>        u-boot EHCI Host Controller
> >>>     1  Hub (480 Mb/s, 0mA)
> >>>     |  u-boot EHCI Host Controller
> >>>     |
> >>>                          uclass_id=64
> >>>     | +-2  Mass Storage (480 Mb/s, 200mA)
> >>>          TDK LoR TF10 07032B6B1D0ACB96
> >>>                          uclass_id=22
> >>>                          uclass_id=25
> >>>        "Synchronous Abort" handler, esr 0x96000010, far 0x948
> >>> elr: 00000000002157d4 lr : 00000000002157d4 (reloc)
> >>> elr: 00000000f3f4f7d4 lr : 00000000f3f4f7d4
> >>
> >> Take the u-boot (unstripped elf) which matches this binary, and run
> >> aarch64-...objdump -lSD on it, then search for the $lr value, see
> >> doc/develop/crash_dumps.rst for details. That should tell you where exactly
> >> the crash occurred. Where did it occur ?
> >>
> >
> > I didn't do it exactly so, but from u-boot.map I gathered that it was
> > in cmd/usb.c and the fact that my patch fixed it implies the problem
> > is the functions usb_show_tree_graph() or usb_show_info() get called
> > recursively with null as a first parameter.
> >
> > Now I don't have that u-boot.map anymore and would have to repeat the
> > experiment, to find out exactly as you say, so I won't do it right
> > now. But thanks, understood.
> >
> > The reason usb_show_tree_graph() gets called with a null usb_device *
> > is that the code in cmd/usb.c for usb info and usb tree assumes
> > everything a UCLASS_MASS_STORAGE device can have as children are
> > devices with some usb_device in their dev_get_parent_priv().  It
> > carves out exceptions to this general rule for UCLASS_USB_EMUL and
> > UCLASS_BLK, but not for UCLASS_BOOTDEV. When it finds a child that is
> > UCLASS_BOOTDEV it happily recurses on it passing its parent_priv as
> > usb_device, but the bootdev code did not put any usb_device there,
> > it's null. So the first access causes a null pointer dereference.
> >
> > I would have to wrap my mind around more code to start understanding
> > if it's better to give that UCLASS_BOOTDEV some usb_device as parent
> > priv data, or it is better to give USB devices that can be enumerated
> > for listing (usb tree or usb info) some RECURSIBLE flag that indicates
> > their priv parent data is reliably a usb_device.
> >
> > So checking that the alledged usb_device at least isn't null as in my
> > patch is possibly a partial solution. I'm sure if it's null we
> > shouldn't call, but if it points to something other than a usb_device we
> > shouldn't either, and it doesn't address why it is null (well, because
> > it's not really a USB internal node, not even a proper leaf, so it
> > shouldn't be recursed anyway).
> >
> > In usb_show_info() it is similar, usb_display_desc() gets called with a
> > null udev because that's what came in. Recursion is avoided for
> > UCLASS_USB_EMUL or UCLASS_BLK but not for UCLASS_BOOTDEV.
> >
> > A different solution could be to expand the exception to
> > UCLASS_BOOTDEV, but it still seems a wrong strategy to expect to know
> > everything you can find so you can list all the exceptions, instead of
> > checking that you have something expected that you can recurse on. Not
> > completely sure, just smells so to me. At least checking for null is
> > more general.
> >
> > Maybe the solution is to fix common/usb_storage.c when it calls
> > bootdev_setup_sibling_blk(), to ensure there's some useful usb_device
> > there as parent priv. Or something needs fixing in
> > drivers/usb/host/usb_bootdev.c ??? But I'm not sure. We really
> > shouldn't call recursively for usb info or tree anyway.
> >
> > For now I was trying to understand when that UCLASS_BOOTDEV is added
> > and why, and whether it should be removed at some time. This should
> > hint me at what is the minimum scenario to reproduce the issue.
> >
> >>
> >> The NAK is really only to prevent this from accidentally going on.
> >>
> >
> > The NAK it's OK. I like people to want to understand stuff. Don't worry.
>
> s@going on@going in@ ... typo.
>
> >> Please see above, maybe it could be narrowed down ?
> >
> > I'll see if I can test better and send more useful reports.
> > Not sure when.
> >
> > I'm not sure I'll have the time to learn all I need. I just hoped someone
> > else had it in mind...
>
> I reproducer would be a good starting point, i.e. something like "I run
> these ... commands and I get this crash" . But please make sure you use
> current codebase and no preboot or similar scripting .

I'm not sure what is going on here. Which version are you testing? Do
you have these two commits?

8c29b73278d6 bootstd: usb: Avoid initing USB twice
9fea3a799dde usb: Tidy up the usb_start flag

Regards,
Simon
Xavier Drudis Ferran June 13, 2023, 6:42 a.m. UTC | #7
El Mon, Jun 12, 2023 at 10:17:38PM +0100, Simon Glass deia:
> 
> I'm not sure what is going on here. Which version are you testing? Do
> you have these two commits?
> 
> 8c29b73278d6 bootstd: usb: Avoid initing USB twice
> 9fea3a799dde usb: Tidy up the usb_start flag
> 
> Regards,
> Simon

Yes, I have both. I'm testing the next branch.

I'll send shortly a reply to Marek Vasut with a cleaner test I did
yesterday. Doesn't look hard to reproduce for me...

I'm still trying to understand when the UCLASS_BOOTDEV device should
be added under an usb mass storage device, and when (if ever) removed.
Xavier Drudis Ferran June 13, 2023, 6:52 a.m. UTC | #8
Ok. New test.

This uses yesterday morning's   next   branch.
commit 5b589e139620214f
Merge: cc5a940923 32d2461e04
Merge branch 'next_net/phy_connect_dev'

USB2 does not work for rk3399 in next (fixes are in master, thanks),
but USB3 is enough.

I compiled for rock-pi-4-rk3399_defconfig

flashed to a new microSD card as per doc/board/rockchip/rockchip.rst :

   dd if=u-boot-rockchip.bin of=/dev/sda seek=64
   sync

Put this microSD card in a Rock Pi 4 B+
Put a new USB stick in the USB3 port (center blue port closer to board).

(the microSD card and USB stick come from factory, I guess they were
partitioned with a single FAT partition)

(make sure emmc and spi are blank)

Connected only serial console and power. 

Got this:

U-Boot TPL 2023.07-rc2-00144-g497967f1ee (Jun 12 2023 - 11:15:47)
lpddr4_set_rate: change freq to 400MHz 0, 1
Channel 0: LPDDR4, 400MHz
BW=32 Col=10 Bk=8 CS0 Row=15 CS1 Row=15 CS=2 Die BW=16 Size=2048MB
Channel 1: LPDDR4, 400MHz
BW=32 Col=10 Bk=8 CS0 Row=15 CS1 Row=15 CS=2 Die BW=16 Size=2048MB
256B stride
lpddr4_set_rate: change freq to 800MHz 1, 0
Trying to boot from BOOTROM
Returning to boot ROM...

U-Boot SPL 2023.07-rc2-00144-g497967f1ee (Jun 12 2023 - 11:15:47 +0200)
Trying to boot from MMC1
NOTICE:  BL31: v2.1(release):v2.1-728-ged01e0c4-dirty
NOTICE:  BL31: Built : 18:29:11, Mar 22 2022


U-Boot 2023.07-rc2-00144-g497967f1ee (Jun 12 2023 - 11:15:47 +0200)

SoC: Rockchip rk3399
Reset cause: POR
Model: Radxa ROCK Pi 4B
DRAM:  4 GiB (effective 3.9 GiB)
PMIC:  RK808 
Core:  283 devices, 29 uclasses, devicetree: separate
MMC:   mmc@fe310000: 2, mmc@fe320000: 1, mmc@fe330000: 0
Loading Environment from MMC... *** Warning - bad CRC, using default environment

In:    serial
Out:   serial
Err:   serial
Model: Radxa ROCK Pi 4B
Net:   eth0: ethernet@fe300000
Hit any key to stop autoboot:  2  1  0 
rockchip_pcie pcie@f8000000: PCIe link training gen1 timeout!
Bus usb@fe380000: ehci_generic usb@fe380000: Failed to get clocks (ret=-19)
Port not available.
Bus usb@fe3c0000: ehci_generic usb@fe3c0000: Failed to get clocks (ret=-19)
Port not available.
Bus usb@fe800000: Register 2000140 NbrPorts 2
Starting the controller
USB XHCI 1.10
Bus usb@fe900000: Register 2000140 NbrPorts 2
Starting the controller
USB XHCI 1.10
scanning bus usb@fe800000 for devices... 1 USB Device(s) found
scanning bus usb@fe900000 for devices... cannot reset port 1!?
2 USB Device(s) found
rockchip_pcie pcie@f8000000: failed to find ep-gpios property
ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
Could not initialize PHY ethernet@fe300000
rockchip_pcie pcie@f8000000: failed to find ep-gpios property
ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
Could not initialize PHY ethernet@fe300000
=> printenv preboot
## Error: "preboot" not defined
=> printenv
arch=arm
baudrate=1500000
board=evb_rk3399
board_name=evb_rk3399
boot_targets=mmc1 mmc0 nvme scsi usb pxe dhcp spi

bootcmd=bootflow scan

bootdelay=2
cpu=armv8
cpuid#=[something]
eth1addr=[:so:me:th:in:g]
ethact=ethernet@fe300000
ethaddr=[:so:me:th:in:g]
fdt_addr_r=0x01f00000
fdtcontroladdr=f1ef9170
fdtfile=rockchip/rk3399-rock-pi-4b.dtb
fdtoverlay_addr_r=0x02000000
kernel_addr_r=0x02080000
kernel_comp_addr_r=0x08000000
kernel_comp_size=0x2000000
loadaddr=0x800800
partitions=uuid_disk=${uuid_gpt_disk};name=loader1,start=32K,size=4000K,uuid=${uuid_gpt_loader1};name=loader2,start=8MB,size=4MB,uuid=${uuid_gpt_loader2};name=trust,size=4M,uuid=${uuid_gpt_atf};name=boot,size=112M,bootable,uuid=${uuid_gpt_boot};name=rootfs,size=-,uuid=[something];
pxefile_addr_r=0x00600000
ramdisk_addr_r=0x06000000
script_offset_f=0xffe000
script_size_f=0x2000
scriptaddr=0x00500000
serial#=[something]
soc=rk3399
stderr=serial,vidconsole
stdin=serial,usbkbd
stdout=serial,vidconsole
vendor=rockchip

Environment size: 1041/32764 bytes
=> usb info
1: Hub,  USB Revision 3.0
 - U-Boot XHCI Host Controller 
 - Class: Hub
 - PacketSize: 512  Configurations: 1
 - Vendor: 0x0000  Product 0x0000 Version 1.0
   Configuration: 1
   - Interfaces: 1 Self Powered 0mA
     Interface: 0
     - Alternate Setting 0, Endpoints: 1
     - Class Hub
     - Endpoint 1 In Interrupt MaxPacket 8 Interval 255ms

1: Hub,  USB Revision 3.0
 - U-Boot XHCI Host Controller 
 - Class: Hub
 - PacketSize: 512  Configurations: 1
 - Vendor: 0x0000  Product 0x0000 Version 1.0
   Configuration: 1
   - Interfaces: 1 Self Powered 0mA
     Interface: 0
     - Alternate Setting 0, Endpoints: 1
     - Class Hub
     - Endpoint 1 In Interrupt MaxPacket 8 Interval 255ms

2: Mass Storage,  USB Revision 3.20
 -  USB  SanDisk 3.2Gen1 05017d2e4d7b4ea0c5822c90c51e0b7
 - Class: (from Interface) Mass Storage
 - PacketSize: 512  Configurations: 1
 - Vendor: 0x0781  Product 0x5591 Version 1.0
   Configuration: 1
   - Interfaces: 1 Bus Powered 224mA
     Interface: 0
     - Alternate Setting 0, Endpoints: 2
     - Class Mass Storage, Transp. SCSI, Bulk only
     - Endpoint 1 In Bulk MaxPacket 1024
     - Endpoint 2 Out Bulk MaxPacket 1024

"Synchronous Abort" handler, esr 0x96000010, far 0x101
elr: 000000000021c398 lr : 000000000021ca70 (reloc)
elr: 00000000f3f32398 lr : 00000000f3f32a70
x0 : 0000000000000000 x1 : 00000000000010d1
x2 : 00000000f1f404b8 x3 : 00000000f1f41998
x4 : 00000000ff1a0000 x5 : 0000000000000034
x6 : 000000000000000a x7 : 0000000000000002
x8 : 0000000000000000 x9 : 0000000000000400
x10: 0000000000000006 x11: 000000000001869f
x12: 0000000000000200 x13: 0000000000000000
x14: 00000000ffffffff x15: 00000000f1ef81c3
x16: 0000000000000000 x17: 0000000000000000
x18: 00000000f1f0dd90 x19: 0000000000000000
x20: 00000000f1ef8848 x21: 0000000000000002
x22: 00000000f1ef8848 x23: 0000000000000002
x24: 00000000f1ef8844 x25: 0000000000000000
x26: 0000000000000000 x27: 0000000000000000
x28: 0000000000000000 x29: 00000000f1ef86b0

Code: f0000460 a8c27bfd 91099000 1401e518 (39440401) 
Resetting CPU ...

resetting ...



bootcmd=bootflow scan is because DISTRO_DEFAULTS is not
in configs/rock-pi-4-rk3399_defconfig

I'd say the changes to Kconfig are in ef5e3891f57 and 2d653f686b6.

In any case this causes

bootflow scan 

to be called. This tries to access the media in boot_targets until it reaches
usb ( do_bootflow_scan() in cmd/bootflow.c tries mmc1, mmc0 and nvme unsuccessfully).

I think this adds a UCLASS_BOOTDEV device under a usb mass storage device
as sibling of a UCLASS_BLK device, and this makes usb info recurse
with a null usb_device pointer and a reset at this dereference.

But I'm still reading the code to understand quite exactly how and
which are the paths. I don't understand how much of this is intendeded
and what should be prevented. Thsi is why my patch just fixed the last
consequence, tthe null pointer dereference.

I'll keep looking when I can.
Simon Glass June 13, 2023, 2:58 p.m. UTC | #9
Hi Xavier,

On Tue, 13 Jun 2023 at 07:52, Xavier Drudis Ferran <xdrudis@tinet.cat> wrote:
>
> Ok. New test.
>
> This uses yesterday morning's   next   branch.
> commit 5b589e139620214f
> Merge: cc5a940923 32d2461e04
> Merge branch 'next_net/phy_connect_dev'
>
> USB2 does not work for rk3399 in next (fixes are in master, thanks),
> but USB3 is enough.
>
> I compiled for rock-pi-4-rk3399_defconfig
>
> flashed to a new microSD card as per doc/board/rockchip/rockchip.rst :
>
>    dd if=u-boot-rockchip.bin of=/dev/sda seek=64
>    sync
>
> Put this microSD card in a Rock Pi 4 B+
> Put a new USB stick in the USB3 port (center blue port closer to board).
>
> (the microSD card and USB stick come from factory, I guess they were
> partitioned with a single FAT partition)
>
> (make sure emmc and spi are blank)
>
> Connected only serial console and power.
>
> Got this:
>
> U-Boot TPL 2023.07-rc2-00144-g497967f1ee (Jun 12 2023 - 11:15:47)
> lpddr4_set_rate: change freq to 400MHz 0, 1
> Channel 0: LPDDR4, 400MHz
> BW=32 Col=10 Bk=8 CS0 Row=15 CS1 Row=15 CS=2 Die BW=16 Size=2048MB
> Channel 1: LPDDR4, 400MHz
> BW=32 Col=10 Bk=8 CS0 Row=15 CS1 Row=15 CS=2 Die BW=16 Size=2048MB
> 256B stride
> lpddr4_set_rate: change freq to 800MHz 1, 0
> Trying to boot from BOOTROM
> Returning to boot ROM...
>
> U-Boot SPL 2023.07-rc2-00144-g497967f1ee (Jun 12 2023 - 11:15:47 +0200)
> Trying to boot from MMC1
> NOTICE:  BL31: v2.1(release):v2.1-728-ged01e0c4-dirty
> NOTICE:  BL31: Built : 18:29:11, Mar 22 2022
>
>
> U-Boot 2023.07-rc2-00144-g497967f1ee (Jun 12 2023 - 11:15:47 +0200)
>
> SoC: Rockchip rk3399
> Reset cause: POR
> Model: Radxa ROCK Pi 4B
> DRAM:  4 GiB (effective 3.9 GiB)
> PMIC:  RK808
> Core:  283 devices, 29 uclasses, devicetree: separate
> MMC:   mmc@fe310000: 2, mmc@fe320000: 1, mmc@fe330000: 0
> Loading Environment from MMC... *** Warning - bad CRC, using default environment
>
> In:    serial
> Out:   serial
> Err:   serial
> Model: Radxa ROCK Pi 4B
> Net:   eth0: ethernet@fe300000
> Hit any key to stop autoboot:  2     1     0
> rockchip_pcie pcie@f8000000: PCIe link training gen1 timeout!
> Bus usb@fe380000: ehci_generic usb@fe380000: Failed to get clocks (ret=-19)
> Port not available.
> Bus usb@fe3c0000: ehci_generic usb@fe3c0000: Failed to get clocks (ret=-19)
> Port not available.
> Bus usb@fe800000: Register 2000140 NbrPorts 2
> Starting the controller
> USB XHCI 1.10
> Bus usb@fe900000: Register 2000140 NbrPorts 2
> Starting the controller
> USB XHCI 1.10
> scanning bus usb@fe800000 for devices... 1 USB Device(s) found
> scanning bus usb@fe900000 for devices... cannot reset port 1!?
> 2 USB Device(s) found
> rockchip_pcie pcie@f8000000: failed to find ep-gpios property
> ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
> Could not initialize PHY ethernet@fe300000
> rockchip_pcie pcie@f8000000: failed to find ep-gpios property
> ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
> Could not initialize PHY ethernet@fe300000
> => printenv preboot
> ## Error: "preboot" not defined
> => printenv
> arch=arm
> baudrate=1500000
> board=evb_rk3399
> board_name=evb_rk3399
> boot_targets=mmc1 mmc0 nvme scsi usb pxe dhcp spi
>
> bootcmd=bootflow scan
>
> bootdelay=2
> cpu=armv8
> cpuid#=[something]
> eth1addr=[:so:me:th:in:g]
> ethact=ethernet@fe300000
> ethaddr=[:so:me:th:in:g]
> fdt_addr_r=0x01f00000
> fdtcontroladdr=f1ef9170
> fdtfile=rockchip/rk3399-rock-pi-4b.dtb
> fdtoverlay_addr_r=0x02000000
> kernel_addr_r=0x02080000
> kernel_comp_addr_r=0x08000000
> kernel_comp_size=0x2000000
> loadaddr=0x800800
> partitions=uuid_disk=${uuid_gpt_disk};name=loader1,start=32K,size=4000K,uuid=${uuid_gpt_loader1};name=loader2,start=8MB,size=4MB,uuid=${uuid_gpt_loader2};name=trust,size=4M,uuid=${uuid_gpt_atf};name=boot,size=112M,bootable,uuid=${uuid_gpt_boot};name=rootfs,size=-,uuid=[something];
> pxefile_addr_r=0x00600000
> ramdisk_addr_r=0x06000000
> script_offset_f=0xffe000
> script_size_f=0x2000
> scriptaddr=0x00500000
> serial#=[something]
> soc=rk3399
> stderr=serial,vidconsole
> stdin=serial,usbkbd
> stdout=serial,vidconsole
> vendor=rockchip
>
> Environment size: 1041/32764 bytes
> => usb info
> 1: Hub,  USB Revision 3.0
>  - U-Boot XHCI Host Controller
>  - Class: Hub
>  - PacketSize: 512  Configurations: 1
>  - Vendor: 0x0000  Product 0x0000 Version 1.0
>    Configuration: 1
>    - Interfaces: 1 Self Powered 0mA
>      Interface: 0
>      - Alternate Setting 0, Endpoints: 1
>      - Class Hub
>      - Endpoint 1 In Interrupt MaxPacket 8 Interval 255ms
>
> 1: Hub,  USB Revision 3.0
>  - U-Boot XHCI Host Controller
>  - Class: Hub
>  - PacketSize: 512  Configurations: 1
>  - Vendor: 0x0000  Product 0x0000 Version 1.0
>    Configuration: 1
>    - Interfaces: 1 Self Powered 0mA
>      Interface: 0
>      - Alternate Setting 0, Endpoints: 1
>      - Class Hub
>      - Endpoint 1 In Interrupt MaxPacket 8 Interval 255ms
>
> 2: Mass Storage,  USB Revision 3.20
>  -  USB  SanDisk 3.2Gen1 05017d2e4d7b4ea0c5822c90c51e0b7
>  - Class: (from Interface) Mass Storage
>  - PacketSize: 512  Configurations: 1
>  - Vendor: 0x0781  Product 0x5591 Version 1.0
>    Configuration: 1
>    - Interfaces: 1 Bus Powered 224mA
>      Interface: 0
>      - Alternate Setting 0, Endpoints: 2
>      - Class Mass Storage, Transp. SCSI, Bulk only
>      - Endpoint 1 In Bulk MaxPacket 1024
>      - Endpoint 2 Out Bulk MaxPacket 1024
>
> "Synchronous Abort" handler, esr 0x96000010, far 0x101
> elr: 000000000021c398 lr : 000000000021ca70 (reloc)
> elr: 00000000f3f32398 lr : 00000000f3f32a70
> x0 : 0000000000000000 x1 : 00000000000010d1
> x2 : 00000000f1f404b8 x3 : 00000000f1f41998
> x4 : 00000000ff1a0000 x5 : 0000000000000034
> x6 : 000000000000000a x7 : 0000000000000002
> x8 : 0000000000000000 x9 : 0000000000000400
> x10: 0000000000000006 x11: 000000000001869f
> x12: 0000000000000200 x13: 0000000000000000
> x14: 00000000ffffffff x15: 00000000f1ef81c3
> x16: 0000000000000000 x17: 0000000000000000
> x18: 00000000f1f0dd90 x19: 0000000000000000
> x20: 00000000f1ef8848 x21: 0000000000000002
> x22: 00000000f1ef8848 x23: 0000000000000002
> x24: 00000000f1ef8844 x25: 0000000000000000
> x26: 0000000000000000 x27: 0000000000000000
> x28: 0000000000000000 x29: 00000000f1ef86b0
>
> Code: f0000460 a8c27bfd 91099000 1401e518 (39440401)
> Resetting CPU ...
>
> resetting ...
>
>
>
> bootcmd=bootflow scan is because DISTRO_DEFAULTS is not
> in configs/rock-pi-4-rk3399_defconfig
>
> I'd say the changes to Kconfig are in ef5e3891f57 and 2d653f686b6.
>
> In any case this causes
>
> bootflow scan
>
> to be called. This tries to access the media in boot_targets until it reaches
> usb ( do_bootflow_scan() in cmd/bootflow.c tries mmc1, mmc0 and nvme unsuccessfully).
>
> I think this adds a UCLASS_BOOTDEV device under a usb mass storage device
> as sibling of a UCLASS_BLK device, and this makes usb info recurse
> with a null usb_device pointer and a reset at this dereference.

Yes that's right. So 'usb info' should ignore UCLASS_BOOTDEV devices.
That is better than checking for the NULL pointer.

>
> But I'm still reading the code to understand quite exactly how and
> which are the paths. I don't understand how much of this is intendeded
> and what should be prevented. Thsi is why my patch just fixed the last
> consequence, tthe null pointer dereference.
>
> I'll keep looking when I can.

Regards,
Simon
Xavier Drudis Ferran June 13, 2023, 4:04 p.m. UTC | #10
El Tue, Jun 13, 2023 at 03:58:22PM +0100, Simon Glass deia:
> 
> Yes that's right. So 'usb info' should ignore UCLASS_BOOTDEV devices.

That's a possibility, yes. It should work until someone adds another
device there for some other purpose.

> That is better than checking for the NULL pointer.
> 

Why? What's wrong about checking for null ?
Or maybe checking for both not null and  not UCLASS_BOOTDEV ?

Not that I care too much, just to understand your reasoning.

Or can we check for recursible devices somehow ?
Maybe flag them or something ?

Why is better to state all devices are recursible except some UCLASSes
(meaning they can have stuff that needs listed in usb info - or
likewise usb tree -) instead of stating that some closed set are
recursible ?
Simon Glass June 13, 2023, 8:12 p.m. UTC | #11
Hi Xavier,

On Tue, 13 Jun 2023 at 17:04, Xavier Drudis Ferran <xdrudis@tinet.cat> wrote:
>
> El Tue, Jun 13, 2023 at 03:58:22PM +0100, Simon Glass deia:
> >
> > Yes that's right. So 'usb info' should ignore UCLASS_BOOTDEV devices.
>
> That's a possibility, yes. It should work until someone adds another
> device there for some other purpose.
>
> > That is better than checking for the NULL pointer.
> >
>
> Why? What's wrong about checking for null ?
> Or maybe checking for both not null and  not UCLASS_BOOTDEV ?
>
> Not that I care too much, just to understand your reasoning.

Well, devices may have something attached there, so it may be
non-NULL, but point to a different struct from the expected one.

>
> Or can we check for recursible devices somehow ?
> Maybe flag them or something ?
>
> Why is better to state all devices are recursible except some UCLASSes
> (meaning they can have stuff that needs listed in usb info - or
> likewise usb tree -) instead of stating that some closed set are
> recursible ?
>

For USB we can have lots of different types of devices as children, so
it would be a pain to enumerate them all. At least that's how I see
it.

Regards,
Simon
Xavier Drudis Ferran June 14, 2023, 8:40 a.m. UTC | #12
Thanks for explaining, Simon Glass.

Can someone please stop me if I'm splitting hairs or bikeshedding or
something ? I guess we agree both checking for null or UCLASS_BOOTDEV
would work right now but we are looking for what's more future-proof.

El Tue, Jun 13, 2023 at 09:12:30PM +0100, Simon Glass deia:
> Hi Xavier,
> 
> On Tue, 13 Jun 2023 at 17:04, Xavier Drudis Ferran <xdrudis@tinet.cat> wrote:
> >
> > El Tue, Jun 13, 2023 at 03:58:22PM +0100, Simon Glass deia:
> > >
> > > Yes that's right. So 'usb info' should ignore UCLASS_BOOTDEV devices.
> >
> > That's a possibility, yes. It should work until someone adds another
> > device there for some other purpose.
> >
> > > That is better than checking for the NULL pointer.
> > >
> >
> > Why? What's wrong about checking for null ?
> > Or maybe checking for both not null and  not UCLASS_BOOTDEV ?
> >
> > Not that I care too much, just to understand your reasoning.
> 
> Well, devices may have something attached there, so it may be
> non-NULL, but point to a different struct from the expected one.
>

Yes. That is possible. How likely ?

That "there" is dev_get_parent_priv(). If I understand the driver
model, those devices shouldn't put things there themselves, it should
be their parent who puts stuff there for them. And the parent should
be an usb_device->dev (because of the recursive code). So what other
struct would such a parent put in a child parent_priv_ ? Yes, whatever
it wants, but I mean, isn't it likely to assume the child would be
"adopted" with null as parent_priv_ or a "natural child" with a usb_device
in parent_priv_ ? 

I did a very rough search

grep -rIE 'UCLASS_(BLK|BOOTDEV|USB_EMUL)' . |grep -F .id |cut -f1 -d:  |xargs -n 1 grep -l per_child_auto 
./drivers/usb/emul/usb-emul-uclass.c

Which seems to suggest only UCLASS_USB_EMUL would have parent_priv_,
and that would be an usb_device, which the current code does not want
to recurse anyway. (dev_set_parent_priv() is almost never called).

It is also possible that one day a device that is not UCLASS_BLK,
UCLASS_BOOTDEV or UCLASS_USB_EMUL is put as children of a usb storage
device (just imagine a future system similar to bootstd for firmware
updates, trust material, etc.). Is it likely to have a struct in
parent_priv_ that is not a usb_device ? 

So which is more likely to survive future changes ?

- checking for parent_priv_ not null and not UCLASS_USB_EMUL

- checking for parent_priv_ not null and not UCLASS_USB_EMUL and not UCLASS_BLK
  (my patch, overcautious ?)

- checking for not (UCLASS_BLK, UCLASS_BOOTDEV or UCLASS_USB_EMUL)
  (Simon Glass' idea)
  
- checking for not  UCLASS_BLK and not UCLASS_BOOTDEV and not UCLASS_USB_EMUL
           and parent_priv_ not null
	   
> >
> > Or can we check for recursible devices somehow ?
> > Maybe flag them or something ?
> >
> > Why is better to state all devices are recursible except some UCLASSes
> > (meaning they can have stuff that needs listed in usb info - or
> > likewise usb tree -) instead of stating that some closed set are
> > recursible ?
> >
> 
> For USB we can have lots of different types of devices as children, so
> it would be a pain to enumerate them all. At least that's how I see
> it.
>

We can have lots of devices as children, but those do get listed
before the recursive call.  How many of those can have children that
we have to list too, i.e.  how many of those do we want to recurse
into ?

I can only think of usb hubs (maybe some node for multifunction
devices too ???).  Maybe I'm not understanding how U-Boot works with
USB devices... but it looks like it would be enough to look for
UCLASS_USB_HUB, then recursive call, else no recursion. I mean instead of

cmd/usb.c : usb_show_tree_graph() :

	if ((device_get_uclass_id(  child  ) != UCLASS_USB_EMUL) &&
      	    (device_get_uclass_id(  child  ) != UCLASS_BOOTDEV) &&
	    (device_get_uclass_id(  child  ) != UCLASS_BLK)) {
			usb_show_tree_graph(udev, pre);
			pre[index] = 0;
		}

we could simply have

	if (device_get_uclass_id(  dev->dev  ) == UCLASS_USB_HUB) {
			usb_show_tree_graph(udev, pre);
			pre[index] = 0;
		}

(or put the condition directly before the for loop)

Or can we have an UCLASS_USB_EMUL, UCLASS_BLK or UCLASS_BOOTDEV as
direct child of an UCLASS_USB_HUB ???

Am I opening any cans of worms ?
Marek Vasut June 20, 2023, 12:50 a.m. UTC | #13
On 6/13/23 08:52, Xavier Drudis Ferran wrote:
> Ok. New test.
> 
> This uses yesterday morning's   next   branch.
> commit 5b589e139620214f
> Merge: cc5a940923 32d2461e04
> Merge branch 'next_net/phy_connect_dev'
> 
> USB2 does not work for rk3399 in next (fixes are in master, thanks),
> but USB3 is enough.
> 
> I compiled for rock-pi-4-rk3399_defconfig
> 
> flashed to a new microSD card as per doc/board/rockchip/rockchip.rst :
> 
>     dd if=u-boot-rockchip.bin of=/dev/sda seek=64
>     sync
> 
> Put this microSD card in a Rock Pi 4 B+
> Put a new USB stick in the USB3 port (center blue port closer to board).
> 
> (the microSD card and USB stick come from factory, I guess they were
> partitioned with a single FAT partition)
> 
> (make sure emmc and spi are blank)
> 
> Connected only serial console and power.
> 
> Got this:
> 
> U-Boot TPL 2023.07-rc2-00144-g497967f1ee (Jun 12 2023 - 11:15:47)

Next is already at rc4 , what's this rc2 ?
Xavier Drudis Ferran June 20, 2023, 7:03 a.m. UTC | #14
El Tue, Jun 20, 2023 at 02:50:48AM +0200, Marek Vasut deia:
> On 6/13/23 08:52, Xavier Drudis Ferran wrote:
> > 
> > U-Boot TPL 2023.07-rc2-00144-g497967f1ee (Jun 12 2023 - 11:15:47)
> 
> Next is already at rc4 , what's this rc2 ?

It's a test I did last week (June 12th). What do you think has changed
in next between rc2 and rc4 so that this shouldn't happen now ?

My last patch (sent yesterday) was tested with the next branch as of
yesterday morning. But you can test it yourself with any version you like.
Do I have to send the logs of each test I do to the list every time ?

I thought Simon Glass reply last week to the mail you quoted already
showed agreement that the issue exists, and the adding of the
UCLASS_BOOTDEV device is per design, so cmd/usb.c needs fixing to deal
with this change.
Marek Vasut June 20, 2023, 9:13 a.m. UTC | #15
On 6/20/23 09:03, Xavier Drudis Ferran wrote:
> El Tue, Jun 20, 2023 at 02:50:48AM +0200, Marek Vasut deia:
>> On 6/13/23 08:52, Xavier Drudis Ferran wrote:
>>>
>>> U-Boot TPL 2023.07-rc2-00144-g497967f1ee (Jun 12 2023 - 11:15:47)
>>
>> Next is already at rc4 , what's this rc2 ?
> 
> It's a test I did last week (June 12th). What do you think has changed
> in next between rc2 and rc4 so that this shouldn't happen now ?

The beginning of the email states:

"
Ok. New test.

This uses yesterday morning's   next   branch.
commit 5b589e139620214f
"

So, that is not the case ?

Also, commit 497967f1ee does not exist in upstream U-Boot, was this some 
patched U-Boot tree ?

> My last patch (sent yesterday) was tested with the next branch as of
> yesterday morning. But you can test it yourself with any version you like.
> Do I have to send the logs of each test I do to the list every time ?

Ideally the claims you make about what you tested and logs from the test 
should match, that would be a good starting point.

> I thought Simon Glass reply last week to the mail you quoted already
> showed agreement that the issue exists, and the adding of the
> UCLASS_BOOTDEV device is per design, so cmd/usb.c needs fixing to deal
> with this change.

I am not disputing that. What bothers me is the still missing consistent 
test case, esp. if this is a fix which should be added this late in release.
Simon Glass June 20, 2023, 10:03 a.m. UTC | #16
Hi Xavier,

On Wed, 14 Jun 2023 at 09:40, Xavier Drudis Ferran <xdrudis@tinet.cat> wrote:
>
>
> Thanks for explaining, Simon Glass.
>
> Can someone please stop me if I'm splitting hairs or bikeshedding or
> something ? I guess we agree both checking for null or UCLASS_BOOTDEV
> would work right now but we are looking for what's more future-proof.
>
> El Tue, Jun 13, 2023 at 09:12:30PM +0100, Simon Glass deia:
> > Hi Xavier,
> >
> > On Tue, 13 Jun 2023 at 17:04, Xavier Drudis Ferran <xdrudis@tinet.cat> wrote:
> > >
> > > El Tue, Jun 13, 2023 at 03:58:22PM +0100, Simon Glass deia:
> > > >
> > > > Yes that's right. So 'usb info' should ignore UCLASS_BOOTDEV devices.
> > >
> > > That's a possibility, yes. It should work until someone adds another
> > > device there for some other purpose.
> > >
> > > > That is better than checking for the NULL pointer.
> > > >
> > >
> > > Why? What's wrong about checking for null ?
> > > Or maybe checking for both not null and  not UCLASS_BOOTDEV ?
> > >
> > > Not that I care too much, just to understand your reasoning.
> >
> > Well, devices may have something attached there, so it may be
> > non-NULL, but point to a different struct from the expected one.
> >
>
> Yes. That is possible. How likely ?
>
> That "there" is dev_get_parent_priv(). If I understand the driver
> model, those devices shouldn't put things there themselves, it should
> be their parent who puts stuff there for them. And the parent should
> be an usb_device->dev (because of the recursive code). So what other
> struct would such a parent put in a child parent_priv_ ? Yes, whatever
> it wants, but I mean, isn't it likely to assume the child would be
> "adopted" with null as parent_priv_ or a "natural child" with a usb_device
> in parent_priv_ ?
>
> I did a very rough search
>
> grep -rIE 'UCLASS_(BLK|BOOTDEV|USB_EMUL)' . |grep -F .id |cut -f1 -d:  |xargs -n 1 grep -l per_child_auto
> ./drivers/usb/emul/usb-emul-uclass.c
>
> Which seems to suggest only UCLASS_USB_EMUL would have parent_priv_,
> and that would be an usb_device, which the current code does not want
> to recurse anyway. (dev_set_parent_priv() is almost never called).
>
> It is also possible that one day a device that is not UCLASS_BLK,
> UCLASS_BOOTDEV or UCLASS_USB_EMUL is put as children of a usb storage
> device (just imagine a future system similar to bootstd for firmware
> updates, trust material, etc.). Is it likely to have a struct in
> parent_priv_ that is not a usb_device ?
>
> So which is more likely to survive future changes ?
>
> - checking for parent_priv_ not null and not UCLASS_USB_EMUL
>
> - checking for parent_priv_ not null and not UCLASS_USB_EMUL and not UCLASS_BLK
>   (my patch, overcautious ?)
>
> - checking for not (UCLASS_BLK, UCLASS_BOOTDEV or UCLASS_USB_EMUL)
>   (Simon Glass' idea)
>
> - checking for not  UCLASS_BLK and not UCLASS_BOOTDEV and not UCLASS_USB_EMUL
>            and parent_priv_ not null

Really the parent_priv thing is a separate issue, a side effect of
something having a UCLASS_USB parent.

The key point here is that we cannot iterate down into a bootdev
device looking for USB devices. So we should use that as the check,
since it is the most direct check.

>
> > >
> > > Or can we check for recursible devices somehow ?
> > > Maybe flag them or something ?
> > >
> > > Why is better to state all devices are recursible except some UCLASSes
> > > (meaning they can have stuff that needs listed in usb info - or
> > > likewise usb tree -) instead of stating that some closed set are
> > > recursible ?
> > >
> >
> > For USB we can have lots of different types of devices as children, so
> > it would be a pain to enumerate them all. At least that's how I see
> > it.
> >
>
> We can have lots of devices as children, but those do get listed
> before the recursive call.  How many of those can have children that
> we have to list too, i.e.  how many of those do we want to recurse
> into ?
>
> I can only think of usb hubs (maybe some node for multifunction
> devices too ???).  Maybe I'm not understanding how U-Boot works with
> USB devices... but it looks like it would be enough to look for
> UCLASS_USB_HUB, then recursive call, else no recursion. I mean instead of
>
> cmd/usb.c : usb_show_tree_graph() :
>
>         if ((device_get_uclass_id(  child  ) != UCLASS_USB_EMUL) &&
>             (device_get_uclass_id(  child  ) != UCLASS_BOOTDEV) &&
>             (device_get_uclass_id(  child  ) != UCLASS_BLK)) {
>                         usb_show_tree_graph(udev, pre);
>                         pre[index] = 0;
>                 }
>
> we could simply have
>
>         if (device_get_uclass_id(  dev->dev  ) == UCLASS_USB_HUB) {
>                         usb_show_tree_graph(udev, pre);
>                         pre[index] = 0;
>                 }
>
> (or put the condition directly before the for loop)
>
> Or can we have an UCLASS_USB_EMUL, UCLASS_BLK or UCLASS_BOOTDEV as
> direct child of an UCLASS_USB_HUB ???
>
> Am I opening any cans of worms ?

From my memory, I think you can check for a USB hub instead, but I'm
not completely sure.

I suggest adding a test for the command (see test/dm/acpi.c for an
example) since a test is the best way to ensure this doesn't happen
again.

Regards,
Simon
Xavier Drudis Ferran June 20, 2023, 11:20 a.m. UTC | #17
El Tue, Jun 20, 2023 at 11:03:57AM +0100, Simon Glass deia:
> Hi Xavier,
>

Hi Simon,

> >
> > It is also possible that one day a device that is not UCLASS_BLK,
> > UCLASS_BOOTDEV or UCLASS_USB_EMUL is put as children of a usb storage
> > device (just imagine a future system similar to bootstd for firmware
> > updates, trust material, etc.). Is it likely to have a struct in
> > parent_priv_ that is not a usb_device ?
> >
> > So which is more likely to survive future changes ?
> >
> > - checking for parent_priv_ not null and not UCLASS_USB_EMUL
> >
> > - checking for parent_priv_ not null and not UCLASS_USB_EMUL and not UCLASS_BLK
> >   (my patch, overcautious ?)
> >
> > - checking for not (UCLASS_BLK, UCLASS_BOOTDEV or UCLASS_USB_EMUL)
> >   (Simon Glass' idea)
> >
> > - checking for not  UCLASS_BLK and not UCLASS_BOOTDEV and not UCLASS_USB_EMUL
> >            and parent_priv_ not null
> 
> Really the parent_priv thing is a separate issue, a side effect of
> something having a UCLASS_USB parent.
>

I don't think it's a separate issue. If parent_priv is present it
could be a usb_device (most likely) or not, but if it's null there's
no way the recursive call can succeed.

> The key point here is that we cannot iterate down into a bootdev
> device looking for USB devices. So we should use that as the check,
> since it is the most direct check.
>

But things keep appearing that have a UCLASS_USB* parent and no
parent_priv.
in 2017 Suneel Garapati already fixed the issue of UCLASS_BLK
being a child of a device. Now it's UCLASS_BOOTDEV, and tomorrow
may be something else.

The most direct check will miss future cases as the devices tend to
become more abstract instead of mapping one to one to physical stuff.

> 
> >From my memory, I think you can check for a USB hub instead, but I'm
> not completely sure.
>

On second thoughts I didn't find it so easy. There's the root hub,
UCLASS_USB, I think and a UCLASS_USB_EMUL may also be a hub, so I
don't know anymore how more elegant that could be, so I left it be.

> I suggest adding a test for the command (see test/dm/acpi.c for an
> example) since a test is the best way to ensure this doesn't happen
> again.
>

Makes sense. But I don't have any more time for that, sorry.

I think we'll have to leave it at this unless someone else has the time.

Bye.
Simon Glass June 20, 2023, 2:36 p.m. UTC | #18
On Tue, 20 Jun 2023 at 12:20, Xavier Drudis Ferran <xdrudis@tinet.cat> wrote:
>
> El Tue, Jun 20, 2023 at 11:03:57AM +0100, Simon Glass deia:
> > Hi Xavier,
> >
>
> Hi Simon,
>
> > >
> > > It is also possible that one day a device that is not UCLASS_BLK,
> > > UCLASS_BOOTDEV or UCLASS_USB_EMUL is put as children of a usb storage
> > > device (just imagine a future system similar to bootstd for firmware
> > > updates, trust material, etc.). Is it likely to have a struct in
> > > parent_priv_ that is not a usb_device ?
> > >
> > > So which is more likely to survive future changes ?
> > >
> > > - checking for parent_priv_ not null and not UCLASS_USB_EMUL
> > >
> > > - checking for parent_priv_ not null and not UCLASS_USB_EMUL and not UCLASS_BLK
> > >   (my patch, overcautious ?)
> > >
> > > - checking for not (UCLASS_BLK, UCLASS_BOOTDEV or UCLASS_USB_EMUL)
> > >   (Simon Glass' idea)
> > >
> > > - checking for not  UCLASS_BLK and not UCLASS_BOOTDEV and not UCLASS_USB_EMUL
> > >            and parent_priv_ not null
> >
> > Really the parent_priv thing is a separate issue, a side effect of
> > something having a UCLASS_USB parent.
> >
>
> I don't think it's a separate issue. If parent_priv is present it
> could be a usb_device (most likely) or not, but if it's null there's
> no way the recursive call can succeed.
>
> > The key point here is that we cannot iterate down into a bootdev
> > device looking for USB devices. So we should use that as the check,
> > since it is the most direct check.
> >
>
> But things keep appearing that have a UCLASS_USB* parent and no
> parent_priv.
> in 2017 Suneel Garapati already fixed the issue of UCLASS_BLK
> being a child of a device. Now it's UCLASS_BOOTDEV, and tomorrow
> may be something else.
>
> The most direct check will miss future cases as the devices tend to
> become more abstract instead of mapping one to one to physical stuff.
>
> >
> > >From my memory, I think you can check for a USB hub instead, but I'm
> > not completely sure.
> >
>
> On second thoughts I didn't find it so easy. There's the root hub,
> UCLASS_USB, I think and a UCLASS_USB_EMUL may also be a hub, so I
> don't know anymore how more elegant that could be, so I left it be.
>
> > I suggest adding a test for the command (see test/dm/acpi.c for an
> > example) since a test is the best way to ensure this doesn't happen
> > again.
> >
>
> Makes sense. But I don't have any more time for that, sorry.
>
> I think we'll have to leave it at this unless someone else has the time.

Reviewed-by: Simon Glass <sjg@chromium.org>
diff mbox series

Patch

diff --git a/cmd/usb.c b/cmd/usb.c
index 73addb04c4..7e6065aa51 100644
--- a/cmd/usb.c
+++ b/cmd/usb.c
@@ -421,7 +421,8 @@  static void usb_show_tree_graph(struct usb_device *dev, char *pre)
 		 * Ignore emulators and block child devices, we only want
 		 * real devices
 		 */
-		if ((device_get_uclass_id(child) != UCLASS_USB_EMUL) &&
+		if (udev &&
+		    (device_get_uclass_id(child) != UCLASS_USB_EMUL) &&
 		    (device_get_uclass_id(child) != UCLASS_BLK)) {
 			usb_show_tree_graph(udev, pre);
 			pre[index] = 0;
@@ -607,7 +608,8 @@  static void usb_show_info(struct usb_device *udev)
 		    (device_get_uclass_id(child) != UCLASS_USB_EMUL) &&
 		    (device_get_uclass_id(child) != UCLASS_BLK)) {
 			udev = dev_get_parent_priv(child);
-			usb_show_info(udev);
+			if (udev)
+				usb_show_info(udev);
 		}
 	}
 }