mbox series

[0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands

Message ID 20210114150902.11515-1-bmeng.cn@gmail.com
Headers show
Series hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands | expand

Message

Bin Meng Jan. 14, 2021, 3:08 p.m. UTC
From: Bin Meng <bin.meng@windriver.com>

The m25p80 model uses s->needed_bytes to indicate how many follow-up
bytes are expected to be received after it receives a command. For
example, depending on the address mode, either 3-byte address or
4-byte address is needed.

For fast read family commands, some dummy cycles are required after
sending the address bytes, and the dummy cycles need to be counted
in s->needed_bytes. This is where the mess began.

As the variable name (needed_bytes) indicates, the unit is in byte.
It is not in bit, or cycle. However for some reason the model has
been using the number of dummy cycles for s->needed_bytes. The right
approach is to convert the number of dummy cycles to bytes based on
the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).

Things get complicated when interacting with different SPI or QSPI
flash controllers. There are major two cases:

- Dummy bytes prepared by drivers, and wrote to the controller fifo.
  For such case, driver will calculate the correct number of dummy
  bytes and write them into the tx fifo. Fixing the m25p80 model will
  fix flashes working with such controllers.
- Dummy bytes not prepared by drivers. Drivers just tell the hardware
  the dummy cycle configuration via some registers, and hardware will
  automatically generate dummy cycles for us. Fixing the m25p80 model
  is not enough, and we will need to fix the SPI/QSPI models for such
  controllers.

This series fixes the mess in the m25p80 from the flash side first,
followed by fixes to 3 known SPI controller models that fall into
the 2nd case above.

Please note, I have no way to verify patch 7/8/9 because:

* There is no public datasheet available for the SoC / SPI controller
* There is no QEMU docs, or details that tell people how to boot either
  U-Boot or Linux kernel to verify the functionality

These 3 patches are very likely to be wrong. Hence I would like to ask
help from the original author who wrote these SPI controller models
to help testing, or completely rewrite these 3 patches to fix things.
Thanks!

Patch 6 is unvalidated with QEMU, mainly because there is no doc to
tell people how to boot anything to test. But I have some confidence
based on my read of the ZynqMP manual, as well as some experimental
testing on a real ZCU102 board.

Other flash patches can be tested with the SiFive SPI series:
http://patchwork.ozlabs.org/project/qemu-devel/list/?series=222391

Cherry-pick patch 16 and 17 from the series above, and switch to
different flash model to test with the following command:

$ qemu-system-riscv64 -nographic -M sifive_u -m 2G -smp 5 -kernel u-boot

I've picked up two for testing:

QEMU flash: "sst25vf032b"

  U-Boot 2020.10 (Jan 14 2021 - 21:55:59 +0800)

  CPU:   rv64imafdcsu
  Model: SiFive HiFive Unleashed A00
  DRAM:  2 GiB
  MMC:
  Loading Environment from SPIFlash... SF: Detected sst25vf032b with page size 256 Bytes, erase size 4 KiB, total 4 MiB
  *** Warning - bad CRC, using default environment

  In:    serial@10010000
  Out:   serial@10010000
  Err:   serial@10010000
  Net:   failed to get gemgxl_reset reset

  Warning: ethernet@10090000 MAC addresses don't match:
  Address in DT is                52:54:00:12:34:56
  Address in environment is       70:b3:d5:92:f0:01
  eth0: ethernet@10090000
  Hit any key to stop autoboot:  0
  => sf probe
  SF: Detected sst25vf032b with page size 256 Bytes, erase size 4 KiB,
  total 4 MiB
  => sf test 1ff000 1000
  SPI flash test:
  0 erase: 0 ticks, 4096000 KiB/s 32768.000 Mbps
  1 check: 10 ticks, 400 KiB/s 3.200 Mbps
  2 write: 170 ticks, 23 KiB/s 0.184 Mbps
  3 read: 9 ticks, 444 KiB/s 3.552 Mbps
  Test passed
  0 erase: 0 ticks, 4096000 KiB/s 32768.000 Mbps
  1 check: 10 ticks, 400 KiB/s 3.200 Mbps
  2 write: 170 ticks, 23 KiB/s 0.184 Mbps
  3 read: 9 ticks, 444 KiB/s 3.552 Mbps

QEMU flash: "mx66u51235f"

  U-Boot 2020.10 (Jan 14 2021 - 21:55:59 +0800)

  CPU:   rv64imafdcsu
  Model: SiFive HiFive Unleashed A00
  DRAM:  2 GiB
  MMC:
  Loading Environment from SPIFlash... SF: Detected mx66u51235f with page size 256 Bytes, erase size 4 KiB, total 64 MiB
  *** Warning - bad CRC, using default environment

  In:    serial@10010000
  Out:   serial@10010000
  Err:   serial@10010000
  Net:   failed to get gemgxl_reset reset

  Warning: ethernet@10090000 MAC addresses don't match:
  Address in DT is                52:54:00:12:34:56
  Address in environment is       70:b3:d5:92:f0:01
  eth0: ethernet@10090000
  Hit any key to stop autoboot:  0
  => sf probe
  SF: Detected mx66u51235f with page size 256 Bytes, erase size 4 KiB, total 64 MiB
  => sf test 0 8000
  SPI flash test:
  0 erase: 1 ticks, 32000 KiB/s 256.000 Mbps
  1 check: 80 ticks, 400 KiB/s 3.200 Mbps
  2 write: 83 ticks, 385 KiB/s 3.080 Mbps
  3 read: 79 ticks, 405 KiB/s 3.240 Mbps
  Test passed
  0 erase: 1 ticks, 32000 KiB/s 256.000 Mbps
  1 check: 80 ticks, 400 KiB/s 3.200 Mbps
  2 write: 83 ticks, 385 KiB/s 3.080 Mbps
  3 read: 79 ticks, 405 KiB/s 3.240 Mbps

I am sure there will be bugs, and I have not tested all flashes affected.
But I want to send out this series for an early discussion and comments.
I will continue my testing.


Bin Meng (9):
  hw/block: m25p80: Fix the number of dummy bytes needed for Windbond
    flashes
  hw/block: m25p80: Fix the number of dummy bytes needed for
    Numonyx/Micron flashes
  hw/block: m25p80: Fix the number of dummy bytes needed for Macronix
    flashes
  hw/block: m25p80: Fix the number of dummy bytes needed for Spansion
    flashes
  hw/block: m25p80: Support fast read for SST flashes
  hw/ssi: xilinx_spips: Fix generic fifo dummy cycle handling
  Revert "aspeed/smc: Fix number of dummy cycles for FAST_READ_4
    command"
  Revert "aspeed/smc: snoop SPI transfers to fake dummy cycles"
  hw/ssi: npcm7xx_fiu: Correct the dummy cycle emulation logic

 include/hw/ssi/aspeed_smc.h |   3 -
 hw/block/m25p80.c           | 153 ++++++++++++++++++++++++++++--------
 hw/ssi/aspeed_smc.c         | 116 +--------------------------
 hw/ssi/npcm7xx_fiu.c        |   8 +-
 hw/ssi/xilinx_spips.c       |  29 ++++++-
 5 files changed, 153 insertions(+), 156 deletions(-)

Comments

Cédric Le Goater Jan. 14, 2021, 3:59 p.m. UTC | #1
On 1/14/21 4:08 PM, Bin Meng wrote:
> From: Bin Meng <bin.meng@windriver.com>
> 
> The m25p80 model uses s->needed_bytes to indicate how many follow-up
> bytes are expected to be received after it receives a command. For
> example, depending on the address mode, either 3-byte address or
> 4-byte address is needed.
> 
> For fast read family commands, some dummy cycles are required after
> sending the address bytes, and the dummy cycles need to be counted
> in s->needed_bytes. This is where the mess began.
> 
> As the variable name (needed_bytes) indicates, the unit is in byte.
> It is not in bit, or cycle. However for some reason the model has
> been using the number of dummy cycles for s->needed_bytes. The right
> approach is to convert the number of dummy cycles to bytes based on
> the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> 
> Things get complicated when interacting with different SPI or QSPI
> flash controllers. There are major two cases:
> 
> - Dummy bytes prepared by drivers, and wrote to the controller fifo.
>   For such case, driver will calculate the correct number of dummy
>   bytes and write them into the tx fifo. Fixing the m25p80 model will
>   fix flashes working with such controllers.
> - Dummy bytes not prepared by drivers. Drivers just tell the hardware
>   the dummy cycle configuration via some registers, and hardware will
>   automatically generate dummy cycles for us. Fixing the m25p80 model
>   is not enough, and we will need to fix the SPI/QSPI models for such
>   controllers.
> 
> This series fixes the mess in the m25p80 from the flash side first,
> followed by fixes to 3 known SPI controller models that fall into
> the 2nd case above.
> 
> Please note, I have no way to verify patch 7/8/9 because:
> 
> * There is no public datasheet available for the SoC / SPI controller
> * There is no QEMU docs, or details that tell people how to boot either
>   U-Boot or Linux kernel to verify the functionality

The Linux drivers are available in mainline but these branches are more 
up to date since not everything is merged :

  https://github.com/openbmc/linux

u-boot : 

  https://github.com/openbmc/u-boot/tree/v2016.07-aspeed-openbmc (ast2400/ast2500)
  https://github.com/openbmc/u-boot/tree/v2019.04-aspeed-openbmc (ast2600)

A quick intro : 

  https://www.qemu.org/docs/master/system/arm/aspeed.html

> 
> These 3 patches are very likely to be wrong. Hence I would like to ask
> help from the original author who wrote these SPI controller models
> to help testing, or completely rewrite these 3 patches to fix things.
> Thanks!

A quick test shows that all Aspeed machines are broken with this patchset.

Please try these command lines : 

  wget https://openpower.xyz/job/openbmc-build/lastSuccessfulBuild/distro=ubuntu,label=builder,target=palmetto/artifact/deploy/images/palmetto/flash-palmetto
  wget https://openpower.xyz/job/openbmc-build/lastSuccessfulBuild/distro=ubuntu,label=builder,target=romulus/artifact/deploy/images/romulus/flash-romulus
  wget https://openpower.xyz/job/openbmc-build/lastSuccessfulBuild/distro=ubuntu,label=builder,target=witherspoon/artifact/deploy/images/witherspoon/obmc-phosphor-image-witherspoon.ubi.mtd

  qemu-system-arm -M witherspoon-bmc -nic user -drive file=obmc-phosphor-image-witherspoon.ubi.mtd,format=raw,if=mtd -nographic
  qemu-system-arm -M romulus-bmc -nic user -drive file=flash-romulus,format=raw,if=mtd -nographic
  qemu-system-arm -M palmetto-bmc -nic user -drive file=flash-palmetto,format=raw,if=mtd -nographic

The Aspeed SMC model has traces to help you in the task.

Thanks,

C. 
 
> Patch 6 is unvalidated with QEMU, mainly because there is no doc to
> tell people how to boot anything to test. But I have some confidence
> based on my read of the ZynqMP manual, as well as some experimental
> testing on a real ZCU102 board.
> 
> Other flash patches can be tested with the SiFive SPI series:
> http://patchwork.ozlabs.org/project/qemu-devel/list/?series=222391
> 
> Cherry-pick patch 16 and 17 from the series above, and switch to
> different flash model to test with the following command:
> 
> $ qemu-system-riscv64 -nographic -M sifive_u -m 2G -smp 5 -kernel u-boot
> 
> I've picked up two for testing:
> 
> QEMU flash: "sst25vf032b"
> 
>   U-Boot 2020.10 (Jan 14 2021 - 21:55:59 +0800)
> 
>   CPU:   rv64imafdcsu
>   Model: SiFive HiFive Unleashed A00
>   DRAM:  2 GiB
>   MMC:
>   Loading Environment from SPIFlash... SF: Detected sst25vf032b with page size 256 Bytes, erase size 4 KiB, total 4 MiB
>   *** Warning - bad CRC, using default environment
> 
>   In:    serial@10010000
>   Out:   serial@10010000
>   Err:   serial@10010000
>   Net:   failed to get gemgxl_reset reset
> 
>   Warning: ethernet@10090000 MAC addresses don't match:
>   Address in DT is                52:54:00:12:34:56
>   Address in environment is       70:b3:d5:92:f0:01
>   eth0: ethernet@10090000
>   Hit any key to stop autoboot:  0
>   => sf probe
>   SF: Detected sst25vf032b with page size 256 Bytes, erase size 4 KiB,
>   total 4 MiB
>   => sf test 1ff000 1000
>   SPI flash test:
>   0 erase: 0 ticks, 4096000 KiB/s 32768.000 Mbps
>   1 check: 10 ticks, 400 KiB/s 3.200 Mbps
>   2 write: 170 ticks, 23 KiB/s 0.184 Mbps
>   3 read: 9 ticks, 444 KiB/s 3.552 Mbps
>   Test passed
>   0 erase: 0 ticks, 4096000 KiB/s 32768.000 Mbps
>   1 check: 10 ticks, 400 KiB/s 3.200 Mbps
>   2 write: 170 ticks, 23 KiB/s 0.184 Mbps
>   3 read: 9 ticks, 444 KiB/s 3.552 Mbps
> 
> QEMU flash: "mx66u51235f"
> 
>   U-Boot 2020.10 (Jan 14 2021 - 21:55:59 +0800)
> 
>   CPU:   rv64imafdcsu
>   Model: SiFive HiFive Unleashed A00
>   DRAM:  2 GiB
>   MMC:
>   Loading Environment from SPIFlash... SF: Detected mx66u51235f with page size 256 Bytes, erase size 4 KiB, total 64 MiB
>   *** Warning - bad CRC, using default environment
> 
>   In:    serial@10010000
>   Out:   serial@10010000
>   Err:   serial@10010000
>   Net:   failed to get gemgxl_reset reset
> 
>   Warning: ethernet@10090000 MAC addresses don't match:
>   Address in DT is                52:54:00:12:34:56
>   Address in environment is       70:b3:d5:92:f0:01
>   eth0: ethernet@10090000
>   Hit any key to stop autoboot:  0
>   => sf probe
>   SF: Detected mx66u51235f with page size 256 Bytes, erase size 4 KiB, total 64 MiB
>   => sf test 0 8000
>   SPI flash test:
>   0 erase: 1 ticks, 32000 KiB/s 256.000 Mbps
>   1 check: 80 ticks, 400 KiB/s 3.200 Mbps
>   2 write: 83 ticks, 385 KiB/s 3.080 Mbps
>   3 read: 79 ticks, 405 KiB/s 3.240 Mbps
>   Test passed
>   0 erase: 1 ticks, 32000 KiB/s 256.000 Mbps
>   1 check: 80 ticks, 400 KiB/s 3.200 Mbps
>   2 write: 83 ticks, 385 KiB/s 3.080 Mbps
>   3 read: 79 ticks, 405 KiB/s 3.240 Mbps
> 
> I am sure there will be bugs, and I have not tested all flashes affected.
> But I want to send out this series for an early discussion and comments.
> I will continue my testing.
> 
> 
> Bin Meng (9):
>   hw/block: m25p80: Fix the number of dummy bytes needed for Windbond
>     flashes
>   hw/block: m25p80: Fix the number of dummy bytes needed for
>     Numonyx/Micron flashes
>   hw/block: m25p80: Fix the number of dummy bytes needed for Macronix
>     flashes
>   hw/block: m25p80: Fix the number of dummy bytes needed for Spansion
>     flashes
>   hw/block: m25p80: Support fast read for SST flashes
>   hw/ssi: xilinx_spips: Fix generic fifo dummy cycle handling
>   Revert "aspeed/smc: Fix number of dummy cycles for FAST_READ_4
>     command"
>   Revert "aspeed/smc: snoop SPI transfers to fake dummy cycles"
>   hw/ssi: npcm7xx_fiu: Correct the dummy cycle emulation logic
> 
>  include/hw/ssi/aspeed_smc.h |   3 -
>  hw/block/m25p80.c           | 153 ++++++++++++++++++++++++++++--------
>  hw/ssi/aspeed_smc.c         | 116 +--------------------------
>  hw/ssi/npcm7xx_fiu.c        |   8 +-
>  hw/ssi/xilinx_spips.c       |  29 ++++++-
>  5 files changed, 153 insertions(+), 156 deletions(-)
>
no-reply@patchew.org Jan. 14, 2021, 4:12 p.m. UTC | #2
Patchew URL: https://patchew.org/QEMU/20210114150902.11515-1-bmeng.cn@gmail.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210114150902.11515-1-bmeng.cn@gmail.com
Subject: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]      patchew/20210114013147.92962-1-jiaxun.yang@flygoat.com -> patchew/20210114013147.92962-1-jiaxun.yang@flygoat.com
 * [new tag]         patchew/20210114150902.11515-1-bmeng.cn@gmail.com -> patchew/20210114150902.11515-1-bmeng.cn@gmail.com
Switched to a new branch 'test'
b87aded hw/ssi: npcm7xx_fiu: Correct the dummy cycle emulation logic
4518be2 Revert "aspeed/smc: snoop SPI transfers to fake dummy cycles"
6a4067a Revert "aspeed/smc: Fix number of dummy cycles for FAST_READ_4 command"
e5ea744 hw/ssi: xilinx_spips: Fix generic fifo dummy cycle handling
3294942 hw/block: m25p80: Support fast read for SST flashes
50a7f9f hw/block: m25p80: Fix the number of dummy bytes needed for Spansion flashes
cf6f8e1 hw/block: m25p80: Fix the number of dummy bytes needed for Macronix flashes
3925fcf hw/block: m25p80: Fix the number of dummy bytes needed for Numonyx/Micron flashes
5344168 hw/block: m25p80: Fix the number of dummy bytes needed for Windbond flashes

=== OUTPUT BEGIN ===
1/9 Checking commit 5344168de433 (hw/block: m25p80: Fix the number of dummy bytes needed for Windbond flashes)
2/9 Checking commit 3925fcf79dbc (hw/block: m25p80: Fix the number of dummy bytes needed for Numonyx/Micron flashes)
3/9 Checking commit cf6f8e145faa (hw/block: m25p80: Fix the number of dummy bytes needed for Macronix flashes)
4/9 Checking commit 50a7f9fb909b (hw/block: m25p80: Fix the number of dummy bytes needed for Spansion flashes)
5/9 Checking commit 3294942ca3a1 (hw/block: m25p80: Support fast read for SST flashes)
6/9 Checking commit e5ea74473d87 (hw/ssi: xilinx_spips: Fix generic fifo dummy cycle handling)
ERROR: line over 90 characters
#63: FILE: hw/ssi/xilinx_spips.c:510:
+                    uint8_t spi_mode = ARRAY_FIELD_EX32(s->regs, GQSPI_GF_SNAPSHOT, SPI_MODE);

ERROR: line over 90 characters
#71: FILE: hw/ssi/xilinx_spips.c:518:
+                        qemu_log_mask(LOG_GUEST_ERROR, "Unknown SPI MODE: 0x%x ", spi_mode);

total: 2 errors, 0 warnings, 41 lines checked

Patch 6/9 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

7/9 Checking commit 6a4067a6a9fc (Revert "aspeed/smc: Fix number of dummy cycles for FAST_READ_4 command")
8/9 Checking commit 4518be22e1c9 (Revert "aspeed/smc: snoop SPI transfers to fake dummy cycles")
9/9 Checking commit b87aded6dc2a (hw/ssi: npcm7xx_fiu: Correct the dummy cycle emulation logic)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20210114150902.11515-1-bmeng.cn@gmail.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Francisco Iglesias Jan. 14, 2021, 6:13 p.m. UTC | #3
Hi Bin,

On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> From: Bin Meng <bin.meng@windriver.com>
> 
> The m25p80 model uses s->needed_bytes to indicate how many follow-up
> bytes are expected to be received after it receives a command. For
> example, depending on the address mode, either 3-byte address or
> 4-byte address is needed.
> 
> For fast read family commands, some dummy cycles are required after
> sending the address bytes, and the dummy cycles need to be counted
> in s->needed_bytes. This is where the mess began.
> 
> As the variable name (needed_bytes) indicates, the unit is in byte.
> It is not in bit, or cycle. However for some reason the model has
> been using the number of dummy cycles for s->needed_bytes. The right
> approach is to convert the number of dummy cycles to bytes based on
> the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).

While not being the original implementor I must assume that above solution was
considered but not chosen by the developers due to it is inaccuracy (it
wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
meaning that if the controller is wrongly programmed to generate 7 the error
wouldn't be caught and the controller will still be considered "correct"). Now
that we have this detail in the implementation I'm in favor of keeping it, this
also because the detail is already in use for catching exactly above error.

> 
> Things get complicated when interacting with different SPI or QSPI
> flash controllers. There are major two cases:
> 
> - Dummy bytes prepared by drivers, and wrote to the controller fifo.
>   For such case, driver will calculate the correct number of dummy
>   bytes and write them into the tx fifo. Fixing the m25p80 model will
>   fix flashes working with such controllers.

Above can be fixed while still keeping the detailed dummy cycle implementation
inside m25p80. Perhaps one of the following could be looked into: configurating
the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
some functionality handling this in the SPI controller. Or a mixture of above.

> - Dummy bytes not prepared by drivers. Drivers just tell the hardware
>   the dummy cycle configuration via some registers, and hardware will
>   automatically generate dummy cycles for us. Fixing the m25p80 model
>   is not enough, and we will need to fix the SPI/QSPI models for such
>   controllers.
> 
> This series fixes the mess in the m25p80 from the flash side first,

Considering the problems solved by the solution in tree I find m25p80 pretty
clean, at least I don't see any clearly better way for accurately modeling the
dummy clock cycles. Counting bits instead of bytes would for example still
force the controllers to mark which bits to count (when transmitting one dummy
byte from a txfifo on four lines (Quad command) it generates 2 dummy clock
cycles since it takes two cycles to transfer 8 bits).

Best regards,
Francisco Iglesias


> followed by fixes to 3 known SPI controller models that fall into
> the 2nd case above.
> 
> Please note, I have no way to verify patch 7/8/9 because:
> 
> * There is no public datasheet available for the SoC / SPI controller
> * There is no QEMU docs, or details that tell people how to boot either
>   U-Boot or Linux kernel to verify the functionality
> 
> These 3 patches are very likely to be wrong. Hence I would like to ask
> help from the original author who wrote these SPI controller models
> to help testing, or completely rewrite these 3 patches to fix things.
> Thanks!
> 
> Patch 6 is unvalidated with QEMU, mainly because there is no doc to
> tell people how to boot anything to test. But I have some confidence
> based on my read of the ZynqMP manual, as well as some experimental
> testing on a real ZCU102 board.
> 
> Other flash patches can be tested with the SiFive SPI series:
> http://patchwork.ozlabs.org/project/qemu-devel/list/?series=222391
> 
> Cherry-pick patch 16 and 17 from the series above, and switch to
> different flash model to test with the following command:
> 
> $ qemu-system-riscv64 -nographic -M sifive_u -m 2G -smp 5 -kernel u-boot
> 
> I've picked up two for testing:
> 
> QEMU flash: "sst25vf032b"
> 
>   U-Boot 2020.10 (Jan 14 2021 - 21:55:59 +0800)
> 
>   CPU:   rv64imafdcsu
>   Model: SiFive HiFive Unleashed A00
>   DRAM:  2 GiB
>   MMC:
>   Loading Environment from SPIFlash... SF: Detected sst25vf032b with page size 256 Bytes, erase size 4 KiB, total 4 MiB
>   *** Warning - bad CRC, using default environment
> 
>   In:    serial@10010000
>   Out:   serial@10010000
>   Err:   serial@10010000
>   Net:   failed to get gemgxl_reset reset
> 
>   Warning: ethernet@10090000 MAC addresses don't match:
>   Address in DT is                52:54:00:12:34:56
>   Address in environment is       70:b3:d5:92:f0:01
>   eth0: ethernet@10090000
>   Hit any key to stop autoboot:  0
>   => sf probe
>   SF: Detected sst25vf032b with page size 256 Bytes, erase size 4 KiB,
>   total 4 MiB
>   => sf test 1ff000 1000
>   SPI flash test:
>   0 erase: 0 ticks, 4096000 KiB/s 32768.000 Mbps
>   1 check: 10 ticks, 400 KiB/s 3.200 Mbps
>   2 write: 170 ticks, 23 KiB/s 0.184 Mbps
>   3 read: 9 ticks, 444 KiB/s 3.552 Mbps
>   Test passed
>   0 erase: 0 ticks, 4096000 KiB/s 32768.000 Mbps
>   1 check: 10 ticks, 400 KiB/s 3.200 Mbps
>   2 write: 170 ticks, 23 KiB/s 0.184 Mbps
>   3 read: 9 ticks, 444 KiB/s 3.552 Mbps
> 
> QEMU flash: "mx66u51235f"
> 
>   U-Boot 2020.10 (Jan 14 2021 - 21:55:59 +0800)
> 
>   CPU:   rv64imafdcsu
>   Model: SiFive HiFive Unleashed A00
>   DRAM:  2 GiB
>   MMC:
>   Loading Environment from SPIFlash... SF: Detected mx66u51235f with page size 256 Bytes, erase size 4 KiB, total 64 MiB
>   *** Warning - bad CRC, using default environment
> 
>   In:    serial@10010000
>   Out:   serial@10010000
>   Err:   serial@10010000
>   Net:   failed to get gemgxl_reset reset
> 
>   Warning: ethernet@10090000 MAC addresses don't match:
>   Address in DT is                52:54:00:12:34:56
>   Address in environment is       70:b3:d5:92:f0:01
>   eth0: ethernet@10090000
>   Hit any key to stop autoboot:  0
>   => sf probe
>   SF: Detected mx66u51235f with page size 256 Bytes, erase size 4 KiB, total 64 MiB
>   => sf test 0 8000
>   SPI flash test:
>   0 erase: 1 ticks, 32000 KiB/s 256.000 Mbps
>   1 check: 80 ticks, 400 KiB/s 3.200 Mbps
>   2 write: 83 ticks, 385 KiB/s 3.080 Mbps
>   3 read: 79 ticks, 405 KiB/s 3.240 Mbps
>   Test passed
>   0 erase: 1 ticks, 32000 KiB/s 256.000 Mbps
>   1 check: 80 ticks, 400 KiB/s 3.200 Mbps
>   2 write: 83 ticks, 385 KiB/s 3.080 Mbps
>   3 read: 79 ticks, 405 KiB/s 3.240 Mbps
> 
> I am sure there will be bugs, and I have not tested all flashes affected.
> But I want to send out this series for an early discussion and comments.
> I will continue my testing.
> 
> 
> Bin Meng (9):
>   hw/block: m25p80: Fix the number of dummy bytes needed for Windbond
>     flashes
>   hw/block: m25p80: Fix the number of dummy bytes needed for
>     Numonyx/Micron flashes
>   hw/block: m25p80: Fix the number of dummy bytes needed for Macronix
>     flashes
>   hw/block: m25p80: Fix the number of dummy bytes needed for Spansion
>     flashes
>   hw/block: m25p80: Support fast read for SST flashes
>   hw/ssi: xilinx_spips: Fix generic fifo dummy cycle handling
>   Revert "aspeed/smc: Fix number of dummy cycles for FAST_READ_4
>     command"
>   Revert "aspeed/smc: snoop SPI transfers to fake dummy cycles"
>   hw/ssi: npcm7xx_fiu: Correct the dummy cycle emulation logic
> 
>  include/hw/ssi/aspeed_smc.h |   3 -
>  hw/block/m25p80.c           | 153 ++++++++++++++++++++++++++++--------
>  hw/ssi/aspeed_smc.c         | 116 +--------------------------
>  hw/ssi/npcm7xx_fiu.c        |   8 +-
>  hw/ssi/xilinx_spips.c       |  29 ++++++-
>  5 files changed, 153 insertions(+), 156 deletions(-)
> 
> -- 
> 2.25.1
>
Bin Meng Jan. 15, 2021, 2:07 a.m. UTC | #4
Hi Francisco,

On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
<frasse.iglesias@gmail.com> wrote:
>
> Hi Bin,
>
> On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > From: Bin Meng <bin.meng@windriver.com>
> >
> > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > bytes are expected to be received after it receives a command. For
> > example, depending on the address mode, either 3-byte address or
> > 4-byte address is needed.
> >
> > For fast read family commands, some dummy cycles are required after
> > sending the address bytes, and the dummy cycles need to be counted
> > in s->needed_bytes. This is where the mess began.
> >
> > As the variable name (needed_bytes) indicates, the unit is in byte.
> > It is not in bit, or cycle. However for some reason the model has
> > been using the number of dummy cycles for s->needed_bytes. The right
> > approach is to convert the number of dummy cycles to bytes based on
> > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
>
> While not being the original implementor I must assume that above solution was
> considered but not chosen by the developers due to it is inaccuracy (it
> wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> meaning that if the controller is wrongly programmed to generate 7 the error
> wouldn't be caught and the controller will still be considered "correct"). Now
> that we have this detail in the implementation I'm in favor of keeping it, this
> also because the detail is already in use for catching exactly above error.
>

I found no clue from the commit message that my proposed solution here
was ever considered, otherwise all SPI controller models supporting
software generation should have been found out seriously broken long
time ago!

The issue you pointed out that we require the total number of dummy
bits should be multiple of 8 is true, that's why I added the
unimplemented log message in this series (patch 2/3/4) to warn users
if this expectation is not met. However this will not cause any issue
when running U-Boot or Linux, because both spi-nor drivers expect the
same assumption as we do here.

See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
there is a logic to calculate the dummy bytes needed for fast read
command:

    /* convert the dummy cycles to the number of bytes */
    op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;

Note the default dummy cycles configuration for all flashes I have
looked into as of today, meets the multiple of 8 assumption. On some
flashes the dummy cycle number is configurable, and if it's been
configured to be an odd value, it would not work on U-Boot/Linux in
the first place.

> >
> > Things get complicated when interacting with different SPI or QSPI
> > flash controllers. There are major two cases:
> >
> > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> >   For such case, driver will calculate the correct number of dummy
> >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> >   fix flashes working with such controllers.
>
> Above can be fixed while still keeping the detailed dummy cycle implementation
> inside m25p80. Perhaps one of the following could be looked into: configurating
> the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> some functionality handling this in the SPI controller. Or a mixture of above.

Please send patches to explain this in detail how this is going to
work. I am open to all possible solutions.

>
> > - Dummy bytes not prepared by drivers. Drivers just tell the hardware
> >   the dummy cycle configuration via some registers, and hardware will
> >   automatically generate dummy cycles for us. Fixing the m25p80 model
> >   is not enough, and we will need to fix the SPI/QSPI models for such
> >   controllers.
> >
> > This series fixes the mess in the m25p80 from the flash side first,
>
> Considering the problems solved by the solution in tree I find m25p80 pretty
> clean, at least I don't see any clearly better way for accurately modeling the
> dummy clock cycles. Counting bits instead of bytes would for example still
> force the controllers to mark which bits to count (when transmitting one dummy
> byte from a txfifo on four lines (Quad command) it generates 2 dummy clock
> cycles since it takes two cycles to transfer 8 bits).
>

SPI is a bit based protocol, not bytes. If you insist on bit modeling
with the dummy cycles then you should also suggest we change all
cycles (including command/addr/dummy/data phases) to be modeled with
bits. That way we can accurately emulate everything, for example one
potential problem like transferring 9 bit in the data phase.

However modeling everything with bit is super inefficient. My view is
that we should avoid trying to support uncommon use cases (like not
multiple of 8 for dummy bits) in QEMU.

Regards,
Bin
Francisco Iglesias Jan. 15, 2021, 12:26 p.m. UTC | #5
Hi Bin,

On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> Hi Francisco,
> 
> On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> <frasse.iglesias@gmail.com> wrote:
> >
> > Hi Bin,
> >
> > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > From: Bin Meng <bin.meng@windriver.com>
> > >
> > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > bytes are expected to be received after it receives a command. For
> > > example, depending on the address mode, either 3-byte address or
> > > 4-byte address is needed.
> > >
> > > For fast read family commands, some dummy cycles are required after
> > > sending the address bytes, and the dummy cycles need to be counted
> > > in s->needed_bytes. This is where the mess began.
> > >
> > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > It is not in bit, or cycle. However for some reason the model has
> > > been using the number of dummy cycles for s->needed_bytes. The right
> > > approach is to convert the number of dummy cycles to bytes based on
> > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> >
> > While not being the original implementor I must assume that above solution was
> > considered but not chosen by the developers due to it is inaccuracy (it
> > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > meaning that if the controller is wrongly programmed to generate 7 the error
> > wouldn't be caught and the controller will still be considered "correct"). Now
> > that we have this detail in the implementation I'm in favor of keeping it, this
> > also because the detail is already in use for catching exactly above error.
> >
> 
> I found no clue from the commit message that my proposed solution here
> was ever considered, otherwise all SPI controller models supporting
> software generation should have been found out seriously broken long
> time ago!


The controllers you are referring to might lack support for commands requiring
dummy clock cycles but I really hope they work with the other commands? If so I
don't think it is fair to call them 'seriously broken' (and else we should
probably let the maintainers know about it). Most likely the lack of support
for the commands is because no request has been made for them. Also there is
one controller that has support.


> 
> The issue you pointed out that we require the total number of dummy
> bits should be multiple of 8 is true, that's why I added the
> unimplemented log message in this series (patch 2/3/4) to warn users
> if this expectation is not met. However this will not cause any issue
> when running U-Boot or Linux, because both spi-nor drivers expect the
> same assumption as we do here.
> 
> See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> there is a logic to calculate the dummy bytes needed for fast read
> command:
> 
>     /* convert the dummy cycles to the number of bytes */
>     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> 
> Note the default dummy cycles configuration for all flashes I have
> looked into as of today, meets the multiple of 8 assumption. On some
> flashes the dummy cycle number is configurable, and if it's been
> configured to be an odd value, it would not work on U-Boot/Linux in
> the first place.
> 
> > >
> > > Things get complicated when interacting with different SPI or QSPI
> > > flash controllers. There are major two cases:
> > >
> > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > >   For such case, driver will calculate the correct number of dummy
> > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > >   fix flashes working with such controllers.
> >
> > Above can be fixed while still keeping the detailed dummy cycle implementation
> > inside m25p80. Perhaps one of the following could be looked into: configurating
> > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > some functionality handling this in the SPI controller. Or a mixture of above.
> 
> Please send patches to explain this in detail how this is going to
> work. I am open to all possible solutions.

In that case I suggest that you instead try with a device property
'model_dummy_bytes' used to select to convert the accurate dummy clock cycle
count to dummy bytes inside m25p80. Below is an example on how to modify the
decode_fast_read_cmd function (the other commands requiring dummy clock cycles
can follow a similar pattern). This way the fifo mode will be able to work the
way you desire while also keeping the current functionality intact. Suddenly
removing functionality (features) will take users by surprise. 


static void decode_fast_read_cmd(Flash *s)
{
    uint8_t dummy_clk_cycles = 0;
    uint8_t extra_bytes;

    s->needed_bytes = get_addr_length(s);

    /* Obtain the number of dummy clock cycles needed */
    switch (get_man(s)) {
    case MAN_WINBOND:
        dummy_clk_cycles += 8;
        break;
    case MAN_NUMONYX:
        dummy_clk_cycles += numonyx_extract_cfg_num_dummies(s);
        break;
    case MAN_MACRONIX:
        if (extract32(s->volatile_cfg, 6, 2) == 1) {
            dummy_clk_cycles += 6;
        } else {
            dummy_clk_cycles += 8;
        }
        break;
    case MAN_SPANSION:
        dummy_clk_cycles += extract32(s->spansion_cr2v,
                                    SPANSION_DUMMY_CLK_POS,
                                    SPANSION_DUMMY_CLK_LEN
                                    );
        break;
    default:
        break;
    }

    if (s->model_dummy_bytes) {
        int lines = 1;

        /*
         * Expect dummy bytes from the controller so convert the dummy
         * clock cycles to dummy_bytes.
         */
        extra_bytes = convert_to_dummy_bytes(dummy_clk_count, lines);
    } else {
        /* Model individual dummy clock cycles as byte writes */
        extra_bytes = dummy_clk_cycles;
    }

    s->needed_bytes += extra_bytes;
    s->pos = 0;
    s->len = 0;
    s->state = STATE_COLLECTING_DATA;
}

Best regards,
Francisco Iglesias

> 
> >
> > > - Dummy bytes not prepared by drivers. Drivers just tell the hardware
> > >   the dummy cycle configuration via some registers, and hardware will
> > >   automatically generate dummy cycles for us. Fixing the m25p80 model
> > >   is not enough, and we will need to fix the SPI/QSPI models for such
> > >   controllers.
> > >
> > > This series fixes the mess in the m25p80 from the flash side first,
> >
> > Considering the problems solved by the solution in tree I find m25p80 pretty
> > clean, at least I don't see any clearly better way for accurately modeling the
> > dummy clock cycles. Counting bits instead of bytes would for example still
> > force the controllers to mark which bits to count (when transmitting one dummy
> > byte from a txfifo on four lines (Quad command) it generates 2 dummy clock
> > cycles since it takes two cycles to transfer 8 bits).
> >
> 
> SPI is a bit based protocol, not bytes. If you insist on bit modeling
> with the dummy cycles then you should also suggest we change all
> cycles (including command/addr/dummy/data phases) to be modeled with
> bits. That way we can accurately emulate everything, for example one
> potential problem like transferring 9 bit in the data phase.
> 
> However modeling everything with bit is super inefficient. My view is
> that we should avoid trying to support uncommon use cases (like not
> multiple of 8 for dummy bits) in QEMU.
> 
> Regards,
> Bin
Bin Meng Jan. 15, 2021, 1:54 p.m. UTC | #6
Hi Havard,

On Fri, Jan 15, 2021 at 11:29 AM Havard Skinnemoen
<hskinnemoen@google.com> wrote:
>
> Hi Bin,
>
> On Thu, Jan 14, 2021 at 6:08 PM Bin Meng <bmeng.cn@gmail.com> wrote:
> >
> > Hi Francisco,
> >
> > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > <frasse.iglesias@gmail.com> wrote:
> > >
> > > Hi Bin,
> > >
> > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > From: Bin Meng <bin.meng@windriver.com>
> > > >
> > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > bytes are expected to be received after it receives a command. For
> > > > example, depending on the address mode, either 3-byte address or
> > > > 4-byte address is needed.
> > > >
> > > > For fast read family commands, some dummy cycles are required after
> > > > sending the address bytes, and the dummy cycles need to be counted
> > > > in s->needed_bytes. This is where the mess began.
> > > >
> > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > It is not in bit, or cycle. However for some reason the model has
> > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > approach is to convert the number of dummy cycles to bytes based on
> > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > >
> > > While not being the original implementor I must assume that above solution was
> > > considered but not chosen by the developers due to it is inaccuracy (it
> > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > also because the detail is already in use for catching exactly above error.
> > >
> >
> > I found no clue from the commit message that my proposed solution here
> > was ever considered, otherwise all SPI controller models supporting
> > software generation should have been found out seriously broken long
> > time ago!
> >
> > The issue you pointed out that we require the total number of dummy
> > bits should be multiple of 8 is true, that's why I added the
> > unimplemented log message in this series (patch 2/3/4) to warn users
> > if this expectation is not met. However this will not cause any issue
> > when running U-Boot or Linux, because both spi-nor drivers expect the
> > same assumption as we do here.
> >
> > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> > there is a logic to calculate the dummy bytes needed for fast read
> > command:
> >
> >     /* convert the dummy cycles to the number of bytes */
> >     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> >
> > Note the default dummy cycles configuration for all flashes I have
> > looked into as of today, meets the multiple of 8 assumption. On some
> > flashes the dummy cycle number is configurable, and if it's been
> > configured to be an odd value, it would not work on U-Boot/Linux in
> > the first place.
> >
> > > >
> > > > Things get complicated when interacting with different SPI or QSPI
> > > > flash controllers. There are major two cases:
> > > >
> > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > > >   For such case, driver will calculate the correct number of dummy
> > > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > > >   fix flashes working with such controllers.
> > >
> > > Above can be fixed while still keeping the detailed dummy cycle implementation
> > > inside m25p80. Perhaps one of the following could be looked into: configurating
> > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > > some functionality handling this in the SPI controller. Or a mixture of above.
> >
> > Please send patches to explain this in detail how this is going to
> > work. I am open to all possible solutions.
> >
> > >
> > > > - Dummy bytes not prepared by drivers. Drivers just tell the hardware
> > > >   the dummy cycle configuration via some registers, and hardware will
> > > >   automatically generate dummy cycles for us. Fixing the m25p80 model
> > > >   is not enough, and we will need to fix the SPI/QSPI models for such
> > > >   controllers.
> > > >
> > > > This series fixes the mess in the m25p80 from the flash side first,
> > >
> > > Considering the problems solved by the solution in tree I find m25p80 pretty
> > > clean, at least I don't see any clearly better way for accurately modeling the
> > > dummy clock cycles. Counting bits instead of bytes would for example still
> > > force the controllers to mark which bits to count (when transmitting one dummy
> > > byte from a txfifo on four lines (Quad command) it generates 2 dummy clock
> > > cycles since it takes two cycles to transfer 8 bits).
> > >
> >
> > SPI is a bit based protocol, not bytes. If you insist on bit modeling
> > with the dummy cycles then you should also suggest we change all
> > cycles (including command/addr/dummy/data phases) to be modeled with
> > bits. That way we can accurately emulate everything, for example one
> > potential problem like transferring 9 bit in the data phase.
>
> I agree with this. There's really nothing special about dummy cycles.
> Making them special makes it super painful to implement SPI controller
> emulation because you have to anticipate when ssi_transfer changes
> semantics from byte-at-a-time to bit-at-a-time. I doubt all the SPI
> controllers in the tree gets it right all the time.
>

Yep, it's not just painful for SPI controllers, and for the case 1 SPI
controller it's impossible to snoop the data to distinguish when the
dummy cycles begin.

> > However modeling everything with bit is super inefficient. My view is
> > that we should avoid trying to support uncommon use cases (like not
> > multiple of 8 for dummy bits) in QEMU.
>
> Perhaps ssi_transfer could take an additional bits parameter? That
> should make it possible to transfer any number of bits up to 32, while
> keeping the common case simple on both sides. And it would work for
> any SPI transfer, not just dummy cycles.

This sounds like a good tradeoff from the emulator perspective. But I
am not sure we should do this to solve the dummy cycle mess given all
the default dummy cycle configurations so far match the multiple of 8
assumption.

Regards,
Bin
Bin Meng Jan. 15, 2021, 2:38 p.m. UTC | #7
Hi Francisco,

On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
<frasse.iglesias@gmail.com> wrote:
>
> Hi Bin,
>
> On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > Hi Francisco,
> >
> > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > <frasse.iglesias@gmail.com> wrote:
> > >
> > > Hi Bin,
> > >
> > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > From: Bin Meng <bin.meng@windriver.com>
> > > >
> > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > bytes are expected to be received after it receives a command. For
> > > > example, depending on the address mode, either 3-byte address or
> > > > 4-byte address is needed.
> > > >
> > > > For fast read family commands, some dummy cycles are required after
> > > > sending the address bytes, and the dummy cycles need to be counted
> > > > in s->needed_bytes. This is where the mess began.
> > > >
> > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > It is not in bit, or cycle. However for some reason the model has
> > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > approach is to convert the number of dummy cycles to bytes based on
> > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > >
> > > While not being the original implementor I must assume that above solution was
> > > considered but not chosen by the developers due to it is inaccuracy (it
> > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > also because the detail is already in use for catching exactly above error.
> > >
> >
> > I found no clue from the commit message that my proposed solution here
> > was ever considered, otherwise all SPI controller models supporting
> > software generation should have been found out seriously broken long
> > time ago!
>
>
> The controllers you are referring to might lack support for commands requiring
> dummy clock cycles but I really hope they work with the other commands? If so I

I am not sure why you view dummy clock cycles as something special
that needs some special support from the SPI controller. For the case
1 controller, it's nothing special from the controller perspective,
just like sending out a command, or address bytes, or data. The
controller just shifts data bit by bit from its tx fifo and that's it.
In the Xilinx GQSPI controller case, the dummy cycles can either be
sent via a regular data (the case 1 controller) in the tx fifo, or
automatically generated (case 2 controller) by the hardware.

> don't think it is fair to call them 'seriously broken' (and else we should
> probably let the maintainers know about it). Most likely the lack of support

I called it "seriously broken" because current implementation only
considered one type of SPI controllers while completely ignoring the
other type.

> for the commands is because no request has been made for them. Also there is
> one controller that has support.

Definitely it's not "no request". Nearly all SPI flashes support the
Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is
"seriously broken" for those case 1 type controllers because they
cannot read anything from the m25p80 model at all. Unless the guest
software being tested only uses Read (03h) command which is not
affected. But I can't find a software that uses Read instead of Fast
Read.

> > The issue you pointed out that we require the total number of dummy
> > bits should be multiple of 8 is true, that's why I added the
> > unimplemented log message in this series (patch 2/3/4) to warn users
> > if this expectation is not met. However this will not cause any issue
> > when running U-Boot or Linux, because both spi-nor drivers expect the
> > same assumption as we do here.
> >
> > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> > there is a logic to calculate the dummy bytes needed for fast read
> > command:
> >
> >     /* convert the dummy cycles to the number of bytes */
> >     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> >
> > Note the default dummy cycles configuration for all flashes I have
> > looked into as of today, meets the multiple of 8 assumption. On some
> > flashes the dummy cycle number is configurable, and if it's been
> > configured to be an odd value, it would not work on U-Boot/Linux in
> > the first place.
> >
> > > >
> > > > Things get complicated when interacting with different SPI or QSPI
> > > > flash controllers. There are major two cases:
> > > >
> > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > > >   For such case, driver will calculate the correct number of dummy
> > > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > > >   fix flashes working with such controllers.
> > >
> > > Above can be fixed while still keeping the detailed dummy cycle implementation
> > > inside m25p80. Perhaps one of the following could be looked into: configurating
> > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > > some functionality handling this in the SPI controller. Or a mixture of above.
> >
> > Please send patches to explain this in detail how this is going to
> > work. I am open to all possible solutions.
>
> In that case I suggest that you instead try with a device property
> 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle
> count to dummy bytes inside m25p80. Below is an example on how to modify the

No this is wrong in my view. This is not like a DMA vs. PIO handling.

> decode_fast_read_cmd function (the other commands requiring dummy clock cycles
> can follow a similar pattern). This way the fifo mode will be able to work the
> way you desire while also keeping the current functionality intact. Suddenly
> removing functionality (features) will take users by surprise.

I don't think we are removing any features. This is a fix to make the
model to be used by any SPI controllers.

As I pointed out, both U-Boot and Linux have the multiple of 8
assumption for the dummy bit, which is the default configuration for
all flashes I have looked into so far. Can you please comment what use
case you want to support? I requested a U-Boot/Linux kernel testing in
the previous SST thread [1] against Xilinx GQSPI but there was no
response.

[1] http://patchwork.ozlabs.org/project/qemu-devel/patch/1606704602-59435-1-git-send-email-bmeng.cn@gmail.com/

>
> static void decode_fast_read_cmd(Flash *s)
> {
>     uint8_t dummy_clk_cycles = 0;
>     uint8_t extra_bytes;
>
>     s->needed_bytes = get_addr_length(s);
>
>     /* Obtain the number of dummy clock cycles needed */
>     switch (get_man(s)) {
>     case MAN_WINBOND:
>         dummy_clk_cycles += 8;
>         break;
>     case MAN_NUMONYX:
>         dummy_clk_cycles += numonyx_extract_cfg_num_dummies(s);
>         break;
>     case MAN_MACRONIX:
>         if (extract32(s->volatile_cfg, 6, 2) == 1) {
>             dummy_clk_cycles += 6;
>         } else {
>             dummy_clk_cycles += 8;
>         }
>         break;
>     case MAN_SPANSION:
>         dummy_clk_cycles += extract32(s->spansion_cr2v,
>                                     SPANSION_DUMMY_CLK_POS,
>                                     SPANSION_DUMMY_CLK_LEN
>                                     );
>         break;
>     default:
>         break;
>     }
>
>     if (s->model_dummy_bytes) {
>         int lines = 1;
>
>         /*
>          * Expect dummy bytes from the controller so convert the dummy
>          * clock cycles to dummy_bytes.
>          */
>         extra_bytes = convert_to_dummy_bytes(dummy_clk_count, lines);
>     } else {
>         /* Model individual dummy clock cycles as byte writes */
>         extra_bytes = dummy_clk_cycles;
>     }
>
>     s->needed_bytes += extra_bytes;
>     s->pos = 0;
>     s->len = 0;
>     s->state = STATE_COLLECTING_DATA;
> }
>
> Best regards,
> Francisco Iglesias
>
> >
> > >
> > > > - Dummy bytes not prepared by drivers. Drivers just tell the hardware
> > > >   the dummy cycle configuration via some registers, and hardware will
> > > >   automatically generate dummy cycles for us. Fixing the m25p80 model
> > > >   is not enough, and we will need to fix the SPI/QSPI models for such
> > > >   controllers.
> > > >
> > > > This series fixes the mess in the m25p80 from the flash side first,
> > >
> > > Considering the problems solved by the solution in tree I find m25p80 pretty
> > > clean, at least I don't see any clearly better way for accurately modeling the
> > > dummy clock cycles. Counting bits instead of bytes would for example still
> > > force the controllers to mark which bits to count (when transmitting one dummy
> > > byte from a txfifo on four lines (Quad command) it generates 2 dummy clock
> > > cycles since it takes two cycles to transfer 8 bits).
> > >
> >
> > SPI is a bit based protocol, not bytes. If you insist on bit modeling
> > with the dummy cycles then you should also suggest we change all
> > cycles (including command/addr/dummy/data phases) to be modeled with
> > bits. That way we can accurately emulate everything, for example one
> > potential problem like transferring 9 bit in the data phase.
> >
> > However modeling everything with bit is super inefficient. My view is
> > that we should avoid trying to support uncommon use cases (like not
> > multiple of 8 for dummy bits) in QEMU.

Regards,
Bin
Francisco Iglesias Jan. 18, 2021, 10:05 a.m. UTC | #8
Hi Bin,

On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> Hi Francisco,
> 
> On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> <frasse.iglesias@gmail.com> wrote:
> >
> > Hi Bin,
> >
> > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > Hi Francisco,
> > >
> > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > <frasse.iglesias@gmail.com> wrote:
> > > >
> > > > Hi Bin,
> > > >
> > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > >
> > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > bytes are expected to be received after it receives a command. For
> > > > > example, depending on the address mode, either 3-byte address or
> > > > > 4-byte address is needed.
> > > > >
> > > > > For fast read family commands, some dummy cycles are required after
> > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > in s->needed_bytes. This is where the mess began.
> > > > >
> > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > >
> > > > While not being the original implementor I must assume that above solution was
> > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > also because the detail is already in use for catching exactly above error.
> > > >
> > >
> > > I found no clue from the commit message that my proposed solution here
> > > was ever considered, otherwise all SPI controller models supporting
> > > software generation should have been found out seriously broken long
> > > time ago!
> >
> >
> > The controllers you are referring to might lack support for commands requiring
> > dummy clock cycles but I really hope they work with the other commands? If so I
> 
> I am not sure why you view dummy clock cycles as something special
> that needs some special support from the SPI controller. For the case
> 1 controller, it's nothing special from the controller perspective,
> just like sending out a command, or address bytes, or data. The
> controller just shifts data bit by bit from its tx fifo and that's it.
> In the Xilinx GQSPI controller case, the dummy cycles can either be
> sent via a regular data (the case 1 controller) in the tx fifo, or
> automatically generated (case 2 controller) by the hardware.

Ok, I'll try to explain my view point a little differently. For that we also
need to keep in mind that QEMU models HW, and any binary that runs on a HW
board supported in QEMU should ideally run on that board inside QEMU aswell
(this can be a bare metal application equaly well as a modified u-boot/Linux
using SPI commands with a non multiple of 8 number of dummy clock cycles).

Once functionality has been introduced into QEMU it is not easy to know which
intentional or untentional features provided by the functionality are being
used by users. One of the (perhaps not well known) features I'm aware of that
is in use and is provided by the accurate dummy clock cycle modeling inside
m25p80 is the be ability to test drivers accurately regarding the dummy clock
cycles (even when using commands with a non-multiple of 8 number of dummy clock
cycles), but there might be others aswell. So by removing this functionality
above use case will brake, this since those test will not be reliable.
Furthermore, since users tend to be creative it is not possible to know if
there are other use cases that will be affected. This means that in case [1]
needs to be followed the safe path is to add functionality instead of removing.
Luckily it also easier in this case, see below.


> 
> > don't think it is fair to call them 'seriously broken' (and else we should
> > probably let the maintainers know about it). Most likely the lack of support
> 
> I called it "seriously broken" because current implementation only
> considered one type of SPI controllers while completely ignoring the
> other type.

If we change view and see this from the perspective of m25p80, it models the
commands a certain way and provides an API that the SPI controllers need to
implement for interacting with it. It is true that there are SPI controllers
referred to above that do not support the portion of that API that corresponds
to commands with dummy clock cycles, but I don't think it is true that this is
broken since there is also one SPI controller that has a working implementation
of m25p80's full API also when transfering through a tx fifo (use case 1). But
as mentioned above, by doing a minor extension and improvement to m25p80's API
and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
will still be honored as in the same time making it possible to have full
support for the API in the SPI controllers that currently do not (please reread
the proposal in my previous reply that attempts to do this). I myself see this
as win/win situation, also because no controller should need modifications.


> 
> > for the commands is because no request has been made for them. Also there is
> > one controller that has support.
> 
> Definitely it's not "no request". Nearly all SPI flashes support the
> Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is
> "seriously broken" for those case 1 type controllers because they
> cannot read anything from the m25p80 model at all. Unless the guest
> software being tested only uses Read (03h) command which is not
> affected. But I can't find a software that uses Read instead of Fast
> Read.
> 
> > > The issue you pointed out that we require the total number of dummy
> > > bits should be multiple of 8 is true, that's why I added the
> > > unimplemented log message in this series (patch 2/3/4) to warn users
> > > if this expectation is not met. However this will not cause any issue
> > > when running U-Boot or Linux, because both spi-nor drivers expect the
> > > same assumption as we do here.
> > >
> > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> > > there is a logic to calculate the dummy bytes needed for fast read
> > > command:
> > >
> > >     /* convert the dummy cycles to the number of bytes */
> > >     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> > >
> > > Note the default dummy cycles configuration for all flashes I have
> > > looked into as of today, meets the multiple of 8 assumption. On some
> > > flashes the dummy cycle number is configurable, and if it's been
> > > configured to be an odd value, it would not work on U-Boot/Linux in
> > > the first place.
> > >
> > > > >
> > > > > Things get complicated when interacting with different SPI or QSPI
> > > > > flash controllers. There are major two cases:
> > > > >
> > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > > > >   For such case, driver will calculate the correct number of dummy
> > > > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > > > >   fix flashes working with such controllers.
> > > >
> > > > Above can be fixed while still keeping the detailed dummy cycle implementation
> > > > inside m25p80. Perhaps one of the following could be looked into: configurating
> > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > > > some functionality handling this in the SPI controller. Or a mixture of above.
> > >
> > > Please send patches to explain this in detail how this is going to
> > > work. I am open to all possible solutions.
> >
> > In that case I suggest that you instead try with a device property
> > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle
> > count to dummy bytes inside m25p80. Below is an example on how to modify the
> 
> No this is wrong in my view. This is not like a DMA vs. PIO handling.
> 
> > decode_fast_read_cmd function (the other commands requiring dummy clock cycles
> > can follow a similar pattern). This way the fifo mode will be able to work the
> > way you desire while also keeping the current functionality intact. Suddenly
> > removing functionality (features) will take users by surprise.
> 
> I don't think we are removing any features. This is a fix to make the
> model to be used by any SPI controllers.
> 
> As I pointed out, both U-Boot and Linux have the multiple of 8
> assumption for the dummy bit, which is the default configuration for
> all flashes I have looked into so far. Can you please comment what use
> case you want to support? I requested a U-Boot/Linux kernel testing in
> the previous SST thread [1] against Xilinx GQSPI but there was no
> response.

In [2] instructions on how to boot u-boot/Linux is found. For building the
various software components I followed the official doc in [3].

Best regards,
Francisco

[1] qemu/docs/system/deprecated.rst
[2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md
[3] https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/460653138/Xilinx+Open+Source+Linux


> 
> [1] http://patchwork.ozlabs.org/project/qemu-devel/patch/1606704602-59435-1-git-send-email-bmeng.cn@gmail.com/
> 
> >
> > static void decode_fast_read_cmd(Flash *s)
> > {
> >     uint8_t dummy_clk_cycles = 0;
> >     uint8_t extra_bytes;
> >
> >     s->needed_bytes = get_addr_length(s);
> >
> >     /* Obtain the number of dummy clock cycles needed */
> >     switch (get_man(s)) {
> >     case MAN_WINBOND:
> >         dummy_clk_cycles += 8;
> >         break;
> >     case MAN_NUMONYX:
> >         dummy_clk_cycles += numonyx_extract_cfg_num_dummies(s);
> >         break;
> >     case MAN_MACRONIX:
> >         if (extract32(s->volatile_cfg, 6, 2) == 1) {
> >             dummy_clk_cycles += 6;
> >         } else {
> >             dummy_clk_cycles += 8;
> >         }
> >         break;
> >     case MAN_SPANSION:
> >         dummy_clk_cycles += extract32(s->spansion_cr2v,
> >                                     SPANSION_DUMMY_CLK_POS,
> >                                     SPANSION_DUMMY_CLK_LEN
> >                                     );
> >         break;
> >     default:
> >         break;
> >     }
> >
> >     if (s->model_dummy_bytes) {
> >         int lines = 1;
> >
> >         /*
> >          * Expect dummy bytes from the controller so convert the dummy
> >          * clock cycles to dummy_bytes.
> >          */
> >         extra_bytes = convert_to_dummy_bytes(dummy_clk_count, lines);
> >     } else {
> >         /* Model individual dummy clock cycles as byte writes */
> >         extra_bytes = dummy_clk_cycles;
> >     }
> >
> >     s->needed_bytes += extra_bytes;
> >     s->pos = 0;
> >     s->len = 0;
> >     s->state = STATE_COLLECTING_DATA;
> > }
> >
> > Best regards,
> > Francisco Iglesias
> >
> > >
> > > >
> > > > > - Dummy bytes not prepared by drivers. Drivers just tell the hardware
> > > > >   the dummy cycle configuration via some registers, and hardware will
> > > > >   automatically generate dummy cycles for us. Fixing the m25p80 model
> > > > >   is not enough, and we will need to fix the SPI/QSPI models for such
> > > > >   controllers.
> > > > >
> > > > > This series fixes the mess in the m25p80 from the flash side first,
> > > >
> > > > Considering the problems solved by the solution in tree I find m25p80 pretty
> > > > clean, at least I don't see any clearly better way for accurately modeling the
> > > > dummy clock cycles. Counting bits instead of bytes would for example still
> > > > force the controllers to mark which bits to count (when transmitting one dummy
> > > > byte from a txfifo on four lines (Quad command) it generates 2 dummy clock
> > > > cycles since it takes two cycles to transfer 8 bits).
> > > >
> > >
> > > SPI is a bit based protocol, not bytes. If you insist on bit modeling
> > > with the dummy cycles then you should also suggest we change all
> > > cycles (including command/addr/dummy/data phases) to be modeled with
> > > bits. That way we can accurately emulate everything, for example one
> > > potential problem like transferring 9 bit in the data phase.
> > >
> > > However modeling everything with bit is super inefficient. My view is
> > > that we should avoid trying to support uncommon use cases (like not
> > > multiple of 8 for dummy bits) in QEMU.
> 
> Regards,
> Bin
Bin Meng Jan. 18, 2021, 12:32 p.m. UTC | #9
Hi Francisco,

On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
<frasse.iglesias@gmail.com> wrote:
>
> Hi Bin,
>
> On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > Hi Francisco,
> >
> > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > <frasse.iglesias@gmail.com> wrote:
> > >
> > > Hi Bin,
> > >
> > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > Hi Francisco,
> > > >
> > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > <frasse.iglesias@gmail.com> wrote:
> > > > >
> > > > > Hi Bin,
> > > > >
> > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > >
> > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > > bytes are expected to be received after it receives a command. For
> > > > > > example, depending on the address mode, either 3-byte address or
> > > > > > 4-byte address is needed.
> > > > > >
> > > > > > For fast read family commands, some dummy cycles are required after
> > > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > > in s->needed_bytes. This is where the mess began.
> > > > > >
> > > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > > >
> > > > > While not being the original implementor I must assume that above solution was
> > > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > > also because the detail is already in use for catching exactly above error.
> > > > >
> > > >
> > > > I found no clue from the commit message that my proposed solution here
> > > > was ever considered, otherwise all SPI controller models supporting
> > > > software generation should have been found out seriously broken long
> > > > time ago!
> > >
> > >
> > > The controllers you are referring to might lack support for commands requiring
> > > dummy clock cycles but I really hope they work with the other commands? If so I
> >
> > I am not sure why you view dummy clock cycles as something special
> > that needs some special support from the SPI controller. For the case
> > 1 controller, it's nothing special from the controller perspective,
> > just like sending out a command, or address bytes, or data. The
> > controller just shifts data bit by bit from its tx fifo and that's it.
> > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > sent via a regular data (the case 1 controller) in the tx fifo, or
> > automatically generated (case 2 controller) by the hardware.
>
> Ok, I'll try to explain my view point a little differently. For that we also
> need to keep in mind that QEMU models HW, and any binary that runs on a HW
> board supported in QEMU should ideally run on that board inside QEMU aswell
> (this can be a bare metal application equaly well as a modified u-boot/Linux
> using SPI commands with a non multiple of 8 number of dummy clock cycles).
>
> Once functionality has been introduced into QEMU it is not easy to know which
> intentional or untentional features provided by the functionality are being
> used by users. One of the (perhaps not well known) features I'm aware of that
> is in use and is provided by the accurate dummy clock cycle modeling inside
> m25p80 is the be ability to test drivers accurately regarding the dummy clock
> cycles (even when using commands with a non-multiple of 8 number of dummy clock
> cycles), but there might be others aswell. So by removing this functionality
> above use case will brake, this since those test will not be reliable.
> Furthermore, since users tend to be creative it is not possible to know if
> there are other use cases that will be affected. This means that in case [1]
> needs to be followed the safe path is to add functionality instead of removing.
> Luckily it also easier in this case, see below.

I understand there might be users other than U-Boot/Linux that use an
odd number of dummy bits (not multiple of 8). If your concern was
about model behavior changes, sure I can update
qemu/docs/system/deprecated.rst to mention that some flashes in the
m25p80 model now implement dummy cycles as bytes.

> >
> > > don't think it is fair to call them 'seriously broken' (and else we should
> > > probably let the maintainers know about it). Most likely the lack of support
> >
> > I called it "seriously broken" because current implementation only
> > considered one type of SPI controllers while completely ignoring the
> > other type.
>
> If we change view and see this from the perspective of m25p80, it models the
> commands a certain way and provides an API that the SPI controllers need to
> implement for interacting with it. It is true that there are SPI controllers
> referred to above that do not support the portion of that API that corresponds
> to commands with dummy clock cycles, but I don't think it is true that this is
> broken since there is also one SPI controller that has a working implementation
> of m25p80's full API also when transfering through a tx fifo (use case 1). But
> as mentioned above, by doing a minor extension and improvement to m25p80's API
> and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> will still be honored as in the same time making it possible to have full
> support for the API in the SPI controllers that currently do not (please reread
> the proposal in my previous reply that attempts to do this). I myself see this
> as win/win situation, also because no controller should need modifications.
>

I am afraid your proposal does not work. Your proposed new device
property 'model_dummy_bytes' to select to convert the accurate dummy
clock cycle count to dummy bytes inside m25p80, is hard to justify as
a property to the flash itself, as the behavior is tightly coupled to
how the SPI controller works.

Please take a look at the Xilinx GQSPI controller, which supports both
use cases, that the dummy cycles can be transferred via tx fifo, or
generated by the controller automatically. Please read the example
given in:

    table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
Command (EBh)

in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf

If you choose to set the m25p80 device property 'model_dummy_bytes' to
true when working with the Xilinx GQSPI controller, you are bound to
only allow guest software to use tx fifo to transfer the dummy cycles,
and this is wrong.

>
> >
> > > for the commands is because no request has been made for them. Also there is
> > > one controller that has support.
> >
> > Definitely it's not "no request". Nearly all SPI flashes support the
> > Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is
> > "seriously broken" for those case 1 type controllers because they
> > cannot read anything from the m25p80 model at all. Unless the guest
> > software being tested only uses Read (03h) command which is not
> > affected. But I can't find a software that uses Read instead of Fast
> > Read.
> >
> > > > The issue you pointed out that we require the total number of dummy
> > > > bits should be multiple of 8 is true, that's why I added the
> > > > unimplemented log message in this series (patch 2/3/4) to warn users
> > > > if this expectation is not met. However this will not cause any issue
> > > > when running U-Boot or Linux, because both spi-nor drivers expect the
> > > > same assumption as we do here.
> > > >
> > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> > > > there is a logic to calculate the dummy bytes needed for fast read
> > > > command:
> > > >
> > > >     /* convert the dummy cycles to the number of bytes */
> > > >     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> > > >
> > > > Note the default dummy cycles configuration for all flashes I have
> > > > looked into as of today, meets the multiple of 8 assumption. On some
> > > > flashes the dummy cycle number is configurable, and if it's been
> > > > configured to be an odd value, it would not work on U-Boot/Linux in
> > > > the first place.
> > > >
> > > > > >
> > > > > > Things get complicated when interacting with different SPI or QSPI
> > > > > > flash controllers. There are major two cases:
> > > > > >
> > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > > > > >   For such case, driver will calculate the correct number of dummy
> > > > > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > > > > >   fix flashes working with such controllers.
> > > > >
> > > > > Above can be fixed while still keeping the detailed dummy cycle implementation
> > > > > inside m25p80. Perhaps one of the following could be looked into: configurating
> > > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > > > > some functionality handling this in the SPI controller. Or a mixture of above.
> > > >
> > > > Please send patches to explain this in detail how this is going to
> > > > work. I am open to all possible solutions.
> > >
> > > In that case I suggest that you instead try with a device property
> > > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle
> > > count to dummy bytes inside m25p80. Below is an example on how to modify the
> >
> > No this is wrong in my view. This is not like a DMA vs. PIO handling.
> >
> > > decode_fast_read_cmd function (the other commands requiring dummy clock cycles
> > > can follow a similar pattern). This way the fifo mode will be able to work the
> > > way you desire while also keeping the current functionality intact. Suddenly
> > > removing functionality (features) will take users by surprise.
> >
> > I don't think we are removing any features. This is a fix to make the
> > model to be used by any SPI controllers.
> >
> > As I pointed out, both U-Boot and Linux have the multiple of 8
> > assumption for the dummy bit, which is the default configuration for
> > all flashes I have looked into so far. Can you please comment what use
> > case you want to support? I requested a U-Boot/Linux kernel testing in
> > the previous SST thread [1] against Xilinx GQSPI but there was no
> > response.
>
> In [2] instructions on how to boot u-boot/Linux is found. For building the
> various software components I followed the official doc in [3].

I see the following QEMU commands are used to test booting U-Boot/Linux:

$ qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G
-serial stdio -display none -device loader,file=u-boot.elf -kernel
bl31.elf -device loader,addr=0x40000000,file=Image -device
loader,addr=0x2000000,file=system.dtb

I am not sure where the system.dtb gets built from?

In [3], it mentions the Xilinx QEMU is used. And a different QEMU
command is used as the example to launch U-Boot which is different
from your command above.

See https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18841606/QEMU+-+Zynq+UltraScale+MPSoC#QEMU-ZynqUltraScale+MPSoC-RunningaZynqUltraScale+U-bootImageOnXilinx'sARMQEMU

$ ./aarch64-softmmu/qemu-system-aarch64 -M arm-generic-fdt -serial
mon:stdio -serial /dev/null -display none \
  -device loader,addr=0xfd1a0104,data=0x8000000e,data-len=4 \ # Un-reset the A53
  -device loader,file=./pre-built/linux/images/bl31.elf,cpu-num=0 \ #
ARM Trusted Firmware
  -device loader,file=./pre-built/linux/images/u-boot.elf\ # The
u-boot exectuable
  -hw-dtb ./pre-built/linux/images/zynqmp-qemu-arm.dtb # HW Device
Tree that QEMU uses to generate the model

It is using a machine called "arm-generic-fdt", but in the mainline
QEMU there is no such machine called "arm-generic-fdt".

>
> Best regards,
> Francisco
>
> [1] qemu/docs/system/deprecated.rst
> [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md
> [3] https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/460653138/Xilinx+Open+Source+Linux
>

Regards,
Bin
Francisco Iglesias Jan. 19, 2021, 1:01 p.m. UTC | #10
Hi Bin,

On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> Hi Francisco,
> 
> On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> <frasse.iglesias@gmail.com> wrote:
> >
> > Hi Bin,
> >
> > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > Hi Francisco,
> > >
> > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > <frasse.iglesias@gmail.com> wrote:
> > > >
> > > > Hi Bin,
> > > >
> > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > Hi Francisco,
> > > > >
> > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > >
> > > > > > Hi Bin,
> > > > > >
> > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > > >
> > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > > > bytes are expected to be received after it receives a command. For
> > > > > > > example, depending on the address mode, either 3-byte address or
> > > > > > > 4-byte address is needed.
> > > > > > >
> > > > > > > For fast read family commands, some dummy cycles are required after
> > > > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > >
> > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > > > >
> > > > > > While not being the original implementor I must assume that above solution was
> > > > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > > > also because the detail is already in use for catching exactly above error.
> > > > > >
> > > > >
> > > > > I found no clue from the commit message that my proposed solution here
> > > > > was ever considered, otherwise all SPI controller models supporting
> > > > > software generation should have been found out seriously broken long
> > > > > time ago!
> > > >
> > > >
> > > > The controllers you are referring to might lack support for commands requiring
> > > > dummy clock cycles but I really hope they work with the other commands? If so I
> > >
> > > I am not sure why you view dummy clock cycles as something special
> > > that needs some special support from the SPI controller. For the case
> > > 1 controller, it's nothing special from the controller perspective,
> > > just like sending out a command, or address bytes, or data. The
> > > controller just shifts data bit by bit from its tx fifo and that's it.
> > > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > automatically generated (case 2 controller) by the hardware.
> >
> > Ok, I'll try to explain my view point a little differently. For that we also
> > need to keep in mind that QEMU models HW, and any binary that runs on a HW
> > board supported in QEMU should ideally run on that board inside QEMU aswell
> > (this can be a bare metal application equaly well as a modified u-boot/Linux
> > using SPI commands with a non multiple of 8 number of dummy clock cycles).
> >
> > Once functionality has been introduced into QEMU it is not easy to know which
> > intentional or untentional features provided by the functionality are being
> > used by users. One of the (perhaps not well known) features I'm aware of that
> > is in use and is provided by the accurate dummy clock cycle modeling inside
> > m25p80 is the be ability to test drivers accurately regarding the dummy clock
> > cycles (even when using commands with a non-multiple of 8 number of dummy clock
> > cycles), but there might be others aswell. So by removing this functionality
> > above use case will brake, this since those test will not be reliable.
> > Furthermore, since users tend to be creative it is not possible to know if
> > there are other use cases that will be affected. This means that in case [1]
> > needs to be followed the safe path is to add functionality instead of removing.
> > Luckily it also easier in this case, see below.
> 
> I understand there might be users other than U-Boot/Linux that use an
> odd number of dummy bits (not multiple of 8). If your concern was
> about model behavior changes, sure I can update
> qemu/docs/system/deprecated.rst to mention that some flashes in the
> m25p80 model now implement dummy cycles as bytes.

Yes, something like that. My concern is that since this functionality has been
in tree for while, users have found known or unknown features that got
introduced by it. By removing the functionality (and the known/uknown features)
we are riscing to brake our user's use cases (currently I'm aware of one
feature/use case but it is not unlikely that there are more). [1] states that
"In general features are intended to be supported indefinitely once introduced
into QEMU", to me that makes very much sense because the opposite would mean
that we were not reliable. So in case [1] needs to be honored it looks to be
safer to add functionality instead of removing (and riscing the removal of use
cases/features). Luckily I still believe in this case that it will be easier to
go forward (even if I also agree on what you are saying below about what I
proposed).


> 
> > >
> > > > don't think it is fair to call them 'seriously broken' (and else we should
> > > > probably let the maintainers know about it). Most likely the lack of support
> > >
> > > I called it "seriously broken" because current implementation only
> > > considered one type of SPI controllers while completely ignoring the
> > > other type.
> >
> > If we change view and see this from the perspective of m25p80, it models the
> > commands a certain way and provides an API that the SPI controllers need to
> > implement for interacting with it. It is true that there are SPI controllers
> > referred to above that do not support the portion of that API that corresponds
> > to commands with dummy clock cycles, but I don't think it is true that this is
> > broken since there is also one SPI controller that has a working implementation
> > of m25p80's full API also when transfering through a tx fifo (use case 1). But
> > as mentioned above, by doing a minor extension and improvement to m25p80's API
> > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> > will still be honored as in the same time making it possible to have full
> > support for the API in the SPI controllers that currently do not (please reread
> > the proposal in my previous reply that attempts to do this). I myself see this
> > as win/win situation, also because no controller should need modifications.
> >
> 
> I am afraid your proposal does not work. Your proposed new device
> property 'model_dummy_bytes' to select to convert the accurate dummy
> clock cycle count to dummy bytes inside m25p80, is hard to justify as
> a property to the flash itself, as the behavior is tightly coupled to
> how the SPI controller works.

I agree on above. I decided though that instead of posting sample code in here
I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
Xilinx ZynqMP GQSPI should not need any modication in a first step.

> 
> Please take a look at the Xilinx GQSPI controller, which supports both
> use cases, that the dummy cycles can be transferred via tx fifo, or
> generated by the controller automatically. Please read the example
> given in:
> 
>     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
> Command (EBh)
> 
> in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> 
> If you choose to set the m25p80 device property 'model_dummy_bytes' to
> true when working with the Xilinx GQSPI controller, you are bound to
> only allow guest software to use tx fifo to transfer the dummy cycles,
> and this is wrong.
> 
> >
> > >
> > > > for the commands is because no request has been made for them. Also there is
> > > > one controller that has support.
> > >
> > > Definitely it's not "no request". Nearly all SPI flashes support the
> > > Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is
> > > "seriously broken" for those case 1 type controllers because they
> > > cannot read anything from the m25p80 model at all. Unless the guest
> > > software being tested only uses Read (03h) command which is not
> > > affected. But I can't find a software that uses Read instead of Fast
> > > Read.
> > >
> > > > > The issue you pointed out that we require the total number of dummy
> > > > > bits should be multiple of 8 is true, that's why I added the
> > > > > unimplemented log message in this series (patch 2/3/4) to warn users
> > > > > if this expectation is not met. However this will not cause any issue
> > > > > when running U-Boot or Linux, because both spi-nor drivers expect the
> > > > > same assumption as we do here.
> > > > >
> > > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> > > > > there is a logic to calculate the dummy bytes needed for fast read
> > > > > command:
> > > > >
> > > > >     /* convert the dummy cycles to the number of bytes */
> > > > >     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> > > > >
> > > > > Note the default dummy cycles configuration for all flashes I have
> > > > > looked into as of today, meets the multiple of 8 assumption. On some
> > > > > flashes the dummy cycle number is configurable, and if it's been
> > > > > configured to be an odd value, it would not work on U-Boot/Linux in
> > > > > the first place.
> > > > >
> > > > > > >
> > > > > > > Things get complicated when interacting with different SPI or QSPI
> > > > > > > flash controllers. There are major two cases:
> > > > > > >
> > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > > > > > >   For such case, driver will calculate the correct number of dummy
> > > > > > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > > > > > >   fix flashes working with such controllers.
> > > > > >
> > > > > > Above can be fixed while still keeping the detailed dummy cycle implementation
> > > > > > inside m25p80. Perhaps one of the following could be looked into: configurating
> > > > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > > > > > some functionality handling this in the SPI controller. Or a mixture of above.
> > > > >
> > > > > Please send patches to explain this in detail how this is going to
> > > > > work. I am open to all possible solutions.
> > > >
> > > > In that case I suggest that you instead try with a device property
> > > > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle
> > > > count to dummy bytes inside m25p80. Below is an example on how to modify the
> > >
> > > No this is wrong in my view. This is not like a DMA vs. PIO handling.
> > >
> > > > decode_fast_read_cmd function (the other commands requiring dummy clock cycles
> > > > can follow a similar pattern). This way the fifo mode will be able to work the
> > > > way you desire while also keeping the current functionality intact. Suddenly
> > > > removing functionality (features) will take users by surprise.
> > >
> > > I don't think we are removing any features. This is a fix to make the
> > > model to be used by any SPI controllers.
> > >
> > > As I pointed out, both U-Boot and Linux have the multiple of 8
> > > assumption for the dummy bit, which is the default configuration for
> > > all flashes I have looked into so far. Can you please comment what use
> > > case you want to support? I requested a U-Boot/Linux kernel testing in
> > > the previous SST thread [1] against Xilinx GQSPI but there was no
> > > response.
> >
> > In [2] instructions on how to boot u-boot/Linux is found. For building the
> > various software components I followed the official doc in [3].
> 
> I see the following QEMU commands are used to test booting U-Boot/Linux:
> 
> $ qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G
> -serial stdio -display none -device loader,file=u-boot.elf -kernel
> bl31.elf -device loader,addr=0x40000000,file=Image -device
> loader,addr=0x2000000,file=system.dtb
> 
> I am not sure where the system.dtb gets built from?

It is the instructions in [2] to look into. 'system.dtb' is the kernel dtb for
zcu102 ([2] has been fixed). I created [2] purely for you, so respectfully I
will ask you to try a little first before asking for further guidance.

Best regards,
Francisco Iglesias

[1] qemu/docs/system/deprecated.rst
[2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md


> 
> In [3], it mentions the Xilinx QEMU is used. And a different QEMU
> command is used as the example to launch U-Boot which is different
> from your command above.
> 
> See https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18841606/QEMU+-+Zynq+UltraScale+MPSoC#QEMU-ZynqUltraScale+MPSoC-RunningaZynqUltraScale+U-bootImageOnXilinx'sARMQEMU
> 
> $ ./aarch64-softmmu/qemu-system-aarch64 -M arm-generic-fdt -serial
> mon:stdio -serial /dev/null -display none \
>   -device loader,addr=0xfd1a0104,data=0x8000000e,data-len=4 \ # Un-reset the A53
>   -device loader,file=./pre-built/linux/images/bl31.elf,cpu-num=0 \ #
> ARM Trusted Firmware
>   -device loader,file=./pre-built/linux/images/u-boot.elf\ # The
> u-boot exectuable
>   -hw-dtb ./pre-built/linux/images/zynqmp-qemu-arm.dtb # HW Device
> Tree that QEMU uses to generate the model
> 
> It is using a machine called "arm-generic-fdt", but in the mainline
> QEMU there is no such machine called "arm-generic-fdt".
> 
> >
> > Best regards,
> > Francisco
> >
> > [1] qemu/docs/system/deprecated.rst
> > [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md
> > [3] https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/460653138/Xilinx+Open+Source+Linux
> >
> 
> Regards,
> Bin
Bin Meng Jan. 20, 2021, 2:20 p.m. UTC | #11
Hi Francisco,

On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
<frasse.iglesias@gmail.com> wrote:
>
> Hi Bin,
>
> On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > Hi Francisco,
> >
> > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > <frasse.iglesias@gmail.com> wrote:
> > >
> > > Hi Bin,
> > >
> > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > Hi Francisco,
> > > >
> > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > <frasse.iglesias@gmail.com> wrote:
> > > > >
> > > > > Hi Bin,
> > > > >
> > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > Hi Francisco,
> > > > > >
> > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi Bin,
> > > > > > >
> > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > > > >
> > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > > > > bytes are expected to be received after it receives a command. For
> > > > > > > > example, depending on the address mode, either 3-byte address or
> > > > > > > > 4-byte address is needed.
> > > > > > > >
> > > > > > > > For fast read family commands, some dummy cycles are required after
> > > > > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > >
> > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > > > > >
> > > > > > > While not being the original implementor I must assume that above solution was
> > > > > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > > > > also because the detail is already in use for catching exactly above error.
> > > > > > >
> > > > > >
> > > > > > I found no clue from the commit message that my proposed solution here
> > > > > > was ever considered, otherwise all SPI controller models supporting
> > > > > > software generation should have been found out seriously broken long
> > > > > > time ago!
> > > > >
> > > > >
> > > > > The controllers you are referring to might lack support for commands requiring
> > > > > dummy clock cycles but I really hope they work with the other commands? If so I
> > > >
> > > > I am not sure why you view dummy clock cycles as something special
> > > > that needs some special support from the SPI controller. For the case
> > > > 1 controller, it's nothing special from the controller perspective,
> > > > just like sending out a command, or address bytes, or data. The
> > > > controller just shifts data bit by bit from its tx fifo and that's it.
> > > > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > > automatically generated (case 2 controller) by the hardware.
> > >
> > > Ok, I'll try to explain my view point a little differently. For that we also
> > > need to keep in mind that QEMU models HW, and any binary that runs on a HW
> > > board supported in QEMU should ideally run on that board inside QEMU aswell
> > > (this can be a bare metal application equaly well as a modified u-boot/Linux
> > > using SPI commands with a non multiple of 8 number of dummy clock cycles).
> > >
> > > Once functionality has been introduced into QEMU it is not easy to know which
> > > intentional or untentional features provided by the functionality are being
> > > used by users. One of the (perhaps not well known) features I'm aware of that
> > > is in use and is provided by the accurate dummy clock cycle modeling inside
> > > m25p80 is the be ability to test drivers accurately regarding the dummy clock
> > > cycles (even when using commands with a non-multiple of 8 number of dummy clock
> > > cycles), but there might be others aswell. So by removing this functionality
> > > above use case will brake, this since those test will not be reliable.
> > > Furthermore, since users tend to be creative it is not possible to know if
> > > there are other use cases that will be affected. This means that in case [1]
> > > needs to be followed the safe path is to add functionality instead of removing.
> > > Luckily it also easier in this case, see below.
> >
> > I understand there might be users other than U-Boot/Linux that use an
> > odd number of dummy bits (not multiple of 8). If your concern was
> > about model behavior changes, sure I can update
> > qemu/docs/system/deprecated.rst to mention that some flashes in the
> > m25p80 model now implement dummy cycles as bytes.
>
> Yes, something like that. My concern is that since this functionality has been
> in tree for while, users have found known or unknown features that got
> introduced by it. By removing the functionality (and the known/uknown features)
> we are riscing to brake our user's use cases (currently I'm aware of one
> feature/use case but it is not unlikely that there are more). [1] states that
> "In general features are intended to be supported indefinitely once introduced
> into QEMU", to me that makes very much sense because the opposite would mean
> that we were not reliable. So in case [1] needs to be honored it looks to be
> safer to add functionality instead of removing (and riscing the removal of use
> cases/features). Luckily I still believe in this case that it will be easier to
> go forward (even if I also agree on what you are saying below about what I
> proposed).
>

Even if the implementation is buggy and we need to keep the buggy
implementation forever? I think that's why
qemu/docs/system/deprecated.rst was created for deprecating such
feature.

> >
> > > >
> > > > > don't think it is fair to call them 'seriously broken' (and else we should
> > > > > probably let the maintainers know about it). Most likely the lack of support
> > > >
> > > > I called it "seriously broken" because current implementation only
> > > > considered one type of SPI controllers while completely ignoring the
> > > > other type.
> > >
> > > If we change view and see this from the perspective of m25p80, it models the
> > > commands a certain way and provides an API that the SPI controllers need to
> > > implement for interacting with it. It is true that there are SPI controllers
> > > referred to above that do not support the portion of that API that corresponds
> > > to commands with dummy clock cycles, but I don't think it is true that this is
> > > broken since there is also one SPI controller that has a working implementation
> > > of m25p80's full API also when transfering through a tx fifo (use case 1). But
> > > as mentioned above, by doing a minor extension and improvement to m25p80's API
> > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> > > will still be honored as in the same time making it possible to have full
> > > support for the API in the SPI controllers that currently do not (please reread
> > > the proposal in my previous reply that attempts to do this). I myself see this
> > > as win/win situation, also because no controller should need modifications.
> > >
> >
> > I am afraid your proposal does not work. Your proposed new device
> > property 'model_dummy_bytes' to select to convert the accurate dummy
> > clock cycle count to dummy bytes inside m25p80, is hard to justify as
> > a property to the flash itself, as the behavior is tightly coupled to
> > how the SPI controller works.
>
> I agree on above. I decided though that instead of posting sample code in here
> I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
> Xilinx ZynqMP GQSPI should not need any modication in a first step.
>

Wait, (see below)

> >
> > Please take a look at the Xilinx GQSPI controller, which supports both
> > use cases, that the dummy cycles can be transferred via tx fifo, or
> > generated by the controller automatically. Please read the example
> > given in:
> >
> >     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
> > Command (EBh)
> >
> > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> >
> > If you choose to set the m25p80 device property 'model_dummy_bytes' to
> > true when working with the Xilinx GQSPI controller, you are bound to
> > only allow guest software to use tx fifo to transfer the dummy cycles,
> > and this is wrong.
> >

You missed this part. I looked at your RFC, and as I mentioned above
your proposal cannot support the complicated controller like Xilinx
GQSPI. Please read the example of table 24-22. With your RFC, you
mandate guest software's GQSPI driver to only use hardware dummy cycle
generation, which is wrong.

> > >
> > > >
> > > > > for the commands is because no request has been made for them. Also there is
> > > > > one controller that has support.
> > > >
> > > > Definitely it's not "no request". Nearly all SPI flashes support the
> > > > Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is
> > > > "seriously broken" for those case 1 type controllers because they
> > > > cannot read anything from the m25p80 model at all. Unless the guest
> > > > software being tested only uses Read (03h) command which is not
> > > > affected. But I can't find a software that uses Read instead of Fast
> > > > Read.
> > > >
> > > > > > The issue you pointed out that we require the total number of dummy
> > > > > > bits should be multiple of 8 is true, that's why I added the
> > > > > > unimplemented log message in this series (patch 2/3/4) to warn users
> > > > > > if this expectation is not met. However this will not cause any issue
> > > > > > when running U-Boot or Linux, because both spi-nor drivers expect the
> > > > > > same assumption as we do here.
> > > > > >
> > > > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> > > > > > there is a logic to calculate the dummy bytes needed for fast read
> > > > > > command:
> > > > > >
> > > > > >     /* convert the dummy cycles to the number of bytes */
> > > > > >     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> > > > > >
> > > > > > Note the default dummy cycles configuration for all flashes I have
> > > > > > looked into as of today, meets the multiple of 8 assumption. On some
> > > > > > flashes the dummy cycle number is configurable, and if it's been
> > > > > > configured to be an odd value, it would not work on U-Boot/Linux in
> > > > > > the first place.
> > > > > >
> > > > > > > >
> > > > > > > > Things get complicated when interacting with different SPI or QSPI
> > > > > > > > flash controllers. There are major two cases:
> > > > > > > >
> > > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > > > > > > >   For such case, driver will calculate the correct number of dummy
> > > > > > > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > > > > > > >   fix flashes working with such controllers.
> > > > > > >
> > > > > > > Above can be fixed while still keeping the detailed dummy cycle implementation
> > > > > > > inside m25p80. Perhaps one of the following could be looked into: configurating
> > > > > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > > > > > > some functionality handling this in the SPI controller. Or a mixture of above.
> > > > > >
> > > > > > Please send patches to explain this in detail how this is going to
> > > > > > work. I am open to all possible solutions.
> > > > >
> > > > > In that case I suggest that you instead try with a device property
> > > > > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle
> > > > > count to dummy bytes inside m25p80. Below is an example on how to modify the
> > > >
> > > > No this is wrong in my view. This is not like a DMA vs. PIO handling.
> > > >
> > > > > decode_fast_read_cmd function (the other commands requiring dummy clock cycles
> > > > > can follow a similar pattern). This way the fifo mode will be able to work the
> > > > > way you desire while also keeping the current functionality intact. Suddenly
> > > > > removing functionality (features) will take users by surprise.
> > > >
> > > > I don't think we are removing any features. This is a fix to make the
> > > > model to be used by any SPI controllers.
> > > >
> > > > As I pointed out, both U-Boot and Linux have the multiple of 8
> > > > assumption for the dummy bit, which is the default configuration for
> > > > all flashes I have looked into so far. Can you please comment what use
> > > > case you want to support? I requested a U-Boot/Linux kernel testing in
> > > > the previous SST thread [1] against Xilinx GQSPI but there was no
> > > > response.
> > >
> > > In [2] instructions on how to boot u-boot/Linux is found. For building the
> > > various software components I followed the official doc in [3].
> >
> > I see the following QEMU commands are used to test booting U-Boot/Linux:
> >
> > $ qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G
> > -serial stdio -display none -device loader,file=u-boot.elf -kernel
> > bl31.elf -device loader,addr=0x40000000,file=Image -device
> > loader,addr=0x2000000,file=system.dtb
> >
> > I am not sure where the system.dtb gets built from?
>
> It is the instructions in [2] to look into. 'system.dtb' is the kernel dtb for
> zcu102 ([2] has been fixed). I created [2] purely for you, so respectfully I
> will ask you to try a little first before asking for further guidance.
>

I tried, but no success. I removed the "-device loader" part for
loading kernel image and the device tree, and only focused on booting
U-Boot.

The ATF bl31.elf was built from
https://github.com/ARM-software/arm-trusted-firmware, by following
build instructions at
https://trustedfirmware-a.readthedocs.io/en/latest/plat/xilinx-zynqmp.html.
U-Boot was built from the upstream U-Boot.

$ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m
4G -serial stdio -display none -device loader,file=u-boot.elf -kernel
bl31.elf
ERROR:   Incorrect XILINX IDCODE 0x0, maskid 0x4600093
NOTICE:  ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000
NOTICE:  BL31: v2.4(release):v2.4-228-g337e493
NOTICE:  BL31: Built : 21:18:14, Jan 20 2021
ERROR:   BL31: Platform Management API version error. Expected: v1.1 -
Found: v0.0
ERROR:   Error initializing runtime service sip_svc

I also tried the Xilinx fork of ATF from
https://github.com/Xilinx/arm-trusted-firmware, by following build
instructions at
https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842305/Build+ARM+Trusted+Firmware+ATF

$ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m
4G -serial stdio -display none -device loader,file=u-boot.elf -kernel
bl31.elf
ERROR:   Incorrect XILINX IDCODE 0x0, maskid 0x4600093
NOTICE:  ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000
NOTICE:  BL31: v2.2(release):xilinx-v2020.2
NOTICE:  BL31: Built : 21:52:38, Jan 20 2021
ERROR:   BL31: Platform Management API version error. Expected: v1.1 -
Found: v0.0
ERROR:   Error initializing runtime service sip_svc

Then I tried to build a U-Boot from the Xilinx fork at
https://github.com/Xilinx/u-boot-xlnx/, still no success.

> Best regards,
> Francisco Iglesias
>
> [1] qemu/docs/system/deprecated.rst
> [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md
>
>

Regards,
Bin
Francisco Iglesias Jan. 21, 2021, 8:50 a.m. UTC | #12
Dear Bin,

On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> Hi Francisco,
> 
> On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> <frasse.iglesias@gmail.com> wrote:
> >
> > Hi Bin,
> >
> > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > Hi Francisco,
> > >
> > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > > <frasse.iglesias@gmail.com> wrote:
> > > >
> > > > Hi Bin,
> > > >
> > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > Hi Francisco,
> > > > >
> > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > >
> > > > > > Hi Bin,
> > > > > >
> > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > Hi Francisco,
> > > > > > >
> > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Hi Bin,
> > > > > > > >
> > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > > > > >
> > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > > > > > bytes are expected to be received after it receives a command. For
> > > > > > > > > example, depending on the address mode, either 3-byte address or
> > > > > > > > > 4-byte address is needed.
> > > > > > > > >
> > > > > > > > > For fast read family commands, some dummy cycles are required after
> > > > > > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > >
> > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > > > > > >
> > > > > > > > While not being the original implementor I must assume that above solution was
> > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > > > > > also because the detail is already in use for catching exactly above error.
> > > > > > > >
> > > > > > >
> > > > > > > I found no clue from the commit message that my proposed solution here
> > > > > > > was ever considered, otherwise all SPI controller models supporting
> > > > > > > software generation should have been found out seriously broken long
> > > > > > > time ago!
> > > > > >
> > > > > >
> > > > > > The controllers you are referring to might lack support for commands requiring
> > > > > > dummy clock cycles but I really hope they work with the other commands? If so I
> > > > >
> > > > > I am not sure why you view dummy clock cycles as something special
> > > > > that needs some special support from the SPI controller. For the case
> > > > > 1 controller, it's nothing special from the controller perspective,
> > > > > just like sending out a command, or address bytes, or data. The
> > > > > controller just shifts data bit by bit from its tx fifo and that's it.
> > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > > > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > > > automatically generated (case 2 controller) by the hardware.
> > > >
> > > > Ok, I'll try to explain my view point a little differently. For that we also
> > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW
> > > > board supported in QEMU should ideally run on that board inside QEMU aswell
> > > > (this can be a bare metal application equaly well as a modified u-boot/Linux
> > > > using SPI commands with a non multiple of 8 number of dummy clock cycles).
> > > >
> > > > Once functionality has been introduced into QEMU it is not easy to know which
> > > > intentional or untentional features provided by the functionality are being
> > > > used by users. One of the (perhaps not well known) features I'm aware of that
> > > > is in use and is provided by the accurate dummy clock cycle modeling inside
> > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock
> > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock
> > > > cycles), but there might be others aswell. So by removing this functionality
> > > > above use case will brake, this since those test will not be reliable.
> > > > Furthermore, since users tend to be creative it is not possible to know if
> > > > there are other use cases that will be affected. This means that in case [1]
> > > > needs to be followed the safe path is to add functionality instead of removing.
> > > > Luckily it also easier in this case, see below.
> > >
> > > I understand there might be users other than U-Boot/Linux that use an
> > > odd number of dummy bits (not multiple of 8). If your concern was
> > > about model behavior changes, sure I can update
> > > qemu/docs/system/deprecated.rst to mention that some flashes in the
> > > m25p80 model now implement dummy cycles as bytes.
> >
> > Yes, something like that. My concern is that since this functionality has been
> > in tree for while, users have found known or unknown features that got
> > introduced by it. By removing the functionality (and the known/uknown features)
> > we are riscing to brake our user's use cases (currently I'm aware of one
> > feature/use case but it is not unlikely that there are more). [1] states that
> > "In general features are intended to be supported indefinitely once introduced
> > into QEMU", to me that makes very much sense because the opposite would mean
> > that we were not reliable. So in case [1] needs to be honored it looks to be
> > safer to add functionality instead of removing (and riscing the removal of use
> > cases/features). Luckily I still believe in this case that it will be easier to
> > go forward (even if I also agree on what you are saying below about what I
> > proposed).
> >
> 
> Even if the implementation is buggy and we need to keep the buggy
> implementation forever? I think that's why
> qemu/docs/system/deprecated.rst was created for deprecating such
> feature.

With the RFC I posted all commands in m25p80 are working for both the case 1
controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI).
Because of this, I, with all respect, will have to disagree that this is buggy.

> 
> > >
> > > > >
> > > > > > don't think it is fair to call them 'seriously broken' (and else we should
> > > > > > probably let the maintainers know about it). Most likely the lack of support
> > > > >
> > > > > I called it "seriously broken" because current implementation only
> > > > > considered one type of SPI controllers while completely ignoring the
> > > > > other type.
> > > >
> > > > If we change view and see this from the perspective of m25p80, it models the
> > > > commands a certain way and provides an API that the SPI controllers need to
> > > > implement for interacting with it. It is true that there are SPI controllers
> > > > referred to above that do not support the portion of that API that corresponds
> > > > to commands with dummy clock cycles, but I don't think it is true that this is
> > > > broken since there is also one SPI controller that has a working implementation
> > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But
> > > > as mentioned above, by doing a minor extension and improvement to m25p80's API
> > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> > > > will still be honored as in the same time making it possible to have full
> > > > support for the API in the SPI controllers that currently do not (please reread
> > > > the proposal in my previous reply that attempts to do this). I myself see this
> > > > as win/win situation, also because no controller should need modifications.
> > > >
> > >
> > > I am afraid your proposal does not work. Your proposed new device
> > > property 'model_dummy_bytes' to select to convert the accurate dummy
> > > clock cycle count to dummy bytes inside m25p80, is hard to justify as
> > > a property to the flash itself, as the behavior is tightly coupled to
> > > how the SPI controller works.
> >
> > I agree on above. I decided though that instead of posting sample code in here
> > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
> > Xilinx ZynqMP GQSPI should not need any modication in a first step.
> >
> 
> Wait, (see below)
> 
> > >
> > > Please take a look at the Xilinx GQSPI controller, which supports both
> > > use cases, that the dummy cycles can be transferred via tx fifo, or
> > > generated by the controller automatically. Please read the example
> > > given in:
> > >
> > >     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
> > > Command (EBh)
> > >
> > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > >
> > > If you choose to set the m25p80 device property 'model_dummy_bytes' to
> > > true when working with the Xilinx GQSPI controller, you are bound to
> > > only allow guest software to use tx fifo to transfer the dummy cycles,
> > > and this is wrong.
> > >
> 
> You missed this part. I looked at your RFC, and as I mentioned above
> your proposal cannot support the complicated controller like Xilinx
> GQSPI. Please read the example of table 24-22. With your RFC, you
> mandate guest software's GQSPI driver to only use hardware dummy cycle
> generation, which is wrong.
> 

First, thank you very much for looking into the RFC series, very much
appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2
locations in the file, in 1 location the transfer referred to above is done, in
another location the transfer through the txfifo is done. The location where
transfer referred to above is done will not need any modifications (and will
thus work equally well as it does currently).

Now that above has is cleared out, and since I know you are heavily loaded with
other higher prio tasks, lets wait for the maintainers to also have a look into
the RFC (understandibly this can take some time due to that they also are
heavily loaded).

Best regards,
Francisco Iglesias


> > > >
> > > > >
> > > > > > for the commands is because no request has been made for them. Also there is
> > > > > > one controller that has support.
> > > > >
> > > > > Definitely it's not "no request". Nearly all SPI flashes support the
> > > > > Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is
> > > > > "seriously broken" for those case 1 type controllers because they
> > > > > cannot read anything from the m25p80 model at all. Unless the guest
> > > > > software being tested only uses Read (03h) command which is not
> > > > > affected. But I can't find a software that uses Read instead of Fast
> > > > > Read.
> > > > >
> > > > > > > The issue you pointed out that we require the total number of dummy
> > > > > > > bits should be multiple of 8 is true, that's why I added the
> > > > > > > unimplemented log message in this series (patch 2/3/4) to warn users
> > > > > > > if this expectation is not met. However this will not cause any issue
> > > > > > > when running U-Boot or Linux, because both spi-nor drivers expect the
> > > > > > > same assumption as we do here.
> > > > > > >
> > > > > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> > > > > > > there is a logic to calculate the dummy bytes needed for fast read
> > > > > > > command:
> > > > > > >
> > > > > > >     /* convert the dummy cycles to the number of bytes */
> > > > > > >     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> > > > > > >
> > > > > > > Note the default dummy cycles configuration for all flashes I have
> > > > > > > looked into as of today, meets the multiple of 8 assumption. On some
> > > > > > > flashes the dummy cycle number is configurable, and if it's been
> > > > > > > configured to be an odd value, it would not work on U-Boot/Linux in
> > > > > > > the first place.
> > > > > > >
> > > > > > > > >
> > > > > > > > > Things get complicated when interacting with different SPI or QSPI
> > > > > > > > > flash controllers. There are major two cases:
> > > > > > > > >
> > > > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > > > > > > > >   For such case, driver will calculate the correct number of dummy
> > > > > > > > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > > > > > > > >   fix flashes working with such controllers.
> > > > > > > >
> > > > > > > > Above can be fixed while still keeping the detailed dummy cycle implementation
> > > > > > > > inside m25p80. Perhaps one of the following could be looked into: configurating
> > > > > > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > > > > > > > some functionality handling this in the SPI controller. Or a mixture of above.
> > > > > > >
> > > > > > > Please send patches to explain this in detail how this is going to
> > > > > > > work. I am open to all possible solutions.
> > > > > >
> > > > > > In that case I suggest that you instead try with a device property
> > > > > > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle
> > > > > > count to dummy bytes inside m25p80. Below is an example on how to modify the
> > > > >
> > > > > No this is wrong in my view. This is not like a DMA vs. PIO handling.
> > > > >
> > > > > > decode_fast_read_cmd function (the other commands requiring dummy clock cycles
> > > > > > can follow a similar pattern). This way the fifo mode will be able to work the
> > > > > > way you desire while also keeping the current functionality intact. Suddenly
> > > > > > removing functionality (features) will take users by surprise.
> > > > >
> > > > > I don't think we are removing any features. This is a fix to make the
> > > > > model to be used by any SPI controllers.
> > > > >
> > > > > As I pointed out, both U-Boot and Linux have the multiple of 8
> > > > > assumption for the dummy bit, which is the default configuration for
> > > > > all flashes I have looked into so far. Can you please comment what use
> > > > > case you want to support? I requested a U-Boot/Linux kernel testing in
> > > > > the previous SST thread [1] against Xilinx GQSPI but there was no
> > > > > response.
> > > >
> > > > In [2] instructions on how to boot u-boot/Linux is found. For building the
> > > > various software components I followed the official doc in [3].
> > >
> > > I see the following QEMU commands are used to test booting U-Boot/Linux:
> > >
> > > $ qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G
> > > -serial stdio -display none -device loader,file=u-boot.elf -kernel
> > > bl31.elf -device loader,addr=0x40000000,file=Image -device
> > > loader,addr=0x2000000,file=system.dtb
> > >
> > > I am not sure where the system.dtb gets built from?
> >
> > It is the instructions in [2] to look into. 'system.dtb' is the kernel dtb for
> > zcu102 ([2] has been fixed). I created [2] purely for you, so respectfully I
> > will ask you to try a little first before asking for further guidance.
> >
> 
> I tried, but no success. I removed the "-device loader" part for
> loading kernel image and the device tree, and only focused on booting
> U-Boot.
> 
> The ATF bl31.elf was built from
> https://github.com/ARM-software/arm-trusted-firmware, by following
> build instructions at
> https://trustedfirmware-a.readthedocs.io/en/latest/plat/xilinx-zynqmp.html.
> U-Boot was built from the upstream U-Boot.
> 
> $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m
> 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel
> bl31.elf
> ERROR:   Incorrect XILINX IDCODE 0x0, maskid 0x4600093
> NOTICE:  ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000
> NOTICE:  BL31: v2.4(release):v2.4-228-g337e493
> NOTICE:  BL31: Built : 21:18:14, Jan 20 2021
> ERROR:   BL31: Platform Management API version error. Expected: v1.1 -
> Found: v0.0
> ERROR:   Error initializing runtime service sip_svc
> 
> I also tried the Xilinx fork of ATF from
> https://github.com/Xilinx/arm-trusted-firmware, by following build
> instructions at
> https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842305/Build+ARM+Trusted+Firmware+ATF
> 
> $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m
> 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel
> bl31.elf
> ERROR:   Incorrect XILINX IDCODE 0x0, maskid 0x4600093
> NOTICE:  ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000
> NOTICE:  BL31: v2.2(release):xilinx-v2020.2
> NOTICE:  BL31: Built : 21:52:38, Jan 20 2021
> ERROR:   BL31: Platform Management API version error. Expected: v1.1 -
> Found: v0.0
> ERROR:   Error initializing runtime service sip_svc
> 
> Then I tried to build a U-Boot from the Xilinx fork at
> https://github.com/Xilinx/u-boot-xlnx/, still no success.
> 
> > Best regards,
> > Francisco Iglesias
> >
> > [1] qemu/docs/system/deprecated.rst
> > [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md
> >
> >
> 
> Regards,
> Bin
Bin Meng Jan. 21, 2021, 8:59 a.m. UTC | #13
Hi Francisco,

On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
<frasse.iglesias@gmail.com> wrote:
>
> Dear Bin,
>
> On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> > Hi Francisco,
> >
> > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> > <frasse.iglesias@gmail.com> wrote:
> > >
> > > Hi Bin,
> > >
> > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > > Hi Francisco,
> > > >
> > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > > > <frasse.iglesias@gmail.com> wrote:
> > > > >
> > > > > Hi Bin,
> > > > >
> > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > > Hi Francisco,
> > > > > >
> > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi Bin,
> > > > > > >
> > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > > Hi Francisco,
> > > > > > > >
> > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Bin,
> > > > > > > > >
> > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > > > > > >
> > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > > > > > > bytes are expected to be received after it receives a command. For
> > > > > > > > > > example, depending on the address mode, either 3-byte address or
> > > > > > > > > > 4-byte address is needed.
> > > > > > > > > >
> > > > > > > > > > For fast read family commands, some dummy cycles are required after
> > > > > > > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > > >
> > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > > > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > > > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > > > > > > >
> > > > > > > > > While not being the original implementor I must assume that above solution was
> > > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > > > > > > also because the detail is already in use for catching exactly above error.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I found no clue from the commit message that my proposed solution here
> > > > > > > > was ever considered, otherwise all SPI controller models supporting
> > > > > > > > software generation should have been found out seriously broken long
> > > > > > > > time ago!
> > > > > > >
> > > > > > >
> > > > > > > The controllers you are referring to might lack support for commands requiring
> > > > > > > dummy clock cycles but I really hope they work with the other commands? If so I
> > > > > >
> > > > > > I am not sure why you view dummy clock cycles as something special
> > > > > > that needs some special support from the SPI controller. For the case
> > > > > > 1 controller, it's nothing special from the controller perspective,
> > > > > > just like sending out a command, or address bytes, or data. The
> > > > > > controller just shifts data bit by bit from its tx fifo and that's it.
> > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > > > > automatically generated (case 2 controller) by the hardware.
> > > > >
> > > > > Ok, I'll try to explain my view point a little differently. For that we also
> > > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW
> > > > > board supported in QEMU should ideally run on that board inside QEMU aswell
> > > > > (this can be a bare metal application equaly well as a modified u-boot/Linux
> > > > > using SPI commands with a non multiple of 8 number of dummy clock cycles).
> > > > >
> > > > > Once functionality has been introduced into QEMU it is not easy to know which
> > > > > intentional or untentional features provided by the functionality are being
> > > > > used by users. One of the (perhaps not well known) features I'm aware of that
> > > > > is in use and is provided by the accurate dummy clock cycle modeling inside
> > > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock
> > > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock
> > > > > cycles), but there might be others aswell. So by removing this functionality
> > > > > above use case will brake, this since those test will not be reliable.
> > > > > Furthermore, since users tend to be creative it is not possible to know if
> > > > > there are other use cases that will be affected. This means that in case [1]
> > > > > needs to be followed the safe path is to add functionality instead of removing.
> > > > > Luckily it also easier in this case, see below.
> > > >
> > > > I understand there might be users other than U-Boot/Linux that use an
> > > > odd number of dummy bits (not multiple of 8). If your concern was
> > > > about model behavior changes, sure I can update
> > > > qemu/docs/system/deprecated.rst to mention that some flashes in the
> > > > m25p80 model now implement dummy cycles as bytes.
> > >
> > > Yes, something like that. My concern is that since this functionality has been
> > > in tree for while, users have found known or unknown features that got
> > > introduced by it. By removing the functionality (and the known/uknown features)
> > > we are riscing to brake our user's use cases (currently I'm aware of one
> > > feature/use case but it is not unlikely that there are more). [1] states that
> > > "In general features are intended to be supported indefinitely once introduced
> > > into QEMU", to me that makes very much sense because the opposite would mean
> > > that we were not reliable. So in case [1] needs to be honored it looks to be
> > > safer to add functionality instead of removing (and riscing the removal of use
> > > cases/features). Luckily I still believe in this case that it will be easier to
> > > go forward (even if I also agree on what you are saying below about what I
> > > proposed).
> > >
> >
> > Even if the implementation is buggy and we need to keep the buggy
> > implementation forever? I think that's why
> > qemu/docs/system/deprecated.rst was created for deprecating such
> > feature.
>
> With the RFC I posted all commands in m25p80 are working for both the case 1
> controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI).
> Because of this, I, with all respect, will have to disagree that this is buggy.

Well, the existing m25p80 implementation that uses dummy cycle
accuracy for those flashes prevents all SPI controllers that use tx
fifo to work with those flashes. Hence it is buggy.

>
> >
> > > >
> > > > > >
> > > > > > > don't think it is fair to call them 'seriously broken' (and else we should
> > > > > > > probably let the maintainers know about it). Most likely the lack of support
> > > > > >
> > > > > > I called it "seriously broken" because current implementation only
> > > > > > considered one type of SPI controllers while completely ignoring the
> > > > > > other type.
> > > > >
> > > > > If we change view and see this from the perspective of m25p80, it models the
> > > > > commands a certain way and provides an API that the SPI controllers need to
> > > > > implement for interacting with it. It is true that there are SPI controllers
> > > > > referred to above that do not support the portion of that API that corresponds
> > > > > to commands with dummy clock cycles, but I don't think it is true that this is
> > > > > broken since there is also one SPI controller that has a working implementation
> > > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But
> > > > > as mentioned above, by doing a minor extension and improvement to m25p80's API
> > > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> > > > > will still be honored as in the same time making it possible to have full
> > > > > support for the API in the SPI controllers that currently do not (please reread
> > > > > the proposal in my previous reply that attempts to do this). I myself see this
> > > > > as win/win situation, also because no controller should need modifications.
> > > > >
> > > >
> > > > I am afraid your proposal does not work. Your proposed new device
> > > > property 'model_dummy_bytes' to select to convert the accurate dummy
> > > > clock cycle count to dummy bytes inside m25p80, is hard to justify as
> > > > a property to the flash itself, as the behavior is tightly coupled to
> > > > how the SPI controller works.
> > >
> > > I agree on above. I decided though that instead of posting sample code in here
> > > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
> > > Xilinx ZynqMP GQSPI should not need any modication in a first step.
> > >
> >
> > Wait, (see below)
> >
> > > >
> > > > Please take a look at the Xilinx GQSPI controller, which supports both
> > > > use cases, that the dummy cycles can be transferred via tx fifo, or
> > > > generated by the controller automatically. Please read the example
> > > > given in:
> > > >
> > > >     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
> > > > Command (EBh)
> > > >
> > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > > >
> > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to
> > > > true when working with the Xilinx GQSPI controller, you are bound to
> > > > only allow guest software to use tx fifo to transfer the dummy cycles,
> > > > and this is wrong.
> > > >
> >
> > You missed this part. I looked at your RFC, and as I mentioned above
> > your proposal cannot support the complicated controller like Xilinx
> > GQSPI. Please read the example of table 24-22. With your RFC, you
> > mandate guest software's GQSPI driver to only use hardware dummy cycle
> > generation, which is wrong.
> >
>
> First, thank you very much for looking into the RFC series, very much
> appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2
> locations in the file, in 1 location the transfer referred to above is done, in
> another location the transfer through the txfifo is done. The location where
> transfer referred to above is done will not need any modifications (and will
> thus work equally well as it does currently).

Please explain this a little bit. How does your RFC series handle
cases as described in table 24-22, where the 6 dummy cycles are split
into 2 transfers, with one transfer using tx fifo, and the other one
using hardware dummy cycle generation?

>
> Now that above has is cleared out, and since I know you are heavily loaded with
> other higher prio tasks, lets wait for the maintainers to also have a look into
> the RFC (understandibly this can take some time due to that they also are
> heavily loaded).

Yes, maintainers are pretty much silent on this topic.

However may I ask you to provide more details on my questions below on
booting U-Boot/Linux with the QEMU?

You can post patches to add documentation for zynqmp in
docs/system/arm, or once I get a working instructions, I could do that
too. Much appreciated.

>
> Best regards,
> Francisco Iglesias
>
>
> > > > >
> > > > > >
> > > > > > > for the commands is because no request has been made for them. Also there is
> > > > > > > one controller that has support.
> > > > > >
> > > > > > Definitely it's not "no request". Nearly all SPI flashes support the
> > > > > > Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is
> > > > > > "seriously broken" for those case 1 type controllers because they
> > > > > > cannot read anything from the m25p80 model at all. Unless the guest
> > > > > > software being tested only uses Read (03h) command which is not
> > > > > > affected. But I can't find a software that uses Read instead of Fast
> > > > > > Read.
> > > > > >
> > > > > > > > The issue you pointed out that we require the total number of dummy
> > > > > > > > bits should be multiple of 8 is true, that's why I added the
> > > > > > > > unimplemented log message in this series (patch 2/3/4) to warn users
> > > > > > > > if this expectation is not met. However this will not cause any issue
> > > > > > > > when running U-Boot or Linux, because both spi-nor drivers expect the
> > > > > > > > same assumption as we do here.
> > > > > > > >
> > > > > > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> > > > > > > > there is a logic to calculate the dummy bytes needed for fast read
> > > > > > > > command:
> > > > > > > >
> > > > > > > >     /* convert the dummy cycles to the number of bytes */
> > > > > > > >     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> > > > > > > >
> > > > > > > > Note the default dummy cycles configuration for all flashes I have
> > > > > > > > looked into as of today, meets the multiple of 8 assumption. On some
> > > > > > > > flashes the dummy cycle number is configurable, and if it's been
> > > > > > > > configured to be an odd value, it would not work on U-Boot/Linux in
> > > > > > > > the first place.
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Things get complicated when interacting with different SPI or QSPI
> > > > > > > > > > flash controllers. There are major two cases:
> > > > > > > > > >
> > > > > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > > > > > > > > >   For such case, driver will calculate the correct number of dummy
> > > > > > > > > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > > > > > > > > >   fix flashes working with such controllers.
> > > > > > > > >
> > > > > > > > > Above can be fixed while still keeping the detailed dummy cycle implementation
> > > > > > > > > inside m25p80. Perhaps one of the following could be looked into: configurating
> > > > > > > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > > > > > > > > some functionality handling this in the SPI controller. Or a mixture of above.
> > > > > > > >
> > > > > > > > Please send patches to explain this in detail how this is going to
> > > > > > > > work. I am open to all possible solutions.
> > > > > > >
> > > > > > > In that case I suggest that you instead try with a device property
> > > > > > > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle
> > > > > > > count to dummy bytes inside m25p80. Below is an example on how to modify the
> > > > > >
> > > > > > No this is wrong in my view. This is not like a DMA vs. PIO handling.
> > > > > >
> > > > > > > decode_fast_read_cmd function (the other commands requiring dummy clock cycles
> > > > > > > can follow a similar pattern). This way the fifo mode will be able to work the
> > > > > > > way you desire while also keeping the current functionality intact. Suddenly
> > > > > > > removing functionality (features) will take users by surprise.
> > > > > >
> > > > > > I don't think we are removing any features. This is a fix to make the
> > > > > > model to be used by any SPI controllers.
> > > > > >
> > > > > > As I pointed out, both U-Boot and Linux have the multiple of 8
> > > > > > assumption for the dummy bit, which is the default configuration for
> > > > > > all flashes I have looked into so far. Can you please comment what use
> > > > > > case you want to support? I requested a U-Boot/Linux kernel testing in
> > > > > > the previous SST thread [1] against Xilinx GQSPI but there was no
> > > > > > response.
> > > > >
> > > > > In [2] instructions on how to boot u-boot/Linux is found. For building the
> > > > > various software components I followed the official doc in [3].
> > > >
> > > > I see the following QEMU commands are used to test booting U-Boot/Linux:
> > > >
> > > > $ qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G
> > > > -serial stdio -display none -device loader,file=u-boot.elf -kernel
> > > > bl31.elf -device loader,addr=0x40000000,file=Image -device
> > > > loader,addr=0x2000000,file=system.dtb
> > > >
> > > > I am not sure where the system.dtb gets built from?
> > >
> > > It is the instructions in [2] to look into. 'system.dtb' is the kernel dtb for
> > > zcu102 ([2] has been fixed). I created [2] purely for you, so respectfully I
> > > will ask you to try a little first before asking for further guidance.
> > >
> >
> > I tried, but no success. I removed the "-device loader" part for
> > loading kernel image and the device tree, and only focused on booting
> > U-Boot.
> >
> > The ATF bl31.elf was built from
> > https://github.com/ARM-software/arm-trusted-firmware, by following
> > build instructions at
> > https://trustedfirmware-a.readthedocs.io/en/latest/plat/xilinx-zynqmp.html.
> > U-Boot was built from the upstream U-Boot.
> >
> > $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m
> > 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel
> > bl31.elf
> > ERROR:   Incorrect XILINX IDCODE 0x0, maskid 0x4600093
> > NOTICE:  ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000
> > NOTICE:  BL31: v2.4(release):v2.4-228-g337e493
> > NOTICE:  BL31: Built : 21:18:14, Jan 20 2021
> > ERROR:   BL31: Platform Management API version error. Expected: v1.1 -
> > Found: v0.0
> > ERROR:   Error initializing runtime service sip_svc
> >
> > I also tried the Xilinx fork of ATF from
> > https://github.com/Xilinx/arm-trusted-firmware, by following build
> > instructions at
> > https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842305/Build+ARM+Trusted+Firmware+ATF
> >
> > $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m
> > 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel
> > bl31.elf
> > ERROR:   Incorrect XILINX IDCODE 0x0, maskid 0x4600093
> > NOTICE:  ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000
> > NOTICE:  BL31: v2.2(release):xilinx-v2020.2
> > NOTICE:  BL31: Built : 21:52:38, Jan 20 2021
> > ERROR:   BL31: Platform Management API version error. Expected: v1.1 -
> > Found: v0.0
> > ERROR:   Error initializing runtime service sip_svc
> >
> > Then I tried to build a U-Boot from the Xilinx fork at
> > https://github.com/Xilinx/u-boot-xlnx/, still no success.
> >
> > > Best regards,
> > > Francisco Iglesias
> > >
> > > [1] qemu/docs/system/deprecated.rst
> > > [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md
> > >

Regards,
Bin
Francisco Iglesias Jan. 21, 2021, 10:01 a.m. UTC | #14
Dear Bin,

On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
> Hi Francisco,
> 
> On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
> <frasse.iglesias@gmail.com> wrote:
> >
> > Dear Bin,
> >
> > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> > > Hi Francisco,
> > >
> > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> > > <frasse.iglesias@gmail.com> wrote:
> > > >
> > > > Hi Bin,
> > > >
> > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > > > Hi Francisco,
> > > > >
> > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > >
> > > > > > Hi Bin,
> > > > > >
> > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > > > Hi Francisco,
> > > > > > >
> > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Hi Bin,
> > > > > > > >
> > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > > > Hi Francisco,
> > > > > > > > >
> > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Bin,
> > > > > > > > > >
> > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > > > > > > >
> > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > > > > > > > bytes are expected to be received after it receives a command. For
> > > > > > > > > > > example, depending on the address mode, either 3-byte address or
> > > > > > > > > > > 4-byte address is needed.
> > > > > > > > > > >
> > > > > > > > > > > For fast read family commands, some dummy cycles are required after
> > > > > > > > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > > > >
> > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > > > > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > > > > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > > > > > > > >
> > > > > > > > > > While not being the original implementor I must assume that above solution was
> > > > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > > > > > > > also because the detail is already in use for catching exactly above error.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I found no clue from the commit message that my proposed solution here
> > > > > > > > > was ever considered, otherwise all SPI controller models supporting
> > > > > > > > > software generation should have been found out seriously broken long
> > > > > > > > > time ago!
> > > > > > > >
> > > > > > > >
> > > > > > > > The controllers you are referring to might lack support for commands requiring
> > > > > > > > dummy clock cycles but I really hope they work with the other commands? If so I
> > > > > > >
> > > > > > > I am not sure why you view dummy clock cycles as something special
> > > > > > > that needs some special support from the SPI controller. For the case
> > > > > > > 1 controller, it's nothing special from the controller perspective,
> > > > > > > just like sending out a command, or address bytes, or data. The
> > > > > > > controller just shifts data bit by bit from its tx fifo and that's it.
> > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > > > > > automatically generated (case 2 controller) by the hardware.
> > > > > >
> > > > > > Ok, I'll try to explain my view point a little differently. For that we also
> > > > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW
> > > > > > board supported in QEMU should ideally run on that board inside QEMU aswell
> > > > > > (this can be a bare metal application equaly well as a modified u-boot/Linux
> > > > > > using SPI commands with a non multiple of 8 number of dummy clock cycles).
> > > > > >
> > > > > > Once functionality has been introduced into QEMU it is not easy to know which
> > > > > > intentional or untentional features provided by the functionality are being
> > > > > > used by users. One of the (perhaps not well known) features I'm aware of that
> > > > > > is in use and is provided by the accurate dummy clock cycle modeling inside
> > > > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock
> > > > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock
> > > > > > cycles), but there might be others aswell. So by removing this functionality
> > > > > > above use case will brake, this since those test will not be reliable.
> > > > > > Furthermore, since users tend to be creative it is not possible to know if
> > > > > > there are other use cases that will be affected. This means that in case [1]
> > > > > > needs to be followed the safe path is to add functionality instead of removing.
> > > > > > Luckily it also easier in this case, see below.
> > > > >
> > > > > I understand there might be users other than U-Boot/Linux that use an
> > > > > odd number of dummy bits (not multiple of 8). If your concern was
> > > > > about model behavior changes, sure I can update
> > > > > qemu/docs/system/deprecated.rst to mention that some flashes in the
> > > > > m25p80 model now implement dummy cycles as bytes.
> > > >
> > > > Yes, something like that. My concern is that since this functionality has been
> > > > in tree for while, users have found known or unknown features that got
> > > > introduced by it. By removing the functionality (and the known/uknown features)
> > > > we are riscing to brake our user's use cases (currently I'm aware of one
> > > > feature/use case but it is not unlikely that there are more). [1] states that
> > > > "In general features are intended to be supported indefinitely once introduced
> > > > into QEMU", to me that makes very much sense because the opposite would mean
> > > > that we were not reliable. So in case [1] needs to be honored it looks to be
> > > > safer to add functionality instead of removing (and riscing the removal of use
> > > > cases/features). Luckily I still believe in this case that it will be easier to
> > > > go forward (even if I also agree on what you are saying below about what I
> > > > proposed).
> > > >
> > >
> > > Even if the implementation is buggy and we need to keep the buggy
> > > implementation forever? I think that's why
> > > qemu/docs/system/deprecated.rst was created for deprecating such
> > > feature.
> >
> > With the RFC I posted all commands in m25p80 are working for both the case 1
> > controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI).
> > Because of this, I, with all respect, will have to disagree that this is buggy.
> 
> Well, the existing m25p80 implementation that uses dummy cycle
> accuracy for those flashes prevents all SPI controllers that use tx
> fifo to work with those flashes. Hence it is buggy.
> 
> >
> > >
> > > > >
> > > > > > >
> > > > > > > > don't think it is fair to call them 'seriously broken' (and else we should
> > > > > > > > probably let the maintainers know about it). Most likely the lack of support
> > > > > > >
> > > > > > > I called it "seriously broken" because current implementation only
> > > > > > > considered one type of SPI controllers while completely ignoring the
> > > > > > > other type.
> > > > > >
> > > > > > If we change view and see this from the perspective of m25p80, it models the
> > > > > > commands a certain way and provides an API that the SPI controllers need to
> > > > > > implement for interacting with it. It is true that there are SPI controllers
> > > > > > referred to above that do not support the portion of that API that corresponds
> > > > > > to commands with dummy clock cycles, but I don't think it is true that this is
> > > > > > broken since there is also one SPI controller that has a working implementation
> > > > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But
> > > > > > as mentioned above, by doing a minor extension and improvement to m25p80's API
> > > > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> > > > > > will still be honored as in the same time making it possible to have full
> > > > > > support for the API in the SPI controllers that currently do not (please reread
> > > > > > the proposal in my previous reply that attempts to do this). I myself see this
> > > > > > as win/win situation, also because no controller should need modifications.
> > > > > >
> > > > >
> > > > > I am afraid your proposal does not work. Your proposed new device
> > > > > property 'model_dummy_bytes' to select to convert the accurate dummy
> > > > > clock cycle count to dummy bytes inside m25p80, is hard to justify as
> > > > > a property to the flash itself, as the behavior is tightly coupled to
> > > > > how the SPI controller works.
> > > >
> > > > I agree on above. I decided though that instead of posting sample code in here
> > > > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
> > > > Xilinx ZynqMP GQSPI should not need any modication in a first step.
> > > >
> > >
> > > Wait, (see below)
> > >
> > > > >
> > > > > Please take a look at the Xilinx GQSPI controller, which supports both
> > > > > use cases, that the dummy cycles can be transferred via tx fifo, or
> > > > > generated by the controller automatically. Please read the example
> > > > > given in:
> > > > >
> > > > >     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
> > > > > Command (EBh)
> > > > >
> > > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > > > >
> > > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to
> > > > > true when working with the Xilinx GQSPI controller, you are bound to
> > > > > only allow guest software to use tx fifo to transfer the dummy cycles,
> > > > > and this is wrong.
> > > > >
> > >
> > > You missed this part. I looked at your RFC, and as I mentioned above
> > > your proposal cannot support the complicated controller like Xilinx
> > > GQSPI. Please read the example of table 24-22. With your RFC, you
> > > mandate guest software's GQSPI driver to only use hardware dummy cycle
> > > generation, which is wrong.
> > >
> >
> > First, thank you very much for looking into the RFC series, very much
> > appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2
> > locations in the file, in 1 location the transfer referred to above is done, in
> > another location the transfer through the txfifo is done. The location where
> > transfer referred to above is done will not need any modifications (and will
> > thus work equally well as it does currently).
> 
> Please explain this a little bit. How does your RFC series handle
> cases as described in table 24-22, where the 6 dummy cycles are split
> into 2 transfers, with one transfer using tx fifo, and the other one
> using hardware dummy cycle generation?


Above transfer is already handled in the model, and since it will not change it
will still work afterwards.

About below, sure I'll provide some doc once I get some time over.

Best regards,
Francisco Iglesias


> 
> >
> > Now that above has is cleared out, and since I know you are heavily loaded with
> > other higher prio tasks, lets wait for the maintainers to also have a look into
> > the RFC (understandibly this can take some time due to that they also are
> > heavily loaded).
> 
> Yes, maintainers are pretty much silent on this topic.
> 
> However may I ask you to provide more details on my questions below on
> booting U-Boot/Linux with the QEMU?
> 
> You can post patches to add documentation for zynqmp in
> docs/system/arm, or once I get a working instructions, I could do that
> too. Much appreciated.
> 
> >
> > Best regards,
> > Francisco Iglesias
> >
> >
> > > > > >
> > > > > > >
> > > > > > > > for the commands is because no request has been made for them. Also there is
> > > > > > > > one controller that has support.
> > > > > > >
> > > > > > > Definitely it's not "no request". Nearly all SPI flashes support the
> > > > > > > Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is
> > > > > > > "seriously broken" for those case 1 type controllers because they
> > > > > > > cannot read anything from the m25p80 model at all. Unless the guest
> > > > > > > software being tested only uses Read (03h) command which is not
> > > > > > > affected. But I can't find a software that uses Read instead of Fast
> > > > > > > Read.
> > > > > > >
> > > > > > > > > The issue you pointed out that we require the total number of dummy
> > > > > > > > > bits should be multiple of 8 is true, that's why I added the
> > > > > > > > > unimplemented log message in this series (patch 2/3/4) to warn users
> > > > > > > > > if this expectation is not met. However this will not cause any issue
> > > > > > > > > when running U-Boot or Linux, because both spi-nor drivers expect the
> > > > > > > > > same assumption as we do here.
> > > > > > > > >
> > > > > > > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> > > > > > > > > there is a logic to calculate the dummy bytes needed for fast read
> > > > > > > > > command:
> > > > > > > > >
> > > > > > > > >     /* convert the dummy cycles to the number of bytes */
> > > > > > > > >     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> > > > > > > > >
> > > > > > > > > Note the default dummy cycles configuration for all flashes I have
> > > > > > > > > looked into as of today, meets the multiple of 8 assumption. On some
> > > > > > > > > flashes the dummy cycle number is configurable, and if it's been
> > > > > > > > > configured to be an odd value, it would not work on U-Boot/Linux in
> > > > > > > > > the first place.
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Things get complicated when interacting with different SPI or QSPI
> > > > > > > > > > > flash controllers. There are major two cases:
> > > > > > > > > > >
> > > > > > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > > > > > > > > > >   For such case, driver will calculate the correct number of dummy
> > > > > > > > > > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > > > > > > > > > >   fix flashes working with such controllers.
> > > > > > > > > >
> > > > > > > > > > Above can be fixed while still keeping the detailed dummy cycle implementation
> > > > > > > > > > inside m25p80. Perhaps one of the following could be looked into: configurating
> > > > > > > > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > > > > > > > > > some functionality handling this in the SPI controller. Or a mixture of above.
> > > > > > > > >
> > > > > > > > > Please send patches to explain this in detail how this is going to
> > > > > > > > > work. I am open to all possible solutions.
> > > > > > > >
> > > > > > > > In that case I suggest that you instead try with a device property
> > > > > > > > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle
> > > > > > > > count to dummy bytes inside m25p80. Below is an example on how to modify the
> > > > > > >
> > > > > > > No this is wrong in my view. This is not like a DMA vs. PIO handling.
> > > > > > >
> > > > > > > > decode_fast_read_cmd function (the other commands requiring dummy clock cycles
> > > > > > > > can follow a similar pattern). This way the fifo mode will be able to work the
> > > > > > > > way you desire while also keeping the current functionality intact. Suddenly
> > > > > > > > removing functionality (features) will take users by surprise.
> > > > > > >
> > > > > > > I don't think we are removing any features. This is a fix to make the
> > > > > > > model to be used by any SPI controllers.
> > > > > > >
> > > > > > > As I pointed out, both U-Boot and Linux have the multiple of 8
> > > > > > > assumption for the dummy bit, which is the default configuration for
> > > > > > > all flashes I have looked into so far. Can you please comment what use
> > > > > > > case you want to support? I requested a U-Boot/Linux kernel testing in
> > > > > > > the previous SST thread [1] against Xilinx GQSPI but there was no
> > > > > > > response.
> > > > > >
> > > > > > In [2] instructions on how to boot u-boot/Linux is found. For building the
> > > > > > various software components I followed the official doc in [3].
> > > > >
> > > > > I see the following QEMU commands are used to test booting U-Boot/Linux:
> > > > >
> > > > > $ qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G
> > > > > -serial stdio -display none -device loader,file=u-boot.elf -kernel
> > > > > bl31.elf -device loader,addr=0x40000000,file=Image -device
> > > > > loader,addr=0x2000000,file=system.dtb
> > > > >
> > > > > I am not sure where the system.dtb gets built from?
> > > >
> > > > It is the instructions in [2] to look into. 'system.dtb' is the kernel dtb for
> > > > zcu102 ([2] has been fixed). I created [2] purely for you, so respectfully I
> > > > will ask you to try a little first before asking for further guidance.
> > > >
> > >
> > > I tried, but no success. I removed the "-device loader" part for
> > > loading kernel image and the device tree, and only focused on booting
> > > U-Boot.
> > >
> > > The ATF bl31.elf was built from
> > > https://github.com/ARM-software/arm-trusted-firmware, by following
> > > build instructions at
> > > https://trustedfirmware-a.readthedocs.io/en/latest/plat/xilinx-zynqmp.html.
> > > U-Boot was built from the upstream U-Boot.
> > >
> > > $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m
> > > 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel
> > > bl31.elf
> > > ERROR:   Incorrect XILINX IDCODE 0x0, maskid 0x4600093
> > > NOTICE:  ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000
> > > NOTICE:  BL31: v2.4(release):v2.4-228-g337e493
> > > NOTICE:  BL31: Built : 21:18:14, Jan 20 2021
> > > ERROR:   BL31: Platform Management API version error. Expected: v1.1 -
> > > Found: v0.0
> > > ERROR:   Error initializing runtime service sip_svc
> > >
> > > I also tried the Xilinx fork of ATF from
> > > https://github.com/Xilinx/arm-trusted-firmware, by following build
> > > instructions at
> > > https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842305/Build+ARM+Trusted+Firmware+ATF
> > >
> > > $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m
> > > 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel
> > > bl31.elf
> > > ERROR:   Incorrect XILINX IDCODE 0x0, maskid 0x4600093
> > > NOTICE:  ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000
> > > NOTICE:  BL31: v2.2(release):xilinx-v2020.2
> > > NOTICE:  BL31: Built : 21:52:38, Jan 20 2021
> > > ERROR:   BL31: Platform Management API version error. Expected: v1.1 -
> > > Found: v0.0
> > > ERROR:   Error initializing runtime service sip_svc
> > >
> > > Then I tried to build a U-Boot from the Xilinx fork at
> > > https://github.com/Xilinx/u-boot-xlnx/, still no success.
> > >
> > > > Best regards,
> > > > Francisco Iglesias
> > > >
> > > > [1] qemu/docs/system/deprecated.rst
> > > > [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md
> > > >
> 
> Regards,
> Bin
Francisco Iglesias Jan. 21, 2021, 2:18 p.m. UTC | #15
Hi Bin,

On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
> Hi Francisco,
> 
> On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
> <frasse.iglesias@gmail.com> wrote:
> >
> > Dear Bin,
> >
> > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> > > Hi Francisco,
> > >
> > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> > > <frasse.iglesias@gmail.com> wrote:
> > > >
> > > > Hi Bin,
> > > >
> > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > > > Hi Francisco,
> > > > >
> > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > >
> > > > > > Hi Bin,
> > > > > >
> > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > > > Hi Francisco,
> > > > > > >
> > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Hi Bin,
> > > > > > > >
> > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > > > Hi Francisco,
> > > > > > > > >
> > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Bin,
> > > > > > > > > >
> > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > > > > > > >
> > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > > > > > > > bytes are expected to be received after it receives a command. For
> > > > > > > > > > > example, depending on the address mode, either 3-byte address or
> > > > > > > > > > > 4-byte address is needed.
> > > > > > > > > > >
> > > > > > > > > > > For fast read family commands, some dummy cycles are required after
> > > > > > > > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > > > >
> > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > > > > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > > > > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > > > > > > > >
> > > > > > > > > > While not being the original implementor I must assume that above solution was
> > > > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > > > > > > > also because the detail is already in use for catching exactly above error.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I found no clue from the commit message that my proposed solution here
> > > > > > > > > was ever considered, otherwise all SPI controller models supporting
> > > > > > > > > software generation should have been found out seriously broken long
> > > > > > > > > time ago!
> > > > > > > >
> > > > > > > >
> > > > > > > > The controllers you are referring to might lack support for commands requiring
> > > > > > > > dummy clock cycles but I really hope they work with the other commands? If so I
> > > > > > >
> > > > > > > I am not sure why you view dummy clock cycles as something special
> > > > > > > that needs some special support from the SPI controller. For the case
> > > > > > > 1 controller, it's nothing special from the controller perspective,
> > > > > > > just like sending out a command, or address bytes, or data. The
> > > > > > > controller just shifts data bit by bit from its tx fifo and that's it.
> > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > > > > > automatically generated (case 2 controller) by the hardware.
> > > > > >
> > > > > > Ok, I'll try to explain my view point a little differently. For that we also
> > > > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW
> > > > > > board supported in QEMU should ideally run on that board inside QEMU aswell
> > > > > > (this can be a bare metal application equaly well as a modified u-boot/Linux
> > > > > > using SPI commands with a non multiple of 8 number of dummy clock cycles).
> > > > > >
> > > > > > Once functionality has been introduced into QEMU it is not easy to know which
> > > > > > intentional or untentional features provided by the functionality are being
> > > > > > used by users. One of the (perhaps not well known) features I'm aware of that
> > > > > > is in use and is provided by the accurate dummy clock cycle modeling inside
> > > > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock
> > > > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock
> > > > > > cycles), but there might be others aswell. So by removing this functionality
> > > > > > above use case will brake, this since those test will not be reliable.
> > > > > > Furthermore, since users tend to be creative it is not possible to know if
> > > > > > there are other use cases that will be affected. This means that in case [1]
> > > > > > needs to be followed the safe path is to add functionality instead of removing.
> > > > > > Luckily it also easier in this case, see below.
> > > > >
> > > > > I understand there might be users other than U-Boot/Linux that use an
> > > > > odd number of dummy bits (not multiple of 8). If your concern was
> > > > > about model behavior changes, sure I can update
> > > > > qemu/docs/system/deprecated.rst to mention that some flashes in the
> > > > > m25p80 model now implement dummy cycles as bytes.
> > > >
> > > > Yes, something like that. My concern is that since this functionality has been
> > > > in tree for while, users have found known or unknown features that got
> > > > introduced by it. By removing the functionality (and the known/uknown features)
> > > > we are riscing to brake our user's use cases (currently I'm aware of one
> > > > feature/use case but it is not unlikely that there are more). [1] states that
> > > > "In general features are intended to be supported indefinitely once introduced
> > > > into QEMU", to me that makes very much sense because the opposite would mean
> > > > that we were not reliable. So in case [1] needs to be honored it looks to be
> > > > safer to add functionality instead of removing (and riscing the removal of use
> > > > cases/features). Luckily I still believe in this case that it will be easier to
> > > > go forward (even if I also agree on what you are saying below about what I
> > > > proposed).
> > > >
> > >
> > > Even if the implementation is buggy and we need to keep the buggy
> > > implementation forever? I think that's why
> > > qemu/docs/system/deprecated.rst was created for deprecating such
> > > feature.
> >
> > With the RFC I posted all commands in m25p80 are working for both the case 1
> > controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI).
> > Because of this, I, with all respect, will have to disagree that this is buggy.
> 
> Well, the existing m25p80 implementation that uses dummy cycle
> accuracy for those flashes prevents all SPI controllers that use tx
> fifo to work with those flashes. Hence it is buggy.
> 
> >
> > >
> > > > >
> > > > > > >
> > > > > > > > don't think it is fair to call them 'seriously broken' (and else we should
> > > > > > > > probably let the maintainers know about it). Most likely the lack of support
> > > > > > >
> > > > > > > I called it "seriously broken" because current implementation only
> > > > > > > considered one type of SPI controllers while completely ignoring the
> > > > > > > other type.
> > > > > >
> > > > > > If we change view and see this from the perspective of m25p80, it models the
> > > > > > commands a certain way and provides an API that the SPI controllers need to
> > > > > > implement for interacting with it. It is true that there are SPI controllers
> > > > > > referred to above that do not support the portion of that API that corresponds
> > > > > > to commands with dummy clock cycles, but I don't think it is true that this is
> > > > > > broken since there is also one SPI controller that has a working implementation
> > > > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But
> > > > > > as mentioned above, by doing a minor extension and improvement to m25p80's API
> > > > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> > > > > > will still be honored as in the same time making it possible to have full
> > > > > > support for the API in the SPI controllers that currently do not (please reread
> > > > > > the proposal in my previous reply that attempts to do this). I myself see this
> > > > > > as win/win situation, also because no controller should need modifications.
> > > > > >
> > > > >
> > > > > I am afraid your proposal does not work. Your proposed new device
> > > > > property 'model_dummy_bytes' to select to convert the accurate dummy
> > > > > clock cycle count to dummy bytes inside m25p80, is hard to justify as
> > > > > a property to the flash itself, as the behavior is tightly coupled to
> > > > > how the SPI controller works.
> > > >
> > > > I agree on above. I decided though that instead of posting sample code in here
> > > > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
> > > > Xilinx ZynqMP GQSPI should not need any modication in a first step.
> > > >
> > >
> > > Wait, (see below)
> > >
> > > > >
> > > > > Please take a look at the Xilinx GQSPI controller, which supports both
> > > > > use cases, that the dummy cycles can be transferred via tx fifo, or
> > > > > generated by the controller automatically. Please read the example
> > > > > given in:
> > > > >
> > > > >     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
> > > > > Command (EBh)
> > > > >
> > > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > > > >
> > > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to
> > > > > true when working with the Xilinx GQSPI controller, you are bound to
> > > > > only allow guest software to use tx fifo to transfer the dummy cycles,
> > > > > and this is wrong.
> > > > >
> > >
> > > You missed this part. I looked at your RFC, and as I mentioned above
> > > your proposal cannot support the complicated controller like Xilinx
> > > GQSPI. Please read the example of table 24-22. With your RFC, you
> > > mandate guest software's GQSPI driver to only use hardware dummy cycle
> > > generation, which is wrong.
> > >
> >
> > First, thank you very much for looking into the RFC series, very much
> > appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2
> > locations in the file, in 1 location the transfer referred to above is done, in
> > another location the transfer through the txfifo is done. The location where
> > transfer referred to above is done will not need any modifications (and will
> > thus work equally well as it does currently).
> 
> Please explain this a little bit. How does your RFC series handle
> cases as described in table 24-22, where the 6 dummy cycles are split
> into 2 transfers, with one transfer using tx fifo, and the other one
> using hardware dummy cycle generation?

Sorry, I missunderstod. You are right, that won't work.

Best regards,
Francisco Iglesias

> 
> >
> > Now that above has is cleared out, and since I know you are heavily loaded with
> > other higher prio tasks, lets wait for the maintainers to also have a look into
> > the RFC (understandibly this can take some time due to that they also are
> > heavily loaded).
> 
> Yes, maintainers are pretty much silent on this topic.
> 
> However may I ask you to provide more details on my questions below on
> booting U-Boot/Linux with the QEMU?
> 
> You can post patches to add documentation for zynqmp in
> docs/system/arm, or once I get a working instructions, I could do that
> too. Much appreciated.
> 
> >
> > Best regards,
> > Francisco Iglesias
> >
> >
> > > > > >
> > > > > > >
> > > > > > > > for the commands is because no request has been made for them. Also there is
> > > > > > > > one controller that has support.
> > > > > > >
> > > > > > > Definitely it's not "no request". Nearly all SPI flashes support the
> > > > > > > Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is
> > > > > > > "seriously broken" for those case 1 type controllers because they
> > > > > > > cannot read anything from the m25p80 model at all. Unless the guest
> > > > > > > software being tested only uses Read (03h) command which is not
> > > > > > > affected. But I can't find a software that uses Read instead of Fast
> > > > > > > Read.
> > > > > > >
> > > > > > > > > The issue you pointed out that we require the total number of dummy
> > > > > > > > > bits should be multiple of 8 is true, that's why I added the
> > > > > > > > > unimplemented log message in this series (patch 2/3/4) to warn users
> > > > > > > > > if this expectation is not met. However this will not cause any issue
> > > > > > > > > when running U-Boot or Linux, because both spi-nor drivers expect the
> > > > > > > > > same assumption as we do here.
> > > > > > > > >
> > > > > > > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> > > > > > > > > there is a logic to calculate the dummy bytes needed for fast read
> > > > > > > > > command:
> > > > > > > > >
> > > > > > > > >     /* convert the dummy cycles to the number of bytes */
> > > > > > > > >     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> > > > > > > > >
> > > > > > > > > Note the default dummy cycles configuration for all flashes I have
> > > > > > > > > looked into as of today, meets the multiple of 8 assumption. On some
> > > > > > > > > flashes the dummy cycle number is configurable, and if it's been
> > > > > > > > > configured to be an odd value, it would not work on U-Boot/Linux in
> > > > > > > > > the first place.
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Things get complicated when interacting with different SPI or QSPI
> > > > > > > > > > > flash controllers. There are major two cases:
> > > > > > > > > > >
> > > > > > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > > > > > > > > > >   For such case, driver will calculate the correct number of dummy
> > > > > > > > > > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > > > > > > > > > >   fix flashes working with such controllers.
> > > > > > > > > >
> > > > > > > > > > Above can be fixed while still keeping the detailed dummy cycle implementation
> > > > > > > > > > inside m25p80. Perhaps one of the following could be looked into: configurating
> > > > > > > > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > > > > > > > > > some functionality handling this in the SPI controller. Or a mixture of above.
> > > > > > > > >
> > > > > > > > > Please send patches to explain this in detail how this is going to
> > > > > > > > > work. I am open to all possible solutions.
> > > > > > > >
> > > > > > > > In that case I suggest that you instead try with a device property
> > > > > > > > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle
> > > > > > > > count to dummy bytes inside m25p80. Below is an example on how to modify the
> > > > > > >
> > > > > > > No this is wrong in my view. This is not like a DMA vs. PIO handling.
> > > > > > >
> > > > > > > > decode_fast_read_cmd function (the other commands requiring dummy clock cycles
> > > > > > > > can follow a similar pattern). This way the fifo mode will be able to work the
> > > > > > > > way you desire while also keeping the current functionality intact. Suddenly
> > > > > > > > removing functionality (features) will take users by surprise.
> > > > > > >
> > > > > > > I don't think we are removing any features. This is a fix to make the
> > > > > > > model to be used by any SPI controllers.
> > > > > > >
> > > > > > > As I pointed out, both U-Boot and Linux have the multiple of 8
> > > > > > > assumption for the dummy bit, which is the default configuration for
> > > > > > > all flashes I have looked into so far. Can you please comment what use
> > > > > > > case you want to support? I requested a U-Boot/Linux kernel testing in
> > > > > > > the previous SST thread [1] against Xilinx GQSPI but there was no
> > > > > > > response.
> > > > > >
> > > > > > In [2] instructions on how to boot u-boot/Linux is found. For building the
> > > > > > various software components I followed the official doc in [3].
> > > > >
> > > > > I see the following QEMU commands are used to test booting U-Boot/Linux:
> > > > >
> > > > > $ qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G
> > > > > -serial stdio -display none -device loader,file=u-boot.elf -kernel
> > > > > bl31.elf -device loader,addr=0x40000000,file=Image -device
> > > > > loader,addr=0x2000000,file=system.dtb
> > > > >
> > > > > I am not sure where the system.dtb gets built from?
> > > >
> > > > It is the instructions in [2] to look into. 'system.dtb' is the kernel dtb for
> > > > zcu102 ([2] has been fixed). I created [2] purely for you, so respectfully I
> > > > will ask you to try a little first before asking for further guidance.
> > > >
> > >
> > > I tried, but no success. I removed the "-device loader" part for
> > > loading kernel image and the device tree, and only focused on booting
> > > U-Boot.
> > >
> > > The ATF bl31.elf was built from
> > > https://github.com/ARM-software/arm-trusted-firmware, by following
> > > build instructions at
> > > https://trustedfirmware-a.readthedocs.io/en/latest/plat/xilinx-zynqmp.html.
> > > U-Boot was built from the upstream U-Boot.
> > >
> > > $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m
> > > 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel
> > > bl31.elf
> > > ERROR:   Incorrect XILINX IDCODE 0x0, maskid 0x4600093
> > > NOTICE:  ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000
> > > NOTICE:  BL31: v2.4(release):v2.4-228-g337e493
> > > NOTICE:  BL31: Built : 21:18:14, Jan 20 2021
> > > ERROR:   BL31: Platform Management API version error. Expected: v1.1 -
> > > Found: v0.0
> > > ERROR:   Error initializing runtime service sip_svc
> > >
> > > I also tried the Xilinx fork of ATF from
> > > https://github.com/Xilinx/arm-trusted-firmware, by following build
> > > instructions at
> > > https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842305/Build+ARM+Trusted+Firmware+ATF
> > >
> > > $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m
> > > 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel
> > > bl31.elf
> > > ERROR:   Incorrect XILINX IDCODE 0x0, maskid 0x4600093
> > > NOTICE:  ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000
> > > NOTICE:  BL31: v2.2(release):xilinx-v2020.2
> > > NOTICE:  BL31: Built : 21:52:38, Jan 20 2021
> > > ERROR:   BL31: Platform Management API version error. Expected: v1.1 -
> > > Found: v0.0
> > > ERROR:   Error initializing runtime service sip_svc
> > >
> > > Then I tried to build a U-Boot from the Xilinx fork at
> > > https://github.com/Xilinx/u-boot-xlnx/, still no success.
> > >
> > > > Best regards,
> > > > Francisco Iglesias
> > > >
> > > > [1] qemu/docs/system/deprecated.rst
> > > > [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md
> > > >
> 
> Regards,
> Bin
Bin Meng Feb. 8, 2021, 2:41 p.m. UTC | #16
On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias
<frasse.iglesias@gmail.com> wrote:
>
> Hi Bin,
>
> On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
> > Hi Francisco,
> >
> > On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
> > <frasse.iglesias@gmail.com> wrote:
> > >
> > > Dear Bin,
> > >
> > > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> > > > Hi Francisco,
> > > >
> > > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> > > > <frasse.iglesias@gmail.com> wrote:
> > > > >
> > > > > Hi Bin,
> > > > >
> > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > > > > Hi Francisco,
> > > > > >
> > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi Bin,
> > > > > > >
> > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > > > > Hi Francisco,
> > > > > > > >
> > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Bin,
> > > > > > > > >
> > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > > > > Hi Francisco,
> > > > > > > > > >
> > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi Bin,
> > > > > > > > > > >
> > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > > > > > > > >
> > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > > > > > > > > bytes are expected to be received after it receives a command. For
> > > > > > > > > > > > example, depending on the address mode, either 3-byte address or
> > > > > > > > > > > > 4-byte address is needed.
> > > > > > > > > > > >
> > > > > > > > > > > > For fast read family commands, some dummy cycles are required after
> > > > > > > > > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > > > > >
> > > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > > > > > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > > > > > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > > > > > > > > >
> > > > > > > > > > > While not being the original implementor I must assume that above solution was
> > > > > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > > > > > > > > also because the detail is already in use for catching exactly above error.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I found no clue from the commit message that my proposed solution here
> > > > > > > > > > was ever considered, otherwise all SPI controller models supporting
> > > > > > > > > > software generation should have been found out seriously broken long
> > > > > > > > > > time ago!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > The controllers you are referring to might lack support for commands requiring
> > > > > > > > > dummy clock cycles but I really hope they work with the other commands? If so I
> > > > > > > >
> > > > > > > > I am not sure why you view dummy clock cycles as something special
> > > > > > > > that needs some special support from the SPI controller. For the case
> > > > > > > > 1 controller, it's nothing special from the controller perspective,
> > > > > > > > just like sending out a command, or address bytes, or data. The
> > > > > > > > controller just shifts data bit by bit from its tx fifo and that's it.
> > > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > > > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > > > > > > automatically generated (case 2 controller) by the hardware.
> > > > > > >
> > > > > > > Ok, I'll try to explain my view point a little differently. For that we also
> > > > > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW
> > > > > > > board supported in QEMU should ideally run on that board inside QEMU aswell
> > > > > > > (this can be a bare metal application equaly well as a modified u-boot/Linux
> > > > > > > using SPI commands with a non multiple of 8 number of dummy clock cycles).
> > > > > > >
> > > > > > > Once functionality has been introduced into QEMU it is not easy to know which
> > > > > > > intentional or untentional features provided by the functionality are being
> > > > > > > used by users. One of the (perhaps not well known) features I'm aware of that
> > > > > > > is in use and is provided by the accurate dummy clock cycle modeling inside
> > > > > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock
> > > > > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock
> > > > > > > cycles), but there might be others aswell. So by removing this functionality
> > > > > > > above use case will brake, this since those test will not be reliable.
> > > > > > > Furthermore, since users tend to be creative it is not possible to know if
> > > > > > > there are other use cases that will be affected. This means that in case [1]
> > > > > > > needs to be followed the safe path is to add functionality instead of removing.
> > > > > > > Luckily it also easier in this case, see below.
> > > > > >
> > > > > > I understand there might be users other than U-Boot/Linux that use an
> > > > > > odd number of dummy bits (not multiple of 8). If your concern was
> > > > > > about model behavior changes, sure I can update
> > > > > > qemu/docs/system/deprecated.rst to mention that some flashes in the
> > > > > > m25p80 model now implement dummy cycles as bytes.
> > > > >
> > > > > Yes, something like that. My concern is that since this functionality has been
> > > > > in tree for while, users have found known or unknown features that got
> > > > > introduced by it. By removing the functionality (and the known/uknown features)
> > > > > we are riscing to brake our user's use cases (currently I'm aware of one
> > > > > feature/use case but it is not unlikely that there are more). [1] states that
> > > > > "In general features are intended to be supported indefinitely once introduced
> > > > > into QEMU", to me that makes very much sense because the opposite would mean
> > > > > that we were not reliable. So in case [1] needs to be honored it looks to be
> > > > > safer to add functionality instead of removing (and riscing the removal of use
> > > > > cases/features). Luckily I still believe in this case that it will be easier to
> > > > > go forward (even if I also agree on what you are saying below about what I
> > > > > proposed).
> > > > >
> > > >
> > > > Even if the implementation is buggy and we need to keep the buggy
> > > > implementation forever? I think that's why
> > > > qemu/docs/system/deprecated.rst was created for deprecating such
> > > > feature.
> > >
> > > With the RFC I posted all commands in m25p80 are working for both the case 1
> > > controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI).
> > > Because of this, I, with all respect, will have to disagree that this is buggy.
> >
> > Well, the existing m25p80 implementation that uses dummy cycle
> > accuracy for those flashes prevents all SPI controllers that use tx
> > fifo to work with those flashes. Hence it is buggy.
> >
> > >
> > > >
> > > > > >
> > > > > > > >
> > > > > > > > > don't think it is fair to call them 'seriously broken' (and else we should
> > > > > > > > > probably let the maintainers know about it). Most likely the lack of support
> > > > > > > >
> > > > > > > > I called it "seriously broken" because current implementation only
> > > > > > > > considered one type of SPI controllers while completely ignoring the
> > > > > > > > other type.
> > > > > > >
> > > > > > > If we change view and see this from the perspective of m25p80, it models the
> > > > > > > commands a certain way and provides an API that the SPI controllers need to
> > > > > > > implement for interacting with it. It is true that there are SPI controllers
> > > > > > > referred to above that do not support the portion of that API that corresponds
> > > > > > > to commands with dummy clock cycles, but I don't think it is true that this is
> > > > > > > broken since there is also one SPI controller that has a working implementation
> > > > > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But
> > > > > > > as mentioned above, by doing a minor extension and improvement to m25p80's API
> > > > > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> > > > > > > will still be honored as in the same time making it possible to have full
> > > > > > > support for the API in the SPI controllers that currently do not (please reread
> > > > > > > the proposal in my previous reply that attempts to do this). I myself see this
> > > > > > > as win/win situation, also because no controller should need modifications.
> > > > > > >
> > > > > >
> > > > > > I am afraid your proposal does not work. Your proposed new device
> > > > > > property 'model_dummy_bytes' to select to convert the accurate dummy
> > > > > > clock cycle count to dummy bytes inside m25p80, is hard to justify as
> > > > > > a property to the flash itself, as the behavior is tightly coupled to
> > > > > > how the SPI controller works.
> > > > >
> > > > > I agree on above. I decided though that instead of posting sample code in here
> > > > > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
> > > > > Xilinx ZynqMP GQSPI should not need any modication in a first step.
> > > > >
> > > >
> > > > Wait, (see below)
> > > >
> > > > > >
> > > > > > Please take a look at the Xilinx GQSPI controller, which supports both
> > > > > > use cases, that the dummy cycles can be transferred via tx fifo, or
> > > > > > generated by the controller automatically. Please read the example
> > > > > > given in:
> > > > > >
> > > > > >     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
> > > > > > Command (EBh)
> > > > > >
> > > > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > > > > >
> > > > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to
> > > > > > true when working with the Xilinx GQSPI controller, you are bound to
> > > > > > only allow guest software to use tx fifo to transfer the dummy cycles,
> > > > > > and this is wrong.
> > > > > >
> > > >
> > > > You missed this part. I looked at your RFC, and as I mentioned above
> > > > your proposal cannot support the complicated controller like Xilinx
> > > > GQSPI. Please read the example of table 24-22. With your RFC, you
> > > > mandate guest software's GQSPI driver to only use hardware dummy cycle
> > > > generation, which is wrong.
> > > >
> > >
> > > First, thank you very much for looking into the RFC series, very much
> > > appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2
> > > locations in the file, in 1 location the transfer referred to above is done, in
> > > another location the transfer through the txfifo is done. The location where
> > > transfer referred to above is done will not need any modifications (and will
> > > thus work equally well as it does currently).
> >
> > Please explain this a little bit. How does your RFC series handle
> > cases as described in table 24-22, where the 6 dummy cycles are split
> > into 2 transfers, with one transfer using tx fifo, and the other one
> > using hardware dummy cycle generation?
>
> Sorry, I missunderstod. You are right, that won't work.

+Edgar E. Iglesias

So it looks by far the only way to implement dummy cycles correctly to
work with all SPI controller models is what I proposed here in this
patch series.

Maintainers are quite silent, so I would like to hear your thoughts.

@Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you
please share your thoughts since you are the one who reviewed the
existing dummy implementation (based on commits history)

Regards,
Bin
Edgar E. Iglesias Feb. 8, 2021, 3:30 p.m. UTC | #17
On Mon, Feb 8, 2021 at 3:42 PM Bin Meng <bmeng.cn@gmail.com> wrote:

> On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias
> <frasse.iglesias@gmail.com> wrote:
> >
> > Hi Bin,
> >
> > On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
> > > Hi Francisco,
> > >
> > > On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
> > > <frasse.iglesias@gmail.com> wrote:
> > > >
> > > > Dear Bin,
> > > >
> > > > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> > > > > Hi Francisco,
> > > > >
> > > > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > >
> > > > > > Hi Bin,
> > > > > >
> > > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > > > > > Hi Francisco,
> > > > > > >
> > > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Hi Bin,
> > > > > > > >
> > > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > > > > > Hi Francisco,
> > > > > > > > >
> > > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Bin,
> > > > > > > > > >
> > > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > > > > > Hi Francisco,
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > >
> > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > > > > > > > > >
> > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate
> how many follow-up
> > > > > > > > > > > > > bytes are expected to be received after it
> receives a command. For
> > > > > > > > > > > > > example, depending on the address mode, either
> 3-byte address or
> > > > > > > > > > > > > 4-byte address is needed.
> > > > > > > > > > > > >
> > > > > > > > > > > > > For fast read family commands, some dummy cycles
> are required after
> > > > > > > > > > > > > sending the address bytes, and the dummy cycles
> need to be counted
> > > > > > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > > > > > >
> > > > > > > > > > > > > As the variable name (needed_bytes) indicates, the
> unit is in byte.
> > > > > > > > > > > > > It is not in bit, or cycle. However for some
> reason the model has
> > > > > > > > > > > > > been using the number of dummy cycles for
> s->needed_bytes. The right
> > > > > > > > > > > > > approach is to convert the number of dummy cycles
> to bytes based on
> > > > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for
> the Fast Read Quad
> > > > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the
> formula (6 * 4 / 8).
> > > > > > > > > > > >
> > > > > > > > > > > > While not being the original implementor I must
> assume that above solution was
> > > > > > > > > > > > considered but not chosen by the developers due to
> it is inaccuracy (it
> > > > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles,
> only a multiple of 8,
> > > > > > > > > > > > meaning that if the controller is wrongly programmed
> to generate 7 the error
> > > > > > > > > > > > wouldn't be caught and the controller will still be
> considered "correct"). Now
> > > > > > > > > > > > that we have this detail in the implementation I'm
> in favor of keeping it, this
> > > > > > > > > > > > also because the detail is already in use for
> catching exactly above error.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I found no clue from the commit message that my
> proposed solution here
> > > > > > > > > > > was ever considered, otherwise all SPI controller
> models supporting
> > > > > > > > > > > software generation should have been found out
> seriously broken long
> > > > > > > > > > > time ago!
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The controllers you are referring to might lack support
> for commands requiring
> > > > > > > > > > dummy clock cycles but I really hope they work with the
> other commands? If so I
> > > > > > > > >
> > > > > > > > > I am not sure why you view dummy clock cycles as something
> special
> > > > > > > > > that needs some special support from the SPI controller.
> For the case
> > > > > > > > > 1 controller, it's nothing special from the controller
> perspective,
> > > > > > > > > just like sending out a command, or address bytes, or
> data. The
> > > > > > > > > controller just shifts data bit by bit from its tx fifo
> and that's it.
> > > > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can
> either be
> > > > > > > > > sent via a regular data (the case 1 controller) in the tx
> fifo, or
> > > > > > > > > automatically generated (case 2 controller) by the
> hardware.
> > > > > > > >
> > > > > > > > Ok, I'll try to explain my view point a little differently.
> For that we also
> > > > > > > > need to keep in mind that QEMU models HW, and any binary
> that runs on a HW
> > > > > > > > board supported in QEMU should ideally run on that board
> inside QEMU aswell
> > > > > > > > (this can be a bare metal application equaly well as a
> modified u-boot/Linux
> > > > > > > > using SPI commands with a non multiple of 8 number of dummy
> clock cycles).
> > > > > > > >
> > > > > > > > Once functionality has been introduced into QEMU it is not
> easy to know which
> > > > > > > > intentional or untentional features provided by the
> functionality are being
> > > > > > > > used by users. One of the (perhaps not well known) features
> I'm aware of that
> > > > > > > > is in use and is provided by the accurate dummy clock cycle
> modeling inside
> > > > > > > > m25p80 is the be ability to test drivers accurately
> regarding the dummy clock
> > > > > > > > cycles (even when using commands with a non-multiple of 8
> number of dummy clock
> > > > > > > > cycles), but there might be others aswell. So by removing
> this functionality
> > > > > > > > above use case will brake, this since those test will not be
> reliable.
> > > > > > > > Furthermore, since users tend to be creative it is not
> possible to know if
> > > > > > > > there are other use cases that will be affected. This means
> that in case [1]
> > > > > > > > needs to be followed the safe path is to add functionality
> instead of removing.
> > > > > > > > Luckily it also easier in this case, see below.
> > > > > > >
> > > > > > > I understand there might be users other than U-Boot/Linux that
> use an
> > > > > > > odd number of dummy bits (not multiple of 8). If your concern
> was
> > > > > > > about model behavior changes, sure I can update
> > > > > > > qemu/docs/system/deprecated.rst to mention that some flashes
> in the
> > > > > > > m25p80 model now implement dummy cycles as bytes.
> > > > > >
> > > > > > Yes, something like that. My concern is that since this
> functionality has been
> > > > > > in tree for while, users have found known or unknown features
> that got
> > > > > > introduced by it. By removing the functionality (and the
> known/uknown features)
> > > > > > we are riscing to brake our user's use cases (currently I'm
> aware of one
> > > > > > feature/use case but it is not unlikely that there are more).
> [1] states that
> > > > > > "In general features are intended to be supported indefinitely
> once introduced
> > > > > > into QEMU", to me that makes very much sense because the
> opposite would mean
> > > > > > that we were not reliable. So in case [1] needs to be honored it
> looks to be
> > > > > > safer to add functionality instead of removing (and riscing the
> removal of use
> > > > > > cases/features). Luckily I still believe in this case that it
> will be easier to
> > > > > > go forward (even if I also agree on what you are saying below
> about what I
> > > > > > proposed).
> > > > > >
> > > > >
> > > > > Even if the implementation is buggy and we need to keep the buggy
> > > > > implementation forever? I think that's why
> > > > > qemu/docs/system/deprecated.rst was created for deprecating such
> > > > > feature.
> > > >
> > > > With the RFC I posted all commands in m25p80 are working for both
> the case 1
> > > > controller (using a txfifo) and the case 2 controller (no txfifo, as
> GQSPI).
> > > > Because of this, I, with all respect, will have to disagree that
> this is buggy.
> > >
> > > Well, the existing m25p80 implementation that uses dummy cycle
> > > accuracy for those flashes prevents all SPI controllers that use tx
> > > fifo to work with those flashes. Hence it is buggy.
> > >
> > > >
> > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > > don't think it is fair to call them 'seriously broken'
> (and else we should
> > > > > > > > > > probably let the maintainers know about it). Most likely
> the lack of support
> > > > > > > > >
> > > > > > > > > I called it "seriously broken" because current
> implementation only
> > > > > > > > > considered one type of SPI controllers while completely
> ignoring the
> > > > > > > > > other type.
> > > > > > > >
> > > > > > > > If we change view and see this from the perspective of
> m25p80, it models the
> > > > > > > > commands a certain way and provides an API that the SPI
> controllers need to
> > > > > > > > implement for interacting with it. It is true that there are
> SPI controllers
> > > > > > > > referred to above that do not support the portion of that
> API that corresponds
> > > > > > > > to commands with dummy clock cycles, but I don't think it is
> true that this is
> > > > > > > > broken since there is also one SPI controller that has a
> working implementation
> > > > > > > > of m25p80's full API also when transfering through a tx fifo
> (use case 1). But
> > > > > > > > as mentioned above, by doing a minor extension and
> improvement to m25p80's API
> > > > > > > > and allow for toggling the accuracy from dummy clock cycles
> to dummy bytes [1]
> > > > > > > > will still be honored as in the same time making it possible
> to have full
> > > > > > > > support for the API in the SPI controllers that currently do
> not (please reread
> > > > > > > > the proposal in my previous reply that attempts to do this).
> I myself see this
> > > > > > > > as win/win situation, also because no controller should need
> modifications.
> > > > > > > >
> > > > > > >
> > > > > > > I am afraid your proposal does not work. Your proposed new
> device
> > > > > > > property 'model_dummy_bytes' to select to convert the accurate
> dummy
> > > > > > > clock cycle count to dummy bytes inside m25p80, is hard to
> justify as
> > > > > > > a property to the flash itself, as the behavior is tightly
> coupled to
> > > > > > > how the SPI controller works.
> > > > > >
> > > > > > I agree on above. I decided though that instead of posting
> sample code in here
> > > > > > I'll post an RFC with hopefully an improved proposal. I'll cc
> you. About below,
> > > > > > Xilinx ZynqMP GQSPI should not need any modication in a first
> step.
> > > > > >
> > > > >
> > > > > Wait, (see below)
> > > > >
> > > > > > >
> > > > > > > Please take a look at the Xilinx GQSPI controller, which
> supports both
> > > > > > > use cases, that the dummy cycles can be transferred via tx
> fifo, or
> > > > > > > generated by the controller automatically. Please read the
> example
> > > > > > > given in:
> > > > > > >
> > > > > > >     table 24‐22, an example of Generic FIFO Contents for Quad
> I/O Read
> > > > > > > Command (EBh)
> > > > > > >
> > > > > > > in
> https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > > > > > >
> > > > > > > If you choose to set the m25p80 device property
> 'model_dummy_bytes' to
> > > > > > > true when working with the Xilinx GQSPI controller, you are
> bound to
> > > > > > > only allow guest software to use tx fifo to transfer the dummy
> cycles,
> > > > > > > and this is wrong.
> > > > > > >
> > > > >
> > > > > You missed this part. I looked at your RFC, and as I mentioned
> above
> > > > > your proposal cannot support the complicated controller like Xilinx
> > > > > GQSPI. Please read the example of table 24-22. With your RFC, you
> > > > > mandate guest software's GQSPI driver to only use hardware dummy
> cycle
> > > > > generation, which is wrong.
> > > > >
> > > >
> > > > First, thank you very much for looking into the RFC series, very much
> > > > appreciated. Secondly, about above, the GQSPI model in QEMU
> transfers from 2
> > > > locations in the file, in 1 location the transfer referred to above
> is done, in
> > > > another location the transfer through the txfifo is done. The
> location where
> > > > transfer referred to above is done will not need any modifications
> (and will
> > > > thus work equally well as it does currently).
> > >
> > > Please explain this a little bit. How does your RFC series handle
> > > cases as described in table 24-22, where the 6 dummy cycles are split
> > > into 2 transfers, with one transfer using tx fifo, and the other one
> > > using hardware dummy cycle generation?
> >
> > Sorry, I missunderstod. You are right, that won't work.
>
> +Edgar E. Iglesias
>
> So it looks by far the only way to implement dummy cycles correctly to
> work with all SPI controller models is what I proposed here in this
> patch series.
>
> Maintainers are quite silent, so I would like to hear your thoughts.
>
> @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you
> please share your thoughts since you are the one who reviewed the
> existing dummy implementation (based on commits history)
>
>
Francisco really knows this stuff better than me....
I would tend to agree that it's unfortunate to model things in cycles, if
we could abstract things at a higher level that would be nice. Without
breaking existing use-cases.
Francisco, is it impossible to bring up the abstraction level to bytes and
keep existing use-cases?

We have a bunch of test-cases, We'll publish some of them in source code,
others we can't publish since they use proprietary SW we're not allowed to
publish at all, but we can run tests and Ack if things work.

Best regards,
Edgar
Francisco Iglesias Feb. 9, 2021, 9:35 a.m. UTC | #18
Hello Edgar,

On [2021 Feb 08] Mon 16:30:00, Edgar E. Iglesias wrote:
>    On Mon, Feb 8, 2021 at 3:42 PM Bin Meng <bmeng.cn@gmail.com> wrote:
> 
>      On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias
>      <frasse.iglesias@gmail.com> wrote:
>      >
>      > Hi Bin,
>      >
>      > On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
>      > > Hi Francisco,
>      > >
>      > > On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
>      > > <frasse.iglesias@gmail.com> wrote:
>      > > >
>      > > > Dear Bin,
>      > > >
>      > > > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
>      > > > > Hi Francisco,
>      > > > >
>      > > > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
>      > > > > <frasse.iglesias@gmail.com> wrote:
>      > > > > >
>      > > > > > Hi Bin,
>      > > > > >
>      > > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
>      > > > > > > Hi Francisco,
>      > > > > > >
>      > > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
>      > > > > > > <frasse.iglesias@gmail.com> wrote:
>      > > > > > > >
>      > > > > > > > Hi Bin,
>      > > > > > > >
>      > > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
>      > > > > > > > > Hi Francisco,
>      > > > > > > > >
>      > > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
>      > > > > > > > > <frasse.iglesias@gmail.com> wrote:
>      > > > > > > > > >
>      > > > > > > > > > Hi Bin,
>      > > > > > > > > >
>      > > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
>      > > > > > > > > > > Hi Francisco,
>      > > > > > > > > > >
>      > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
>      > > > > > > > > > > <frasse.iglesias@gmail.com> wrote:
>      > > > > > > > > > > >
>      > > > > > > > > > > > Hi Bin,
>      > > > > > > > > > > >
>      > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
>      > > > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com>
>      > > > > > > > > > > > >
>      > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to
>      indicate how many follow-up
>      > > > > > > > > > > > > bytes are expected to be received after it
>      receives a command. For
>      > > > > > > > > > > > > example, depending on the address mode, either
>      3-byte address or
>      > > > > > > > > > > > > 4-byte address is needed.
>      > > > > > > > > > > > >
>      > > > > > > > > > > > > For fast read family commands, some dummy cycles
>      are required after
>      > > > > > > > > > > > > sending the address bytes, and the dummy cycles
>      need to be counted
>      > > > > > > > > > > > > in s->needed_bytes. This is where the mess
>      began.
>      > > > > > > > > > > > >
>      > > > > > > > > > > > > As the variable name (needed_bytes) indicates,
>      the unit is in byte.
>      > > > > > > > > > > > > It is not in bit, or cycle. However for some
>      reason the model has
>      > > > > > > > > > > > > been using the number of dummy cycles for
>      s->needed_bytes. The right
>      > > > > > > > > > > > > approach is to convert the number of dummy
>      cycles to bytes based on
>      > > > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles
>      for the Fast Read Quad
>      > > > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the
>      formula (6 * 4 / 8).
>      > > > > > > > > > > >
>      > > > > > > > > > > > While not being the original implementor I must
>      assume that above solution was
>      > > > > > > > > > > > considered but not chosen by the developers due to
>      it is inaccuracy (it
>      > > > > > > > > > > > wouldn't be possible to model exacly 6 dummy
>      cycles, only a multiple of 8,
>      > > > > > > > > > > > meaning that if the controller is wrongly
>      programmed to generate 7 the error
>      > > > > > > > > > > > wouldn't be caught and the controller will still
>      be considered "correct"). Now
>      > > > > > > > > > > > that we have this detail in the implementation I'm
>      in favor of keeping it, this
>      > > > > > > > > > > > also because the detail is already in use for
>      catching exactly above error.
>      > > > > > > > > > > >
>      > > > > > > > > > >
>      > > > > > > > > > > I found no clue from the commit message that my
>      proposed solution here
>      > > > > > > > > > > was ever considered, otherwise all SPI controller
>      models supporting
>      > > > > > > > > > > software generation should have been found out
>      seriously broken long
>      > > > > > > > > > > time ago!
>      > > > > > > > > >
>      > > > > > > > > >
>      > > > > > > > > > The controllers you are referring to might lack
>      support for commands requiring
>      > > > > > > > > > dummy clock cycles but I really hope they work with
>      the other commands? If so I
>      > > > > > > > >
>      > > > > > > > > I am not sure why you view dummy clock cycles as
>      something special
>      > > > > > > > > that needs some special support from the SPI controller.
>      For the case
>      > > > > > > > > 1 controller, it's nothing special from the controller
>      perspective,
>      > > > > > > > > just like sending out a command, or address bytes, or
>      data. The
>      > > > > > > > > controller just shifts data bit by bit from its tx fifo
>      and that's it.
>      > > > > > > > > In the Xilinx GQSPI controller case, the dummy cycles
>      can either be
>      > > > > > > > > sent via a regular data (the case 1 controller) in the
>      tx fifo, or
>      > > > > > > > > automatically generated (case 2 controller) by the
>      hardware.
>      > > > > > > >
>      > > > > > > > Ok, I'll try to explain my view point a little
>      differently. For that we also
>      > > > > > > > need to keep in mind that QEMU models HW, and any binary
>      that runs on a HW
>      > > > > > > > board supported in QEMU should ideally run on that board
>      inside QEMU aswell
>      > > > > > > > (this can be a bare metal application equaly well as a
>      modified u-boot/Linux
>      > > > > > > > using SPI commands with a non multiple of 8 number of
>      dummy clock cycles).
>      > > > > > > >
>      > > > > > > > Once functionality has been introduced into QEMU it is not
>      easy to know which
>      > > > > > > > intentional or untentional features provided by the
>      functionality are being
>      > > > > > > > used by users. One of the (perhaps not well known)
>      features I'm aware of that
>      > > > > > > > is in use and is provided by the accurate dummy clock
>      cycle modeling inside
>      > > > > > > > m25p80 is the be ability to test drivers accurately
>      regarding the dummy clock
>      > > > > > > > cycles (even when using commands with a non-multiple of 8
>      number of dummy clock
>      > > > > > > > cycles), but there might be others aswell. So by removing
>      this functionality
>      > > > > > > > above use case will brake, this since those test will not
>      be reliable.
>      > > > > > > > Furthermore, since users tend to be creative it is not
>      possible to know if
>      > > > > > > > there are other use cases that will be affected. This
>      means that in case [1]
>      > > > > > > > needs to be followed the safe path is to add functionality
>      instead of removing.
>      > > > > > > > Luckily it also easier in this case, see below.
>      > > > > > >
>      > > > > > > I understand there might be users other than U-Boot/Linux
>      that use an
>      > > > > > > odd number of dummy bits (not multiple of 8). If your
>      concern was
>      > > > > > > about model behavior changes, sure I can update
>      > > > > > > qemu/docs/system/deprecated.rst to mention that some flashes
>      in the
>      > > > > > > m25p80 model now implement dummy cycles as bytes.
>      > > > > >
>      > > > > > Yes, something like that. My concern is that since this
>      functionality has been
>      > > > > > in tree for while, users have found known or unknown features
>      that got
>      > > > > > introduced by it. By removing the functionality (and the
>      known/uknown features)
>      > > > > > we are riscing to brake our user's use cases (currently I'm
>      aware of one
>      > > > > > feature/use case but it is not unlikely that there are more).
>      [1] states that
>      > > > > > "In general features are intended to be supported indefinitely
>      once introduced
>      > > > > > into QEMU", to me that makes very much sense because the
>      opposite would mean
>      > > > > > that we were not reliable. So in case [1] needs to be honored
>      it looks to be
>      > > > > > safer to add functionality instead of removing (and riscing
>      the removal of use
>      > > > > > cases/features). Luckily I still believe in this case that it
>      will be easier to
>      > > > > > go forward (even if I also agree on what you are saying below
>      about what I
>      > > > > > proposed).
>      > > > > >
>      > > > >
>      > > > > Even if the implementation is buggy and we need to keep the
>      buggy
>      > > > > implementation forever? I think that's why
>      > > > > qemu/docs/system/deprecated.rst was created for deprecating such
>      > > > > feature.
>      > > >
>      > > > With the RFC I posted all commands in m25p80 are working for both
>      the case 1
>      > > > controller (using a txfifo) and the case 2 controller (no txfifo,
>      as GQSPI).
>      > > > Because of this, I, with all respect, will have to disagree that
>      this is buggy.
>      > >
>      > > Well, the existing m25p80 implementation that uses dummy cycle
>      > > accuracy for those flashes prevents all SPI controllers that use tx
>      > > fifo to work with those flashes. Hence it is buggy.
>      > >
>      > > >
>      > > > >
>      > > > > > >
>      > > > > > > > >
>      > > > > > > > > > don't think it is fair to call them 'seriously broken'
>      (and else we should
>      > > > > > > > > > probably let the maintainers know about it). Most
>      likely the lack of support
>      > > > > > > > >
>      > > > > > > > > I called it "seriously broken" because current
>      implementation only
>      > > > > > > > > considered one type of SPI controllers while completely
>      ignoring the
>      > > > > > > > > other type.
>      > > > > > > >
>      > > > > > > > If we change view and see this from the perspective of
>      m25p80, it models the
>      > > > > > > > commands a certain way and provides an API that the SPI
>      controllers need to
>      > > > > > > > implement for interacting with it. It is true that there
>      are SPI controllers
>      > > > > > > > referred to above that do not support the portion of that
>      API that corresponds
>      > > > > > > > to commands with dummy clock cycles, but I don't think it
>      is true that this is
>      > > > > > > > broken since there is also one SPI controller that has a
>      working implementation
>      > > > > > > > of m25p80's full API also when transfering through a tx
>      fifo (use case 1). But
>      > > > > > > > as mentioned above, by doing a minor extension and
>      improvement to m25p80's API
>      > > > > > > > and allow for toggling the accuracy from dummy clock
>      cycles to dummy bytes [1]
>      > > > > > > > will still be honored as in the same time making it
>      possible to have full
>      > > > > > > > support for the API in the SPI controllers that currently
>      do not (please reread
>      > > > > > > > the proposal in my previous reply that attempts to do
>      this). I myself see this
>      > > > > > > > as win/win situation, also because no controller should
>      need modifications.
>      > > > > > > >
>      > > > > > >
>      > > > > > > I am afraid your proposal does not work. Your proposed new
>      device
>      > > > > > > property 'model_dummy_bytes' to select to convert the
>      accurate dummy
>      > > > > > > clock cycle count to dummy bytes inside m25p80, is hard to
>      justify as
>      > > > > > > a property to the flash itself, as the behavior is tightly
>      coupled to
>      > > > > > > how the SPI controller works.
>      > > > > >
>      > > > > > I agree on above. I decided though that instead of posting
>      sample code in here
>      > > > > > I'll post an RFC with hopefully an improved proposal. I'll cc
>      you. About below,
>      > > > > > Xilinx ZynqMP GQSPI should not need any modication in a first
>      step.
>      > > > > >
>      > > > >
>      > > > > Wait, (see below)
>      > > > >
>      > > > > > >
>      > > > > > > Please take a look at the Xilinx GQSPI controller, which
>      supports both
>      > > > > > > use cases, that the dummy cycles can be transferred via tx
>      fifo, or
>      > > > > > > generated by the controller automatically. Please read the
>      example
>      > > > > > > given in:
>      > > > > > >
>      > > > > > >     table 24‐22, an example of Generic FIFO Contents for
>      Quad I/O Read
>      > > > > > > Command (EBh)
>      > > > > > >
>      > > > > > > in
>      https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
>      > > > > > >
>      > > > > > > If you choose to set the m25p80 device property
>      'model_dummy_bytes' to
>      > > > > > > true when working with the Xilinx GQSPI controller, you are
>      bound to
>      > > > > > > only allow guest software to use tx fifo to transfer the
>      dummy cycles,
>      > > > > > > and this is wrong.
>      > > > > > >
>      > > > >
>      > > > > You missed this part. I looked at your RFC, and as I mentioned
>      above
>      > > > > your proposal cannot support the complicated controller like
>      Xilinx
>      > > > > GQSPI. Please read the example of table 24-22. With your RFC,
>      you
>      > > > > mandate guest software's GQSPI driver to only use hardware dummy
>      cycle
>      > > > > generation, which is wrong.
>      > > > >
>      > > >
>      > > > First, thank you very much for looking into the RFC series, very
>      much
>      > > > appreciated. Secondly, about above, the GQSPI model in QEMU
>      transfers from 2
>      > > > locations in the file, in 1 location the transfer referred to
>      above is done, in
>      > > > another location the transfer through the txfifo is done. The
>      location where
>      > > > transfer referred to above is done will not need any modifications
>      (and will
>      > > > thus work equally well as it does currently).
>      > >
>      > > Please explain this a little bit. How does your RFC series handle
>      > > cases as described in table 24-22, where the 6 dummy cycles are
>      split
>      > > into 2 transfers, with one transfer using tx fifo, and the other one
>      > > using hardware dummy cycle generation?
>      >
>      > Sorry, I missunderstod. You are right, that won't work.
> 
>      +Edgar E. Iglesias
> 
>      So it looks by far the only way to implement dummy cycles correctly to
>      work with all SPI controller models is what I proposed here in this
>      patch series.
> 
>      Maintainers are quite silent, so I would like to hear your thoughts.
> 
>      @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you
>      please share your thoughts since you are the one who reviewed the
>      existing dummy implementation (based on commits history)
> 
>    Francisco really knows this stuff better than me....
>    I would tend to agree that it's unfortunate to model things in cycles, if
>    we could abstract things at a higher level that would be nice. Without
>    breaking existing use-cases.
>    Francisco, is it impossible to bring up the abstraction level to bytes and
>    keep existing use-cases?

Great question, I'm leaning on that it shouldn't be impossible to be
honest (but I haven't been able to try anything yet though).

Best regards,
Francisco Iglesias


>    We have a bunch of test-cases, We'll publish some of them in source code,
>    others we can't publish since they use proprietary SW we're not allowed to
>    publish at all, but we can run tests and Ack if things work.
>    Best regards,
>    Edgar
Bin Meng April 23, 2021, 6:45 a.m. UTC | #19
On Mon, Feb 8, 2021 at 10:41 PM Bin Meng <bmeng.cn@gmail.com> wrote:
>
> On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias
> <frasse.iglesias@gmail.com> wrote:
> >
> > Hi Bin,
> >
> > On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
> > > Hi Francisco,
> > >
> > > On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
> > > <frasse.iglesias@gmail.com> wrote:
> > > >
> > > > Dear Bin,
> > > >
> > > > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> > > > > Hi Francisco,
> > > > >
> > > > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > >
> > > > > > Hi Bin,
> > > > > >
> > > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > > > > > Hi Francisco,
> > > > > > >
> > > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Hi Bin,
> > > > > > > >
> > > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > > > > > Hi Francisco,
> > > > > > > > >
> > > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Bin,
> > > > > > > > > >
> > > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > > > > > Hi Francisco,
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > >
> > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > > > > > > > > >
> > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > > > > > > > > > bytes are expected to be received after it receives a command. For
> > > > > > > > > > > > > example, depending on the address mode, either 3-byte address or
> > > > > > > > > > > > > 4-byte address is needed.
> > > > > > > > > > > > >
> > > > > > > > > > > > > For fast read family commands, some dummy cycles are required after
> > > > > > > > > > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > > > > > >
> > > > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > > > > > > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > > > > > > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > > > > > > > > > >
> > > > > > > > > > > > While not being the original implementor I must assume that above solution was
> > > > > > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > > > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > > > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > > > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > > > > > > > > > also because the detail is already in use for catching exactly above error.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I found no clue from the commit message that my proposed solution here
> > > > > > > > > > > was ever considered, otherwise all SPI controller models supporting
> > > > > > > > > > > software generation should have been found out seriously broken long
> > > > > > > > > > > time ago!
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The controllers you are referring to might lack support for commands requiring
> > > > > > > > > > dummy clock cycles but I really hope they work with the other commands? If so I
> > > > > > > > >
> > > > > > > > > I am not sure why you view dummy clock cycles as something special
> > > > > > > > > that needs some special support from the SPI controller. For the case
> > > > > > > > > 1 controller, it's nothing special from the controller perspective,
> > > > > > > > > just like sending out a command, or address bytes, or data. The
> > > > > > > > > controller just shifts data bit by bit from its tx fifo and that's it.
> > > > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > > > > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > > > > > > > automatically generated (case 2 controller) by the hardware.
> > > > > > > >
> > > > > > > > Ok, I'll try to explain my view point a little differently. For that we also
> > > > > > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW
> > > > > > > > board supported in QEMU should ideally run on that board inside QEMU aswell
> > > > > > > > (this can be a bare metal application equaly well as a modified u-boot/Linux
> > > > > > > > using SPI commands with a non multiple of 8 number of dummy clock cycles).
> > > > > > > >
> > > > > > > > Once functionality has been introduced into QEMU it is not easy to know which
> > > > > > > > intentional or untentional features provided by the functionality are being
> > > > > > > > used by users. One of the (perhaps not well known) features I'm aware of that
> > > > > > > > is in use and is provided by the accurate dummy clock cycle modeling inside
> > > > > > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock
> > > > > > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock
> > > > > > > > cycles), but there might be others aswell. So by removing this functionality
> > > > > > > > above use case will brake, this since those test will not be reliable.
> > > > > > > > Furthermore, since users tend to be creative it is not possible to know if
> > > > > > > > there are other use cases that will be affected. This means that in case [1]
> > > > > > > > needs to be followed the safe path is to add functionality instead of removing.
> > > > > > > > Luckily it also easier in this case, see below.
> > > > > > >
> > > > > > > I understand there might be users other than U-Boot/Linux that use an
> > > > > > > odd number of dummy bits (not multiple of 8). If your concern was
> > > > > > > about model behavior changes, sure I can update
> > > > > > > qemu/docs/system/deprecated.rst to mention that some flashes in the
> > > > > > > m25p80 model now implement dummy cycles as bytes.
> > > > > >
> > > > > > Yes, something like that. My concern is that since this functionality has been
> > > > > > in tree for while, users have found known or unknown features that got
> > > > > > introduced by it. By removing the functionality (and the known/uknown features)
> > > > > > we are riscing to brake our user's use cases (currently I'm aware of one
> > > > > > feature/use case but it is not unlikely that there are more). [1] states that
> > > > > > "In general features are intended to be supported indefinitely once introduced
> > > > > > into QEMU", to me that makes very much sense because the opposite would mean
> > > > > > that we were not reliable. So in case [1] needs to be honored it looks to be
> > > > > > safer to add functionality instead of removing (and riscing the removal of use
> > > > > > cases/features). Luckily I still believe in this case that it will be easier to
> > > > > > go forward (even if I also agree on what you are saying below about what I
> > > > > > proposed).
> > > > > >
> > > > >
> > > > > Even if the implementation is buggy and we need to keep the buggy
> > > > > implementation forever? I think that's why
> > > > > qemu/docs/system/deprecated.rst was created for deprecating such
> > > > > feature.
> > > >
> > > > With the RFC I posted all commands in m25p80 are working for both the case 1
> > > > controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI).
> > > > Because of this, I, with all respect, will have to disagree that this is buggy.
> > >
> > > Well, the existing m25p80 implementation that uses dummy cycle
> > > accuracy for those flashes prevents all SPI controllers that use tx
> > > fifo to work with those flashes. Hence it is buggy.
> > >
> > > >
> > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > > don't think it is fair to call them 'seriously broken' (and else we should
> > > > > > > > > > probably let the maintainers know about it). Most likely the lack of support
> > > > > > > > >
> > > > > > > > > I called it "seriously broken" because current implementation only
> > > > > > > > > considered one type of SPI controllers while completely ignoring the
> > > > > > > > > other type.
> > > > > > > >
> > > > > > > > If we change view and see this from the perspective of m25p80, it models the
> > > > > > > > commands a certain way and provides an API that the SPI controllers need to
> > > > > > > > implement for interacting with it. It is true that there are SPI controllers
> > > > > > > > referred to above that do not support the portion of that API that corresponds
> > > > > > > > to commands with dummy clock cycles, but I don't think it is true that this is
> > > > > > > > broken since there is also one SPI controller that has a working implementation
> > > > > > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But
> > > > > > > > as mentioned above, by doing a minor extension and improvement to m25p80's API
> > > > > > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> > > > > > > > will still be honored as in the same time making it possible to have full
> > > > > > > > support for the API in the SPI controllers that currently do not (please reread
> > > > > > > > the proposal in my previous reply that attempts to do this). I myself see this
> > > > > > > > as win/win situation, also because no controller should need modifications.
> > > > > > > >
> > > > > > >
> > > > > > > I am afraid your proposal does not work. Your proposed new device
> > > > > > > property 'model_dummy_bytes' to select to convert the accurate dummy
> > > > > > > clock cycle count to dummy bytes inside m25p80, is hard to justify as
> > > > > > > a property to the flash itself, as the behavior is tightly coupled to
> > > > > > > how the SPI controller works.
> > > > > >
> > > > > > I agree on above. I decided though that instead of posting sample code in here
> > > > > > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
> > > > > > Xilinx ZynqMP GQSPI should not need any modication in a first step.
> > > > > >
> > > > >
> > > > > Wait, (see below)
> > > > >
> > > > > > >
> > > > > > > Please take a look at the Xilinx GQSPI controller, which supports both
> > > > > > > use cases, that the dummy cycles can be transferred via tx fifo, or
> > > > > > > generated by the controller automatically. Please read the example
> > > > > > > given in:
> > > > > > >
> > > > > > >     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
> > > > > > > Command (EBh)
> > > > > > >
> > > > > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > > > > > >
> > > > > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to
> > > > > > > true when working with the Xilinx GQSPI controller, you are bound to
> > > > > > > only allow guest software to use tx fifo to transfer the dummy cycles,
> > > > > > > and this is wrong.
> > > > > > >
> > > > >
> > > > > You missed this part. I looked at your RFC, and as I mentioned above
> > > > > your proposal cannot support the complicated controller like Xilinx
> > > > > GQSPI. Please read the example of table 24-22. With your RFC, you
> > > > > mandate guest software's GQSPI driver to only use hardware dummy cycle
> > > > > generation, which is wrong.
> > > > >
> > > >
> > > > First, thank you very much for looking into the RFC series, very much
> > > > appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2
> > > > locations in the file, in 1 location the transfer referred to above is done, in
> > > > another location the transfer through the txfifo is done. The location where
> > > > transfer referred to above is done will not need any modifications (and will
> > > > thus work equally well as it does currently).
> > >
> > > Please explain this a little bit. How does your RFC series handle
> > > cases as described in table 24-22, where the 6 dummy cycles are split
> > > into 2 transfers, with one transfer using tx fifo, and the other one
> > > using hardware dummy cycle generation?
> >
> > Sorry, I missunderstod. You are right, that won't work.
>
> +Edgar E. Iglesias
>
> So it looks by far the only way to implement dummy cycles correctly to
> work with all SPI controller models is what I proposed here in this
> patch series.
>
> Maintainers are quite silent, so I would like to hear your thoughts.
>
> @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you
> please share your thoughts since you are the one who reviewed the
> existing dummy implementation (based on commits history)

Hello maintainers,

We apparently missed the 6.0 window to address this mess of the m25p80
model. Please provide your inputs on this before I start working on
the v2.

Regards,
Bin
Alistair Francis April 27, 2021, 5:56 a.m. UTC | #20
On Fri, Apr 23, 2021 at 4:46 PM Bin Meng <bmeng.cn@gmail.com> wrote:
>
> On Mon, Feb 8, 2021 at 10:41 PM Bin Meng <bmeng.cn@gmail.com> wrote:
> >
> > On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias
> > <frasse.iglesias@gmail.com> wrote:
> > >
> > > Hi Bin,
> > >
> > > On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
> > > > Hi Francisco,
> > > >
> > > > On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
> > > > <frasse.iglesias@gmail.com> wrote:
> > > > >
> > > > > Dear Bin,
> > > > >
> > > > > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> > > > > > Hi Francisco,
> > > > > >
> > > > > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi Bin,
> > > > > > >
> > > > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > > > > > > Hi Francisco,
> > > > > > > >
> > > > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Bin,
> > > > > > > > >
> > > > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > > > > > > Hi Francisco,
> > > > > > > > > >
> > > > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi Bin,
> > > > > > > > > > >
> > > > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > > > > > > Hi Francisco,
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > >
> > > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > > > > > > > > > > bytes are expected to be received after it receives a command. For
> > > > > > > > > > > > > > example, depending on the address mode, either 3-byte address or
> > > > > > > > > > > > > > 4-byte address is needed.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > For fast read family commands, some dummy cycles are required after
> > > > > > > > > > > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > > > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > > > > > > > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > > > > > > > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > > > > > > > > > > >
> > > > > > > > > > > > > While not being the original implementor I must assume that above solution was
> > > > > > > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > > > > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > > > > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > > > > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > > > > > > > > > > also because the detail is already in use for catching exactly above error.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I found no clue from the commit message that my proposed solution here
> > > > > > > > > > > > was ever considered, otherwise all SPI controller models supporting
> > > > > > > > > > > > software generation should have been found out seriously broken long
> > > > > > > > > > > > time ago!
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > The controllers you are referring to might lack support for commands requiring
> > > > > > > > > > > dummy clock cycles but I really hope they work with the other commands? If so I
> > > > > > > > > >
> > > > > > > > > > I am not sure why you view dummy clock cycles as something special
> > > > > > > > > > that needs some special support from the SPI controller. For the case
> > > > > > > > > > 1 controller, it's nothing special from the controller perspective,
> > > > > > > > > > just like sending out a command, or address bytes, or data. The
> > > > > > > > > > controller just shifts data bit by bit from its tx fifo and that's it.
> > > > > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > > > > > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > > > > > > > > automatically generated (case 2 controller) by the hardware.
> > > > > > > > >
> > > > > > > > > Ok, I'll try to explain my view point a little differently. For that we also
> > > > > > > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW
> > > > > > > > > board supported in QEMU should ideally run on that board inside QEMU aswell
> > > > > > > > > (this can be a bare metal application equaly well as a modified u-boot/Linux
> > > > > > > > > using SPI commands with a non multiple of 8 number of dummy clock cycles).
> > > > > > > > >
> > > > > > > > > Once functionality has been introduced into QEMU it is not easy to know which
> > > > > > > > > intentional or untentional features provided by the functionality are being
> > > > > > > > > used by users. One of the (perhaps not well known) features I'm aware of that
> > > > > > > > > is in use and is provided by the accurate dummy clock cycle modeling inside
> > > > > > > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock
> > > > > > > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock
> > > > > > > > > cycles), but there might be others aswell. So by removing this functionality
> > > > > > > > > above use case will brake, this since those test will not be reliable.
> > > > > > > > > Furthermore, since users tend to be creative it is not possible to know if
> > > > > > > > > there are other use cases that will be affected. This means that in case [1]
> > > > > > > > > needs to be followed the safe path is to add functionality instead of removing.
> > > > > > > > > Luckily it also easier in this case, see below.
> > > > > > > >
> > > > > > > > I understand there might be users other than U-Boot/Linux that use an
> > > > > > > > odd number of dummy bits (not multiple of 8). If your concern was
> > > > > > > > about model behavior changes, sure I can update
> > > > > > > > qemu/docs/system/deprecated.rst to mention that some flashes in the
> > > > > > > > m25p80 model now implement dummy cycles as bytes.
> > > > > > >
> > > > > > > Yes, something like that. My concern is that since this functionality has been
> > > > > > > in tree for while, users have found known or unknown features that got
> > > > > > > introduced by it. By removing the functionality (and the known/uknown features)
> > > > > > > we are riscing to brake our user's use cases (currently I'm aware of one
> > > > > > > feature/use case but it is not unlikely that there are more). [1] states that
> > > > > > > "In general features are intended to be supported indefinitely once introduced
> > > > > > > into QEMU", to me that makes very much sense because the opposite would mean
> > > > > > > that we were not reliable. So in case [1] needs to be honored it looks to be
> > > > > > > safer to add functionality instead of removing (and riscing the removal of use
> > > > > > > cases/features). Luckily I still believe in this case that it will be easier to
> > > > > > > go forward (even if I also agree on what you are saying below about what I
> > > > > > > proposed).
> > > > > > >
> > > > > >
> > > > > > Even if the implementation is buggy and we need to keep the buggy
> > > > > > implementation forever? I think that's why
> > > > > > qemu/docs/system/deprecated.rst was created for deprecating such
> > > > > > feature.
> > > > >
> > > > > With the RFC I posted all commands in m25p80 are working for both the case 1
> > > > > controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI).
> > > > > Because of this, I, with all respect, will have to disagree that this is buggy.
> > > >
> > > > Well, the existing m25p80 implementation that uses dummy cycle
> > > > accuracy for those flashes prevents all SPI controllers that use tx
> > > > fifo to work with those flashes. Hence it is buggy.
> > > >
> > > > >
> > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > don't think it is fair to call them 'seriously broken' (and else we should
> > > > > > > > > > > probably let the maintainers know about it). Most likely the lack of support
> > > > > > > > > >
> > > > > > > > > > I called it "seriously broken" because current implementation only
> > > > > > > > > > considered one type of SPI controllers while completely ignoring the
> > > > > > > > > > other type.
> > > > > > > > >
> > > > > > > > > If we change view and see this from the perspective of m25p80, it models the
> > > > > > > > > commands a certain way and provides an API that the SPI controllers need to
> > > > > > > > > implement for interacting with it. It is true that there are SPI controllers
> > > > > > > > > referred to above that do not support the portion of that API that corresponds
> > > > > > > > > to commands with dummy clock cycles, but I don't think it is true that this is
> > > > > > > > > broken since there is also one SPI controller that has a working implementation
> > > > > > > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But
> > > > > > > > > as mentioned above, by doing a minor extension and improvement to m25p80's API
> > > > > > > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> > > > > > > > > will still be honored as in the same time making it possible to have full
> > > > > > > > > support for the API in the SPI controllers that currently do not (please reread
> > > > > > > > > the proposal in my previous reply that attempts to do this). I myself see this
> > > > > > > > > as win/win situation, also because no controller should need modifications.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I am afraid your proposal does not work. Your proposed new device
> > > > > > > > property 'model_dummy_bytes' to select to convert the accurate dummy
> > > > > > > > clock cycle count to dummy bytes inside m25p80, is hard to justify as
> > > > > > > > a property to the flash itself, as the behavior is tightly coupled to
> > > > > > > > how the SPI controller works.
> > > > > > >
> > > > > > > I agree on above. I decided though that instead of posting sample code in here
> > > > > > > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
> > > > > > > Xilinx ZynqMP GQSPI should not need any modication in a first step.
> > > > > > >
> > > > > >
> > > > > > Wait, (see below)
> > > > > >
> > > > > > > >
> > > > > > > > Please take a look at the Xilinx GQSPI controller, which supports both
> > > > > > > > use cases, that the dummy cycles can be transferred via tx fifo, or
> > > > > > > > generated by the controller automatically. Please read the example
> > > > > > > > given in:
> > > > > > > >
> > > > > > > >     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
> > > > > > > > Command (EBh)
> > > > > > > >
> > > > > > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > > > > > > >
> > > > > > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to
> > > > > > > > true when working with the Xilinx GQSPI controller, you are bound to
> > > > > > > > only allow guest software to use tx fifo to transfer the dummy cycles,
> > > > > > > > and this is wrong.
> > > > > > > >
> > > > > >
> > > > > > You missed this part. I looked at your RFC, and as I mentioned above
> > > > > > your proposal cannot support the complicated controller like Xilinx
> > > > > > GQSPI. Please read the example of table 24-22. With your RFC, you
> > > > > > mandate guest software's GQSPI driver to only use hardware dummy cycle
> > > > > > generation, which is wrong.
> > > > > >
> > > > >
> > > > > First, thank you very much for looking into the RFC series, very much
> > > > > appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2
> > > > > locations in the file, in 1 location the transfer referred to above is done, in
> > > > > another location the transfer through the txfifo is done. The location where
> > > > > transfer referred to above is done will not need any modifications (and will
> > > > > thus work equally well as it does currently).
> > > >
> > > > Please explain this a little bit. How does your RFC series handle
> > > > cases as described in table 24-22, where the 6 dummy cycles are split
> > > > into 2 transfers, with one transfer using tx fifo, and the other one
> > > > using hardware dummy cycle generation?
> > >
> > > Sorry, I missunderstod. You are right, that won't work.
> >
> > +Edgar E. Iglesias
> >
> > So it looks by far the only way to implement dummy cycles correctly to
> > work with all SPI controller models is what I proposed here in this
> > patch series.
> >
> > Maintainers are quite silent, so I would like to hear your thoughts.
> >
> > @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you
> > please share your thoughts since you are the one who reviewed the
> > existing dummy implementation (based on commits history)

I agree with Edgar, in that Francisco and Bin know this better than me
and that modelling things in cycles is a pain.

As Bin points out it seems like currently we should be modelling bytes
(from the variable name) so it makes sense to keep it in bytes. I
would be in favour of this series in that case. Do we know what use
cases this will break? I know it's hard to answer but I don't think
there are too many SSI users in QEMU so it might not be too hard to
test most of the possible use cases.

Alistair

>
> Hello maintainers,
>
> We apparently missed the 6.0 window to address this mess of the m25p80
> model. Please provide your inputs on this before I start working on
> the v2.
>
> Regards,
> Bin
>
Francisco Iglesias April 27, 2021, 8:54 a.m. UTC | #21
On [2021 Apr 27] Tue 15:56:10, Alistair Francis wrote:
> On Fri, Apr 23, 2021 at 4:46 PM Bin Meng <bmeng.cn@gmail.com> wrote:
> >
> > On Mon, Feb 8, 2021 at 10:41 PM Bin Meng <bmeng.cn@gmail.com> wrote:
> > >
> > > On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias
> > > <frasse.iglesias@gmail.com> wrote:
> > > >
> > > > Hi Bin,
> > > >
> > > > On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
> > > > > Hi Francisco,
> > > > >
> > > > > On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
> > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > >
> > > > > > Dear Bin,
> > > > > >
> > > > > > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> > > > > > > Hi Francisco,
> > > > > > >
> > > > > > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Hi Bin,
> > > > > > > >
> > > > > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > > > > > > > Hi Francisco,
> > > > > > > > >
> > > > > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Bin,
> > > > > > > > > >
> > > > > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > > > > > > > Hi Francisco,
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > >
> > > > > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > > > > > > > Hi Francisco,
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > > > > > > > > > > > bytes are expected to be received after it receives a command. For
> > > > > > > > > > > > > > > example, depending on the address mode, either 3-byte address or
> > > > > > > > > > > > > > > 4-byte address is needed.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > For fast read family commands, some dummy cycles are required after
> > > > > > > > > > > > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > > > > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > > > > > > > > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > > > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > > > > > > > > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > While not being the original implementor I must assume that above solution was
> > > > > > > > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > > > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > > > > > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > > > > > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > > > > > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > > > > > > > > > > > also because the detail is already in use for catching exactly above error.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I found no clue from the commit message that my proposed solution here
> > > > > > > > > > > > > was ever considered, otherwise all SPI controller models supporting
> > > > > > > > > > > > > software generation should have been found out seriously broken long
> > > > > > > > > > > > > time ago!
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > The controllers you are referring to might lack support for commands requiring
> > > > > > > > > > > > dummy clock cycles but I really hope they work with the other commands? If so I
> > > > > > > > > > >
> > > > > > > > > > > I am not sure why you view dummy clock cycles as something special
> > > > > > > > > > > that needs some special support from the SPI controller. For the case
> > > > > > > > > > > 1 controller, it's nothing special from the controller perspective,
> > > > > > > > > > > just like sending out a command, or address bytes, or data. The
> > > > > > > > > > > controller just shifts data bit by bit from its tx fifo and that's it.
> > > > > > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > > > > > > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > > > > > > > > > automatically generated (case 2 controller) by the hardware.
> > > > > > > > > >
> > > > > > > > > > Ok, I'll try to explain my view point a little differently. For that we also
> > > > > > > > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW
> > > > > > > > > > board supported in QEMU should ideally run on that board inside QEMU aswell
> > > > > > > > > > (this can be a bare metal application equaly well as a modified u-boot/Linux
> > > > > > > > > > using SPI commands with a non multiple of 8 number of dummy clock cycles).
> > > > > > > > > >
> > > > > > > > > > Once functionality has been introduced into QEMU it is not easy to know which
> > > > > > > > > > intentional or untentional features provided by the functionality are being
> > > > > > > > > > used by users. One of the (perhaps not well known) features I'm aware of that
> > > > > > > > > > is in use and is provided by the accurate dummy clock cycle modeling inside
> > > > > > > > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock
> > > > > > > > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock
> > > > > > > > > > cycles), but there might be others aswell. So by removing this functionality
> > > > > > > > > > above use case will brake, this since those test will not be reliable.
> > > > > > > > > > Furthermore, since users tend to be creative it is not possible to know if
> > > > > > > > > > there are other use cases that will be affected. This means that in case [1]
> > > > > > > > > > needs to be followed the safe path is to add functionality instead of removing.
> > > > > > > > > > Luckily it also easier in this case, see below.
> > > > > > > > >
> > > > > > > > > I understand there might be users other than U-Boot/Linux that use an
> > > > > > > > > odd number of dummy bits (not multiple of 8). If your concern was
> > > > > > > > > about model behavior changes, sure I can update
> > > > > > > > > qemu/docs/system/deprecated.rst to mention that some flashes in the
> > > > > > > > > m25p80 model now implement dummy cycles as bytes.
> > > > > > > >
> > > > > > > > Yes, something like that. My concern is that since this functionality has been
> > > > > > > > in tree for while, users have found known or unknown features that got
> > > > > > > > introduced by it. By removing the functionality (and the known/uknown features)
> > > > > > > > we are riscing to brake our user's use cases (currently I'm aware of one
> > > > > > > > feature/use case but it is not unlikely that there are more). [1] states that
> > > > > > > > "In general features are intended to be supported indefinitely once introduced
> > > > > > > > into QEMU", to me that makes very much sense because the opposite would mean
> > > > > > > > that we were not reliable. So in case [1] needs to be honored it looks to be
> > > > > > > > safer to add functionality instead of removing (and riscing the removal of use
> > > > > > > > cases/features). Luckily I still believe in this case that it will be easier to
> > > > > > > > go forward (even if I also agree on what you are saying below about what I
> > > > > > > > proposed).
> > > > > > > >
> > > > > > >
> > > > > > > Even if the implementation is buggy and we need to keep the buggy
> > > > > > > implementation forever? I think that's why
> > > > > > > qemu/docs/system/deprecated.rst was created for deprecating such
> > > > > > > feature.
> > > > > >
> > > > > > With the RFC I posted all commands in m25p80 are working for both the case 1
> > > > > > controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI).
> > > > > > Because of this, I, with all respect, will have to disagree that this is buggy.
> > > > >
> > > > > Well, the existing m25p80 implementation that uses dummy cycle
> > > > > accuracy for those flashes prevents all SPI controllers that use tx
> > > > > fifo to work with those flashes. Hence it is buggy.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > don't think it is fair to call them 'seriously broken' (and else we should
> > > > > > > > > > > > probably let the maintainers know about it). Most likely the lack of support
> > > > > > > > > > >
> > > > > > > > > > > I called it "seriously broken" because current implementation only
> > > > > > > > > > > considered one type of SPI controllers while completely ignoring the
> > > > > > > > > > > other type.
> > > > > > > > > >
> > > > > > > > > > If we change view and see this from the perspective of m25p80, it models the
> > > > > > > > > > commands a certain way and provides an API that the SPI controllers need to
> > > > > > > > > > implement for interacting with it. It is true that there are SPI controllers
> > > > > > > > > > referred to above that do not support the portion of that API that corresponds
> > > > > > > > > > to commands with dummy clock cycles, but I don't think it is true that this is
> > > > > > > > > > broken since there is also one SPI controller that has a working implementation
> > > > > > > > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But
> > > > > > > > > > as mentioned above, by doing a minor extension and improvement to m25p80's API
> > > > > > > > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> > > > > > > > > > will still be honored as in the same time making it possible to have full
> > > > > > > > > > support for the API in the SPI controllers that currently do not (please reread
> > > > > > > > > > the proposal in my previous reply that attempts to do this). I myself see this
> > > > > > > > > > as win/win situation, also because no controller should need modifications.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I am afraid your proposal does not work. Your proposed new device
> > > > > > > > > property 'model_dummy_bytes' to select to convert the accurate dummy
> > > > > > > > > clock cycle count to dummy bytes inside m25p80, is hard to justify as
> > > > > > > > > a property to the flash itself, as the behavior is tightly coupled to
> > > > > > > > > how the SPI controller works.
> > > > > > > >
> > > > > > > > I agree on above. I decided though that instead of posting sample code in here
> > > > > > > > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
> > > > > > > > Xilinx ZynqMP GQSPI should not need any modication in a first step.
> > > > > > > >
> > > > > > >
> > > > > > > Wait, (see below)
> > > > > > >
> > > > > > > > >
> > > > > > > > > Please take a look at the Xilinx GQSPI controller, which supports both
> > > > > > > > > use cases, that the dummy cycles can be transferred via tx fifo, or
> > > > > > > > > generated by the controller automatically. Please read the example
> > > > > > > > > given in:
> > > > > > > > >
> > > > > > > > >     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
> > > > > > > > > Command (EBh)
> > > > > > > > >
> > > > > > > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > > > > > > > >
> > > > > > > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to
> > > > > > > > > true when working with the Xilinx GQSPI controller, you are bound to
> > > > > > > > > only allow guest software to use tx fifo to transfer the dummy cycles,
> > > > > > > > > and this is wrong.
> > > > > > > > >
> > > > > > >
> > > > > > > You missed this part. I looked at your RFC, and as I mentioned above
> > > > > > > your proposal cannot support the complicated controller like Xilinx
> > > > > > > GQSPI. Please read the example of table 24-22. With your RFC, you
> > > > > > > mandate guest software's GQSPI driver to only use hardware dummy cycle
> > > > > > > generation, which is wrong.
> > > > > > >
> > > > > >
> > > > > > First, thank you very much for looking into the RFC series, very much
> > > > > > appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2
> > > > > > locations in the file, in 1 location the transfer referred to above is done, in
> > > > > > another location the transfer through the txfifo is done. The location where
> > > > > > transfer referred to above is done will not need any modifications (and will
> > > > > > thus work equally well as it does currently).
> > > > >
> > > > > Please explain this a little bit. How does your RFC series handle
> > > > > cases as described in table 24-22, where the 6 dummy cycles are split
> > > > > into 2 transfers, with one transfer using tx fifo, and the other one
> > > > > using hardware dummy cycle generation?
> > > >
> > > > Sorry, I missunderstod. You are right, that won't work.
> > >
> > > +Edgar E. Iglesias
> > >
> > > So it looks by far the only way to implement dummy cycles correctly to
> > > work with all SPI controller models is what I proposed here in this
> > > patch series.
> > >
> > > Maintainers are quite silent, so I would like to hear your thoughts.
> > >
> > > @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you
> > > please share your thoughts since you are the one who reviewed the
> > > existing dummy implementation (based on commits history)
> 
> I agree with Edgar, in that Francisco and Bin know this better than me
> and that modelling things in cycles is a pain.

Hi Alistair,

> 
> As Bin points out it seems like currently we should be modelling bytes
> (from the variable name) so it makes sense to keep it in bytes. I
> would be in favour of this series in that case. Do we know what use
> cases this will break? I know it's hard to answer but I don't think
> there are too many SSI users in QEMU so it might not be too hard to
> test most of the possible use cases.

The use case I'm aware of is regression testing of drivers. Ex: if a
driver is using 10 dummy clock cycles with the commands and a patch
accidentaly changes the driver to use 11 dummy clock cycles QEMU currently
finds the problem, that won't be possible with this series. It's difficult
to say but it is not impossible there are other use cases also.

More importantly IMO though is that the current use cases can be keept
while still providing support for commands with dummy clock cycles into
the QEMU SPI controllers lacking at the moment.

(If I recall correctly this series might also have another issue regarding
the GQSPI SPI mode configuration, with that it is possible transmit 8
dummy clock cycles as 1 data byte, 2 data bytes or 4 data bytes, so I
think some form of calculation might be needed inside m25p80).

Best regards,
Francisco


> 
> Alistair
> 
> >
> > Hello maintainers,
> >
> > We apparently missed the 6.0 window to address this mess of the m25p80
> > model. Please provide your inputs on this before I start working on
> > the v2.
> >
> > Regards,
> > Bin
> >
Cédric Le Goater April 27, 2021, 2:32 p.m. UTC | #22
Hello,

On 4/27/21 10:54 AM, Francisco Iglesias wrote:
> On [2021 Apr 27] Tue 15:56:10, Alistair Francis wrote:
>> On Fri, Apr 23, 2021 at 4:46 PM Bin Meng <bmeng.cn@gmail.com> wrote:
>>>
>>> On Mon, Feb 8, 2021 at 10:41 PM Bin Meng <bmeng.cn@gmail.com> wrote:
>>>>
>>>> On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias
>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>
>>>>> Hi Bin,
>>>>>
>>>>> On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
>>>>>> Hi Francisco,
>>>>>>
>>>>>> On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>
>>>>>>> Dear Bin,
>>>>>>>
>>>>>>> On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
>>>>>>>> Hi Francisco,
>>>>>>>>
>>>>>>>> On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi Bin,
>>>>>>>>>
>>>>>>>>> On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
>>>>>>>>>> Hi Francisco,
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>
>>>>>>>>>>> On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
>>>>>>>>>>>> Hi Francisco,
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
>>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
>>>>>>>>>>>>>> Hi Francisco,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
>>>>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
>>>>>>>>>>>>>>>> From: Bin Meng <bin.meng@windriver.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The m25p80 model uses s->needed_bytes to indicate how many follow-up
>>>>>>>>>>>>>>>> bytes are expected to be received after it receives a command. For
>>>>>>>>>>>>>>>> example, depending on the address mode, either 3-byte address or
>>>>>>>>>>>>>>>> 4-byte address is needed.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For fast read family commands, some dummy cycles are required after
>>>>>>>>>>>>>>>> sending the address bytes, and the dummy cycles need to be counted
>>>>>>>>>>>>>>>> in s->needed_bytes. This is where the mess began.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> As the variable name (needed_bytes) indicates, the unit is in byte.
>>>>>>>>>>>>>>>> It is not in bit, or cycle. However for some reason the model has
>>>>>>>>>>>>>>>> been using the number of dummy cycles for s->needed_bytes. The right
>>>>>>>>>>>>>>>> approach is to convert the number of dummy cycles to bytes based on
>>>>>>>>>>>>>>>> the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
>>>>>>>>>>>>>>>> I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> While not being the original implementor I must assume that above solution was
>>>>>>>>>>>>>>> considered but not chosen by the developers due to it is inaccuracy (it
>>>>>>>>>>>>>>> wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
>>>>>>>>>>>>>>> meaning that if the controller is wrongly programmed to generate 7 the error
>>>>>>>>>>>>>>> wouldn't be caught and the controller will still be considered "correct"). Now
>>>>>>>>>>>>>>> that we have this detail in the implementation I'm in favor of keeping it, this
>>>>>>>>>>>>>>> also because the detail is already in use for catching exactly above error.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I found no clue from the commit message that my proposed solution here
>>>>>>>>>>>>>> was ever considered, otherwise all SPI controller models supporting
>>>>>>>>>>>>>> software generation should have been found out seriously broken long
>>>>>>>>>>>>>> time ago!
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> The controllers you are referring to might lack support for commands requiring
>>>>>>>>>>>>> dummy clock cycles but I really hope they work with the other commands? If so I
>>>>>>>>>>>>
>>>>>>>>>>>> I am not sure why you view dummy clock cycles as something special
>>>>>>>>>>>> that needs some special support from the SPI controller. For the case
>>>>>>>>>>>> 1 controller, it's nothing special from the controller perspective,
>>>>>>>>>>>> just like sending out a command, or address bytes, or data. The
>>>>>>>>>>>> controller just shifts data bit by bit from its tx fifo and that's it.
>>>>>>>>>>>> In the Xilinx GQSPI controller case, the dummy cycles can either be
>>>>>>>>>>>> sent via a regular data (the case 1 controller) in the tx fifo, or
>>>>>>>>>>>> automatically generated (case 2 controller) by the hardware.
>>>>>>>>>>>
>>>>>>>>>>> Ok, I'll try to explain my view point a little differently. For that we also
>>>>>>>>>>> need to keep in mind that QEMU models HW, and any binary that runs on a HW
>>>>>>>>>>> board supported in QEMU should ideally run on that board inside QEMU aswell
>>>>>>>>>>> (this can be a bare metal application equaly well as a modified u-boot/Linux
>>>>>>>>>>> using SPI commands with a non multiple of 8 number of dummy clock cycles).
>>>>>>>>>>>
>>>>>>>>>>> Once functionality has been introduced into QEMU it is not easy to know which
>>>>>>>>>>> intentional or untentional features provided by the functionality are being
>>>>>>>>>>> used by users. One of the (perhaps not well known) features I'm aware of that
>>>>>>>>>>> is in use and is provided by the accurate dummy clock cycle modeling inside
>>>>>>>>>>> m25p80 is the be ability to test drivers accurately regarding the dummy clock
>>>>>>>>>>> cycles (even when using commands with a non-multiple of 8 number of dummy clock
>>>>>>>>>>> cycles), but there might be others aswell. So by removing this functionality
>>>>>>>>>>> above use case will brake, this since those test will not be reliable.
>>>>>>>>>>> Furthermore, since users tend to be creative it is not possible to know if
>>>>>>>>>>> there are other use cases that will be affected. This means that in case [1]
>>>>>>>>>>> needs to be followed the safe path is to add functionality instead of removing.
>>>>>>>>>>> Luckily it also easier in this case, see below.
>>>>>>>>>>
>>>>>>>>>> I understand there might be users other than U-Boot/Linux that use an
>>>>>>>>>> odd number of dummy bits (not multiple of 8). If your concern was
>>>>>>>>>> about model behavior changes, sure I can update
>>>>>>>>>> qemu/docs/system/deprecated.rst to mention that some flashes in the
>>>>>>>>>> m25p80 model now implement dummy cycles as bytes.
>>>>>>>>>
>>>>>>>>> Yes, something like that. My concern is that since this functionality has been
>>>>>>>>> in tree for while, users have found known or unknown features that got
>>>>>>>>> introduced by it. By removing the functionality (and the known/uknown features)
>>>>>>>>> we are riscing to brake our user's use cases (currently I'm aware of one
>>>>>>>>> feature/use case but it is not unlikely that there are more). [1] states that
>>>>>>>>> "In general features are intended to be supported indefinitely once introduced
>>>>>>>>> into QEMU", to me that makes very much sense because the opposite would mean
>>>>>>>>> that we were not reliable. So in case [1] needs to be honored it looks to be
>>>>>>>>> safer to add functionality instead of removing (and riscing the removal of use
>>>>>>>>> cases/features). Luckily I still believe in this case that it will be easier to
>>>>>>>>> go forward (even if I also agree on what you are saying below about what I
>>>>>>>>> proposed).
>>>>>>>>>
>>>>>>>>
>>>>>>>> Even if the implementation is buggy and we need to keep the buggy
>>>>>>>> implementation forever? I think that's why
>>>>>>>> qemu/docs/system/deprecated.rst was created for deprecating such
>>>>>>>> feature.
>>>>>>>
>>>>>>> With the RFC I posted all commands in m25p80 are working for both the case 1
>>>>>>> controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI).
>>>>>>> Because of this, I, with all respect, will have to disagree that this is buggy.
>>>>>>
>>>>>> Well, the existing m25p80 implementation that uses dummy cycle
>>>>>> accuracy for those flashes prevents all SPI controllers that use tx
>>>>>> fifo to work with those flashes. Hence it is buggy.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> don't think it is fair to call them 'seriously broken' (and else we should
>>>>>>>>>>>>> probably let the maintainers know about it). Most likely the lack of support
>>>>>>>>>>>>
>>>>>>>>>>>> I called it "seriously broken" because current implementation only
>>>>>>>>>>>> considered one type of SPI controllers while completely ignoring the
>>>>>>>>>>>> other type.
>>>>>>>>>>>
>>>>>>>>>>> If we change view and see this from the perspective of m25p80, it models the
>>>>>>>>>>> commands a certain way and provides an API that the SPI controllers need to
>>>>>>>>>>> implement for interacting with it. It is true that there are SPI controllers
>>>>>>>>>>> referred to above that do not support the portion of that API that corresponds
>>>>>>>>>>> to commands with dummy clock cycles, but I don't think it is true that this is
>>>>>>>>>>> broken since there is also one SPI controller that has a working implementation
>>>>>>>>>>> of m25p80's full API also when transfering through a tx fifo (use case 1). But
>>>>>>>>>>> as mentioned above, by doing a minor extension and improvement to m25p80's API
>>>>>>>>>>> and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
>>>>>>>>>>> will still be honored as in the same time making it possible to have full
>>>>>>>>>>> support for the API in the SPI controllers that currently do not (please reread
>>>>>>>>>>> the proposal in my previous reply that attempts to do this). I myself see this
>>>>>>>>>>> as win/win situation, also because no controller should need modifications.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I am afraid your proposal does not work. Your proposed new device
>>>>>>>>>> property 'model_dummy_bytes' to select to convert the accurate dummy
>>>>>>>>>> clock cycle count to dummy bytes inside m25p80, is hard to justify as
>>>>>>>>>> a property to the flash itself, as the behavior is tightly coupled to
>>>>>>>>>> how the SPI controller works.
>>>>>>>>>
>>>>>>>>> I agree on above. I decided though that instead of posting sample code in here
>>>>>>>>> I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
>>>>>>>>> Xilinx ZynqMP GQSPI should not need any modication in a first step.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Wait, (see below)
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Please take a look at the Xilinx GQSPI controller, which supports both
>>>>>>>>>> use cases, that the dummy cycles can be transferred via tx fifo, or
>>>>>>>>>> generated by the controller automatically. Please read the example
>>>>>>>>>> given in:
>>>>>>>>>>
>>>>>>>>>>     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
>>>>>>>>>> Command (EBh)
>>>>>>>>>>
>>>>>>>>>> in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
>>>>>>>>>>
>>>>>>>>>> If you choose to set the m25p80 device property 'model_dummy_bytes' to
>>>>>>>>>> true when working with the Xilinx GQSPI controller, you are bound to
>>>>>>>>>> only allow guest software to use tx fifo to transfer the dummy cycles,
>>>>>>>>>> and this is wrong.
>>>>>>>>>>
>>>>>>>>
>>>>>>>> You missed this part. I looked at your RFC, and as I mentioned above
>>>>>>>> your proposal cannot support the complicated controller like Xilinx
>>>>>>>> GQSPI. Please read the example of table 24-22. With your RFC, you
>>>>>>>> mandate guest software's GQSPI driver to only use hardware dummy cycle
>>>>>>>> generation, which is wrong.
>>>>>>>>
>>>>>>>
>>>>>>> First, thank you very much for looking into the RFC series, very much
>>>>>>> appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2
>>>>>>> locations in the file, in 1 location the transfer referred to above is done, in
>>>>>>> another location the transfer through the txfifo is done. The location where
>>>>>>> transfer referred to above is done will not need any modifications (and will
>>>>>>> thus work equally well as it does currently).
>>>>>>
>>>>>> Please explain this a little bit. How does your RFC series handle
>>>>>> cases as described in table 24-22, where the 6 dummy cycles are split
>>>>>> into 2 transfers, with one transfer using tx fifo, and the other one
>>>>>> using hardware dummy cycle generation?
>>>>>
>>>>> Sorry, I missunderstod. You are right, that won't work.
>>>>
>>>> +Edgar E. Iglesias
>>>>
>>>> So it looks by far the only way to implement dummy cycles correctly to
>>>> work with all SPI controller models is what I proposed here in this
>>>> patch series.
>>>>
>>>> Maintainers are quite silent, so I would like to hear your thoughts.
>>>>
>>>> @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you
>>>> please share your thoughts since you are the one who reviewed the
>>>> existing dummy implementation (based on commits history)
>>
>> I agree with Edgar, in that Francisco and Bin know this better than me
>> and that modelling things in cycles is a pain.
> 
> Hi Alistair,
> 
>>
>> As Bin points out it seems like currently we should be modelling bytes
>> (from the variable name) so it makes sense to keep it in bytes. I
>> would be in favour of this series in that case. Do we know what use
>> cases this will break? I know it's hard to answer but I don't think
>> there are too many SSI users in QEMU so it might not be too hard to
>> test most of the possible use cases.
> 
> The use case I'm aware of is regression testing of drivers. Ex: if a
> driver is using 10 dummy clock cycles with the commands and a patch
> accidentaly changes the driver to use 11 dummy clock cycles QEMU currently
> finds the problem, that won't be possible with this series. It's difficult
> to say but it is not impossible there are other use cases also.


It was breaking the Aspeed machines :

  https://lore.kernel.org/qemu-devel/78a12882-1303-dd6d-6619-96c5e2cbf531@kaod.org/

QEMU 6.1 should have acceptance tests that will help in detecting
regressions in this area.

Thanks,

C.
 



> 
> More importantly IMO though is that the current use cases can be keept
> while still providing support for commands with dummy clock cycles into
> the QEMU SPI controllers lacking at the moment.
> 
> (If I recall correctly this series might also have another issue regarding
> the GQSPI SPI mode configuration, with that it is possible transmit 8
> dummy clock cycles as 1 data byte, 2 data bytes or 4 data bytes, so I
> think some form of calculation might be needed inside m25p80).
> 
> Best regards,
> Francisco
> 
> 
>>
>> Alistair
>>
>>>
>>> Hello maintainers,
>>>
>>> We apparently missed the 6.0 window to address this mess of the m25p80
>>> model. Please provide your inputs on this before I start working on
>>> the v2.
>>>
>>> Regards,
>>> Bin
>>>
Bin Meng April 28, 2021, 1:12 p.m. UTC | #23
Hi Cédric,

On Tue, Apr 27, 2021 at 10:32 PM Cédric Le Goater <clg@kaod.org> wrote:
>
> Hello,
>
> On 4/27/21 10:54 AM, Francisco Iglesias wrote:
> > On [2021 Apr 27] Tue 15:56:10, Alistair Francis wrote:
> >> On Fri, Apr 23, 2021 at 4:46 PM Bin Meng <bmeng.cn@gmail.com> wrote:
> >>>
> >>> On Mon, Feb 8, 2021 at 10:41 PM Bin Meng <bmeng.cn@gmail.com> wrote:
> >>>>
> >>>> On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias
> >>>> <frasse.iglesias@gmail.com> wrote:
> >>>>>
> >>>>> Hi Bin,
> >>>>>
> >>>>> On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
> >>>>>> Hi Francisco,
> >>>>>>
> >>>>>> On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
> >>>>>> <frasse.iglesias@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Dear Bin,
> >>>>>>>
> >>>>>>> On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> >>>>>>>> Hi Francisco,
> >>>>>>>>
> >>>>>>>> On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> >>>>>>>> <frasse.iglesias@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Bin,
> >>>>>>>>>
> >>>>>>>>> On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> >>>>>>>>>> Hi Francisco,
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> >>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi Bin,
> >>>>>>>>>>>
> >>>>>>>>>>> On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> >>>>>>>>>>>> Hi Francisco,
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> >>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi Bin,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> >>>>>>>>>>>>>> Hi Francisco,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> >>>>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi Bin,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> >>>>>>>>>>>>>>>> From: Bin Meng <bin.meng@windriver.com>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The m25p80 model uses s->needed_bytes to indicate how many follow-up
> >>>>>>>>>>>>>>>> bytes are expected to be received after it receives a command. For
> >>>>>>>>>>>>>>>> example, depending on the address mode, either 3-byte address or
> >>>>>>>>>>>>>>>> 4-byte address is needed.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> For fast read family commands, some dummy cycles are required after
> >>>>>>>>>>>>>>>> sending the address bytes, and the dummy cycles need to be counted
> >>>>>>>>>>>>>>>> in s->needed_bytes. This is where the mess began.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> As the variable name (needed_bytes) indicates, the unit is in byte.
> >>>>>>>>>>>>>>>> It is not in bit, or cycle. However for some reason the model has
> >>>>>>>>>>>>>>>> been using the number of dummy cycles for s->needed_bytes. The right
> >>>>>>>>>>>>>>>> approach is to convert the number of dummy cycles to bytes based on
> >>>>>>>>>>>>>>>> the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> >>>>>>>>>>>>>>>> I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> While not being the original implementor I must assume that above solution was
> >>>>>>>>>>>>>>> considered but not chosen by the developers due to it is inaccuracy (it
> >>>>>>>>>>>>>>> wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> >>>>>>>>>>>>>>> meaning that if the controller is wrongly programmed to generate 7 the error
> >>>>>>>>>>>>>>> wouldn't be caught and the controller will still be considered "correct"). Now
> >>>>>>>>>>>>>>> that we have this detail in the implementation I'm in favor of keeping it, this
> >>>>>>>>>>>>>>> also because the detail is already in use for catching exactly above error.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I found no clue from the commit message that my proposed solution here
> >>>>>>>>>>>>>> was ever considered, otherwise all SPI controller models supporting
> >>>>>>>>>>>>>> software generation should have been found out seriously broken long
> >>>>>>>>>>>>>> time ago!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The controllers you are referring to might lack support for commands requiring
> >>>>>>>>>>>>> dummy clock cycles but I really hope they work with the other commands? If so I
> >>>>>>>>>>>>
> >>>>>>>>>>>> I am not sure why you view dummy clock cycles as something special
> >>>>>>>>>>>> that needs some special support from the SPI controller. For the case
> >>>>>>>>>>>> 1 controller, it's nothing special from the controller perspective,
> >>>>>>>>>>>> just like sending out a command, or address bytes, or data. The
> >>>>>>>>>>>> controller just shifts data bit by bit from its tx fifo and that's it.
> >>>>>>>>>>>> In the Xilinx GQSPI controller case, the dummy cycles can either be
> >>>>>>>>>>>> sent via a regular data (the case 1 controller) in the tx fifo, or
> >>>>>>>>>>>> automatically generated (case 2 controller) by the hardware.
> >>>>>>>>>>>
> >>>>>>>>>>> Ok, I'll try to explain my view point a little differently. For that we also
> >>>>>>>>>>> need to keep in mind that QEMU models HW, and any binary that runs on a HW
> >>>>>>>>>>> board supported in QEMU should ideally run on that board inside QEMU aswell
> >>>>>>>>>>> (this can be a bare metal application equaly well as a modified u-boot/Linux
> >>>>>>>>>>> using SPI commands with a non multiple of 8 number of dummy clock cycles).
> >>>>>>>>>>>
> >>>>>>>>>>> Once functionality has been introduced into QEMU it is not easy to know which
> >>>>>>>>>>> intentional or untentional features provided by the functionality are being
> >>>>>>>>>>> used by users. One of the (perhaps not well known) features I'm aware of that
> >>>>>>>>>>> is in use and is provided by the accurate dummy clock cycle modeling inside
> >>>>>>>>>>> m25p80 is the be ability to test drivers accurately regarding the dummy clock
> >>>>>>>>>>> cycles (even when using commands with a non-multiple of 8 number of dummy clock
> >>>>>>>>>>> cycles), but there might be others aswell. So by removing this functionality
> >>>>>>>>>>> above use case will brake, this since those test will not be reliable.
> >>>>>>>>>>> Furthermore, since users tend to be creative it is not possible to know if
> >>>>>>>>>>> there are other use cases that will be affected. This means that in case [1]
> >>>>>>>>>>> needs to be followed the safe path is to add functionality instead of removing.
> >>>>>>>>>>> Luckily it also easier in this case, see below.
> >>>>>>>>>>
> >>>>>>>>>> I understand there might be users other than U-Boot/Linux that use an
> >>>>>>>>>> odd number of dummy bits (not multiple of 8). If your concern was
> >>>>>>>>>> about model behavior changes, sure I can update
> >>>>>>>>>> qemu/docs/system/deprecated.rst to mention that some flashes in the
> >>>>>>>>>> m25p80 model now implement dummy cycles as bytes.
> >>>>>>>>>
> >>>>>>>>> Yes, something like that. My concern is that since this functionality has been
> >>>>>>>>> in tree for while, users have found known or unknown features that got
> >>>>>>>>> introduced by it. By removing the functionality (and the known/uknown features)
> >>>>>>>>> we are riscing to brake our user's use cases (currently I'm aware of one
> >>>>>>>>> feature/use case but it is not unlikely that there are more). [1] states that
> >>>>>>>>> "In general features are intended to be supported indefinitely once introduced
> >>>>>>>>> into QEMU", to me that makes very much sense because the opposite would mean
> >>>>>>>>> that we were not reliable. So in case [1] needs to be honored it looks to be
> >>>>>>>>> safer to add functionality instead of removing (and riscing the removal of use
> >>>>>>>>> cases/features). Luckily I still believe in this case that it will be easier to
> >>>>>>>>> go forward (even if I also agree on what you are saying below about what I
> >>>>>>>>> proposed).
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Even if the implementation is buggy and we need to keep the buggy
> >>>>>>>> implementation forever? I think that's why
> >>>>>>>> qemu/docs/system/deprecated.rst was created for deprecating such
> >>>>>>>> feature.
> >>>>>>>
> >>>>>>> With the RFC I posted all commands in m25p80 are working for both the case 1
> >>>>>>> controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI).
> >>>>>>> Because of this, I, with all respect, will have to disagree that this is buggy.
> >>>>>>
> >>>>>> Well, the existing m25p80 implementation that uses dummy cycle
> >>>>>> accuracy for those flashes prevents all SPI controllers that use tx
> >>>>>> fifo to work with those flashes. Hence it is buggy.
> >>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> don't think it is fair to call them 'seriously broken' (and else we should
> >>>>>>>>>>>>> probably let the maintainers know about it). Most likely the lack of support
> >>>>>>>>>>>>
> >>>>>>>>>>>> I called it "seriously broken" because current implementation only
> >>>>>>>>>>>> considered one type of SPI controllers while completely ignoring the
> >>>>>>>>>>>> other type.
> >>>>>>>>>>>
> >>>>>>>>>>> If we change view and see this from the perspective of m25p80, it models the
> >>>>>>>>>>> commands a certain way and provides an API that the SPI controllers need to
> >>>>>>>>>>> implement for interacting with it. It is true that there are SPI controllers
> >>>>>>>>>>> referred to above that do not support the portion of that API that corresponds
> >>>>>>>>>>> to commands with dummy clock cycles, but I don't think it is true that this is
> >>>>>>>>>>> broken since there is also one SPI controller that has a working implementation
> >>>>>>>>>>> of m25p80's full API also when transfering through a tx fifo (use case 1). But
> >>>>>>>>>>> as mentioned above, by doing a minor extension and improvement to m25p80's API
> >>>>>>>>>>> and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> >>>>>>>>>>> will still be honored as in the same time making it possible to have full
> >>>>>>>>>>> support for the API in the SPI controllers that currently do not (please reread
> >>>>>>>>>>> the proposal in my previous reply that attempts to do this). I myself see this
> >>>>>>>>>>> as win/win situation, also because no controller should need modifications.
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> I am afraid your proposal does not work. Your proposed new device
> >>>>>>>>>> property 'model_dummy_bytes' to select to convert the accurate dummy
> >>>>>>>>>> clock cycle count to dummy bytes inside m25p80, is hard to justify as
> >>>>>>>>>> a property to the flash itself, as the behavior is tightly coupled to
> >>>>>>>>>> how the SPI controller works.
> >>>>>>>>>
> >>>>>>>>> I agree on above. I decided though that instead of posting sample code in here
> >>>>>>>>> I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
> >>>>>>>>> Xilinx ZynqMP GQSPI should not need any modication in a first step.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Wait, (see below)
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Please take a look at the Xilinx GQSPI controller, which supports both
> >>>>>>>>>> use cases, that the dummy cycles can be transferred via tx fifo, or
> >>>>>>>>>> generated by the controller automatically. Please read the example
> >>>>>>>>>> given in:
> >>>>>>>>>>
> >>>>>>>>>>     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
> >>>>>>>>>> Command (EBh)
> >>>>>>>>>>
> >>>>>>>>>> in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> >>>>>>>>>>
> >>>>>>>>>> If you choose to set the m25p80 device property 'model_dummy_bytes' to
> >>>>>>>>>> true when working with the Xilinx GQSPI controller, you are bound to
> >>>>>>>>>> only allow guest software to use tx fifo to transfer the dummy cycles,
> >>>>>>>>>> and this is wrong.
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>> You missed this part. I looked at your RFC, and as I mentioned above
> >>>>>>>> your proposal cannot support the complicated controller like Xilinx
> >>>>>>>> GQSPI. Please read the example of table 24-22. With your RFC, you
> >>>>>>>> mandate guest software's GQSPI driver to only use hardware dummy cycle
> >>>>>>>> generation, which is wrong.
> >>>>>>>>
> >>>>>>>
> >>>>>>> First, thank you very much for looking into the RFC series, very much
> >>>>>>> appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2
> >>>>>>> locations in the file, in 1 location the transfer referred to above is done, in
> >>>>>>> another location the transfer through the txfifo is done. The location where
> >>>>>>> transfer referred to above is done will not need any modifications (and will
> >>>>>>> thus work equally well as it does currently).
> >>>>>>
> >>>>>> Please explain this a little bit. How does your RFC series handle
> >>>>>> cases as described in table 24-22, where the 6 dummy cycles are split
> >>>>>> into 2 transfers, with one transfer using tx fifo, and the other one
> >>>>>> using hardware dummy cycle generation?
> >>>>>
> >>>>> Sorry, I missunderstod. You are right, that won't work.
> >>>>
> >>>> +Edgar E. Iglesias
> >>>>
> >>>> So it looks by far the only way to implement dummy cycles correctly to
> >>>> work with all SPI controller models is what I proposed here in this
> >>>> patch series.
> >>>>
> >>>> Maintainers are quite silent, so I would like to hear your thoughts.
> >>>>
> >>>> @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you
> >>>> please share your thoughts since you are the one who reviewed the
> >>>> existing dummy implementation (based on commits history)
> >>
> >> I agree with Edgar, in that Francisco and Bin know this better than me
> >> and that modelling things in cycles is a pain.
> >
> > Hi Alistair,
> >
> >>
> >> As Bin points out it seems like currently we should be modelling bytes
> >> (from the variable name) so it makes sense to keep it in bytes. I
> >> would be in favour of this series in that case. Do we know what use
> >> cases this will break? I know it's hard to answer but I don't think
> >> there are too many SSI users in QEMU so it might not be too hard to
> >> test most of the possible use cases.
> >
> > The use case I'm aware of is regression testing of drivers. Ex: if a
> > driver is using 10 dummy clock cycles with the commands and a patch
> > accidentaly changes the driver to use 11 dummy clock cycles QEMU currently
> > finds the problem, that won't be possible with this series. It's difficult
> > to say but it is not impossible there are other use cases also.
>
>
> It was breaking the Aspeed machines :
>
>   https://lore.kernel.org/qemu-devel/78a12882-1303-dd6d-6619-96c5e2cbf531@kaod.org/

Yes, as I mentioned in the series the modification was based on a pure
guess from existing QEMU codes as I could not find a datasheet of the
Aspeed SPI controller on the internet. Do you know if this is publicly
available?

>
> QEMU 6.1 should have acceptance tests that will help in detecting
> regressions in this area.
>

Regards,
Bin
Cédric Le Goater April 28, 2021, 1:54 p.m. UTC | #24
On 4/28/21 3:12 PM, Bin Meng wrote:
> Hi Cédric,
> 
> On Tue, Apr 27, 2021 at 10:32 PM Cédric Le Goater <clg@kaod.org> wrote:
>>
>> Hello,
>>
>> On 4/27/21 10:54 AM, Francisco Iglesias wrote:
>>> On [2021 Apr 27] Tue 15:56:10, Alistair Francis wrote:
>>>> On Fri, Apr 23, 2021 at 4:46 PM Bin Meng <bmeng.cn@gmail.com> wrote:
>>>>>
>>>>> On Mon, Feb 8, 2021 at 10:41 PM Bin Meng <bmeng.cn@gmail.com> wrote:
>>>>>>
>>>>>> On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias
>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Bin,
>>>>>>>
>>>>>>> On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
>>>>>>>> Hi Francisco,
>>>>>>>>
>>>>>>>> On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Dear Bin,
>>>>>>>>>
>>>>>>>>> On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
>>>>>>>>>> Hi Francisco,
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>
>>>>>>>>>>> On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
>>>>>>>>>>>> Hi Francisco,
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
>>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
>>>>>>>>>>>>>> Hi Francisco,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
>>>>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
>>>>>>>>>>>>>>>> Hi Francisco,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
>>>>>>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
>>>>>>>>>>>>>>>>>> From: Bin Meng <bin.meng@windriver.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The m25p80 model uses s->needed_bytes to indicate how many follow-up
>>>>>>>>>>>>>>>>>> bytes are expected to be received after it receives a command. For
>>>>>>>>>>>>>>>>>> example, depending on the address mode, either 3-byte address or
>>>>>>>>>>>>>>>>>> 4-byte address is needed.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> For fast read family commands, some dummy cycles are required after
>>>>>>>>>>>>>>>>>> sending the address bytes, and the dummy cycles need to be counted
>>>>>>>>>>>>>>>>>> in s->needed_bytes. This is where the mess began.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> As the variable name (needed_bytes) indicates, the unit is in byte.
>>>>>>>>>>>>>>>>>> It is not in bit, or cycle. However for some reason the model has
>>>>>>>>>>>>>>>>>> been using the number of dummy cycles for s->needed_bytes. The right
>>>>>>>>>>>>>>>>>> approach is to convert the number of dummy cycles to bytes based on
>>>>>>>>>>>>>>>>>> the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
>>>>>>>>>>>>>>>>>> I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> While not being the original implementor I must assume that above solution was
>>>>>>>>>>>>>>>>> considered but not chosen by the developers due to it is inaccuracy (it
>>>>>>>>>>>>>>>>> wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
>>>>>>>>>>>>>>>>> meaning that if the controller is wrongly programmed to generate 7 the error
>>>>>>>>>>>>>>>>> wouldn't be caught and the controller will still be considered "correct"). Now
>>>>>>>>>>>>>>>>> that we have this detail in the implementation I'm in favor of keeping it, this
>>>>>>>>>>>>>>>>> also because the detail is already in use for catching exactly above error.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I found no clue from the commit message that my proposed solution here
>>>>>>>>>>>>>>>> was ever considered, otherwise all SPI controller models supporting
>>>>>>>>>>>>>>>> software generation should have been found out seriously broken long
>>>>>>>>>>>>>>>> time ago!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The controllers you are referring to might lack support for commands requiring
>>>>>>>>>>>>>>> dummy clock cycles but I really hope they work with the other commands? If so I
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am not sure why you view dummy clock cycles as something special
>>>>>>>>>>>>>> that needs some special support from the SPI controller. For the case
>>>>>>>>>>>>>> 1 controller, it's nothing special from the controller perspective,
>>>>>>>>>>>>>> just like sending out a command, or address bytes, or data. The
>>>>>>>>>>>>>> controller just shifts data bit by bit from its tx fifo and that's it.
>>>>>>>>>>>>>> In the Xilinx GQSPI controller case, the dummy cycles can either be
>>>>>>>>>>>>>> sent via a regular data (the case 1 controller) in the tx fifo, or
>>>>>>>>>>>>>> automatically generated (case 2 controller) by the hardware.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ok, I'll try to explain my view point a little differently. For that we also
>>>>>>>>>>>>> need to keep in mind that QEMU models HW, and any binary that runs on a HW
>>>>>>>>>>>>> board supported in QEMU should ideally run on that board inside QEMU aswell
>>>>>>>>>>>>> (this can be a bare metal application equaly well as a modified u-boot/Linux
>>>>>>>>>>>>> using SPI commands with a non multiple of 8 number of dummy clock cycles).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Once functionality has been introduced into QEMU it is not easy to know which
>>>>>>>>>>>>> intentional or untentional features provided by the functionality are being
>>>>>>>>>>>>> used by users. One of the (perhaps not well known) features I'm aware of that
>>>>>>>>>>>>> is in use and is provided by the accurate dummy clock cycle modeling inside
>>>>>>>>>>>>> m25p80 is the be ability to test drivers accurately regarding the dummy clock
>>>>>>>>>>>>> cycles (even when using commands with a non-multiple of 8 number of dummy clock
>>>>>>>>>>>>> cycles), but there might be others aswell. So by removing this functionality
>>>>>>>>>>>>> above use case will brake, this since those test will not be reliable.
>>>>>>>>>>>>> Furthermore, since users tend to be creative it is not possible to know if
>>>>>>>>>>>>> there are other use cases that will be affected. This means that in case [1]
>>>>>>>>>>>>> needs to be followed the safe path is to add functionality instead of removing.
>>>>>>>>>>>>> Luckily it also easier in this case, see below.
>>>>>>>>>>>>
>>>>>>>>>>>> I understand there might be users other than U-Boot/Linux that use an
>>>>>>>>>>>> odd number of dummy bits (not multiple of 8). If your concern was
>>>>>>>>>>>> about model behavior changes, sure I can update
>>>>>>>>>>>> qemu/docs/system/deprecated.rst to mention that some flashes in the
>>>>>>>>>>>> m25p80 model now implement dummy cycles as bytes.
>>>>>>>>>>>
>>>>>>>>>>> Yes, something like that. My concern is that since this functionality has been
>>>>>>>>>>> in tree for while, users have found known or unknown features that got
>>>>>>>>>>> introduced by it. By removing the functionality (and the known/uknown features)
>>>>>>>>>>> we are riscing to brake our user's use cases (currently I'm aware of one
>>>>>>>>>>> feature/use case but it is not unlikely that there are more). [1] states that
>>>>>>>>>>> "In general features are intended to be supported indefinitely once introduced
>>>>>>>>>>> into QEMU", to me that makes very much sense because the opposite would mean
>>>>>>>>>>> that we were not reliable. So in case [1] needs to be honored it looks to be
>>>>>>>>>>> safer to add functionality instead of removing (and riscing the removal of use
>>>>>>>>>>> cases/features). Luckily I still believe in this case that it will be easier to
>>>>>>>>>>> go forward (even if I also agree on what you are saying below about what I
>>>>>>>>>>> proposed).
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Even if the implementation is buggy and we need to keep the buggy
>>>>>>>>>> implementation forever? I think that's why
>>>>>>>>>> qemu/docs/system/deprecated.rst was created for deprecating such
>>>>>>>>>> feature.
>>>>>>>>>
>>>>>>>>> With the RFC I posted all commands in m25p80 are working for both the case 1
>>>>>>>>> controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI).
>>>>>>>>> Because of this, I, with all respect, will have to disagree that this is buggy.
>>>>>>>>
>>>>>>>> Well, the existing m25p80 implementation that uses dummy cycle
>>>>>>>> accuracy for those flashes prevents all SPI controllers that use tx
>>>>>>>> fifo to work with those flashes. Hence it is buggy.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> don't think it is fair to call them 'seriously broken' (and else we should
>>>>>>>>>>>>>>> probably let the maintainers know about it). Most likely the lack of support
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I called it "seriously broken" because current implementation only
>>>>>>>>>>>>>> considered one type of SPI controllers while completely ignoring the
>>>>>>>>>>>>>> other type.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If we change view and see this from the perspective of m25p80, it models the
>>>>>>>>>>>>> commands a certain way and provides an API that the SPI controllers need to
>>>>>>>>>>>>> implement for interacting with it. It is true that there are SPI controllers
>>>>>>>>>>>>> referred to above that do not support the portion of that API that corresponds
>>>>>>>>>>>>> to commands with dummy clock cycles, but I don't think it is true that this is
>>>>>>>>>>>>> broken since there is also one SPI controller that has a working implementation
>>>>>>>>>>>>> of m25p80's full API also when transfering through a tx fifo (use case 1). But
>>>>>>>>>>>>> as mentioned above, by doing a minor extension and improvement to m25p80's API
>>>>>>>>>>>>> and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
>>>>>>>>>>>>> will still be honored as in the same time making it possible to have full
>>>>>>>>>>>>> support for the API in the SPI controllers that currently do not (please reread
>>>>>>>>>>>>> the proposal in my previous reply that attempts to do this). I myself see this
>>>>>>>>>>>>> as win/win situation, also because no controller should need modifications.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I am afraid your proposal does not work. Your proposed new device
>>>>>>>>>>>> property 'model_dummy_bytes' to select to convert the accurate dummy
>>>>>>>>>>>> clock cycle count to dummy bytes inside m25p80, is hard to justify as
>>>>>>>>>>>> a property to the flash itself, as the behavior is tightly coupled to
>>>>>>>>>>>> how the SPI controller works.
>>>>>>>>>>>
>>>>>>>>>>> I agree on above. I decided though that instead of posting sample code in here
>>>>>>>>>>> I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
>>>>>>>>>>> Xilinx ZynqMP GQSPI should not need any modication in a first step.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Wait, (see below)
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Please take a look at the Xilinx GQSPI controller, which supports both
>>>>>>>>>>>> use cases, that the dummy cycles can be transferred via tx fifo, or
>>>>>>>>>>>> generated by the controller automatically. Please read the example
>>>>>>>>>>>> given in:
>>>>>>>>>>>>
>>>>>>>>>>>>     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
>>>>>>>>>>>> Command (EBh)
>>>>>>>>>>>>
>>>>>>>>>>>> in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
>>>>>>>>>>>>
>>>>>>>>>>>> If you choose to set the m25p80 device property 'model_dummy_bytes' to
>>>>>>>>>>>> true when working with the Xilinx GQSPI controller, you are bound to
>>>>>>>>>>>> only allow guest software to use tx fifo to transfer the dummy cycles,
>>>>>>>>>>>> and this is wrong.
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> You missed this part. I looked at your RFC, and as I mentioned above
>>>>>>>>>> your proposal cannot support the complicated controller like Xilinx
>>>>>>>>>> GQSPI. Please read the example of table 24-22. With your RFC, you
>>>>>>>>>> mandate guest software's GQSPI driver to only use hardware dummy cycle
>>>>>>>>>> generation, which is wrong.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> First, thank you very much for looking into the RFC series, very much
>>>>>>>>> appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2
>>>>>>>>> locations in the file, in 1 location the transfer referred to above is done, in
>>>>>>>>> another location the transfer through the txfifo is done. The location where
>>>>>>>>> transfer referred to above is done will not need any modifications (and will
>>>>>>>>> thus work equally well as it does currently).
>>>>>>>>
>>>>>>>> Please explain this a little bit. How does your RFC series handle
>>>>>>>> cases as described in table 24-22, where the 6 dummy cycles are split
>>>>>>>> into 2 transfers, with one transfer using tx fifo, and the other one
>>>>>>>> using hardware dummy cycle generation?
>>>>>>>
>>>>>>> Sorry, I missunderstod. You are right, that won't work.
>>>>>>
>>>>>> +Edgar E. Iglesias
>>>>>>
>>>>>> So it looks by far the only way to implement dummy cycles correctly to
>>>>>> work with all SPI controller models is what I proposed here in this
>>>>>> patch series.
>>>>>>
>>>>>> Maintainers are quite silent, so I would like to hear your thoughts.
>>>>>>
>>>>>> @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you
>>>>>> please share your thoughts since you are the one who reviewed the
>>>>>> existing dummy implementation (based on commits history)
>>>>
>>>> I agree with Edgar, in that Francisco and Bin know this better than me
>>>> and that modelling things in cycles is a pain.
>>>
>>> Hi Alistair,
>>>
>>>>
>>>> As Bin points out it seems like currently we should be modelling bytes
>>>> (from the variable name) so it makes sense to keep it in bytes. I
>>>> would be in favour of this series in that case. Do we know what use
>>>> cases this will break? I know it's hard to answer but I don't think
>>>> there are too many SSI users in QEMU so it might not be too hard to
>>>> test most of the possible use cases.
>>>
>>> The use case I'm aware of is regression testing of drivers. Ex: if a
>>> driver is using 10 dummy clock cycles with the commands and a patch
>>> accidentaly changes the driver to use 11 dummy clock cycles QEMU currently
>>> finds the problem, that won't be possible with this series. It's difficult
>>> to say but it is not impossible there are other use cases also.
>>
>>
>> It was breaking the Aspeed machines :
>>
>>   https://lore.kernel.org/qemu-devel/78a12882-1303-dd6d-6619-96c5e2cbf531@kaod.org/
> 
> Yes, as I mentioned in the series the modification was based on a pure
> guess from existing QEMU codes as I could not find a datasheet of the
> Aspeed SPI controller on the internet. Do you know if this is publicly
> available?

It is not but much of the register bitfields are described in the code.
I should be able to help you in making this work.

Thanks,

C. 


>> QEMU 6.1 should have acceptance tests that will help in detecting
>> regressions in this area.
>>
> 
> Regards,
> Bin
>