mbox series

[v4,00/11] spi: spi-mem: Convert Aspeed SMC driver to spi-mem

Message ID 20220325100849.2019209-1-clg@kaod.org
Headers show
Series spi: spi-mem: Convert Aspeed SMC driver to spi-mem | expand

Message

Cédric Le Goater March 25, 2022, 10:08 a.m. UTC
Hi,

This series adds a new SPI driver using the spi-mem interface for the
Aspeed static memory controllers of the AST2600, AST2500 and AST2400
SoCs.

 * AST2600 Firmware SPI Memory Controller (FMC)
 * AST2600 SPI Flash Controller (SPI1 and SPI2)
 * AST2500 Firmware SPI Memory Controller (FMC)
 * AST2500 SPI Flash Controller (SPI1 and SPI2)
 * AST2400 New Static Memory Controller (also referred as FMC)
 * AST2400 SPI Flash Controller (SPI)

It is based on the current OpenBMC kernel driver [1], using directly
the MTD SPI-NOR interface and on a patchset [2] previously proposed
adding support for the AST2600 only. This driver takes a slightly
different approach to cover all 6 controllers.

It does not make use of the controller register disabling Address and
Data byte lanes because is not available on the AST2400 SoC. We could
introduce a specific handler for new features available on recent SoCs
if needed. As there is not much difference on performance, the driver
chooses the common denominator: "User mode" which has been heavily
tested in [1]. "User mode" is also used as a fall back method when
flash device mapping window is too small.

Problems to address with spi-mem were the configuration of the mapping
windows and the calibration of the read timings. The driver handles
them in the direct mapping handler when some knowledge on the size of
the flash device is know. It is not perfect but not incorrect either.
The algorithm is one from [1] because it doesn't require the DMA
registers which are not available on all controllers.

Direct mapping for writes is not supported (yet). I have seen some
corruption with writes and I preferred to use the safer and proven
method of the initial driver [1]. We can improve that later.

The driver supports Quad SPI RX transfers on the AST2600 SoC but it
didn't have the expected results. Therefore it is not activated yet.
There are some issues on the pinctrl to investigate first. 

Tested on:
 
 * OpenPOWER Palmetto (AST2400)
 * Facebook Wedge 100 BMC (AST2400) by Tao Ren <rentao.bupt@gmail.com>
 * Evaluation board (AST2500) 
 * Inspur FP5280G2 BMC (AST2500) by John Wang <wangzq.jn@gmail.com>
 * Facebook Backpack CMM BMC (AST2500) by Tao Ren <rentao.bupt@gmail.com>
 * OpenPOWER Witherspoon (AST2500)
 * Evaluation board (AST2600 A0 and A3)
 * Rainier board (AST2600)
 
[1] https://github.com/openbmc/linux/blob/dev-5.15/drivers/mtd/spi-nor/controllers/aspeed-smc.c
[2] https://patchwork.ozlabs.org/project/linux-aspeed/list/?series=212394

Thanks,

C. 

Changes in v4:

  - Rebased on 5.18 
  - Removal of the SPI-NOR base driver (we had enough tests)
  - Fix for small size flash devices on AST2600 (Potin)

Changes in v3:

 - Fixed compile warning on aspeed_spi_dirmap_read() prototype reported
   by kernel test robot 
 - Removed unnecessary entry in ast2600-fmc.yaml
 - New patch from Tao to set spi-max-frequency on all FMC devices

Changes in v2:

 - Fixed dt_binding_check warnings (Rob)
 - New entry in MAINTAINERS 
 - Addressed Lukas comments regarding the SPI controller registration
   and device removal. Checked with driver bind/unbind   
 - Introduced setup and cleanup handlers and removed routine looping
   on the DT children properties (Pratyush)
 - Clarified in commit log requirements for training.
 - Removed defconfig changes of patch 1 since they were reverted in
   the last patch (Joel)

Cédric Le Goater (9):
  ARM: dts: aspeed: Adjust "reg" property of FMC/SPI controllers
  dt-bindings: spi: Add Aspeed SMC controllers device tree binding
  spi: spi-mem: Convert Aspeed SMC driver to spi-mem
  spi: aspeed: Add support for direct mapping
  spi: aspeed: Adjust direct mapping to device size
  spi: aspeed: Workaround AST2500 limitations
  spi: aspeed: Add support for the AST2400 SPI controller
  spi: aspeed: Calibrate read timings
  ARM: dts: aspeed: Enable Dual SPI RX transfers

Potin Lai (1):
  mtd: spi-nor: aspeed: set the decoding size to at least 2MB for
    AST2600

Tao Ren (1):
  ARM: dts: aspeed-g4: Set spi-max-frequency for all flashes

 drivers/mtd/spi-nor/controllers/aspeed-smc.c  |  910 -------------
 drivers/spi/spi-aspeed-smc.c                  | 1197 +++++++++++++++++
 .../devicetree/bindings/mtd/aspeed-smc.txt    |   51 -
 .../bindings/spi/aspeed,ast2600-fmc.yaml      |   87 ++
 MAINTAINERS                                   |   10 +
 arch/arm/boot/dts/aspeed-g4.dtsi              |   16 +-
 arch/arm/boot/dts/aspeed-g5.dtsi              |   16 +-
 arch/arm/boot/dts/aspeed-g6.dtsi              |   17 +-
 drivers/mtd/spi-nor/controllers/Kconfig       |   10 -
 drivers/mtd/spi-nor/controllers/Makefile      |    1 -
 drivers/spi/Kconfig                           |   11 +
 drivers/spi/Makefile                          |    1 +
 12 files changed, 1339 insertions(+), 988 deletions(-)
 delete mode 100644 drivers/mtd/spi-nor/controllers/aspeed-smc.c
 create mode 100644 drivers/spi/spi-aspeed-smc.c
 delete mode 100644 Documentation/devicetree/bindings/mtd/aspeed-smc.txt
 create mode 100644 Documentation/devicetree/bindings/spi/aspeed,ast2600-fmc.yaml

Comments

Chin-Ting Kuo March 30, 2022, 11:53 a.m. UTC | #1
Hi Cédric Le,

> -----Original Message-----
> From: Cédric Le Goater <clg@kaod.org>
> Sent: Friday, March 25, 2022 6:09 PM
> To: linux-spi@vger.kernel.org; linux-mtd@lists.infradead.org
> Subject: [PATCH v4 08/11] spi: aspeed: Calibrate read timings
> 
> To accommodate the different response time of SPI transfers on different
> boards and different SPI NOR devices, the Aspeed controllers provide a set of
> Read Timing Compensation registers to tune the timing delays depending on
> the frequency being used. The AST2600 SoC has one of these registers per
> device. On the AST2500 and AST2400 SoCs, the timing register is shared by all
> devices which is problematic to get good results other than for one device.
> 
> The algorithm first reads a golden buffer at low speed and then performs reads
> with different clocks and delay cycle settings to find a breaking point. This
> selects a default good frequency for the CEx control register.
> The current settings are a bit optimistic as we pick the first delay giving good
> results. A safer approach would be to determine an interval and choose the
> middle value.
> 
> Calibration is performed when the direct mapping for reads is created.
> Since the underlying spi-nor object needs to be initialized to create the
> spi_mem operation for direct mapping, we should be fine. Having a specific
> API would clarify the requirements though.
> 
> Cc: Pratyush Yadav <p.yadav@ti.com>
> Reviewed-by: Joel Stanley <joel@jms.id.au>
> Tested-by: Joel Stanley <joel@jms.id.au>
> Tested-by: Tao Ren <rentao.bupt@gmail.com>
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  drivers/spi/spi-aspeed-smc.c | 281
> +++++++++++++++++++++++++++++++++++
>  1 file changed, 281 insertions(+)
> 
> diff --git a/drivers/spi/spi-aspeed-smc.c b/drivers/spi/spi-aspeed-smc.c index
> 7f306da7c44e..660451667a39 100644
> --- a/drivers/spi/spi-aspeed-smc.c
> +++ b/drivers/spi/spi-aspeed-smc.c
> @@ -33,6 +33,8 @@
>  #define   CTRL_IO_ADDRESS_4B		BIT(13)	/* AST2400 SPI only */
>  #define   CTRL_IO_DUMMY_SET(dummy)					\
>  	(((((dummy) >> 2) & 0x1) << 14) | (((dummy) & 0x3) << 6))
> +#define   CTRL_FREQ_SEL_SHIFT		8
> +#define   CTRL_FREQ_SEL_MASK		GENMASK(11,
> CTRL_FREQ_SEL_SHIFT)
>  #define   CTRL_CE_STOP_ACTIVE		BIT(2)
>  #define   CTRL_IO_MODE_CMD_MASK		GENMASK(1, 0)
>  #define   CTRL_IO_MODE_NORMAL		0x0
> @@ -45,6 +47,9 @@
>  /* CEx Address Decoding Range Register */
>  #define CE0_SEGMENT_ADDR_REG		0x30
> 
> +/* CEx Read timing compensation register */
> +#define CE0_TIMING_COMPENSATION_REG	0x94
> +
>  enum aspeed_spi_ctl_reg_value {
>  	ASPEED_SPI_BASE,
>  	ASPEED_SPI_READ,
> @@ -70,10 +75,15 @@ struct aspeed_spi_data {
>  	bool	hastype;
>  	u32	mode_bits;
>  	u32	we0;
> +	u32	timing;
> +	u32	hclk_mask;
> +	u32	hdiv_max;
> 
>  	u32 (*segment_start)(struct aspeed_spi *aspi, u32 reg);
>  	u32 (*segment_end)(struct aspeed_spi *aspi, u32 reg);
>  	u32 (*segment_reg)(struct aspeed_spi *aspi, u32 start, u32 end);
> +	int (*calibrate)(struct aspeed_spi_chip *chip, u32 hdiv,
> +			 const u8 *golden_buf, u8 *test_buf);
>  };
> 
>  #define ASPEED_SPI_MAX_NUM_CS	5
> @@ -517,6 +527,8 @@ static int aspeed_spi_chip_adjust_window(struct
> aspeed_spi_chip *chip,
>  	return 0;
>  }
> 
> +static int aspeed_spi_do_calibration(struct aspeed_spi_chip *chip);
> +
>  static int aspeed_spi_dirmap_create(struct spi_mem_dirmap_desc *desc)  {
>  	struct aspeed_spi *aspi =
> spi_controller_get_devdata(desc->mem->spi->master);
> @@ -565,6 +577,8 @@ static int aspeed_spi_dirmap_create(struct
> spi_mem_dirmap_desc *desc)
>  	chip->ctl_val[ASPEED_SPI_READ] = ctl_val;
>  	writel(chip->ctl_val[ASPEED_SPI_READ], chip->ctl);
> 
> +	ret = aspeed_spi_do_calibration(chip);
> +
>  	dev_info(aspi->dev, "CE%d read buswidth:%d [0x%08x]\n",
>  		 chip->cs, op->data.buswidth, chip->ctl_val[ASPEED_SPI_READ]);
> 
> @@ -812,6 +826,249 @@ static u32 aspeed_spi_segment_ast2600_reg(struct
> aspeed_spi *aspi,
>  		((end - 1) & AST2600_SEG_ADDR_MASK);
>  }
> 
> +/*
> + * Read timing compensation sequences
> + */
> +
> +#define CALIBRATE_BUF_SIZE SZ_16K
> +
> +static bool aspeed_spi_check_reads(struct aspeed_spi_chip *chip,
> +				   const u8 *golden_buf, u8 *test_buf) {
> +	int i;
> +
> +	for (i = 0; i < 10; i++) {
> +		memcpy_fromio(test_buf, chip->ahb_base, CALIBRATE_BUF_SIZE);
> +		if (memcmp(test_buf, golden_buf, CALIBRATE_BUF_SIZE) != 0) { #if
> +defined(VERBOSE_DEBUG)
> +			print_hex_dump_bytes(DEVICE_NAME "  fail: ",
> DUMP_PREFIX_NONE,
> +					     test_buf, 0x100);
> +#endif
> +			return false;
> +		}
> +	}
> +	return true;
> +}
> +
> +#define FREAD_TPASS(i)	(((i) / 2) | (((i) & 1) ? 0 : 8))
> +
> +/*
> + * The timing register is shared by all devices. Only update for CE0.
> + */
> +static int aspeed_spi_calibrate(struct aspeed_spi_chip *chip, u32 hdiv,
> +				const u8 *golden_buf, u8 *test_buf) {
> +	struct aspeed_spi *aspi = chip->aspi;
> +	const struct aspeed_spi_data *data = aspi->data;
> +	int i;
> +	int good_pass = -1, pass_count = 0;
> +	u32 shift = (hdiv - 1) << 2;
> +	u32 mask = ~(0xfu << shift);
> +	u32 fread_timing_val = 0;
> +
> +	/* Try HCLK delay 0..5, each one with/without delay and look for a
> +	 * good pair.
> +	 */
> +	for (i = 0; i < 12; i++) {
> +		bool pass;
> +
> +		if (chip->cs == 0) {
> +			fread_timing_val &= mask;
> +			fread_timing_val |= FREAD_TPASS(i) << shift;
> +			writel(fread_timing_val, aspi->regs + data->timing);
> +		}
> +		pass = aspeed_spi_check_reads(chip, golden_buf, test_buf);
> +		dev_dbg(aspi->dev,
> +			"  * [%08x] %d HCLK delay, %dns DI delay : %s",
> +			fread_timing_val, i / 2, (i & 1) ? 0 : 4,
> +			pass ? "PASS" : "FAIL");
> +		if (pass) {
> +			pass_count++;
> +			if (pass_count == 3) {
> +				good_pass = i - 1;
> +				break;
> +			}
> +		} else {
> +			pass_count = 0;
> +		}
> +	}
> +
> +	/* No good setting for this frequency */
> +	if (good_pass < 0)
> +		return -1;
> +
> +	/* We have at least one pass of margin, let's use first pass */
> +	if (chip->cs == 0) {
> +		fread_timing_val &= mask;
> +		fread_timing_val |= FREAD_TPASS(good_pass) << shift;
> +		writel(fread_timing_val, aspi->regs + data->timing);
> +	}
> +	dev_dbg(aspi->dev, " * -> good is pass %d [0x%08x]",
> +		good_pass, fread_timing_val);
> +	return 0;
> +}
> +
> +static bool aspeed_spi_check_calib_data(const u8 *test_buf, u32 size) {
> +	const u32 *tb32 = (const u32 *)test_buf;
> +	u32 i, cnt = 0;
> +
> +	/* We check if we have enough words that are neither all 0
> +	 * nor all 1's so the calibration can be considered valid.
> +	 *
> +	 * I use an arbitrary threshold for now of 64
> +	 */
> +	size >>= 2;
> +	for (i = 0; i < size; i++) {
> +		if (tb32[i] != 0 && tb32[i] != 0xffffffff)
> +			cnt++;
> +	}
> +	return cnt >= 64;
> +}
> +
> +static const u32 aspeed_spi_hclk_divs[] = {
> +	0xf, /* HCLK */
> +	0x7, /* HCLK/2 */
> +	0xe, /* HCLK/3 */
> +	0x6, /* HCLK/4 */
> +	0xd, /* HCLK/5 */
> +};
> +
> +#define ASPEED_SPI_HCLK_DIV(i) \
> +	(aspeed_spi_hclk_divs[(i) - 1] << CTRL_FREQ_SEL_SHIFT)
> +
> +static int aspeed_spi_do_calibration(struct aspeed_spi_chip *chip) {
> +	struct aspeed_spi *aspi = chip->aspi;
> +	const struct aspeed_spi_data *data = aspi->data;
> +	u32 ahb_freq = aspi->clk_freq;
> +	u32 max_freq = chip->clk_freq;
> +	u32 ctl_val;
> +	u8 *golden_buf = NULL;
> +	u8 *test_buf = NULL;
> +	int i, rc, best_div = -1;
> +
> +	dev_dbg(aspi->dev, "calculate timing compensation - AHB freq: %d MHz",
> +		ahb_freq / 1000000);
> +
> +	/*
> +	 * use the related low frequency to get check calibration data
> +	 * and get golden data.
> +	 */
> +	ctl_val = chip->ctl_val[ASPEED_SPI_READ] & data->hclk_mask;
> +	writel(ctl_val, chip->ctl);
> +
> +	test_buf = kzalloc(CALIBRATE_BUF_SIZE * 2, GFP_KERNEL);
> +	if (!test_buf)
> +		return -ENOMEM;
> +
> +	golden_buf = test_buf + CALIBRATE_BUF_SIZE;
> +
> +	memcpy_fromio(golden_buf, chip->ahb_base, CALIBRATE_BUF_SIZE);
> +	if (!aspeed_spi_check_calib_data(golden_buf, CALIBRATE_BUF_SIZE)) {
> +		dev_info(aspi->dev, "Calibration area too uniform, using low speed");
> +		goto no_calib;
> +	}
> +
> +#if defined(VERBOSE_DEBUG)
> +	print_hex_dump_bytes(DEVICE_NAME "  good: ", DUMP_PREFIX_NONE,
> +			     golden_buf, 0x100);
> +#endif
> +
> +	/* Now we iterate the HCLK dividers until we find our breaking point */
> +	for (i = ARRAY_SIZE(aspeed_spi_hclk_divs); i > data->hdiv_max - 1; i--) {
> +		u32 tv, freq;
> +
> +		freq = ahb_freq / i;
> +		if (freq > max_freq)
> +			continue;
> +
> +		/* Set the timing */
> +		tv = chip->ctl_val[ASPEED_SPI_READ] | ASPEED_SPI_HCLK_DIV(i);
> +		writel(tv, chip->ctl);
> +		dev_dbg(aspi->dev, "Trying HCLK/%d [%08x] ...", i, tv);
> +		rc = data->calibrate(chip, i, golden_buf, test_buf);
> +		if (rc == 0)
> +			best_div = i;
> +	}
> +
> +	/* Nothing found ? */
> +	if (best_div < 0) {
> +		dev_warn(aspi->dev, "No good frequency, using dumb slow");
> +	} else {
> +		dev_dbg(aspi->dev, "Found good read timings at HCLK/%d",
> best_div);
> +
> +		/* Record the freq */
> +		for (i = 0; i < ASPEED_SPI_MAX; i++)
> +			chip->ctl_val[i] = (chip->ctl_val[i] & data->hclk_mask) |
> +				ASPEED_SPI_HCLK_DIV(best_div);
> +	}
> +
> +no_calib:

- Maybe, if the calibration process is not executed, the frequency setting calculated from max_frequency in the device tree can be filled in FMC10 instead of using dumb slow one, 12.5MHz, always.
 For example, except for uniform content case, the calibration process will be ignored when SPI clock frequency in the device tree is smaller than 40MHz.
- The function, aspeed_2600_spi_clk_basic_setting, in [2] can be added to support lower SPI clock frequency, e.g., 4MHz.
 For AST2600, SPI clock frequency can be calculated by HCLK/(FMC10[27:24] + FMC10[11:8]).

> +	writel(chip->ctl_val[ASPEED_SPI_READ], chip->ctl);
> +	kfree(test_buf);
> +	return 0;
> +}
> +
> +#define TIMING_DELAY_DI		BIT(3)
> +#define TIMING_DELAY_HCYCLE_MAX	5
> +#define TIMING_REG_AST2600(chip)				\
> +	((chip)->aspi->regs + (chip)->aspi->data->timing +	\
> +	 (chip)->cs * 4)
> +
> +static int aspeed_spi_ast2600_calibrate(struct aspeed_spi_chip *chip, u32
> hdiv,
> +					const u8 *golden_buf, u8 *test_buf) {
> +	struct aspeed_spi *aspi = chip->aspi;
> +	int hcycle;
> +	u32 shift = (hdiv - 2) << 3;
> +	u32 mask = ~(0xfu << shift);
> +	u32 fread_timing_val = 0;
> +
> +	for (hcycle = 0; hcycle <= TIMING_DELAY_HCYCLE_MAX; hcycle++) {
> +		int delay_ns;
> +		bool pass = false;
> +
> +		fread_timing_val &= mask;
> +		fread_timing_val |= hcycle << shift;
> +
> +		/* no DI input delay first  */
> +		writel(fread_timing_val, TIMING_REG_AST2600(chip));
> +		pass = aspeed_spi_check_reads(chip, golden_buf, test_buf);
> +		dev_dbg(aspi->dev,
> +			"  * [%08x] %d HCLK delay, DI delay none : %s",
> +			fread_timing_val, hcycle, pass ? "PASS" : "FAIL");
> +		if (pass)
> +			return 0;
> +
> +		/* Add DI input delays  */
> +		fread_timing_val &= mask;
> +		fread_timing_val |= (TIMING_DELAY_DI | hcycle) << shift;
> +
> +		for (delay_ns = 0; delay_ns < 0x10; delay_ns++) {
> +			fread_timing_val &= ~(0xf << (4 + shift));
> +			fread_timing_val |= delay_ns << (4 + shift);
> +
> +			writel(fread_timing_val, TIMING_REG_AST2600(chip));
> +			pass = aspeed_spi_check_reads(chip, golden_buf, test_buf);
> +			dev_dbg(aspi->dev,
> +				"  * [%08x] %d HCLK delay, DI delay %d.%dns : %s",
> +				fread_timing_val, hcycle, (delay_ns + 1) / 2,
> +				(delay_ns + 1) & 1 ? 5 : 5, pass ? "PASS" : "FAIL");
> +			/*
> +			 * TODO: This is optimistic. We should look
> +			 * for a working interval and save the middle
> +			 * value in the read timing register.
> +			 */
> +			if (pass)
> +				return 0;
> +		}
> +	}
> +
> +	/* No good setting for this frequency */
> +	return -1;
> +}
> +
>  /*
>   * Platform definitions
>   */
> @@ -820,6 +1077,10 @@ static const struct aspeed_spi_data
> ast2400_fmc_data = {
>  	.hastype       = true,
>  	.we0	       = 16,
>  	.ctl0	       = CE0_CTRL_REG,
> +	.timing	       = CE0_TIMING_COMPENSATION_REG,
> +	.hclk_mask     = 0xfffff0ff,
> +	.hdiv_max      = 1,
> +	.calibrate     = aspeed_spi_calibrate,
>  	.segment_start = aspeed_spi_segment_start,
>  	.segment_end   = aspeed_spi_segment_end,
>  	.segment_reg   = aspeed_spi_segment_reg,
> @@ -830,6 +1091,10 @@ static const struct aspeed_spi_data
> ast2400_spi_data = {
>  	.hastype       = false,
>  	.we0	       = 0,
>  	.ctl0	       = 0x04,
> +	.timing	       = 0x14,
> +	.hclk_mask     = 0xfffff0ff,
> +	.hdiv_max      = 1,
> +	.calibrate     = aspeed_spi_calibrate,
>  	/* No segment registers */
>  };
> 
> @@ -838,6 +1103,10 @@ static const struct aspeed_spi_data
> ast2500_fmc_data = {
>  	.hastype       = true,
>  	.we0	       = 16,
>  	.ctl0	       = CE0_CTRL_REG,
> +	.timing	       = CE0_TIMING_COMPENSATION_REG,
> +	.hclk_mask     = 0xfffff0ff,
> +	.hdiv_max      = 1,
> +	.calibrate     = aspeed_spi_calibrate,
>  	.segment_start = aspeed_spi_segment_start,
>  	.segment_end   = aspeed_spi_segment_end,
>  	.segment_reg   = aspeed_spi_segment_reg,
> @@ -848,6 +1117,10 @@ static const struct aspeed_spi_data
> ast2500_spi_data = {
>  	.hastype       = false,
>  	.we0	       = 16,
>  	.ctl0	       = CE0_CTRL_REG,
> +	.timing	       = CE0_TIMING_COMPENSATION_REG,
> +	.hclk_mask     = 0xfffff0ff,
> +	.hdiv_max      = 1,
> +	.calibrate     = aspeed_spi_calibrate,
>  	.segment_start = aspeed_spi_segment_start,
>  	.segment_end   = aspeed_spi_segment_end,
>  	.segment_reg   = aspeed_spi_segment_reg,
> @@ -859,6 +1132,10 @@ static const struct aspeed_spi_data
> ast2600_fmc_data = {
>  	.mode_bits     = SPI_RX_QUAD | SPI_RX_QUAD,
>  	.we0	       = 16,
>  	.ctl0	       = CE0_CTRL_REG,
> +	.timing	       = CE0_TIMING_COMPENSATION_REG,
> +	.hclk_mask     = 0xf0fff0ff,
> +	.hdiv_max      = 2,
> +	.calibrate     = aspeed_spi_ast2600_calibrate,
>  	.segment_start = aspeed_spi_segment_ast2600_start,
>  	.segment_end   = aspeed_spi_segment_ast2600_end,
>  	.segment_reg   = aspeed_spi_segment_ast2600_reg,
> @@ -870,6 +1147,10 @@ static const struct aspeed_spi_data
> ast2600_spi_data = {
>  	.mode_bits     = SPI_RX_QUAD | SPI_RX_QUAD,
>  	.we0	       = 16,
>  	.ctl0	       = CE0_CTRL_REG,
> +	.timing	       = CE0_TIMING_COMPENSATION_REG,
> +	.hclk_mask     = 0xf0fff0ff,
> +	.hdiv_max      = 2,
> +	.calibrate     = aspeed_spi_ast2600_calibrate,
>  	.segment_start = aspeed_spi_segment_ast2600_start,
>  	.segment_end   = aspeed_spi_segment_ast2600_end,
>  	.segment_reg   = aspeed_spi_segment_ast2600_reg,
> --
> 2.34.1


Best Wishes,
Chin-Ting
Cédric Le Goater March 30, 2022, 12:14 p.m. UTC | #2
On 3/30/22 13:53, Chin-Ting Kuo wrote:
> Hi Cédric Le,
> 
>> -----Original Message-----
>> From: Cédric Le Goater <clg@kaod.org>
>> Sent: Friday, March 25, 2022 6:09 PM
>> To: linux-spi@vger.kernel.org; linux-mtd@lists.infradead.org
>> Subject: [PATCH v4 08/11] spi: aspeed: Calibrate read timings
>>
>> To accommodate the different response time of SPI transfers on different
>> boards and different SPI NOR devices, the Aspeed controllers provide a set of
>> Read Timing Compensation registers to tune the timing delays depending on
>> the frequency being used. The AST2600 SoC has one of these registers per
>> device. On the AST2500 and AST2400 SoCs, the timing register is shared by all
>> devices which is problematic to get good results other than for one device.
>>
>> The algorithm first reads a golden buffer at low speed and then performs reads
>> with different clocks and delay cycle settings to find a breaking point. This
>> selects a default good frequency for the CEx control register.
>> The current settings are a bit optimistic as we pick the first delay giving good
>> results. A safer approach would be to determine an interval and choose the
>> middle value.
>>
>> Calibration is performed when the direct mapping for reads is created.
>> Since the underlying spi-nor object needs to be initialized to create the
>> spi_mem operation for direct mapping, we should be fine. Having a specific
>> API would clarify the requirements though.
>>
>> Cc: Pratyush Yadav <p.yadav@ti.com>
>> Reviewed-by: Joel Stanley <joel@jms.id.au>
>> Tested-by: Joel Stanley <joel@jms.id.au>
>> Tested-by: Tao Ren <rentao.bupt@gmail.com>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>   drivers/spi/spi-aspeed-smc.c | 281
>> +++++++++++++++++++++++++++++++++++
>>   1 file changed, 281 insertions(+)
>>
>> diff --git a/drivers/spi/spi-aspeed-smc.c b/drivers/spi/spi-aspeed-smc.c index
>> 7f306da7c44e..660451667a39 100644
>> --- a/drivers/spi/spi-aspeed-smc.c
>> +++ b/drivers/spi/spi-aspeed-smc.c
>> @@ -33,6 +33,8 @@
>>   #define   CTRL_IO_ADDRESS_4B		BIT(13)	/* AST2400 SPI only */
>>   #define   CTRL_IO_DUMMY_SET(dummy)					\
>>   	(((((dummy) >> 2) & 0x1) << 14) | (((dummy) & 0x3) << 6))
>> +#define   CTRL_FREQ_SEL_SHIFT		8
>> +#define   CTRL_FREQ_SEL_MASK		GENMASK(11,
>> CTRL_FREQ_SEL_SHIFT)
>>   #define   CTRL_CE_STOP_ACTIVE		BIT(2)
>>   #define   CTRL_IO_MODE_CMD_MASK		GENMASK(1, 0)
>>   #define   CTRL_IO_MODE_NORMAL		0x0
>> @@ -45,6 +47,9 @@
>>   /* CEx Address Decoding Range Register */
>>   #define CE0_SEGMENT_ADDR_REG		0x30
>>
>> +/* CEx Read timing compensation register */
>> +#define CE0_TIMING_COMPENSATION_REG	0x94
>> +
>>   enum aspeed_spi_ctl_reg_value {
>>   	ASPEED_SPI_BASE,
>>   	ASPEED_SPI_READ,
>> @@ -70,10 +75,15 @@ struct aspeed_spi_data {
>>   	bool	hastype;
>>   	u32	mode_bits;
>>   	u32	we0;
>> +	u32	timing;
>> +	u32	hclk_mask;
>> +	u32	hdiv_max;
>>
>>   	u32 (*segment_start)(struct aspeed_spi *aspi, u32 reg);
>>   	u32 (*segment_end)(struct aspeed_spi *aspi, u32 reg);
>>   	u32 (*segment_reg)(struct aspeed_spi *aspi, u32 start, u32 end);
>> +	int (*calibrate)(struct aspeed_spi_chip *chip, u32 hdiv,
>> +			 const u8 *golden_buf, u8 *test_buf);
>>   };
>>
>>   #define ASPEED_SPI_MAX_NUM_CS	5
>> @@ -517,6 +527,8 @@ static int aspeed_spi_chip_adjust_window(struct
>> aspeed_spi_chip *chip,
>>   	return 0;
>>   }
>>
>> +static int aspeed_spi_do_calibration(struct aspeed_spi_chip *chip);
>> +
>>   static int aspeed_spi_dirmap_create(struct spi_mem_dirmap_desc *desc)  {
>>   	struct aspeed_spi *aspi =
>> spi_controller_get_devdata(desc->mem->spi->master);
>> @@ -565,6 +577,8 @@ static int aspeed_spi_dirmap_create(struct
>> spi_mem_dirmap_desc *desc)
>>   	chip->ctl_val[ASPEED_SPI_READ] = ctl_val;
>>   	writel(chip->ctl_val[ASPEED_SPI_READ], chip->ctl);
>>
>> +	ret = aspeed_spi_do_calibration(chip);
>> +
>>   	dev_info(aspi->dev, "CE%d read buswidth:%d [0x%08x]\n",
>>   		 chip->cs, op->data.buswidth, chip->ctl_val[ASPEED_SPI_READ]);
>>
>> @@ -812,6 +826,249 @@ static u32 aspeed_spi_segment_ast2600_reg(struct
>> aspeed_spi *aspi,
>>   		((end - 1) & AST2600_SEG_ADDR_MASK);
>>   }
>>
>> +/*
>> + * Read timing compensation sequences
>> + */
>> +
>> +#define CALIBRATE_BUF_SIZE SZ_16K
>> +
>> +static bool aspeed_spi_check_reads(struct aspeed_spi_chip *chip,
>> +				   const u8 *golden_buf, u8 *test_buf) {
>> +	int i;
>> +
>> +	for (i = 0; i < 10; i++) {
>> +		memcpy_fromio(test_buf, chip->ahb_base, CALIBRATE_BUF_SIZE);
>> +		if (memcmp(test_buf, golden_buf, CALIBRATE_BUF_SIZE) != 0) { #if
>> +defined(VERBOSE_DEBUG)
>> +			print_hex_dump_bytes(DEVICE_NAME "  fail: ",
>> DUMP_PREFIX_NONE,
>> +					     test_buf, 0x100);
>> +#endif
>> +			return false;
>> +		}
>> +	}
>> +	return true;
>> +}
>> +
>> +#define FREAD_TPASS(i)	(((i) / 2) | (((i) & 1) ? 0 : 8))
>> +
>> +/*
>> + * The timing register is shared by all devices. Only update for CE0.
>> + */
>> +static int aspeed_spi_calibrate(struct aspeed_spi_chip *chip, u32 hdiv,
>> +				const u8 *golden_buf, u8 *test_buf) {
>> +	struct aspeed_spi *aspi = chip->aspi;
>> +	const struct aspeed_spi_data *data = aspi->data;
>> +	int i;
>> +	int good_pass = -1, pass_count = 0;
>> +	u32 shift = (hdiv - 1) << 2;
>> +	u32 mask = ~(0xfu << shift);
>> +	u32 fread_timing_val = 0;
>> +
>> +	/* Try HCLK delay 0..5, each one with/without delay and look for a
>> +	 * good pair.
>> +	 */
>> +	for (i = 0; i < 12; i++) {
>> +		bool pass;
>> +
>> +		if (chip->cs == 0) {
>> +			fread_timing_val &= mask;
>> +			fread_timing_val |= FREAD_TPASS(i) << shift;
>> +			writel(fread_timing_val, aspi->regs + data->timing);
>> +		}
>> +		pass = aspeed_spi_check_reads(chip, golden_buf, test_buf);
>> +		dev_dbg(aspi->dev,
>> +			"  * [%08x] %d HCLK delay, %dns DI delay : %s",
>> +			fread_timing_val, i / 2, (i & 1) ? 0 : 4,
>> +			pass ? "PASS" : "FAIL");
>> +		if (pass) {
>> +			pass_count++;
>> +			if (pass_count == 3) {
>> +				good_pass = i - 1;
>> +				break;
>> +			}
>> +		} else {
>> +			pass_count = 0;
>> +		}
>> +	}
>> +
>> +	/* No good setting for this frequency */
>> +	if (good_pass < 0)
>> +		return -1;
>> +
>> +	/* We have at least one pass of margin, let's use first pass */
>> +	if (chip->cs == 0) {
>> +		fread_timing_val &= mask;
>> +		fread_timing_val |= FREAD_TPASS(good_pass) << shift;
>> +		writel(fread_timing_val, aspi->regs + data->timing);
>> +	}
>> +	dev_dbg(aspi->dev, " * -> good is pass %d [0x%08x]",
>> +		good_pass, fread_timing_val);
>> +	return 0;
>> +}
>> +
>> +static bool aspeed_spi_check_calib_data(const u8 *test_buf, u32 size) {
>> +	const u32 *tb32 = (const u32 *)test_buf;
>> +	u32 i, cnt = 0;
>> +
>> +	/* We check if we have enough words that are neither all 0
>> +	 * nor all 1's so the calibration can be considered valid.
>> +	 *
>> +	 * I use an arbitrary threshold for now of 64
>> +	 */
>> +	size >>= 2;
>> +	for (i = 0; i < size; i++) {
>> +		if (tb32[i] != 0 && tb32[i] != 0xffffffff)
>> +			cnt++;
>> +	}
>> +	return cnt >= 64;
>> +}
>> +
>> +static const u32 aspeed_spi_hclk_divs[] = {
>> +	0xf, /* HCLK */
>> +	0x7, /* HCLK/2 */
>> +	0xe, /* HCLK/3 */
>> +	0x6, /* HCLK/4 */
>> +	0xd, /* HCLK/5 */
>> +};
>> +
>> +#define ASPEED_SPI_HCLK_DIV(i) \
>> +	(aspeed_spi_hclk_divs[(i) - 1] << CTRL_FREQ_SEL_SHIFT)
>> +
>> +static int aspeed_spi_do_calibration(struct aspeed_spi_chip *chip) {
>> +	struct aspeed_spi *aspi = chip->aspi;
>> +	const struct aspeed_spi_data *data = aspi->data;
>> +	u32 ahb_freq = aspi->clk_freq;
>> +	u32 max_freq = chip->clk_freq;
>> +	u32 ctl_val;
>> +	u8 *golden_buf = NULL;
>> +	u8 *test_buf = NULL;
>> +	int i, rc, best_div = -1;
>> +
>> +	dev_dbg(aspi->dev, "calculate timing compensation - AHB freq: %d MHz",
>> +		ahb_freq / 1000000);
>> +
>> +	/*
>> +	 * use the related low frequency to get check calibration data
>> +	 * and get golden data.
>> +	 */
>> +	ctl_val = chip->ctl_val[ASPEED_SPI_READ] & data->hclk_mask;
>> +	writel(ctl_val, chip->ctl);
>> +
>> +	test_buf = kzalloc(CALIBRATE_BUF_SIZE * 2, GFP_KERNEL);
>> +	if (!test_buf)
>> +		return -ENOMEM;
>> +
>> +	golden_buf = test_buf + CALIBRATE_BUF_SIZE;
>> +
>> +	memcpy_fromio(golden_buf, chip->ahb_base, CALIBRATE_BUF_SIZE);
>> +	if (!aspeed_spi_check_calib_data(golden_buf, CALIBRATE_BUF_SIZE)) {
>> +		dev_info(aspi->dev, "Calibration area too uniform, using low speed");
>> +		goto no_calib;
>> +	}
>> +
>> +#if defined(VERBOSE_DEBUG)
>> +	print_hex_dump_bytes(DEVICE_NAME "  good: ", DUMP_PREFIX_NONE,
>> +			     golden_buf, 0x100);
>> +#endif
>> +
>> +	/* Now we iterate the HCLK dividers until we find our breaking point */
>> +	for (i = ARRAY_SIZE(aspeed_spi_hclk_divs); i > data->hdiv_max - 1; i--) {
>> +		u32 tv, freq;
>> +
>> +		freq = ahb_freq / i;
>> +		if (freq > max_freq)
>> +			continue;
>> +
>> +		/* Set the timing */
>> +		tv = chip->ctl_val[ASPEED_SPI_READ] | ASPEED_SPI_HCLK_DIV(i);
>> +		writel(tv, chip->ctl);
>> +		dev_dbg(aspi->dev, "Trying HCLK/%d [%08x] ...", i, tv);
>> +		rc = data->calibrate(chip, i, golden_buf, test_buf);
>> +		if (rc == 0)
>> +			best_div = i;
>> +	}
>> +
>> +	/* Nothing found ? */
>> +	if (best_div < 0) {
>> +		dev_warn(aspi->dev, "No good frequency, using dumb slow");
>> +	} else {
>> +		dev_dbg(aspi->dev, "Found good read timings at HCLK/%d",
>> best_div);
>> +
>> +		/* Record the freq */
>> +		for (i = 0; i < ASPEED_SPI_MAX; i++)
>> +			chip->ctl_val[i] = (chip->ctl_val[i] & data->hclk_mask) |
>> +				ASPEED_SPI_HCLK_DIV(best_div);
>> +	}
>> +
>> +no_calib:
> 
> - Maybe, if the calibration process is not executed, the frequency setting calculated from max_frequency in the device tree can be filled in FMC10 instead of using dumb slow one, 12.5MHz, always.

Indeed.

>   For example, except for uniform content case, the calibration process will be ignored when SPI clock frequency in the device tree is smaller than 40MHz.
> - The function, aspeed_2600_spi_clk_basic_setting, in [2] can be added to support lower SPI clock frequency, e.g., 4MHz.
>   For AST2600, SPI clock frequency can be calculated by HCLK/(FMC10[27:24] + FMC10[11:8]).

Could you please send patches on top of this series ? Here are the branches :

   https://github.com/legoater/linux/commits/openbmc-5.15
   https://github.com/legoater/linux/commits/aspeed         (mainline)

I can include them when I resend a v5.

Thanks,

C.


> 
>> +	writel(chip->ctl_val[ASPEED_SPI_READ], chip->ctl);
>> +	kfree(test_buf);
>> +	return 0;
>> +}
>> +
>> +#define TIMING_DELAY_DI		BIT(3)
>> +#define TIMING_DELAY_HCYCLE_MAX	5
>> +#define TIMING_REG_AST2600(chip)				\
>> +	((chip)->aspi->regs + (chip)->aspi->data->timing +	\
>> +	 (chip)->cs * 4)
>> +
>> +static int aspeed_spi_ast2600_calibrate(struct aspeed_spi_chip *chip, u32
>> hdiv,
>> +					const u8 *golden_buf, u8 *test_buf) {
>> +	struct aspeed_spi *aspi = chip->aspi;
>> +	int hcycle;
>> +	u32 shift = (hdiv - 2) << 3;
>> +	u32 mask = ~(0xfu << shift);
>> +	u32 fread_timing_val = 0;
>> +
>> +	for (hcycle = 0; hcycle <= TIMING_DELAY_HCYCLE_MAX; hcycle++) {
>> +		int delay_ns;
>> +		bool pass = false;
>> +
>> +		fread_timing_val &= mask;
>> +		fread_timing_val |= hcycle << shift;
>> +
>> +		/* no DI input delay first  */
>> +		writel(fread_timing_val, TIMING_REG_AST2600(chip));
>> +		pass = aspeed_spi_check_reads(chip, golden_buf, test_buf);
>> +		dev_dbg(aspi->dev,
>> +			"  * [%08x] %d HCLK delay, DI delay none : %s",
>> +			fread_timing_val, hcycle, pass ? "PASS" : "FAIL");
>> +		if (pass)
>> +			return 0;
>> +
>> +		/* Add DI input delays  */
>> +		fread_timing_val &= mask;
>> +		fread_timing_val |= (TIMING_DELAY_DI | hcycle) << shift;
>> +
>> +		for (delay_ns = 0; delay_ns < 0x10; delay_ns++) {
>> +			fread_timing_val &= ~(0xf << (4 + shift));
>> +			fread_timing_val |= delay_ns << (4 + shift);
>> +
>> +			writel(fread_timing_val, TIMING_REG_AST2600(chip));
>> +			pass = aspeed_spi_check_reads(chip, golden_buf, test_buf);
>> +			dev_dbg(aspi->dev,
>> +				"  * [%08x] %d HCLK delay, DI delay %d.%dns : %s",
>> +				fread_timing_val, hcycle, (delay_ns + 1) / 2,
>> +				(delay_ns + 1) & 1 ? 5 : 5, pass ? "PASS" : "FAIL");
>> +			/*
>> +			 * TODO: This is optimistic. We should look
>> +			 * for a working interval and save the middle
>> +			 * value in the read timing register.
>> +			 */
>> +			if (pass)
>> +				return 0;
>> +		}
>> +	}
>> +
>> +	/* No good setting for this frequency */
>> +	return -1;
>> +}
>> +
>>   /*
>>    * Platform definitions
>>    */
>> @@ -820,6 +1077,10 @@ static const struct aspeed_spi_data
>> ast2400_fmc_data = {
>>   	.hastype       = true,
>>   	.we0	       = 16,
>>   	.ctl0	       = CE0_CTRL_REG,
>> +	.timing	       = CE0_TIMING_COMPENSATION_REG,
>> +	.hclk_mask     = 0xfffff0ff,
>> +	.hdiv_max      = 1,
>> +	.calibrate     = aspeed_spi_calibrate,
>>   	.segment_start = aspeed_spi_segment_start,
>>   	.segment_end   = aspeed_spi_segment_end,
>>   	.segment_reg   = aspeed_spi_segment_reg,
>> @@ -830,6 +1091,10 @@ static const struct aspeed_spi_data
>> ast2400_spi_data = {
>>   	.hastype       = false,
>>   	.we0	       = 0,
>>   	.ctl0	       = 0x04,
>> +	.timing	       = 0x14,
>> +	.hclk_mask     = 0xfffff0ff,
>> +	.hdiv_max      = 1,
>> +	.calibrate     = aspeed_spi_calibrate,
>>   	/* No segment registers */
>>   };
>>
>> @@ -838,6 +1103,10 @@ static const struct aspeed_spi_data
>> ast2500_fmc_data = {
>>   	.hastype       = true,
>>   	.we0	       = 16,
>>   	.ctl0	       = CE0_CTRL_REG,
>> +	.timing	       = CE0_TIMING_COMPENSATION_REG,
>> +	.hclk_mask     = 0xfffff0ff,
>> +	.hdiv_max      = 1,
>> +	.calibrate     = aspeed_spi_calibrate,
>>   	.segment_start = aspeed_spi_segment_start,
>>   	.segment_end   = aspeed_spi_segment_end,
>>   	.segment_reg   = aspeed_spi_segment_reg,
>> @@ -848,6 +1117,10 @@ static const struct aspeed_spi_data
>> ast2500_spi_data = {
>>   	.hastype       = false,
>>   	.we0	       = 16,
>>   	.ctl0	       = CE0_CTRL_REG,
>> +	.timing	       = CE0_TIMING_COMPENSATION_REG,
>> +	.hclk_mask     = 0xfffff0ff,
>> +	.hdiv_max      = 1,
>> +	.calibrate     = aspeed_spi_calibrate,
>>   	.segment_start = aspeed_spi_segment_start,
>>   	.segment_end   = aspeed_spi_segment_end,
>>   	.segment_reg   = aspeed_spi_segment_reg,
>> @@ -859,6 +1132,10 @@ static const struct aspeed_spi_data
>> ast2600_fmc_data = {
>>   	.mode_bits     = SPI_RX_QUAD | SPI_RX_QUAD,
>>   	.we0	       = 16,
>>   	.ctl0	       = CE0_CTRL_REG,
>> +	.timing	       = CE0_TIMING_COMPENSATION_REG,
>> +	.hclk_mask     = 0xf0fff0ff,
>> +	.hdiv_max      = 2,
>> +	.calibrate     = aspeed_spi_ast2600_calibrate,
>>   	.segment_start = aspeed_spi_segment_ast2600_start,
>>   	.segment_end   = aspeed_spi_segment_ast2600_end,
>>   	.segment_reg   = aspeed_spi_segment_ast2600_reg,
>> @@ -870,6 +1147,10 @@ static const struct aspeed_spi_data
>> ast2600_spi_data = {
>>   	.mode_bits     = SPI_RX_QUAD | SPI_RX_QUAD,
>>   	.we0	       = 16,
>>   	.ctl0	       = CE0_CTRL_REG,
>> +	.timing	       = CE0_TIMING_COMPENSATION_REG,
>> +	.hclk_mask     = 0xf0fff0ff,
>> +	.hdiv_max      = 2,
>> +	.calibrate     = aspeed_spi_ast2600_calibrate,
>>   	.segment_start = aspeed_spi_segment_ast2600_start,
>>   	.segment_end   = aspeed_spi_segment_ast2600_end,
>>   	.segment_reg   = aspeed_spi_segment_ast2600_reg,
>> --
>> 2.34.1
> 
> 
> Best Wishes,
> Chin-Ting
Pratyush Yadav March 30, 2022, 7:45 p.m. UTC | #3
On 25/03/22 11:08AM, Cédric Le Goater wrote:
> Use direct mapping to read the flash device contents. This operation
> mode is called "Command mode" on Aspeed SoC SMC controllers. It uses a
> Control Register for the settings to apply when a memory operation is
> performed on the flash device mapping window.
> 
> If the window is not big enough, fall back to the "User mode" to
> perform the read.
> 
> Since direct mapping now handles all reads of the flash device
> contents, also use memcpy_fromio for other address spaces, such as
> SFDP.
> 
> Direct mapping for writes will come later when validated.
> 
> Reviewed-by: Joel Stanley <joel@jms.id.au>
> Tested-by: Joel Stanley <joel@jms.id.au>
> Tested-by: Tao Ren <rentao.bupt@gmail.com>
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  drivers/spi/spi-aspeed-smc.c | 67 ++++++++++++++++++++++++++++++++++--
>  1 file changed, 65 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/spi/spi-aspeed-smc.c b/drivers/spi/spi-aspeed-smc.c
> index 997ec2e45118..0951766baef4 100644
> --- a/drivers/spi/spi-aspeed-smc.c
> +++ b/drivers/spi/spi-aspeed-smc.c
> @@ -322,8 +322,8 @@ static int do_aspeed_spi_exec_op(struct spi_mem *mem, const struct spi_mem_op *o
>  		if (!op->addr.nbytes)
>  			ret = aspeed_spi_read_reg(chip, op);
>  		else
> -			ret = aspeed_spi_read_user(chip, op, op->addr.val,
> -						   op->data.nbytes, op->data.buf.in);
> +			memcpy_fromio(op->data.buf.in, chip->ahb_base + op->addr.val,
> +				      op->data.nbytes);

I think I commented on this earlier too, though I failed to respond to 
your reply. Let me bring the topic back up. I think this can cause an 
invalid memory address to be accessed. Not all SPI MEM consumers will 
use dirmap APIs, and they won't use them all the time. For example, SPI 
NOR can perform some operations to reset the flash before shutting down. 
For example, SPI NOR turns off 4byte address mode during shutdown. This 
will be a register read/write operation, which usually has a different 
opcode.

So I think you should keep dirmap and exec_op() independent of each 
other.

>  	} else {
>  		if (!op->addr.nbytes)
>  			ret = aspeed_spi_write_reg(chip, op);
> @@ -403,10 +403,73 @@ static int aspeed_spi_chip_set_default_window(struct aspeed_spi_chip *chip)
>  	return chip->ahb_window_size ? 0 : -1;
>  }
>  
> +static int aspeed_spi_dirmap_create(struct spi_mem_dirmap_desc *desc)
> +{
> +	struct aspeed_spi *aspi = spi_controller_get_devdata(desc->mem->spi->master);
> +	struct aspeed_spi_chip *chip = &aspi->chips[desc->mem->spi->chip_select];
> +	struct spi_mem_op *op = &desc->info.op_tmpl;
> +	u32 ctl_val;
> +	int ret = 0;
> +
> +	chip->clk_freq = desc->mem->spi->max_speed_hz;
> +
> +	/* Only for reads */
> +	if (op->data.dir != SPI_MEM_DATA_IN)
> +		return -EOPNOTSUPP;
> +
> +	if (desc->info.length > chip->ahb_window_size)
> +		dev_warn(aspi->dev, "CE%d window (%dMB) too small for mapping",
> +			 chip->cs, chip->ahb_window_size >> 20);
> +
> +	/* Define the default IO read settings */
> +	ctl_val = readl(chip->ctl) & ~CTRL_IO_CMD_MASK;
> +	ctl_val |= aspeed_spi_get_io_mode(op) |
> +		op->cmd.opcode << CTRL_COMMAND_SHIFT |
> +		CTRL_IO_DUMMY_SET(op->dummy.nbytes / op->dummy.buswidth) |
> +		CTRL_IO_MODE_READ;
> +
> +	/* Tune 4BYTE address mode */
> +	if (op->addr.nbytes) {
> +		u32 addr_mode = readl(aspi->regs + CE_CTRL_REG);
> +
> +		if (op->addr.nbytes == 4)
> +			addr_mode |= (0x11 << chip->cs);
> +		else
> +			addr_mode &= ~(0x11 << chip->cs);
> +		writel(addr_mode, aspi->regs + CE_CTRL_REG);
> +	}
> +
> +	/* READ mode is the controller default setting */
> +	chip->ctl_val[ASPEED_SPI_READ] = ctl_val;
> +	writel(chip->ctl_val[ASPEED_SPI_READ], chip->ctl);
> +
> +	dev_info(aspi->dev, "CE%d read buswidth:%d [0x%08x]\n",
> +		 chip->cs, op->data.buswidth, chip->ctl_val[ASPEED_SPI_READ]);
> +
> +	return ret;
> +}
> +
> +static ssize_t aspeed_spi_dirmap_read(struct spi_mem_dirmap_desc *desc,
> +				      u64 offset, size_t len, void *buf)
> +{
> +	struct aspeed_spi *aspi = spi_controller_get_devdata(desc->mem->spi->master);
> +	struct aspeed_spi_chip *chip = &aspi->chips[desc->mem->spi->chip_select];
> +
> +	/* Switch to USER command mode if mapping window is too small */
> +	if (chip->ahb_window_size < offset + len)
> +		aspeed_spi_read_user(chip, &desc->info.op_tmpl, offset, len, buf);
> +	else
> +		memcpy_fromio(buf, chip->ahb_base + offset, len);
> +
> +	return len;
> +}
> +
>  static const struct spi_controller_mem_ops aspeed_spi_mem_ops = {
>  	.supports_op = aspeed_spi_supports_op,
>  	.exec_op = aspeed_spi_exec_op,
>  	.get_name = aspeed_spi_get_name,
> +	.dirmap_create = aspeed_spi_dirmap_create,
> +	.dirmap_read = aspeed_spi_dirmap_read,
>  };
>  
>  static void aspeed_spi_chip_set_type(struct aspeed_spi *aspi, unsigned int cs, int type)
> -- 
> 2.34.1
>
Pratyush Yadav March 30, 2022, 7:49 p.m. UTC | #4
> Subject: [PATCH v4 11/11] mtd: spi-nor: aspeed: set the decoding size to at least 2MB for AST2600

Nitpick: s/mtd: spi-nor: aspeed:/spi: aspeed:/

On 25/03/22 11:08AM, Cédric Le Goater wrote:
> From: Potin Lai <potin.lai@quantatw.com>
> 
> In AST2600, the unit of SPI CEx decoding range register is 1MB, and end
> address offset is set to the acctual offset - 1MB. If the flash only has
> 1MB, the end address will has same value as start address, which will
> causing unexpected errors.
> 
> This patch set the decoding size to at least 2MB to avoid decoding errors.
> 
> Tested:
> root@bletchley:~# dmesg | grep "aspeed-smc 1e631000.spi: CE0 window"
> [   59.328134] aspeed-smc 1e631000.spi: CE0 window resized to 2MB (AST2600 Decoding)
> [   59.343001] aspeed-smc 1e631000.spi: CE0 window [ 0x50000000 - 0x50200000 ] 2MB
> root@bletchley:~# devmem 0x1e631030
> 0x00100000
> 
> Signed-off-by: Potin Lai <potin.lai@quantatw.com>
> [ clg : Ported on new spi-mem driver ]
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  drivers/spi/spi-aspeed-smc.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/spi/spi-aspeed-smc.c b/drivers/spi/spi-aspeed-smc.c
> index 660451667a39..227797e13997 100644
> --- a/drivers/spi/spi-aspeed-smc.c
> +++ b/drivers/spi/spi-aspeed-smc.c
> @@ -466,6 +466,8 @@ static int aspeed_spi_set_window(struct aspeed_spi *aspi,
>   *   is correct.
>   */
>  static const struct aspeed_spi_data ast2500_spi_data;
> +static const struct aspeed_spi_data ast2600_spi_data;
> +static const struct aspeed_spi_data ast2600_fmc_data;
>  
>  static int aspeed_spi_chip_adjust_window(struct aspeed_spi_chip *chip,
>  					 u32 local_offset, u32 size)
> @@ -489,6 +491,17 @@ static int aspeed_spi_chip_adjust_window(struct aspeed_spi_chip *chip,
>  			 chip->cs, size >> 20);
>  	}
>  
> +	/*
> +	 * The decoding size of AST2600 SPI controller should set at
> +	 * least 2MB.
> +	 */
> +	if ((aspi->data == &ast2600_spi_data || aspi->data == &ast2600_fmc_data) &&
> +	    size < SZ_2M) {
> +		size = SZ_2M;
> +		dev_info(aspi->dev, "CE%d window resized to %dMB (AST2600 Decoding)",
> +			 chip->cs, size >> 20);
> +	}
> +
>  	aspeed_spi_get_windows(aspi, windows);
>  
>  	/* Adjust this chip window */
> -- 
> 2.34.1
>
Chin-Ting Kuo March 31, 2022, 2:46 a.m. UTC | #5
Hi Cédric,

> -----Original Message-----
> From: Cédric Le Goater <clg@kaod.org>
> Sent: Wednesday, March 30, 2022 8:15 PM
> To: Chin-Ting Kuo <chin-ting_kuo@aspeedtech.com>; linux-spi@vger.kernel.org;
> linux-mtd@lists.infradead.org
> Subject: Re: [PATCH v4 08/11] spi: aspeed: Calibrate read timings
> 
> On 3/30/22 13:53, Chin-Ting Kuo wrote:
> > Hi Cédric Le,
> >
> >> -----Original Message-----
> >> From: Cédric Le Goater <clg@kaod.org>
> >> Sent: Friday, March 25, 2022 6:09 PM
> >> To: linux-spi@vger.kernel.org; linux-mtd@lists.infradead.org
> >> Subject: [PATCH v4 08/11] spi: aspeed: Calibrate read timings
> >>
> >> To accommodate the different response time of SPI transfers on
> >> different boards and different SPI NOR devices, the Aspeed
> >> controllers provide a set of Read Timing Compensation registers to
> >> tune the timing delays depending on the frequency being used. The
> >> AST2600 SoC has one of these registers per device. On the AST2500 and
> >> AST2400 SoCs, the timing register is shared by all devices which is
> problematic to get good results other than for one device.
> >>
> >> The algorithm first reads a golden buffer at low speed and then
> >> performs reads with different clocks and delay cycle settings to find
> >> a breaking point. This selects a default good frequency for the CEx control
> register.
> >> The current settings are a bit optimistic as we pick the first delay
> >> giving good results. A safer approach would be to determine an
> >> interval and choose the middle value.
> >>
> >> Calibration is performed when the direct mapping for reads is created.
> >> Since the underlying spi-nor object needs to be initialized to create
> >> the spi_mem operation for direct mapping, we should be fine. Having a
> >> specific API would clarify the requirements though.
> >>
> >> Cc: Pratyush Yadav <p.yadav@ti.com>
> >> Reviewed-by: Joel Stanley <joel@jms.id.au>
> >> Tested-by: Joel Stanley <joel@jms.id.au>
> >> Tested-by: Tao Ren <rentao.bupt@gmail.com>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>   drivers/spi/spi-aspeed-smc.c | 281
> >> +++++++++++++++++++++++++++++++++++
> >>   1 file changed, 281 insertions(+)
> >>
> >> diff --git a/drivers/spi/spi-aspeed-smc.c
> >> b/drivers/spi/spi-aspeed-smc.c index
> >> 7f306da7c44e..660451667a39 100644
> >> --- a/drivers/spi/spi-aspeed-smc.c
> >> +++ b/drivers/spi/spi-aspeed-smc.c
> >> @@ -33,6 +33,8 @@
> >>   #define   CTRL_IO_ADDRESS_4B		BIT(13)	/* AST2400 SPI only
> */
> >>   #define   CTRL_IO_DUMMY_SET(dummy)					\
> >>   	(((((dummy) >> 2) & 0x1) << 14) | (((dummy) & 0x3) << 6))
> >> +#define   CTRL_FREQ_SEL_SHIFT		8
> >> +#define   CTRL_FREQ_SEL_MASK		GENMASK(11,
> >> CTRL_FREQ_SEL_SHIFT)
> >>   #define   CTRL_CE_STOP_ACTIVE		BIT(2)
> >>   #define   CTRL_IO_MODE_CMD_MASK		GENMASK(1, 0)
> >>   #define   CTRL_IO_MODE_NORMAL		0x0
> >> @@ -45,6 +47,9 @@
> >>   /* CEx Address Decoding Range Register */
> >>   #define CE0_SEGMENT_ADDR_REG		0x30
> >>
> >> +/* CEx Read timing compensation register */
> >> +#define CE0_TIMING_COMPENSATION_REG	0x94
> >> +
> >>   enum aspeed_spi_ctl_reg_value {
> >>   	ASPEED_SPI_BASE,
> >>   	ASPEED_SPI_READ,
> >> @@ -70,10 +75,15 @@ struct aspeed_spi_data {
> >>   	bool	hastype;
> >>   	u32	mode_bits;
> >>   	u32	we0;
> >> +	u32	timing;
> >> +	u32	hclk_mask;
> >> +	u32	hdiv_max;
> >>
> >>   	u32 (*segment_start)(struct aspeed_spi *aspi, u32 reg);
> >>   	u32 (*segment_end)(struct aspeed_spi *aspi, u32 reg);
> >>   	u32 (*segment_reg)(struct aspeed_spi *aspi, u32 start, u32 end);
> >> +	int (*calibrate)(struct aspeed_spi_chip *chip, u32 hdiv,
> >> +			 const u8 *golden_buf, u8 *test_buf);
> >>   };
> >>
> >>   #define ASPEED_SPI_MAX_NUM_CS	5
> >> @@ -517,6 +527,8 @@ static int aspeed_spi_chip_adjust_window(struct
> >> aspeed_spi_chip *chip,
> >>   	return 0;
> >>   }
> >>
> >> +static int aspeed_spi_do_calibration(struct aspeed_spi_chip *chip);
> >> +
> >>   static int aspeed_spi_dirmap_create(struct spi_mem_dirmap_desc
> *desc)  {
> >>   	struct aspeed_spi *aspi =
> >> spi_controller_get_devdata(desc->mem->spi->master);
> >> @@ -565,6 +577,8 @@ static int aspeed_spi_dirmap_create(struct
> >> spi_mem_dirmap_desc *desc)
> >>   	chip->ctl_val[ASPEED_SPI_READ] = ctl_val;
> >>   	writel(chip->ctl_val[ASPEED_SPI_READ], chip->ctl);
> >>
> >> +	ret = aspeed_spi_do_calibration(chip);
> >> +
> >>   	dev_info(aspi->dev, "CE%d read buswidth:%d [0x%08x]\n",
> >>   		 chip->cs, op->data.buswidth,
> chip->ctl_val[ASPEED_SPI_READ]);
> >>
> >> @@ -812,6 +826,249 @@ static u32
> >> aspeed_spi_segment_ast2600_reg(struct
> >> aspeed_spi *aspi,
> >>   		((end - 1) & AST2600_SEG_ADDR_MASK);
> >>   }
> >>
> >> +/*
> >> + * Read timing compensation sequences  */
> >> +
> >> +#define CALIBRATE_BUF_SIZE SZ_16K
> >> +
> >> +static bool aspeed_spi_check_reads(struct aspeed_spi_chip *chip,
> >> +				   const u8 *golden_buf, u8 *test_buf) {
> >> +	int i;
> >> +
> >> +	for (i = 0; i < 10; i++) {
> >> +		memcpy_fromio(test_buf, chip->ahb_base, CALIBRATE_BUF_SIZE);
> >> +		if (memcmp(test_buf, golden_buf, CALIBRATE_BUF_SIZE) != 0) { #if
> >> +defined(VERBOSE_DEBUG)
> >> +			print_hex_dump_bytes(DEVICE_NAME "  fail: ",
> >> DUMP_PREFIX_NONE,
> >> +					     test_buf, 0x100);
> >> +#endif
> >> +			return false;
> >> +		}
> >> +	}
> >> +	return true;
> >> +}
> >> +
> >> +#define FREAD_TPASS(i)	(((i) / 2) | (((i) & 1) ? 0 : 8))
> >> +
> >> +/*
> >> + * The timing register is shared by all devices. Only update for CE0.
> >> + */
> >> +static int aspeed_spi_calibrate(struct aspeed_spi_chip *chip, u32 hdiv,
> >> +				const u8 *golden_buf, u8 *test_buf) {
> >> +	struct aspeed_spi *aspi = chip->aspi;
> >> +	const struct aspeed_spi_data *data = aspi->data;
> >> +	int i;
> >> +	int good_pass = -1, pass_count = 0;
> >> +	u32 shift = (hdiv - 1) << 2;
> >> +	u32 mask = ~(0xfu << shift);
> >> +	u32 fread_timing_val = 0;
> >> +
> >> +	/* Try HCLK delay 0..5, each one with/without delay and look for a
> >> +	 * good pair.
> >> +	 */
> >> +	for (i = 0; i < 12; i++) {
> >> +		bool pass;
> >> +
> >> +		if (chip->cs == 0) {
> >> +			fread_timing_val &= mask;
> >> +			fread_timing_val |= FREAD_TPASS(i) << shift;
> >> +			writel(fread_timing_val, aspi->regs + data->timing);
> >> +		}
> >> +		pass = aspeed_spi_check_reads(chip, golden_buf, test_buf);
> >> +		dev_dbg(aspi->dev,
> >> +			"  * [%08x] %d HCLK delay, %dns DI delay : %s",
> >> +			fread_timing_val, i / 2, (i & 1) ? 0 : 4,
> >> +			pass ? "PASS" : "FAIL");
> >> +		if (pass) {
> >> +			pass_count++;
> >> +			if (pass_count == 3) {
> >> +				good_pass = i - 1;
> >> +				break;
> >> +			}
> >> +		} else {
> >> +			pass_count = 0;
> >> +		}
> >> +	}
> >> +
> >> +	/* No good setting for this frequency */
> >> +	if (good_pass < 0)
> >> +		return -1;
> >> +
> >> +	/* We have at least one pass of margin, let's use first pass */
> >> +	if (chip->cs == 0) {
> >> +		fread_timing_val &= mask;
> >> +		fread_timing_val |= FREAD_TPASS(good_pass) << shift;
> >> +		writel(fread_timing_val, aspi->regs + data->timing);
> >> +	}
> >> +	dev_dbg(aspi->dev, " * -> good is pass %d [0x%08x]",
> >> +		good_pass, fread_timing_val);
> >> +	return 0;
> >> +}
> >> +
> >> +static bool aspeed_spi_check_calib_data(const u8 *test_buf, u32 size) {
> >> +	const u32 *tb32 = (const u32 *)test_buf;
> >> +	u32 i, cnt = 0;
> >> +
> >> +	/* We check if we have enough words that are neither all 0
> >> +	 * nor all 1's so the calibration can be considered valid.
> >> +	 *
> >> +	 * I use an arbitrary threshold for now of 64
> >> +	 */
> >> +	size >>= 2;
> >> +	for (i = 0; i < size; i++) {
> >> +		if (tb32[i] != 0 && tb32[i] != 0xffffffff)
> >> +			cnt++;
> >> +	}
> >> +	return cnt >= 64;
> >> +}
> >> +
> >> +static const u32 aspeed_spi_hclk_divs[] = {
> >> +	0xf, /* HCLK */
> >> +	0x7, /* HCLK/2 */
> >> +	0xe, /* HCLK/3 */
> >> +	0x6, /* HCLK/4 */
> >> +	0xd, /* HCLK/5 */
> >> +};
> >> +
> >> +#define ASPEED_SPI_HCLK_DIV(i) \
> >> +	(aspeed_spi_hclk_divs[(i) - 1] << CTRL_FREQ_SEL_SHIFT)
> >> +
> >> +static int aspeed_spi_do_calibration(struct aspeed_spi_chip *chip) {
> >> +	struct aspeed_spi *aspi = chip->aspi;
> >> +	const struct aspeed_spi_data *data = aspi->data;
> >> +	u32 ahb_freq = aspi->clk_freq;
> >> +	u32 max_freq = chip->clk_freq;
> >> +	u32 ctl_val;
> >> +	u8 *golden_buf = NULL;
> >> +	u8 *test_buf = NULL;
> >> +	int i, rc, best_div = -1;
> >> +
> >> +	dev_dbg(aspi->dev, "calculate timing compensation - AHB freq: %d MHz",
> >> +		ahb_freq / 1000000);
> >> +
> >> +	/*
> >> +	 * use the related low frequency to get check calibration data
> >> +	 * and get golden data.
> >> +	 */
> >> +	ctl_val = chip->ctl_val[ASPEED_SPI_READ] & data->hclk_mask;
> >> +	writel(ctl_val, chip->ctl);
> >> +
> >> +	test_buf = kzalloc(CALIBRATE_BUF_SIZE * 2, GFP_KERNEL);
> >> +	if (!test_buf)
> >> +		return -ENOMEM;
> >> +
> >> +	golden_buf = test_buf + CALIBRATE_BUF_SIZE;
> >> +
> >> +	memcpy_fromio(golden_buf, chip->ahb_base, CALIBRATE_BUF_SIZE);
> >> +	if (!aspeed_spi_check_calib_data(golden_buf, CALIBRATE_BUF_SIZE)) {
> >> +		dev_info(aspi->dev, "Calibration area too uniform, using low speed");
> >> +		goto no_calib;
> >> +	}
> >> +
> >> +#if defined(VERBOSE_DEBUG)
> >> +	print_hex_dump_bytes(DEVICE_NAME "  good: ", DUMP_PREFIX_NONE,
> >> +			     golden_buf, 0x100);
> >> +#endif
> >> +
> >> +	/* Now we iterate the HCLK dividers until we find our breaking point */
> >> +	for (i = ARRAY_SIZE(aspeed_spi_hclk_divs); i > data->hdiv_max - 1; i--) {
> >> +		u32 tv, freq;
> >> +
> >> +		freq = ahb_freq / i;
> >> +		if (freq > max_freq)
> >> +			continue;
> >> +
> >> +		/* Set the timing */
> >> +		tv = chip->ctl_val[ASPEED_SPI_READ] | ASPEED_SPI_HCLK_DIV(i);
> >> +		writel(tv, chip->ctl);
> >> +		dev_dbg(aspi->dev, "Trying HCLK/%d [%08x] ...", i, tv);
> >> +		rc = data->calibrate(chip, i, golden_buf, test_buf);
> >> +		if (rc == 0)
> >> +			best_div = i;
> >> +	}
> >> +
> >> +	/* Nothing found ? */
> >> +	if (best_div < 0) {
> >> +		dev_warn(aspi->dev, "No good frequency, using dumb slow");
> >> +	} else {
> >> +		dev_dbg(aspi->dev, "Found good read timings at HCLK/%d",
> >> best_div);
> >> +
> >> +		/* Record the freq */
> >> +		for (i = 0; i < ASPEED_SPI_MAX; i++)
> >> +			chip->ctl_val[i] = (chip->ctl_val[i] & data->hclk_mask) |
> >> +				ASPEED_SPI_HCLK_DIV(best_div);
> >> +	}
> >> +
> >> +no_calib:
> >
> > - Maybe, if the calibration process is not executed, the frequency setting
> calculated from max_frequency in the device tree can be filled in FMC10
> instead of using dumb slow one, 12.5MHz, always.
> 
> Indeed.
> 
> >   For example, except for uniform content case, the calibration process will
> be ignored when SPI clock frequency in the device tree is smaller than 40MHz.
> > - The function, aspeed_2600_spi_clk_basic_setting, in [2] can be added to
> support lower SPI clock frequency, e.g., 4MHz.
> >   For AST2600, SPI clock frequency can be calculated by
> HCLK/(FMC10[27:24] + FMC10[11:8]).
> 
> Could you please send patches on top of this series ? Here are the branches :
> 

Of course. How do I provide you the patch? By private mail or send a PR?
Besides, I may add a new callback function for this part due to difference between AST2500 and AST2600.

Thanks.

Chin-Ting

>    https://github.com/legoater/linux/commits/openbmc-5.15
>    https://github.com/legoater/linux/commits/aspeed         (mainline)
> 
> I can include them when I resend a v5.
> 
> Thanks,
> 
> C.
> 
> 
> >
> >> +	writel(chip->ctl_val[ASPEED_SPI_READ], chip->ctl);
> >> +	kfree(test_buf);
> >> +	return 0;
> >> +}
> >> +
> >> +#define TIMING_DELAY_DI		BIT(3)
> >> +#define TIMING_DELAY_HCYCLE_MAX	5
> >> +#define TIMING_REG_AST2600(chip)				\
> >> +	((chip)->aspi->regs + (chip)->aspi->data->timing +	\
> >> +	 (chip)->cs * 4)
> >> +
> >> +static int aspeed_spi_ast2600_calibrate(struct aspeed_spi_chip
> >> +*chip, u32
> >> hdiv,
> >> +					const u8 *golden_buf, u8 *test_buf) {
> >> +	struct aspeed_spi *aspi = chip->aspi;
> >> +	int hcycle;
> >> +	u32 shift = (hdiv - 2) << 3;
> >> +	u32 mask = ~(0xfu << shift);
> >> +	u32 fread_timing_val = 0;
> >> +
> >> +	for (hcycle = 0; hcycle <= TIMING_DELAY_HCYCLE_MAX; hcycle++) {
> >> +		int delay_ns;
> >> +		bool pass = false;
> >> +
> >> +		fread_timing_val &= mask;
> >> +		fread_timing_val |= hcycle << shift;
> >> +
> >> +		/* no DI input delay first  */
> >> +		writel(fread_timing_val, TIMING_REG_AST2600(chip));
> >> +		pass = aspeed_spi_check_reads(chip, golden_buf, test_buf);
> >> +		dev_dbg(aspi->dev,
> >> +			"  * [%08x] %d HCLK delay, DI delay none : %s",
> >> +			fread_timing_val, hcycle, pass ? "PASS" : "FAIL");
> >> +		if (pass)
> >> +			return 0;
> >> +
> >> +		/* Add DI input delays  */
> >> +		fread_timing_val &= mask;
> >> +		fread_timing_val |= (TIMING_DELAY_DI | hcycle) << shift;
> >> +
> >> +		for (delay_ns = 0; delay_ns < 0x10; delay_ns++) {
> >> +			fread_timing_val &= ~(0xf << (4 + shift));
> >> +			fread_timing_val |= delay_ns << (4 + shift);
> >> +
> >> +			writel(fread_timing_val, TIMING_REG_AST2600(chip));
> >> +			pass = aspeed_spi_check_reads(chip, golden_buf, test_buf);
> >> +			dev_dbg(aspi->dev,
> >> +				"  * [%08x] %d HCLK delay, DI delay %d.%dns : %s",
> >> +				fread_timing_val, hcycle, (delay_ns + 1) / 2,
> >> +				(delay_ns + 1) & 1 ? 5 : 5, pass ? "PASS" : "FAIL");
> >> +			/*
> >> +			 * TODO: This is optimistic. We should look
> >> +			 * for a working interval and save the middle
> >> +			 * value in the read timing register.
> >> +			 */
> >> +			if (pass)
> >> +				return 0;
> >> +		}
> >> +	}
> >> +
> >> +	/* No good setting for this frequency */
> >> +	return -1;
> >> +}
> >> +
> >>   /*
> >>    * Platform definitions
> >>    */
> >> @@ -820,6 +1077,10 @@ static const struct aspeed_spi_data
> >> ast2400_fmc_data = {
> >>   	.hastype       = true,
> >>   	.we0	       = 16,
> >>   	.ctl0	       = CE0_CTRL_REG,
> >> +	.timing	       = CE0_TIMING_COMPENSATION_REG,
> >> +	.hclk_mask     = 0xfffff0ff,
> >> +	.hdiv_max      = 1,
> >> +	.calibrate     = aspeed_spi_calibrate,
> >>   	.segment_start = aspeed_spi_segment_start,
> >>   	.segment_end   = aspeed_spi_segment_end,
> >>   	.segment_reg   = aspeed_spi_segment_reg,
> >> @@ -830,6 +1091,10 @@ static const struct aspeed_spi_data
> >> ast2400_spi_data = {
> >>   	.hastype       = false,
> >>   	.we0	       = 0,
> >>   	.ctl0	       = 0x04,
> >> +	.timing	       = 0x14,
> >> +	.hclk_mask     = 0xfffff0ff,
> >> +	.hdiv_max      = 1,
> >> +	.calibrate     = aspeed_spi_calibrate,
> >>   	/* No segment registers */
> >>   };
> >>
> >> @@ -838,6 +1103,10 @@ static const struct aspeed_spi_data
> >> ast2500_fmc_data = {
> >>   	.hastype       = true,
> >>   	.we0	       = 16,
> >>   	.ctl0	       = CE0_CTRL_REG,
> >> +	.timing	       = CE0_TIMING_COMPENSATION_REG,
> >> +	.hclk_mask     = 0xfffff0ff,
> >> +	.hdiv_max      = 1,
> >> +	.calibrate     = aspeed_spi_calibrate,
> >>   	.segment_start = aspeed_spi_segment_start,
> >>   	.segment_end   = aspeed_spi_segment_end,
> >>   	.segment_reg   = aspeed_spi_segment_reg,
> >> @@ -848,6 +1117,10 @@ static const struct aspeed_spi_data
> >> ast2500_spi_data = {
> >>   	.hastype       = false,
> >>   	.we0	       = 16,
> >>   	.ctl0	       = CE0_CTRL_REG,
> >> +	.timing	       = CE0_TIMING_COMPENSATION_REG,
> >> +	.hclk_mask     = 0xfffff0ff,
> >> +	.hdiv_max      = 1,
> >> +	.calibrate     = aspeed_spi_calibrate,
> >>   	.segment_start = aspeed_spi_segment_start,
> >>   	.segment_end   = aspeed_spi_segment_end,
> >>   	.segment_reg   = aspeed_spi_segment_reg,
> >> @@ -859,6 +1132,10 @@ static const struct aspeed_spi_data
> >> ast2600_fmc_data = {
> >>   	.mode_bits     = SPI_RX_QUAD | SPI_RX_QUAD,
> >>   	.we0	       = 16,
> >>   	.ctl0	       = CE0_CTRL_REG,
> >> +	.timing	       = CE0_TIMING_COMPENSATION_REG,
> >> +	.hclk_mask     = 0xf0fff0ff,
> >> +	.hdiv_max      = 2,
> >> +	.calibrate     = aspeed_spi_ast2600_calibrate,
> >>   	.segment_start = aspeed_spi_segment_ast2600_start,
> >>   	.segment_end   = aspeed_spi_segment_ast2600_end,
> >>   	.segment_reg   = aspeed_spi_segment_ast2600_reg,
> >> @@ -870,6 +1147,10 @@ static const struct aspeed_spi_data
> >> ast2600_spi_data = {
> >>   	.mode_bits     = SPI_RX_QUAD | SPI_RX_QUAD,
> >>   	.we0	       = 16,
> >>   	.ctl0	       = CE0_CTRL_REG,
> >> +	.timing	       = CE0_TIMING_COMPENSATION_REG,
> >> +	.hclk_mask     = 0xf0fff0ff,
> >> +	.hdiv_max      = 2,
> >> +	.calibrate     = aspeed_spi_ast2600_calibrate,
> >>   	.segment_start = aspeed_spi_segment_ast2600_start,
> >>   	.segment_end   = aspeed_spi_segment_ast2600_end,
> >>   	.segment_reg   = aspeed_spi_segment_ast2600_reg,
> >> --
> >> 2.34.1
> >
> >
> > Best Wishes,
> > Chin-Ting
Cédric Le Goater March 31, 2022, 7:15 a.m. UTC | #6
Hello Chin-Ting,

>>> - Maybe, if the calibration process is not executed, the frequency setting
>> calculated from max_frequency in the device tree can be filled in FMC10
>> instead of using dumb slow one, 12.5MHz, always.
>>
>> Indeed.
>>
>>>    For example, except for uniform content case, the calibration process will
>> be ignored when SPI clock frequency in the device tree is smaller than 40MHz.
>>> - The function, aspeed_2600_spi_clk_basic_setting, in [2] can be added to
>> support lower SPI clock frequency, e.g., 4MHz.
>>>    For AST2600, SPI clock frequency can be calculated by
>> HCLK/(FMC10[27:24] + FMC10[11:8]).
>>
>> Could you please send patches on top of this series ? Here are the branches :
>>
> 
> Of course. How do I provide you the patch? By private mail or send a PR?

We should discuss first by email on the openbmc@ and linux-aspeed@ lists.
Please send as follow ups on top of v4.

Using the openbmc tree should be easier :

      https://github.com/legoater/linux/commits/openbmc-5.15

> Besides, I may add a new callback function for this part due to difference 
> between AST2500 and AST2600.

ok.

Given all the reviews and tests that were done on AST2400, AST2500, AST2600
platforms, I will be careful not to break the existing proposal.

Thanks,

C.
Chin-Ting Kuo March 31, 2022, 8:34 a.m. UTC | #7
Hi Cédric,

> -----Original Message-----
> From: Cédric Le Goater <clg@kaod.org>
> Sent: Thursday, March 31, 2022 3:16 PM
> To: Chin-Ting Kuo <chin-ting_kuo@aspeedtech.com>; linux-spi@vger.kernel.org;
> linux-mtd@lists.infradead.org
> Subject: Re: [PATCH v4 08/11] spi: aspeed: Calibrate read timings
> 
> Hello Chin-Ting,
> 
> >>> - Maybe, if the calibration process is not executed, the frequency
> >>> setting
> >> calculated from max_frequency in the device tree can be filled in
> >> FMC10 instead of using dumb slow one, 12.5MHz, always.
> >>
> >> Indeed.
> >>
> >>>    For example, except for uniform content case, the calibration
> >>> process will
> >> be ignored when SPI clock frequency in the device tree is smaller than
> 40MHz.
> >>> - The function, aspeed_2600_spi_clk_basic_setting, in [2] can be
> >>> added to
> >> support lower SPI clock frequency, e.g., 4MHz.
> >>>    For AST2600, SPI clock frequency can be calculated by
> >> HCLK/(FMC10[27:24] + FMC10[11:8]).
> >>
> >> Could you please send patches on top of this series ? Here are the
> branches :
> >>
> >
> > Of course. How do I provide you the patch? By private mail or send a PR?
> 
> We should discuss first by email on the openbmc@ and linux-aspeed@ lists.
> Please send as follow ups on top of v4.
> 

Okay.

> Using the openbmc tree should be easier :
> 
>       https://github.com/legoater/linux/commits/openbmc-5.15
> 
> > Besides, I may add a new callback function for this part due to
> > difference between AST2500 and AST2600.
> 
> ok.
> 
> Given all the reviews and tests that were done on AST2400, AST2500, AST2600
> platforms, I will be careful not to break the existing proposal.
> 
> Thanks,
> 
> C.
Pratyush Yadav March 31, 2022, 4:41 p.m. UTC | #8
Hi,

On 25/03/22 11:08AM, Cédric Le Goater wrote:
> To accommodate the different response time of SPI transfers on different
> boards and different SPI NOR devices, the Aspeed controllers provide a
> set of Read Timing Compensation registers to tune the timing delays
> depending on the frequency being used. The AST2600 SoC has one of these
> registers per device. On the AST2500 and AST2400 SoCs, the timing
> register is shared by all devices which is problematic to get good
> results other than for one device.
> 
> The algorithm first reads a golden buffer at low speed and then performs
> reads with different clocks and delay cycle settings to find a breaking
> point. This selects a default good frequency for the CEx control register.
> The current settings are a bit optimistic as we pick the first delay giving
> good results. A safer approach would be to determine an interval and
> choose the middle value.
> 
> Calibration is performed when the direct mapping for reads is created.
> Since the underlying spi-nor object needs to be initialized to create
> the spi_mem operation for direct mapping, we should be fine. Having a
> specific API would clarify the requirements though.
> 
> Cc: Pratyush Yadav <p.yadav@ti.com>
> Reviewed-by: Joel Stanley <joel@jms.id.au>
> Tested-by: Joel Stanley <joel@jms.id.au>
> Tested-by: Tao Ren <rentao.bupt@gmail.com>
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  drivers/spi/spi-aspeed-smc.c | 281 +++++++++++++++++++++++++++++++++++
>  1 file changed, 281 insertions(+)
> 
[...]
> @@ -517,6 +527,8 @@ static int aspeed_spi_chip_adjust_window(struct aspeed_spi_chip *chip,
>  	return 0;
>  }
>  
> +static int aspeed_spi_do_calibration(struct aspeed_spi_chip *chip);
> +
>  static int aspeed_spi_dirmap_create(struct spi_mem_dirmap_desc *desc)
>  {
>  	struct aspeed_spi *aspi = spi_controller_get_devdata(desc->mem->spi->master);
> @@ -565,6 +577,8 @@ static int aspeed_spi_dirmap_create(struct spi_mem_dirmap_desc *desc)
>  	chip->ctl_val[ASPEED_SPI_READ] = ctl_val;
>  	writel(chip->ctl_val[ASPEED_SPI_READ], chip->ctl);
>  
> +	ret = aspeed_spi_do_calibration(chip);
> +

I am still not convinced this is a good idea. The API does not say 
anywhere what dirmap_create must be called after the flash is completely 
initialized, though that is what is done currently in practice. I think 
an explicit API to mark flash as "ready for calibration" would be a 
better idea.

Tudor/Mark/Miquel, what do you think?

>  	dev_info(aspi->dev, "CE%d read buswidth:%d [0x%08x]\n",
>  		 chip->cs, op->data.buswidth, chip->ctl_val[ASPEED_SPI_READ]);
>  
[...]
Jae Hyun Yoo April 1, 2022, 2:12 p.m. UTC | #9
Hi Cédric,

On 3/25/2022 3:08 AM, Cédric Le Goater wrote:
> Hi,
> 
> This series adds a new SPI driver using the spi-mem interface for the
> Aspeed static memory controllers of the AST2600, AST2500 and AST2400
> SoCs.
> 
>   * AST2600 Firmware SPI Memory Controller (FMC)
>   * AST2600 SPI Flash Controller (SPI1 and SPI2)

Tested-by: Jae Hyun Yoo <quic_jaehyoo@quicinc.com>


I tested this patch series on Qualcomm DC-SCM (AST2600) board with quad
mode enabled. It has worked well so far.

The quad mode test details:
https://lore.kernel.org/linux-devicetree/e362f6dd-785f-87b3-3090-554be0fb860c@quicinc.com/T/#t

Cheers,

Jae
Cédric Le Goater April 4, 2022, 7:11 a.m. UTC | #10
On 3/30/22 21:45, Pratyush Yadav wrote:
> On 25/03/22 11:08AM, Cédric Le Goater wrote:
>> Use direct mapping to read the flash device contents. This operation
>> mode is called "Command mode" on Aspeed SoC SMC controllers. It uses a
>> Control Register for the settings to apply when a memory operation is
>> performed on the flash device mapping window.
>>
>> If the window is not big enough, fall back to the "User mode" to
>> perform the read.
>>
>> Since direct mapping now handles all reads of the flash device
>> contents, also use memcpy_fromio for other address spaces, such as
>> SFDP.
>>
>> Direct mapping for writes will come later when validated.
>>
>> Reviewed-by: Joel Stanley <joel@jms.id.au>
>> Tested-by: Joel Stanley <joel@jms.id.au>
>> Tested-by: Tao Ren <rentao.bupt@gmail.com>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>   drivers/spi/spi-aspeed-smc.c | 67 ++++++++++++++++++++++++++++++++++--
>>   1 file changed, 65 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/spi/spi-aspeed-smc.c b/drivers/spi/spi-aspeed-smc.c
>> index 997ec2e45118..0951766baef4 100644
>> --- a/drivers/spi/spi-aspeed-smc.c
>> +++ b/drivers/spi/spi-aspeed-smc.c
>> @@ -322,8 +322,8 @@ static int do_aspeed_spi_exec_op(struct spi_mem *mem, const struct spi_mem_op *o
>>   		if (!op->addr.nbytes)
>>   			ret = aspeed_spi_read_reg(chip, op);
>>   		else
>> -			ret = aspeed_spi_read_user(chip, op, op->addr.val,
>> -						   op->data.nbytes, op->data.buf.in);
>> +			memcpy_fromio(op->data.buf.in, chip->ahb_base + op->addr.val,
>> +				      op->data.nbytes);
> 
> I think I commented on this earlier too, though I failed to respond to
> your reply. Let me bring the topic back up. I think this can cause an
> invalid memory address to be accessed. Not all SPI MEM consumers will
> use dirmap APIs, and they won't use them all the time. For example, SPI
> NOR can perform some operations to reset the flash before shutting down.
> For example, SPI NOR turns off 4byte address mode during shutdown. This
> will be a register read/write operation, which usually has a different
> opcode.

It's only a small optimization for startup when the SFDP probing is done.
There are quite a few reads which are large :

   spi-aspeed-smc 1e630000.spi: CE0 read OP 0x5a mode:1.1.1.1 naddr:0x3 ndummies:0x1 len:0x10
   spi-aspeed-smc 1e630000.spi: CE0 read OP 0x5a mode:1.1.1.1 naddr:0x3 ndummies:0x1 len:0x10
   spi-aspeed-smc 1e630000.spi: CE0 read OP 0x5a mode:1.1.1.1 naddr:0x3 ndummies:0x1 len:0x120
   spi-aspeed-smc 1e630000.spi: CE0 read OP 0x5a mode:1.1.1.1 naddr:0x3 ndummies:0x1 len:0x40
   spi-aspeed-smc 1e630000.spi: CE0 read OP 0x5a mode:1.1.1.1 naddr:0x3 ndummies:0x1 len:0x8

> 
> So I think you should keep dirmap and exec_op() independent of each
> other.

OK. I understand. It's not a problem as it works either way.

Thanks,

C.


>>   	} else {
>>   		if (!op->addr.nbytes)
>>   			ret = aspeed_spi_write_reg(chip, op);
>> @@ -403,10 +403,73 @@ static int aspeed_spi_chip_set_default_window(struct aspeed_spi_chip *chip)
>>   	return chip->ahb_window_size ? 0 : -1;
>>   }
>>   
>> +static int aspeed_spi_dirmap_create(struct spi_mem_dirmap_desc *desc)
>> +{
>> +	struct aspeed_spi *aspi = spi_controller_get_devdata(desc->mem->spi->master);
>> +	struct aspeed_spi_chip *chip = &aspi->chips[desc->mem->spi->chip_select];
>> +	struct spi_mem_op *op = &desc->info.op_tmpl;
>> +	u32 ctl_val;
>> +	int ret = 0;
>> +
>> +	chip->clk_freq = desc->mem->spi->max_speed_hz;
>> +
>> +	/* Only for reads */
>> +	if (op->data.dir != SPI_MEM_DATA_IN)
>> +		return -EOPNOTSUPP;
>> +
>> +	if (desc->info.length > chip->ahb_window_size)
>> +		dev_warn(aspi->dev, "CE%d window (%dMB) too small for mapping",
>> +			 chip->cs, chip->ahb_window_size >> 20);
>> +
>> +	/* Define the default IO read settings */
>> +	ctl_val = readl(chip->ctl) & ~CTRL_IO_CMD_MASK;
>> +	ctl_val |= aspeed_spi_get_io_mode(op) |
>> +		op->cmd.opcode << CTRL_COMMAND_SHIFT |
>> +		CTRL_IO_DUMMY_SET(op->dummy.nbytes / op->dummy.buswidth) |
>> +		CTRL_IO_MODE_READ;
>> +
>> +	/* Tune 4BYTE address mode */
>> +	if (op->addr.nbytes) {
>> +		u32 addr_mode = readl(aspi->regs + CE_CTRL_REG);
>> +
>> +		if (op->addr.nbytes == 4)
>> +			addr_mode |= (0x11 << chip->cs);
>> +		else
>> +			addr_mode &= ~(0x11 << chip->cs);
>> +		writel(addr_mode, aspi->regs + CE_CTRL_REG);
>> +	}
>> +
>> +	/* READ mode is the controller default setting */
>> +	chip->ctl_val[ASPEED_SPI_READ] = ctl_val;
>> +	writel(chip->ctl_val[ASPEED_SPI_READ], chip->ctl);
>> +
>> +	dev_info(aspi->dev, "CE%d read buswidth:%d [0x%08x]\n",
>> +		 chip->cs, op->data.buswidth, chip->ctl_val[ASPEED_SPI_READ]);
>> +
>> +	return ret;
>> +}
>> +
>> +static ssize_t aspeed_spi_dirmap_read(struct spi_mem_dirmap_desc *desc,
>> +				      u64 offset, size_t len, void *buf)
>> +{
>> +	struct aspeed_spi *aspi = spi_controller_get_devdata(desc->mem->spi->master);
>> +	struct aspeed_spi_chip *chip = &aspi->chips[desc->mem->spi->chip_select];
>> +
>> +	/* Switch to USER command mode if mapping window is too small */
>> +	if (chip->ahb_window_size < offset + len)
>> +		aspeed_spi_read_user(chip, &desc->info.op_tmpl, offset, len, buf);
>> +	else
>> +		memcpy_fromio(buf, chip->ahb_base + offset, len);
>> +
>> +	return len;
>> +}
>> +
>>   static const struct spi_controller_mem_ops aspeed_spi_mem_ops = {
>>   	.supports_op = aspeed_spi_supports_op,
>>   	.exec_op = aspeed_spi_exec_op,
>>   	.get_name = aspeed_spi_get_name,
>> +	.dirmap_create = aspeed_spi_dirmap_create,
>> +	.dirmap_read = aspeed_spi_dirmap_read,
>>   };
>>   
>>   static void aspeed_spi_chip_set_type(struct aspeed_spi *aspi, unsigned int cs, int type)
>> -- 
>> 2.34.1
>>
>
Cédric Le Goater April 4, 2022, 7:30 a.m. UTC | #11
On 3/31/22 18:41, Pratyush Yadav wrote:
> Hi,
> 
> On 25/03/22 11:08AM, Cédric Le Goater wrote:
>> To accommodate the different response time of SPI transfers on different
>> boards and different SPI NOR devices, the Aspeed controllers provide a
>> set of Read Timing Compensation registers to tune the timing delays
>> depending on the frequency being used. The AST2600 SoC has one of these
>> registers per device. On the AST2500 and AST2400 SoCs, the timing
>> register is shared by all devices which is problematic to get good
>> results other than for one device.
>>
>> The algorithm first reads a golden buffer at low speed and then performs
>> reads with different clocks and delay cycle settings to find a breaking
>> point. This selects a default good frequency for the CEx control register.
>> The current settings are a bit optimistic as we pick the first delay giving
>> good results. A safer approach would be to determine an interval and
>> choose the middle value.
>>
>> Calibration is performed when the direct mapping for reads is created.
>> Since the underlying spi-nor object needs to be initialized to create
>> the spi_mem operation for direct mapping, we should be fine. Having a
>> specific API would clarify the requirements though.
>>
>> Cc: Pratyush Yadav <p.yadav@ti.com>
>> Reviewed-by: Joel Stanley <joel@jms.id.au>
>> Tested-by: Joel Stanley <joel@jms.id.au>
>> Tested-by: Tao Ren <rentao.bupt@gmail.com>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>   drivers/spi/spi-aspeed-smc.c | 281 +++++++++++++++++++++++++++++++++++
>>   1 file changed, 281 insertions(+)
>>
> [...]
>> @@ -517,6 +527,8 @@ static int aspeed_spi_chip_adjust_window(struct aspeed_spi_chip *chip,
>>   	return 0;
>>   }
>>   
>> +static int aspeed_spi_do_calibration(struct aspeed_spi_chip *chip);
>> +
>>   static int aspeed_spi_dirmap_create(struct spi_mem_dirmap_desc *desc)
>>   {
>>   	struct aspeed_spi *aspi = spi_controller_get_devdata(desc->mem->spi->master);
>> @@ -565,6 +577,8 @@ static int aspeed_spi_dirmap_create(struct spi_mem_dirmap_desc *desc)
>>   	chip->ctl_val[ASPEED_SPI_READ] = ctl_val;
>>   	writel(chip->ctl_val[ASPEED_SPI_READ], chip->ctl);
>>   
>> +	ret = aspeed_spi_do_calibration(chip);
>> +
> 
> I am still not convinced this is a good idea. The API does not say
> anywhere what dirmap_create must be called after the flash is completely
> initialized, though that is what is done currently in practice. 

Yes because we wouldn't have a correct 'spi_mem_dirmap_info' if it wasn't
the case. May be change the documentation ?

> I think
> an explicit API to mark flash as "ready for calibration" would be a
> better idea.

OK. Since the above is a oneliner, it should not be a problem to move
it under a new handler if needed.

The dirmap_create() handler expects the spi-mem descriptor and the field
'desc->info.op_tmpl' to be correctly initialized in order to compute the
control register value, which is a requirement for dirmap_read(). The
calibration sequence simply comes after.

AFAICT, there is nothing incorrect today.

> Tudor/Mark/Miquel, what do you think?


Thanks,

C.
Pratyush Yadav April 5, 2022, 7:22 p.m. UTC | #12
On 04/04/22 09:30AM, Cédric Le Goater wrote:
> On 3/31/22 18:41, Pratyush Yadav wrote:
> > Hi,
> > 
> > On 25/03/22 11:08AM, Cédric Le Goater wrote:
> > > To accommodate the different response time of SPI transfers on different
> > > boards and different SPI NOR devices, the Aspeed controllers provide a
> > > set of Read Timing Compensation registers to tune the timing delays
> > > depending on the frequency being used. The AST2600 SoC has one of these
> > > registers per device. On the AST2500 and AST2400 SoCs, the timing
> > > register is shared by all devices which is problematic to get good
> > > results other than for one device.
> > > 
> > > The algorithm first reads a golden buffer at low speed and then performs
> > > reads with different clocks and delay cycle settings to find a breaking
> > > point. This selects a default good frequency for the CEx control register.
> > > The current settings are a bit optimistic as we pick the first delay giving
> > > good results. A safer approach would be to determine an interval and
> > > choose the middle value.
> > > 
> > > Calibration is performed when the direct mapping for reads is created.
> > > Since the underlying spi-nor object needs to be initialized to create
> > > the spi_mem operation for direct mapping, we should be fine. Having a
> > > specific API would clarify the requirements though.
> > > 
> > > Cc: Pratyush Yadav <p.yadav@ti.com>
> > > Reviewed-by: Joel Stanley <joel@jms.id.au>
> > > Tested-by: Joel Stanley <joel@jms.id.au>
> > > Tested-by: Tao Ren <rentao.bupt@gmail.com>
> > > Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > > ---
> > >   drivers/spi/spi-aspeed-smc.c | 281 +++++++++++++++++++++++++++++++++++
> > >   1 file changed, 281 insertions(+)
> > > 
> > [...]
> > > @@ -517,6 +527,8 @@ static int aspeed_spi_chip_adjust_window(struct aspeed_spi_chip *chip,
> > >   	return 0;
> > >   }
> > > +static int aspeed_spi_do_calibration(struct aspeed_spi_chip *chip);
> > > +
> > >   static int aspeed_spi_dirmap_create(struct spi_mem_dirmap_desc *desc)
> > >   {
> > >   	struct aspeed_spi *aspi = spi_controller_get_devdata(desc->mem->spi->master);
> > > @@ -565,6 +577,8 @@ static int aspeed_spi_dirmap_create(struct spi_mem_dirmap_desc *desc)
> > >   	chip->ctl_val[ASPEED_SPI_READ] = ctl_val;
> > >   	writel(chip->ctl_val[ASPEED_SPI_READ], chip->ctl);
> > > +	ret = aspeed_spi_do_calibration(chip);
> > > +
> > 
> > I am still not convinced this is a good idea. The API does not say
> > anywhere what dirmap_create must be called after the flash is completely
> > initialized, though that is what is done currently in practice.
> 
> Yes because we wouldn't have a correct 'spi_mem_dirmap_info' if it wasn't
> the case. May be change the documentation ?

SPI NOR knows what protocol and opcodes it would use before it actually 
puts the flash in that mode. So in theory it could call dirmap_create() 
before it has put the flash in say 8D-8D-8D mode. I don't see much 
reason to do so in practice, but who knows, that might change. This is 
why I would like to hear what other people think.

> 
> > I think
> > an explicit API to mark flash as "ready for calibration" would be a
> > better idea.
> 
> OK. Since the above is a oneliner, it should not be a problem to move
> it under a new handler if needed.
> 
> The dirmap_create() handler expects the spi-mem descriptor and the field
> 'desc->info.op_tmpl' to be correctly initialized in order to compute the
> control register value, which is a requirement for dirmap_read(). The
> calibration sequence simply comes after.
> 
> AFAICT, there is nothing incorrect today.

In practice, no there is nothing incorrect. But as I explained above, it 
is certainly possible to call dirmap_create() before the flash is ready.

> 
> > Tudor/Mark/Miquel, what do you think?
> 
> 
> Thanks,
> 
> C.
Pratyush Yadav April 5, 2022, 7:27 p.m. UTC | #13
On 04/04/22 09:11AM, Cédric Le Goater wrote:
> On 3/30/22 21:45, Pratyush Yadav wrote:
> > On 25/03/22 11:08AM, Cédric Le Goater wrote:
> > > Use direct mapping to read the flash device contents. This operation
> > > mode is called "Command mode" on Aspeed SoC SMC controllers. It uses a
> > > Control Register for the settings to apply when a memory operation is
> > > performed on the flash device mapping window.
> > > 
> > > If the window is not big enough, fall back to the "User mode" to
> > > perform the read.
> > > 
> > > Since direct mapping now handles all reads of the flash device
> > > contents, also use memcpy_fromio for other address spaces, such as
> > > SFDP.
> > > 
> > > Direct mapping for writes will come later when validated.
> > > 
> > > Reviewed-by: Joel Stanley <joel@jms.id.au>
> > > Tested-by: Joel Stanley <joel@jms.id.au>
> > > Tested-by: Tao Ren <rentao.bupt@gmail.com>
> > > Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > > ---
> > >   drivers/spi/spi-aspeed-smc.c | 67 ++++++++++++++++++++++++++++++++++--
> > >   1 file changed, 65 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/spi/spi-aspeed-smc.c b/drivers/spi/spi-aspeed-smc.c
> > > index 997ec2e45118..0951766baef4 100644
> > > --- a/drivers/spi/spi-aspeed-smc.c
> > > +++ b/drivers/spi/spi-aspeed-smc.c
> > > @@ -322,8 +322,8 @@ static int do_aspeed_spi_exec_op(struct spi_mem *mem, const struct spi_mem_op *o
> > >   		if (!op->addr.nbytes)
> > >   			ret = aspeed_spi_read_reg(chip, op);
> > >   		else
> > > -			ret = aspeed_spi_read_user(chip, op, op->addr.val,
> > > -						   op->data.nbytes, op->data.buf.in);
> > > +			memcpy_fromio(op->data.buf.in, chip->ahb_base + op->addr.val,
> > > +				      op->data.nbytes);
> > 
> > I think I commented on this earlier too, though I failed to respond to
> > your reply. Let me bring the topic back up. I think this can cause an
> > invalid memory address to be accessed. Not all SPI MEM consumers will
> > use dirmap APIs, and they won't use them all the time. For example, SPI
> > NOR can perform some operations to reset the flash before shutting down.
> > For example, SPI NOR turns off 4byte address mode during shutdown. This
> > will be a register read/write operation, which usually has a different
> > opcode.
> 
> It's only a small optimization for startup when the SFDP probing is done.
> There are quite a few reads which are large :
> 
>   spi-aspeed-smc 1e630000.spi: CE0 read OP 0x5a mode:1.1.1.1 naddr:0x3 ndummies:0x1 len:0x10
>   spi-aspeed-smc 1e630000.spi: CE0 read OP 0x5a mode:1.1.1.1 naddr:0x3 ndummies:0x1 len:0x10
>   spi-aspeed-smc 1e630000.spi: CE0 read OP 0x5a mode:1.1.1.1 naddr:0x3 ndummies:0x1 len:0x120
>   spi-aspeed-smc 1e630000.spi: CE0 read OP 0x5a mode:1.1.1.1 naddr:0x3 ndummies:0x1 len:0x40
>   spi-aspeed-smc 1e630000.spi: CE0 read OP 0x5a mode:1.1.1.1 naddr:0x3 ndummies:0x1 len:0x8

Is the overhead large enough to cause any significant delays? This would 
only happen once, it is not on a hot path.

Anyway, I had not noticed earlier that you set opcode, dummy, etc. in 
do_aspeed_spi_exec_op() so I was thinking you would always end up using 
the "main" read operation for this.

I am fine with this as long as you add range checks to make sure there 
are no overflows.

> 
> > 
> > So I think you should keep dirmap and exec_op() independent of each
> > other.
> 
> OK. I understand. It's not a problem as it works either way.
> 
> Thanks,
> 
> C.
> 
> 
> > >   	} else {
> > >   		if (!op->addr.nbytes)
> > >   			ret = aspeed_spi_write_reg(chip, op);
> > > @@ -403,10 +403,73 @@ static int aspeed_spi_chip_set_default_window(struct aspeed_spi_chip *chip)
> > >   	return chip->ahb_window_size ? 0 : -1;
> > >   }
> > > +static int aspeed_spi_dirmap_create(struct spi_mem_dirmap_desc *desc)
> > > +{
> > > +	struct aspeed_spi *aspi = spi_controller_get_devdata(desc->mem->spi->master);
> > > +	struct aspeed_spi_chip *chip = &aspi->chips[desc->mem->spi->chip_select];
> > > +	struct spi_mem_op *op = &desc->info.op_tmpl;
> > > +	u32 ctl_val;
> > > +	int ret = 0;
> > > +
> > > +	chip->clk_freq = desc->mem->spi->max_speed_hz;
> > > +
> > > +	/* Only for reads */
> > > +	if (op->data.dir != SPI_MEM_DATA_IN)
> > > +		return -EOPNOTSUPP;
> > > +
> > > +	if (desc->info.length > chip->ahb_window_size)
> > > +		dev_warn(aspi->dev, "CE%d window (%dMB) too small for mapping",
> > > +			 chip->cs, chip->ahb_window_size >> 20);
> > > +
> > > +	/* Define the default IO read settings */
> > > +	ctl_val = readl(chip->ctl) & ~CTRL_IO_CMD_MASK;
> > > +	ctl_val |= aspeed_spi_get_io_mode(op) |
> > > +		op->cmd.opcode << CTRL_COMMAND_SHIFT |
> > > +		CTRL_IO_DUMMY_SET(op->dummy.nbytes / op->dummy.buswidth) |
> > > +		CTRL_IO_MODE_READ;
> > > +
> > > +	/* Tune 4BYTE address mode */
> > > +	if (op->addr.nbytes) {
> > > +		u32 addr_mode = readl(aspi->regs + CE_CTRL_REG);
> > > +
> > > +		if (op->addr.nbytes == 4)
> > > +			addr_mode |= (0x11 << chip->cs);
> > > +		else
> > > +			addr_mode &= ~(0x11 << chip->cs);
> > > +		writel(addr_mode, aspi->regs + CE_CTRL_REG);
> > > +	}
> > > +
> > > +	/* READ mode is the controller default setting */
> > > +	chip->ctl_val[ASPEED_SPI_READ] = ctl_val;
> > > +	writel(chip->ctl_val[ASPEED_SPI_READ], chip->ctl);
> > > +
> > > +	dev_info(aspi->dev, "CE%d read buswidth:%d [0x%08x]\n",
> > > +		 chip->cs, op->data.buswidth, chip->ctl_val[ASPEED_SPI_READ]);
> > > +
> > > +	return ret;
> > > +}
> > > +
> > > +static ssize_t aspeed_spi_dirmap_read(struct spi_mem_dirmap_desc *desc,
> > > +				      u64 offset, size_t len, void *buf)
> > > +{
> > > +	struct aspeed_spi *aspi = spi_controller_get_devdata(desc->mem->spi->master);
> > > +	struct aspeed_spi_chip *chip = &aspi->chips[desc->mem->spi->chip_select];
> > > +
> > > +	/* Switch to USER command mode if mapping window is too small */
> > > +	if (chip->ahb_window_size < offset + len)
> > > +		aspeed_spi_read_user(chip, &desc->info.op_tmpl, offset, len, buf);
> > > +	else
> > > +		memcpy_fromio(buf, chip->ahb_base + offset, len);
> > > +
> > > +	return len;
> > > +}
> > > +
> > >   static const struct spi_controller_mem_ops aspeed_spi_mem_ops = {
> > >   	.supports_op = aspeed_spi_supports_op,
> > >   	.exec_op = aspeed_spi_exec_op,
> > >   	.get_name = aspeed_spi_get_name,
> > > +	.dirmap_create = aspeed_spi_dirmap_create,
> > > +	.dirmap_read = aspeed_spi_dirmap_read,
> > >   };
> > >   static void aspeed_spi_chip_set_type(struct aspeed_spi *aspi, unsigned int cs, int type)
> > > -- 
> > > 2.34.1
> > > 
> > 
>