diff mbox series

mtd: revert "spi-nor: intel: provide a range for poll_timout"

Message ID 20200610224652.64336-1-luisalberto@google.com
State Accepted
Delegated to: Ambarus Tudor
Headers show
Series mtd: revert "spi-nor: intel: provide a range for poll_timout" | expand

Commit Message

Luis Alberto Herrera June 10, 2020, 10:46 p.m. UTC
This change reverts aba3a882a178: "mtd: spi-nor: intel: provide a range
for poll_timout". That change introduces a performance regression when
reading sequentially from flash. Logging calls to intel_spi_read without
this change we get:

Start MTD read
[   20.045527] intel_spi_read(from=1800000, len=400000)
[   20.045527] intel_spi_read(from=1800000, len=400000)
[  282.199274] intel_spi_read(from=1c00000, len=400000)
[  282.199274] intel_spi_read(from=1c00000, len=400000)
[  544.351528] intel_spi_read(from=2000000, len=400000)
[  544.351528] intel_spi_read(from=2000000, len=400000)
End MTD read

With this change:

Start MTD read
[   21.942922] intel_spi_read(from=1c00000, len=400000)
[   21.942922] intel_spi_read(from=1c00000, len=400000)
[   23.784058] intel_spi_read(from=2000000, len=400000)
[   23.784058] intel_spi_read(from=2000000, len=400000)
[   25.625006] intel_spi_read(from=2400000, len=400000)
[   25.625006] intel_spi_read(from=2400000, len=400000)
End MTD read

Signed-off-by: Luis Alberto Herrera <luisalberto@google.com>
---
 drivers/mtd/spi-nor/controllers/intel-spi.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Mika Westerberg June 11, 2020, 10:39 a.m. UTC | #1
On Wed, Jun 10, 2020 at 10:46:49PM +0000, Luis Alberto Herrera wrote:
> This change reverts aba3a882a178: "mtd: spi-nor: intel: provide a range
> for poll_timout". That change introduces a performance regression when
> reading sequentially from flash. Logging calls to intel_spi_read without
> this change we get:
> 
> Start MTD read
> [   20.045527] intel_spi_read(from=1800000, len=400000)
> [   20.045527] intel_spi_read(from=1800000, len=400000)
> [  282.199274] intel_spi_read(from=1c00000, len=400000)
> [  282.199274] intel_spi_read(from=1c00000, len=400000)
> [  544.351528] intel_spi_read(from=2000000, len=400000)
> [  544.351528] intel_spi_read(from=2000000, len=400000)
> End MTD read
> 
> With this change:
> 
> Start MTD read
> [   21.942922] intel_spi_read(from=1c00000, len=400000)
> [   21.942922] intel_spi_read(from=1c00000, len=400000)
> [   23.784058] intel_spi_read(from=2000000, len=400000)
> [   23.784058] intel_spi_read(from=2000000, len=400000)
> [   25.625006] intel_spi_read(from=2400000, len=400000)
> [   25.625006] intel_spi_read(from=2400000, len=400000)
> End MTD read
> 
> Signed-off-by: Luis Alberto Herrera <luisalberto@google.com>

Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tudor Ambarus June 26, 2020, 10:57 a.m. UTC | #2
On 6/11/20 1:46 AM, Luis Alberto Herrera wrote:
> This change reverts aba3a882a178: "mtd: spi-nor: intel: provide a range
> for poll_timout". That change introduces a performance regression when
> reading sequentially from flash. Logging calls to intel_spi_read without
> this change we get:
> 
> Start MTD read
> [   20.045527] intel_spi_read(from=1800000, len=400000)
> [   20.045527] intel_spi_read(from=1800000, len=400000)
> [  282.199274] intel_spi_read(from=1c00000, len=400000)
> [  282.199274] intel_spi_read(from=1c00000, len=400000)
> [  544.351528] intel_spi_read(from=2000000, len=400000)
> [  544.351528] intel_spi_read(from=2000000, len=400000)
> End MTD read
> 
> With this change:
> 
> Start MTD read
> [   21.942922] intel_spi_read(from=1c00000, len=400000)
> [   21.942922] intel_spi_read(from=1c00000, len=400000)
> [   23.784058] intel_spi_read(from=2000000, len=400000)
> [   23.784058] intel_spi_read(from=2000000, len=400000)
> [   25.625006] intel_spi_read(from=2400000, len=400000)
> [   25.625006] intel_spi_read(from=2400000, len=400000)
> End MTD read
> 
> Signed-off-by: Luis Alberto Herrera <luisalberto@google.com>
> ---
>  drivers/mtd/spi-nor/controllers/intel-spi.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/mtd/spi-nor/controllers/intel-spi.c b/drivers/mtd/spi-nor/controllers/intel-spi.c
> index 61d2a0ad2131..2b89361a0d3a 100644
> --- a/drivers/mtd/spi-nor/controllers/intel-spi.c
> +++ b/drivers/mtd/spi-nor/controllers/intel-spi.c
> @@ -292,7 +292,7 @@ static int intel_spi_wait_hw_busy(struct intel_spi *ispi)
>         u32 val;
> 
>         return readl_poll_timeout(ispi->base + HSFSTS_CTL, val,
> -                                 !(val & HSFSTS_CTL_SCIP), 40,
> +                                 !(val & HSFSTS_CTL_SCIP), 0,

Would 10 us keep the performance as it was before?

Cheers,
ta
Alexander Sverdlin July 22, 2020, 4:37 p.m. UTC | #3
Hello Luis,

thank you for the patch!

On 11/06/2020 00:46, Luis Alberto Herrera wrote:
> This change reverts aba3a882a178: "mtd: spi-nor: intel: provide a range
> for poll_timout". That change introduces a performance regression when
> reading sequentially from flash. Logging calls to intel_spi_read without
> this change we get:
> 
> Start MTD read
> [   20.045527] intel_spi_read(from=1800000, len=400000)
> [   20.045527] intel_spi_read(from=1800000, len=400000)
> [  282.199274] intel_spi_read(from=1c00000, len=400000)
> [  282.199274] intel_spi_read(from=1c00000, len=400000)
> [  544.351528] intel_spi_read(from=2000000, len=400000)
> [  544.351528] intel_spi_read(from=2000000, len=400000)
> End MTD read
> 
> With this change:
> 
> Start MTD read
> [   21.942922] intel_spi_read(from=1c00000, len=400000)
> [   21.942922] intel_spi_read(from=1c00000, len=400000)
> [   23.784058] intel_spi_read(from=2000000, len=400000)
> [   23.784058] intel_spi_read(from=2000000, len=400000)
> [   25.625006] intel_spi_read(from=2400000, len=400000)
> [   25.625006] intel_spi_read(from=2400000, len=400000)
> End MTD read

I've performed my testing as well and got the following results:

Vanilla Linux 4.9 (i.e. before the introduction of the offending
patch):

dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k              
1280+0 records in
1280+0 records out
5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.91981 s, 1.3 MB/s

Vanilla 4.19 (i.e. with offending patch):

dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
1280+0 records in
1280+0 records out
5242880 bytes (5.2 MB, 5.0 MiB) copied, 6.70891 s, 781 kB/s

4.19 + revert:

dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
1280+0 records in
1280+0 records out
5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.90503 s, 1.3 MB/s

Therefore it looks good from my PoV:

Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>

> Signed-off-by: Luis Alberto Herrera <luisalberto@google.com>
> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> ---
>  drivers/mtd/spi-nor/controllers/intel-spi.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/mtd/spi-nor/controllers/intel-spi.c b/drivers/mtd/spi-nor/controllers/intel-spi.c
> index 61d2a0ad2131..2b89361a0d3a 100644
> --- a/drivers/mtd/spi-nor/controllers/intel-spi.c
> +++ b/drivers/mtd/spi-nor/controllers/intel-spi.c
> @@ -292,7 +292,7 @@ static int intel_spi_wait_hw_busy(struct intel_spi *ispi)
>  	u32 val;
>  
>  	return readl_poll_timeout(ispi->base + HSFSTS_CTL, val,
> -				  !(val & HSFSTS_CTL_SCIP), 40,
> +				  !(val & HSFSTS_CTL_SCIP), 0,
>  				  INTEL_SPI_TIMEOUT * 1000);
>  }
>  
> @@ -301,7 +301,7 @@ static int intel_spi_wait_sw_busy(struct intel_spi *ispi)
>  	u32 val;
>  
>  	return readl_poll_timeout(ispi->sregs + SSFSTS_CTL, val,
> -				  !(val & SSFSTS_CTL_SCIP), 40,
> +				  !(val & SSFSTS_CTL_SCIP), 0,
>  				  INTEL_SPI_TIMEOUT * 1000);
>  }
>  
>
Tudor Ambarus July 22, 2020, 5:03 p.m. UTC | #4
Hi, Alexander,

On 7/22/20 7:37 PM, Alexander Sverdlin wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> Hello Luis,
> 
> thank you for the patch!
> 
> On 11/06/2020 00:46, Luis Alberto Herrera wrote:
>> This change reverts aba3a882a178: "mtd: spi-nor: intel: provide a range
>> for poll_timout". That change introduces a performance regression when
>> reading sequentially from flash. Logging calls to intel_spi_read without
>> this change we get:
>>
>> Start MTD read
>> [   20.045527] intel_spi_read(from=1800000, len=400000)
>> [   20.045527] intel_spi_read(from=1800000, len=400000)
>> [  282.199274] intel_spi_read(from=1c00000, len=400000)
>> [  282.199274] intel_spi_read(from=1c00000, len=400000)
>> [  544.351528] intel_spi_read(from=2000000, len=400000)
>> [  544.351528] intel_spi_read(from=2000000, len=400000)
>> End MTD read
>>
>> With this change:
>>
>> Start MTD read
>> [   21.942922] intel_spi_read(from=1c00000, len=400000)
>> [   21.942922] intel_spi_read(from=1c00000, len=400000)
>> [   23.784058] intel_spi_read(from=2000000, len=400000)
>> [   23.784058] intel_spi_read(from=2000000, len=400000)
>> [   25.625006] intel_spi_read(from=2400000, len=400000)
>> [   25.625006] intel_spi_read(from=2400000, len=400000)
>> End MTD read
> 
> I've performed my testing as well and got the following results:
> 
> Vanilla Linux 4.9 (i.e. before the introduction of the offending
> patch):
> 
> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
> 1280+0 records in
> 1280+0 records out
> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.91981 s, 1.3 MB/s
> 
> Vanilla 4.19 (i.e. with offending patch):
> 
> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
> 1280+0 records in
> 1280+0 records out
> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 6.70891 s, 781 kB/s
> 
> 4.19 + revert:
> 
> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
> 1280+0 records in
> 1280+0 records out
> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.90503 s, 1.3 MB/s
> 
> Therefore it looks good from my PoV:
> 
> Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
> 
>> Signed-off-by: Luis Alberto Herrera <luisalberto@google.com>
>> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
>> ---
>>  drivers/mtd/spi-nor/controllers/intel-spi.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/mtd/spi-nor/controllers/intel-spi.c b/drivers/mtd/spi-nor/controllers/intel-spi.c
>> index 61d2a0ad2131..2b89361a0d3a 100644
>> --- a/drivers/mtd/spi-nor/controllers/intel-spi.c
>> +++ b/drivers/mtd/spi-nor/controllers/intel-spi.c
>> @@ -292,7 +292,7 @@ static int intel_spi_wait_hw_busy(struct intel_spi *ispi)
>>       u32 val;
>>
>>       return readl_poll_timeout(ispi->base + HSFSTS_CTL, val,
>> -                               !(val & HSFSTS_CTL_SCIP), 40,>> +                               !(val & HSFSTS_CTL_SCIP), 0,

would you put 10 us here
>>                                 INTEL_SPI_TIMEOUT * 1000);
>>  }
>>
>> @@ -301,7 +301,7 @@ static int intel_spi_wait_sw_busy(struct intel_spi *ispi)
>>       u32 val;
>>
>>       return readl_poll_timeout(ispi->sregs + SSFSTS_CTL, val,
>> -                               !(val & SSFSTS_CTL_SCIP), 40,
>> +                               !(val & SSFSTS_CTL_SCIP), 0,

also here, and re-do a test? I'm curios if the performance will be
as it was before.

Thanks!

>>                                 INTEL_SPI_TIMEOUT * 1000);
>>  }
>>
>>
> 
> --
> Best regards,
> Alexander Sverdlin.
>
Alexander Sverdlin July 23, 2020, 9:05 a.m. UTC | #5
Hello Tudor,

On 22/07/2020 19:03, Tudor.Ambarus@microchip.com wrote:
> On 7/22/20 7:37 PM, Alexander Sverdlin wrote:

[...]

>> I've performed my testing as well and got the following results:
>>
>> Vanilla Linux 4.9 (i.e. before the introduction of the offending
>> patch):
>>
>> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
>> 1280+0 records in
>> 1280+0 records out
>> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.91981 s, 1.3 MB/s
>>
>> Vanilla 4.19 (i.e. with offending patch):
>>
>> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
>> 1280+0 records in
>> 1280+0 records out
>> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 6.70891 s, 781 kB/s
>>
>> 4.19 + revert:
>>
>> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
>> 1280+0 records in
>> 1280+0 records out
>> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.90503 s, 1.3 MB/s
>>
>> Therefore it looks good from my PoV:
>>
>> Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>

[...]

> would you put 10 us here
>>>                                 INTEL_SPI_TIMEOUT * 1000);
>>>  }
>>>
>>> @@ -301,7 +301,7 @@ static int intel_spi_wait_sw_busy(struct intel_spi *ispi)
>>>       u32 val;
>>>
>>>       return readl_poll_timeout(ispi->sregs + SSFSTS_CTL, val,
>>> -                               !(val & SSFSTS_CTL_SCIP), 40,
>>> +                               !(val & SSFSTS_CTL_SCIP), 0,
> 
> also here, and re-do a test? I'm curios if the performance will be
> as it was before.

with 10us it looks like this:

dd if=/dev/flash/by-name/... of=/dev/null bs=4k
1280+0 records in
1280+0 records out
5242880 bytes (5.2 MB, 5.0 MiB) copied, 4.33816 s, 1.2 MB/s

Which means, there is a performance regression and it would depend on
the test case, how bad it will be...
Tudor Ambarus July 28, 2020, 8:28 a.m. UTC | #6
Hi, Mika,

On 7/23/20 12:05 PM, Alexander Sverdlin wrote:
> 
> Hello Tudor,
> 
> On 22/07/2020 19:03, Tudor.Ambarus@microchip.com wrote:
>> On 7/22/20 7:37 PM, Alexander Sverdlin wrote:
> 
> [...]
> 
>>> I've performed my testing as well and got the following results:
>>>
>>> Vanilla Linux 4.9 (i.e. before the introduction of the offending
>>> patch):
>>>
>>> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
>>> 1280+0 records in
>>> 1280+0 records out
>>> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.91981 s, 1.3 MB/s
>>>
>>> Vanilla 4.19 (i.e. with offending patch):
>>>
>>> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
>>> 1280+0 records in
>>> 1280+0 records out
>>> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 6.70891 s, 781 kB/s
>>>
>>> 4.19 + revert:
>>>
>>> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
>>> 1280+0 records in
>>> 1280+0 records out
>>> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.90503 s, 1.3 MB/s
>>>

[cut]

> with 10us it looks like this:
> 
> dd if=/dev/flash/by-name/... of=/dev/null bs=4k
> 1280+0 records in
> 1280+0 records out
> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 4.33816 s, 1.2 MB/s
> 
> Which means, there is a performance regression and it would depend on
> the test case, how bad it will be...
> 

We need a bit of a context here. Using a tight-loop for polling and
having a 5 secs timeout is fishy. For anything that's expected to
complete less than a few usec, it's usually better to poll continuously,
but then a timeout of 5s is way too big. Can we shrink the timeout to
few msecs?

I'll queue this to spi-nor/next to fix the perf regression, but I would
like to continue the discussion and to come up with an incremental patch
on top of this one.

Cheers,
ta
Tudor Ambarus July 28, 2020, 9:55 a.m. UTC | #7
On Wed, 10 Jun 2020 22:46:49 +0000, Luis Alberto Herrera wrote:
> This change reverts aba3a882a178: "mtd: spi-nor: intel: provide a range
> for poll_timout". That change introduces a performance regression when
> reading sequentially from flash. Logging calls to intel_spi_read without
> this change we get:
> 
> Start MTD read
> [   20.045527] intel_spi_read(from=1800000, len=400000)
> [   20.045527] intel_spi_read(from=1800000, len=400000)
> [  282.199274] intel_spi_read(from=1c00000, len=400000)
> [  282.199274] intel_spi_read(from=1c00000, len=400000)
> [  544.351528] intel_spi_read(from=2000000, len=400000)
> [  544.351528] intel_spi_read(from=2000000, len=400000)
> End MTD read
> 
> [...]

Applied to spi-nor/next, thanks!

[1/1] mtd: revert "spi-nor: intel: provide a range for poll_timout"
      https://git.kernel.org/mtd/c/e93a977367b2

Best regards,
diff mbox series

Patch

diff --git a/drivers/mtd/spi-nor/controllers/intel-spi.c b/drivers/mtd/spi-nor/controllers/intel-spi.c
index 61d2a0ad2131..2b89361a0d3a 100644
--- a/drivers/mtd/spi-nor/controllers/intel-spi.c
+++ b/drivers/mtd/spi-nor/controllers/intel-spi.c
@@ -292,7 +292,7 @@  static int intel_spi_wait_hw_busy(struct intel_spi *ispi)
 	u32 val;
 
 	return readl_poll_timeout(ispi->base + HSFSTS_CTL, val,
-				  !(val & HSFSTS_CTL_SCIP), 40,
+				  !(val & HSFSTS_CTL_SCIP), 0,
 				  INTEL_SPI_TIMEOUT * 1000);
 }
 
@@ -301,7 +301,7 @@  static int intel_spi_wait_sw_busy(struct intel_spi *ispi)
 	u32 val;
 
 	return readl_poll_timeout(ispi->sregs + SSFSTS_CTL, val,
-				  !(val & SSFSTS_CTL_SCIP), 40,
+				  !(val & SSFSTS_CTL_SCIP), 0,
 				  INTEL_SPI_TIMEOUT * 1000);
 }