Message ID | 20200610224652.64336-1-luisalberto@google.com |
---|---|
State | Accepted |
Delegated to: | Ambarus Tudor |
Headers | show |
Series | mtd: revert "spi-nor: intel: provide a range for poll_timout" | expand |
On Wed, Jun 10, 2020 at 10:46:49PM +0000, Luis Alberto Herrera wrote: > This change reverts aba3a882a178: "mtd: spi-nor: intel: provide a range > for poll_timout". That change introduces a performance regression when > reading sequentially from flash. Logging calls to intel_spi_read without > this change we get: > > Start MTD read > [ 20.045527] intel_spi_read(from=1800000, len=400000) > [ 20.045527] intel_spi_read(from=1800000, len=400000) > [ 282.199274] intel_spi_read(from=1c00000, len=400000) > [ 282.199274] intel_spi_read(from=1c00000, len=400000) > [ 544.351528] intel_spi_read(from=2000000, len=400000) > [ 544.351528] intel_spi_read(from=2000000, len=400000) > End MTD read > > With this change: > > Start MTD read > [ 21.942922] intel_spi_read(from=1c00000, len=400000) > [ 21.942922] intel_spi_read(from=1c00000, len=400000) > [ 23.784058] intel_spi_read(from=2000000, len=400000) > [ 23.784058] intel_spi_read(from=2000000, len=400000) > [ 25.625006] intel_spi_read(from=2400000, len=400000) > [ 25.625006] intel_spi_read(from=2400000, len=400000) > End MTD read > > Signed-off-by: Luis Alberto Herrera <luisalberto@google.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
On 6/11/20 1:46 AM, Luis Alberto Herrera wrote: > This change reverts aba3a882a178: "mtd: spi-nor: intel: provide a range > for poll_timout". That change introduces a performance regression when > reading sequentially from flash. Logging calls to intel_spi_read without > this change we get: > > Start MTD read > [ 20.045527] intel_spi_read(from=1800000, len=400000) > [ 20.045527] intel_spi_read(from=1800000, len=400000) > [ 282.199274] intel_spi_read(from=1c00000, len=400000) > [ 282.199274] intel_spi_read(from=1c00000, len=400000) > [ 544.351528] intel_spi_read(from=2000000, len=400000) > [ 544.351528] intel_spi_read(from=2000000, len=400000) > End MTD read > > With this change: > > Start MTD read > [ 21.942922] intel_spi_read(from=1c00000, len=400000) > [ 21.942922] intel_spi_read(from=1c00000, len=400000) > [ 23.784058] intel_spi_read(from=2000000, len=400000) > [ 23.784058] intel_spi_read(from=2000000, len=400000) > [ 25.625006] intel_spi_read(from=2400000, len=400000) > [ 25.625006] intel_spi_read(from=2400000, len=400000) > End MTD read > > Signed-off-by: Luis Alberto Herrera <luisalberto@google.com> > --- > drivers/mtd/spi-nor/controllers/intel-spi.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/mtd/spi-nor/controllers/intel-spi.c b/drivers/mtd/spi-nor/controllers/intel-spi.c > index 61d2a0ad2131..2b89361a0d3a 100644 > --- a/drivers/mtd/spi-nor/controllers/intel-spi.c > +++ b/drivers/mtd/spi-nor/controllers/intel-spi.c > @@ -292,7 +292,7 @@ static int intel_spi_wait_hw_busy(struct intel_spi *ispi) > u32 val; > > return readl_poll_timeout(ispi->base + HSFSTS_CTL, val, > - !(val & HSFSTS_CTL_SCIP), 40, > + !(val & HSFSTS_CTL_SCIP), 0, Would 10 us keep the performance as it was before? Cheers, ta
Hello Luis, thank you for the patch! On 11/06/2020 00:46, Luis Alberto Herrera wrote: > This change reverts aba3a882a178: "mtd: spi-nor: intel: provide a range > for poll_timout". That change introduces a performance regression when > reading sequentially from flash. Logging calls to intel_spi_read without > this change we get: > > Start MTD read > [ 20.045527] intel_spi_read(from=1800000, len=400000) > [ 20.045527] intel_spi_read(from=1800000, len=400000) > [ 282.199274] intel_spi_read(from=1c00000, len=400000) > [ 282.199274] intel_spi_read(from=1c00000, len=400000) > [ 544.351528] intel_spi_read(from=2000000, len=400000) > [ 544.351528] intel_spi_read(from=2000000, len=400000) > End MTD read > > With this change: > > Start MTD read > [ 21.942922] intel_spi_read(from=1c00000, len=400000) > [ 21.942922] intel_spi_read(from=1c00000, len=400000) > [ 23.784058] intel_spi_read(from=2000000, len=400000) > [ 23.784058] intel_spi_read(from=2000000, len=400000) > [ 25.625006] intel_spi_read(from=2400000, len=400000) > [ 25.625006] intel_spi_read(from=2400000, len=400000) > End MTD read I've performed my testing as well and got the following results: Vanilla Linux 4.9 (i.e. before the introduction of the offending patch): dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k 1280+0 records in 1280+0 records out 5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.91981 s, 1.3 MB/s Vanilla 4.19 (i.e. with offending patch): dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k 1280+0 records in 1280+0 records out 5242880 bytes (5.2 MB, 5.0 MiB) copied, 6.70891 s, 781 kB/s 4.19 + revert: dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k 1280+0 records in 1280+0 records out 5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.90503 s, 1.3 MB/s Therefore it looks good from my PoV: Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> > Signed-off-by: Luis Alberto Herrera <luisalberto@google.com> > Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> > --- > drivers/mtd/spi-nor/controllers/intel-spi.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/mtd/spi-nor/controllers/intel-spi.c b/drivers/mtd/spi-nor/controllers/intel-spi.c > index 61d2a0ad2131..2b89361a0d3a 100644 > --- a/drivers/mtd/spi-nor/controllers/intel-spi.c > +++ b/drivers/mtd/spi-nor/controllers/intel-spi.c > @@ -292,7 +292,7 @@ static int intel_spi_wait_hw_busy(struct intel_spi *ispi) > u32 val; > > return readl_poll_timeout(ispi->base + HSFSTS_CTL, val, > - !(val & HSFSTS_CTL_SCIP), 40, > + !(val & HSFSTS_CTL_SCIP), 0, > INTEL_SPI_TIMEOUT * 1000); > } > > @@ -301,7 +301,7 @@ static int intel_spi_wait_sw_busy(struct intel_spi *ispi) > u32 val; > > return readl_poll_timeout(ispi->sregs + SSFSTS_CTL, val, > - !(val & SSFSTS_CTL_SCIP), 40, > + !(val & SSFSTS_CTL_SCIP), 0, > INTEL_SPI_TIMEOUT * 1000); > } > >
Hi, Alexander, On 7/22/20 7:37 PM, Alexander Sverdlin wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > Hello Luis, > > thank you for the patch! > > On 11/06/2020 00:46, Luis Alberto Herrera wrote: >> This change reverts aba3a882a178: "mtd: spi-nor: intel: provide a range >> for poll_timout". That change introduces a performance regression when >> reading sequentially from flash. Logging calls to intel_spi_read without >> this change we get: >> >> Start MTD read >> [ 20.045527] intel_spi_read(from=1800000, len=400000) >> [ 20.045527] intel_spi_read(from=1800000, len=400000) >> [ 282.199274] intel_spi_read(from=1c00000, len=400000) >> [ 282.199274] intel_spi_read(from=1c00000, len=400000) >> [ 544.351528] intel_spi_read(from=2000000, len=400000) >> [ 544.351528] intel_spi_read(from=2000000, len=400000) >> End MTD read >> >> With this change: >> >> Start MTD read >> [ 21.942922] intel_spi_read(from=1c00000, len=400000) >> [ 21.942922] intel_spi_read(from=1c00000, len=400000) >> [ 23.784058] intel_spi_read(from=2000000, len=400000) >> [ 23.784058] intel_spi_read(from=2000000, len=400000) >> [ 25.625006] intel_spi_read(from=2400000, len=400000) >> [ 25.625006] intel_spi_read(from=2400000, len=400000) >> End MTD read > > I've performed my testing as well and got the following results: > > Vanilla Linux 4.9 (i.e. before the introduction of the offending > patch): > > dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k > 1280+0 records in > 1280+0 records out > 5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.91981 s, 1.3 MB/s > > Vanilla 4.19 (i.e. with offending patch): > > dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k > 1280+0 records in > 1280+0 records out > 5242880 bytes (5.2 MB, 5.0 MiB) copied, 6.70891 s, 781 kB/s > > 4.19 + revert: > > dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k > 1280+0 records in > 1280+0 records out > 5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.90503 s, 1.3 MB/s > > Therefore it looks good from my PoV: > > Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> > >> Signed-off-by: Luis Alberto Herrera <luisalberto@google.com> >> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> >> --- >> drivers/mtd/spi-nor/controllers/intel-spi.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/mtd/spi-nor/controllers/intel-spi.c b/drivers/mtd/spi-nor/controllers/intel-spi.c >> index 61d2a0ad2131..2b89361a0d3a 100644 >> --- a/drivers/mtd/spi-nor/controllers/intel-spi.c >> +++ b/drivers/mtd/spi-nor/controllers/intel-spi.c >> @@ -292,7 +292,7 @@ static int intel_spi_wait_hw_busy(struct intel_spi *ispi) >> u32 val; >> >> return readl_poll_timeout(ispi->base + HSFSTS_CTL, val, >> - !(val & HSFSTS_CTL_SCIP), 40,>> + !(val & HSFSTS_CTL_SCIP), 0, would you put 10 us here >> INTEL_SPI_TIMEOUT * 1000); >> } >> >> @@ -301,7 +301,7 @@ static int intel_spi_wait_sw_busy(struct intel_spi *ispi) >> u32 val; >> >> return readl_poll_timeout(ispi->sregs + SSFSTS_CTL, val, >> - !(val & SSFSTS_CTL_SCIP), 40, >> + !(val & SSFSTS_CTL_SCIP), 0, also here, and re-do a test? I'm curios if the performance will be as it was before. Thanks! >> INTEL_SPI_TIMEOUT * 1000); >> } >> >> > > -- > Best regards, > Alexander Sverdlin. >
Hello Tudor, On 22/07/2020 19:03, Tudor.Ambarus@microchip.com wrote: > On 7/22/20 7:37 PM, Alexander Sverdlin wrote: [...] >> I've performed my testing as well and got the following results: >> >> Vanilla Linux 4.9 (i.e. before the introduction of the offending >> patch): >> >> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k >> 1280+0 records in >> 1280+0 records out >> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.91981 s, 1.3 MB/s >> >> Vanilla 4.19 (i.e. with offending patch): >> >> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k >> 1280+0 records in >> 1280+0 records out >> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 6.70891 s, 781 kB/s >> >> 4.19 + revert: >> >> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k >> 1280+0 records in >> 1280+0 records out >> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.90503 s, 1.3 MB/s >> >> Therefore it looks good from my PoV: >> >> Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> [...] > would you put 10 us here >>> INTEL_SPI_TIMEOUT * 1000); >>> } >>> >>> @@ -301,7 +301,7 @@ static int intel_spi_wait_sw_busy(struct intel_spi *ispi) >>> u32 val; >>> >>> return readl_poll_timeout(ispi->sregs + SSFSTS_CTL, val, >>> - !(val & SSFSTS_CTL_SCIP), 40, >>> + !(val & SSFSTS_CTL_SCIP), 0, > > also here, and re-do a test? I'm curios if the performance will be > as it was before. with 10us it looks like this: dd if=/dev/flash/by-name/... of=/dev/null bs=4k 1280+0 records in 1280+0 records out 5242880 bytes (5.2 MB, 5.0 MiB) copied, 4.33816 s, 1.2 MB/s Which means, there is a performance regression and it would depend on the test case, how bad it will be...
Hi, Mika, On 7/23/20 12:05 PM, Alexander Sverdlin wrote: > > Hello Tudor, > > On 22/07/2020 19:03, Tudor.Ambarus@microchip.com wrote: >> On 7/22/20 7:37 PM, Alexander Sverdlin wrote: > > [...] > >>> I've performed my testing as well and got the following results: >>> >>> Vanilla Linux 4.9 (i.e. before the introduction of the offending >>> patch): >>> >>> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k >>> 1280+0 records in >>> 1280+0 records out >>> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.91981 s, 1.3 MB/s >>> >>> Vanilla 4.19 (i.e. with offending patch): >>> >>> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k >>> 1280+0 records in >>> 1280+0 records out >>> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 6.70891 s, 781 kB/s >>> >>> 4.19 + revert: >>> >>> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k >>> 1280+0 records in >>> 1280+0 records out >>> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.90503 s, 1.3 MB/s >>> [cut] > with 10us it looks like this: > > dd if=/dev/flash/by-name/... of=/dev/null bs=4k > 1280+0 records in > 1280+0 records out > 5242880 bytes (5.2 MB, 5.0 MiB) copied, 4.33816 s, 1.2 MB/s > > Which means, there is a performance regression and it would depend on > the test case, how bad it will be... > We need a bit of a context here. Using a tight-loop for polling and having a 5 secs timeout is fishy. For anything that's expected to complete less than a few usec, it's usually better to poll continuously, but then a timeout of 5s is way too big. Can we shrink the timeout to few msecs? I'll queue this to spi-nor/next to fix the perf regression, but I would like to continue the discussion and to come up with an incremental patch on top of this one. Cheers, ta
On Wed, 10 Jun 2020 22:46:49 +0000, Luis Alberto Herrera wrote: > This change reverts aba3a882a178: "mtd: spi-nor: intel: provide a range > for poll_timout". That change introduces a performance regression when > reading sequentially from flash. Logging calls to intel_spi_read without > this change we get: > > Start MTD read > [ 20.045527] intel_spi_read(from=1800000, len=400000) > [ 20.045527] intel_spi_read(from=1800000, len=400000) > [ 282.199274] intel_spi_read(from=1c00000, len=400000) > [ 282.199274] intel_spi_read(from=1c00000, len=400000) > [ 544.351528] intel_spi_read(from=2000000, len=400000) > [ 544.351528] intel_spi_read(from=2000000, len=400000) > End MTD read > > [...] Applied to spi-nor/next, thanks! [1/1] mtd: revert "spi-nor: intel: provide a range for poll_timout" https://git.kernel.org/mtd/c/e93a977367b2 Best regards,
diff --git a/drivers/mtd/spi-nor/controllers/intel-spi.c b/drivers/mtd/spi-nor/controllers/intel-spi.c index 61d2a0ad2131..2b89361a0d3a 100644 --- a/drivers/mtd/spi-nor/controllers/intel-spi.c +++ b/drivers/mtd/spi-nor/controllers/intel-spi.c @@ -292,7 +292,7 @@ static int intel_spi_wait_hw_busy(struct intel_spi *ispi) u32 val; return readl_poll_timeout(ispi->base + HSFSTS_CTL, val, - !(val & HSFSTS_CTL_SCIP), 40, + !(val & HSFSTS_CTL_SCIP), 0, INTEL_SPI_TIMEOUT * 1000); } @@ -301,7 +301,7 @@ static int intel_spi_wait_sw_busy(struct intel_spi *ispi) u32 val; return readl_poll_timeout(ispi->sregs + SSFSTS_CTL, val, - !(val & SSFSTS_CTL_SCIP), 40, + !(val & SSFSTS_CTL_SCIP), 0, INTEL_SPI_TIMEOUT * 1000); }
This change reverts aba3a882a178: "mtd: spi-nor: intel: provide a range for poll_timout". That change introduces a performance regression when reading sequentially from flash. Logging calls to intel_spi_read without this change we get: Start MTD read [ 20.045527] intel_spi_read(from=1800000, len=400000) [ 20.045527] intel_spi_read(from=1800000, len=400000) [ 282.199274] intel_spi_read(from=1c00000, len=400000) [ 282.199274] intel_spi_read(from=1c00000, len=400000) [ 544.351528] intel_spi_read(from=2000000, len=400000) [ 544.351528] intel_spi_read(from=2000000, len=400000) End MTD read With this change: Start MTD read [ 21.942922] intel_spi_read(from=1c00000, len=400000) [ 21.942922] intel_spi_read(from=1c00000, len=400000) [ 23.784058] intel_spi_read(from=2000000, len=400000) [ 23.784058] intel_spi_read(from=2000000, len=400000) [ 25.625006] intel_spi_read(from=2400000, len=400000) [ 25.625006] intel_spi_read(from=2400000, len=400000) End MTD read Signed-off-by: Luis Alberto Herrera <luisalberto@google.com> --- drivers/mtd/spi-nor/controllers/intel-spi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)