mbox series

[V2,00/14] net: ks8851: Unify KS8851 SPI and MLL drivers

Message ID 20200325150543.78569-1-marex@denx.de
Headers show
Series net: ks8851: Unify KS8851 SPI and MLL drivers | expand

Message

Marek Vasut March 25, 2020, 3:05 p.m. UTC
The KS8851SNL/SNLI and KS8851-16MLL/MLLI/MLLU are very much the same pieces
of silicon, except the former has an SPI interface, while the later has a
parallel bus interface. Thus far, Linux has two separate drivers for each
and they are diverging considerably.

This series unifies them into a single driver with small SPI and parallel
bus specific parts. The approach here is to first separate out the SPI
specific parts into a separate file, then add parallel bus accessors in
another separate file and then finally remove the old parallel bus driver.
The reason for replacing the old parallel bus driver is because the SPI
bus driver is much higher quality.

NOTE: This series depends on "net: ks8851-ml: Fix IO operations, again"

Marek Vasut (14):
  net: ks8851: Factor out spi->dev in probe()/remove()
  net: ks8851: Rename ndev to netdev in probe
  net: ks8851: Replace dev_err() with netdev_err() in IRQ handler
  net: ks8851: Pass device node into ks8851_init_mac()
  net: ks8851: Use devm_alloc_etherdev()
  net: ks8851: Use dev_{get,set}_drvdata()
  net: ks8851: Remove ks8851_rdreg32()
  net: ks8851: Use 16-bit writes to program MAC address
  net: ks8851: Use 16-bit read of RXFC register
  net: ks8851: Split out SPI specific entries in struct ks8851_net
  net: ks8851: Split out SPI specific code from probe() and remove()
  net: ks8851: Separate SPI operations into separate file
  net: ks8851: Implement Parallel bus operations
  net: ks8851: Remove ks8851_mll.c

 drivers/net/ethernet/micrel/Kconfig      |    2 +
 drivers/net/ethernet/micrel/Makefile     |    4 +-
 drivers/net/ethernet/micrel/ks8851.c     |  517 +-------
 drivers/net/ethernet/micrel/ks8851.h     |  127 +-
 drivers/net/ethernet/micrel/ks8851_mll.c | 1393 ----------------------
 drivers/net/ethernet/micrel/ks8851_par.c |  238 ++++
 drivers/net/ethernet/micrel/ks8851_spi.c |  305 +++++
 7 files changed, 726 insertions(+), 1860 deletions(-)
 delete mode 100644 drivers/net/ethernet/micrel/ks8851_mll.c
 create mode 100644 drivers/net/ethernet/micrel/ks8851_par.c
 create mode 100644 drivers/net/ethernet/micrel/ks8851_spi.c

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>

Comments

Lukas Wunner March 26, 2020, 7:02 p.m. UTC | #1
On Wed, Mar 25, 2020 at 04:05:29PM +0100, Marek Vasut wrote:
> The KS8851SNL/SNLI and KS8851-16MLL/MLLI/MLLU are very much the same pieces
> of silicon, except the former has an SPI interface, while the later has a
> parallel bus interface. Thus far, Linux has two separate drivers for each
> and they are diverging considerably.
> 
> This series unifies them into a single driver with small SPI and parallel
> bus specific parts. The approach here is to first separate out the SPI
> specific parts into a separate file, then add parallel bus accessors in
> another separate file and then finally remove the old parallel bus driver.
> The reason for replacing the old parallel bus driver is because the SPI
> bus driver is much higher quality.

With this series, ks8851.ko (SPI variant) failed to compile as a module.
I got it working by renaming ks8851.c to ks8851_common.c and applying
the following change to the Makefile:

--- a/drivers/net/ethernet/micrel/Makefile
+++ b/drivers/net/ethernet/micrel/Makefile
@@ -5,6 +5,8 @@
 
 obj-$(CONFIG_ARM_KS8695_ETHER) += ks8695net.o
 obj-$(CONFIG_KS8842) += ks8842.o
-obj-$(CONFIG_KS8851) += ks8851.o ks8851_spi.o
-obj-$(CONFIG_KS8851_MLL) += ks8851.o ks8851_par.o
+obj-$(CONFIG_KS8851) += ks8851.o
+ks8851-objs = ks8851_common.o ks8851_spi.o
+obj-$(CONFIG_KS8851_MLL) += ks8851_mll.o
+ks8851_mll-objs = ks8851_common.o ks8851_par.o
 obj-$(CONFIG_KSZ884X_PCI) += ksz884x.o

This series breaks reading the MAC address from an EEPROM attached to
the KSZ8851SNLI:
The MAC address stored in the EEPROM was c8:3e:a7:99:ef:aa.
The MAC address was read as 3e:c8:99:a7:ef:aa with this series.
Note: The MAC address starts at the third byte in the EEPROM and is
stored as aa:ef:99:a7:3e:c8, i.e. in reverse order.  (I think the
spec says something else but it appears to be wrong.)

Assigning a MAC address with "ifconfig eth1 hw ether <mac>" (which I
believe ends up calling ks8851_write_mac_addr()) worked fine.

The performance degredation with this series is as follows:

Latency (ping) without this series:
  rtt min/avg/max/mdev = 0.982/1.776/3.756/0.027 ms, ipg/ewma 2.001/1.761 ms
With this series:
  rtt min/avg/max/mdev = 1.084/1.811/3.546/0.040 ms, ipg/ewma 2.020/1.814 ms

Throughput (scp) without this series:
  Transferred: sent 369780976, received 66088 bytes, in 202.0 seconds
  Bytes per second: sent 1830943.5, received 327.2
With this series:
  Transferred: sent 369693896, received 67588 bytes, in 210.5 seconds
  Bytes per second: sent 1755952.6, received 321.0

SPI clock is 25 MHz.  The chip would allow up to 40 MHz, but the board
layout limits that.

I suspect the performance regression is not only caused by the
suboptimal 16 byte instead of 8 byte accesses (and 2x16 byte instead
of 32 byte accesses), but also because the accessor functions cannot
be inlined.  It would be better if they were included from a header
file as static inlines.  The performance regression would then likely
disappear.

I guess the good news is that it otherwise worked out of the box.

Thanks,

Lukas
Marek Vasut March 27, 2020, 6:18 p.m. UTC | #2
On 3/26/20 8:02 PM, Lukas Wunner wrote:
> On Wed, Mar 25, 2020 at 04:05:29PM +0100, Marek Vasut wrote:
>> The KS8851SNL/SNLI and KS8851-16MLL/MLLI/MLLU are very much the same pieces
>> of silicon, except the former has an SPI interface, while the later has a
>> parallel bus interface. Thus far, Linux has two separate drivers for each
>> and they are diverging considerably.
>>
>> This series unifies them into a single driver with small SPI and parallel
>> bus specific parts. The approach here is to first separate out the SPI
>> specific parts into a separate file, then add parallel bus accessors in
>> another separate file and then finally remove the old parallel bus driver.
>> The reason for replacing the old parallel bus driver is because the SPI
>> bus driver is much higher quality.
> 
> With this series, ks8851.ko (SPI variant) failed to compile as a module.
> I got it working by renaming ks8851.c to ks8851_common.c and applying
> the following change to the Makefile:
> 
> --- a/drivers/net/ethernet/micrel/Makefile
> +++ b/drivers/net/ethernet/micrel/Makefile
> @@ -5,6 +5,8 @@
>  
>  obj-$(CONFIG_ARM_KS8695_ETHER) += ks8695net.o
>  obj-$(CONFIG_KS8842) += ks8842.o
> -obj-$(CONFIG_KS8851) += ks8851.o ks8851_spi.o
> -obj-$(CONFIG_KS8851_MLL) += ks8851.o ks8851_par.o
> +obj-$(CONFIG_KS8851) += ks8851.o
> +ks8851-objs = ks8851_common.o ks8851_spi.o
> +obj-$(CONFIG_KS8851_MLL) += ks8851_mll.o
> +ks8851_mll-objs = ks8851_common.o ks8851_par.o
>  obj-$(CONFIG_KSZ884X_PCI) += ksz884x.o
> 
> This series breaks reading the MAC address from an EEPROM attached to
> the KSZ8851SNLI:
> The MAC address stored in the EEPROM was c8:3e:a7:99:ef:aa.
> The MAC address was read as 3e:c8:99:a7:ef:aa with this series.
> Note: The MAC address starts at the third byte in the EEPROM and is
> stored as aa:ef:99:a7:3e:c8, i.e. in reverse order.  (I think the
> spec says something else but it appears to be wrong.)
> 
> Assigning a MAC address with "ifconfig eth1 hw ether <mac>" (which I
> believe ends up calling ks8851_write_mac_addr()) worked fine.

I added fixes for those.

> The performance degredation with this series is as follows:
> 
> Latency (ping) without this series:
>   rtt min/avg/max/mdev = 0.982/1.776/3.756/0.027 ms, ipg/ewma 2.001/1.761 ms
> With this series:
>   rtt min/avg/max/mdev = 1.084/1.811/3.546/0.040 ms, ipg/ewma 2.020/1.814 ms
> 
> Throughput (scp) without this series:
>   Transferred: sent 369780976, received 66088 bytes, in 202.0 seconds
>   Bytes per second: sent 1830943.5, received 327.2
> With this series:
>   Transferred: sent 369693896, received 67588 bytes, in 210.5 seconds
>   Bytes per second: sent 1755952.6, received 321.0

Maybe some iperf would be better here ?

> SPI clock is 25 MHz.  The chip would allow up to 40 MHz, but the board
> layout limits that.
> 
> I suspect the performance regression is not only caused by the
> suboptimal 16 byte instead of 8 byte accesses (and 2x16 byte instead
> of 32 byte accesses), but also because the accessor functions cannot
> be inlined.  It would be better if they were included from a header
> file as static inlines.  The performance regression would then likely
> disappear.

I did another measurement today and I found out that while RX on the old
KS8851-MLL driver runs at ~50 Mbit/s , TX runs at ~80 Mbit/s . With this
new driver, RX still runs at ~50 Mbit/s, but TX runs also at 50 Mbit/s .
That's real bad. Any ideas how to debug/profile this one ?

> I guess the good news is that it otherwise worked out of the box.

Great
Marek Vasut March 27, 2020, 7:25 p.m. UTC | #3
On 3/27/20 7:18 PM, Marek Vasut wrote:

[...]

>> The performance degredation with this series is as follows:
>>
>> Latency (ping) without this series:
>>   rtt min/avg/max/mdev = 0.982/1.776/3.756/0.027 ms, ipg/ewma 2.001/1.761 ms
>> With this series:
>>   rtt min/avg/max/mdev = 1.084/1.811/3.546/0.040 ms, ipg/ewma 2.020/1.814 ms
>>
>> Throughput (scp) without this series:
>>   Transferred: sent 369780976, received 66088 bytes, in 202.0 seconds
>>   Bytes per second: sent 1830943.5, received 327.2
>> With this series:
>>   Transferred: sent 369693896, received 67588 bytes, in 210.5 seconds
>>   Bytes per second: sent 1755952.6, received 321.0
> 
> Maybe some iperf would be better here ?
> 
>> SPI clock is 25 MHz.  The chip would allow up to 40 MHz, but the board
>> layout limits that.
>>
>> I suspect the performance regression is not only caused by the
>> suboptimal 16 byte instead of 8 byte accesses (and 2x16 byte instead
>> of 32 byte accesses), but also because the accessor functions cannot
>> be inlined.  It would be better if they were included from a header
>> file as static inlines.  The performance regression would then likely
>> disappear.
> 
> I did another measurement today and I found out that while RX on the old
> KS8851-MLL driver runs at ~50 Mbit/s , TX runs at ~80 Mbit/s . With this
> new driver, RX still runs at ~50 Mbit/s, but TX runs also at 50 Mbit/s .
> That's real bad. Any ideas how to debug/profile this one ?

So this schedule_work in start_xmit is the problem I have. If I hack it
up to do what the ks8851-mll does -- basically write packet into HW and
wait until it's transmitted -- then I get my 75 Mbit/s back.

I think we should implement some napi here, but for TX ? Basically
buffer up a few packets and then write them to the hardware in bulk.
There has to be something like that in the network stack , no ?