mbox series

[V3,00/18] net: ks8851: Unify KS8851 SPI and MLL drivers

Message ID 20200328003148.498021-1-marex@denx.de
Headers show
Series net: ks8851: Unify KS8851 SPI and MLL drivers | expand

Message

Marek Vasut March 28, 2020, 12:31 a.m. UTC
The KS8851SNL/SNLI and KS8851-16MLL/MLLI/MLLU are very much the same pieces
of silicon, except the former has an SPI interface, while the later has a
parallel bus interface. Thus far, Linux has two separate drivers for each
and they are diverging considerably.

This series unifies them into a single driver with small SPI and parallel
bus specific parts. The approach here is to first separate out the SPI
specific parts into a separate file, then add parallel bus accessors in
another separate file and then finally remove the old parallel bus driver.
The reason for replacing the old parallel bus driver is because the SPI
bus driver is much higher quality.

NOTE: This series depends on "net: ks8851-ml: Fix IO operations, again"

NOTE: The performance regression on KS8851-16MLL is now fixes, the TX
      throughput is back to ~75 Mbit/s , RX is still 50 Mbit/s .

Marek Vasut (18):
  net: ks8851: Factor out spi->dev in probe()/remove()
  net: ks8851: Rename ndev to netdev in probe
  net: ks8851: Replace dev_err() with netdev_err() in IRQ handler
  net: ks8851: Pass device node into ks8851_init_mac()
  net: ks8851: Use devm_alloc_etherdev()
  net: ks8851: Use dev_{get,set}_drvdata()
  net: ks8851: Remove ks8851_rdreg32()
  net: ks8851: Use 16-bit writes to program MAC address
  net: ks8851: Use 16-bit read of RXFC register
  net: ks8851: Factor out bus lock handling
  net: ks8851: Factor out SKB receive function
  net: ks8851: Split out SPI specific entries in struct ks8851_net
  net: ks8851: Split out SPI specific code from probe() and remove()
  net: ks8851: Factor out TX work flush function
  net: ks8851: Permit overridding interrupt enable register
  net: ks8851: Separate SPI operations into separate file
  net: ks8851: Implement Parallel bus operations
  net: ks8851: Remove ks8851_mll.c

 drivers/net/ethernet/micrel/Kconfig           |    2 +
 drivers/net/ethernet/micrel/Makefile          |    2 +
 drivers/net/ethernet/micrel/ks8851.h          |  142 +-
 .../micrel/{ks8851.c => ks8851_common.c}      |  688 ++------
 drivers/net/ethernet/micrel/ks8851_mll.c      | 1393 -----------------
 drivers/net/ethernet/micrel/ks8851_par.c      |  337 ++++
 drivers/net/ethernet/micrel/ks8851_spi.c      |  448 ++++++
 7 files changed, 1027 insertions(+), 1985 deletions(-)
 rename drivers/net/ethernet/micrel/{ks8851.c => ks8851_common.c} (60%)
 delete mode 100644 drivers/net/ethernet/micrel/ks8851_mll.c
 create mode 100644 drivers/net/ethernet/micrel/ks8851_par.c
 create mode 100644 drivers/net/ethernet/micrel/ks8851_spi.c

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>

Comments

Marek Vasut March 29, 2020, 4:15 p.m. UTC | #1
On 3/28/20 1:31 AM, Marek Vasut wrote:
> The KS8851SNL/SNLI and KS8851-16MLL/MLLI/MLLU are very much the same pieces
> of silicon, except the former has an SPI interface, while the later has a
> parallel bus interface. Thus far, Linux has two separate drivers for each
> and they are diverging considerably.
> 
> This series unifies them into a single driver with small SPI and parallel
> bus specific parts. The approach here is to first separate out the SPI
> specific parts into a separate file, then add parallel bus accessors in
> another separate file and then finally remove the old parallel bus driver.
> The reason for replacing the old parallel bus driver is because the SPI
> bus driver is much higher quality.
> 
> NOTE: This series depends on "net: ks8851-ml: Fix IO operations, again"
> 
> NOTE: The performance regression on KS8851-16MLL is now fixes, the TX
>       throughput is back to ~75 Mbit/s , RX is still 50 Mbit/s .

I also managed to implement RX NAPI (see attached demo patch if you want
to measure something on the SPI variant), but the RX performance didn't
improve dramatically. It's been 52 Mbit/s, with the RX NAPI it is 56
Mbit/s, on the parallel variant.

[...]
Lukas Wunner April 6, 2020, 3:16 a.m. UTC | #2
On Sat, Mar 28, 2020 at 01:31:30AM +0100, Marek Vasut wrote:
> The KS8851SNL/SNLI and KS8851-16MLL/MLLI/MLLU are very much the same pieces
> of silicon, except the former has an SPI interface, while the later has a
> parallel bus interface. Thus far, Linux has two separate drivers for each
> and they are diverging considerably.
> 
> This series unifies them into a single driver with small SPI and parallel
> bus specific parts. The approach here is to first separate out the SPI
> specific parts into a separate file, then add parallel bus accessors in
> another separate file and then finally remove the old parallel bus driver.
> The reason for replacing the old parallel bus driver is because the SPI
> bus driver is much higher quality.

Sorry for the delay Marek.

ks8851.ko (SPI variant) no longer compiles with this series.
The attached 0001 patch fixes it.

Both drivers can only be compiled as modules with this series:
If only one of them is built-in, there's a linker error because of
the module_param_named() for msg_enable.
If both are built-in, the symbol collisions you've mentioned occur.

It seems Kbuild can't support building a .o file with a different name
than the corresponding .c file because of the implicit rules used
everywhere.  However, ks8851_common.c can be renamed to be a header
file (a library of sorts) which is included by the two .c files.
I've renamed ks8851_spi.c back to the original ks8851.c and
ks8851_par.c back to ks8851_mll.c. The result is the attached 0002 patch.
Compiles without any errors regardless if one or both drivers are
built-in or modules.

I'll be back at the office this week and will conduct performance
measurements with this version.

Thanks,

Lukas
Marek Vasut April 6, 2020, 11:20 a.m. UTC | #3
On 4/6/20 5:16 AM, Lukas Wunner wrote:
> On Sat, Mar 28, 2020 at 01:31:30AM +0100, Marek Vasut wrote:
>> The KS8851SNL/SNLI and KS8851-16MLL/MLLI/MLLU are very much the same pieces
>> of silicon, except the former has an SPI interface, while the later has a
>> parallel bus interface. Thus far, Linux has two separate drivers for each
>> and they are diverging considerably.
>>
>> This series unifies them into a single driver with small SPI and parallel
>> bus specific parts. The approach here is to first separate out the SPI
>> specific parts into a separate file, then add parallel bus accessors in
>> another separate file and then finally remove the old parallel bus driver.
>> The reason for replacing the old parallel bus driver is because the SPI
>> bus driver is much higher quality.
> 
> Sorry for the delay Marek.
> 
> ks8851.ko (SPI variant) no longer compiles with this series.
> The attached 0001 patch fixes it.
> 
> Both drivers can only be compiled as modules with this series:
> If only one of them is built-in, there's a linker error because of
> the module_param_named() for msg_enable.
> If both are built-in, the symbol collisions you've mentioned occur.
> 
> It seems Kbuild can't support building a .o file with a different name
> than the corresponding .c file because of the implicit rules used
> everywhere.  However, ks8851_common.c can be renamed to be a header
> file (a library of sorts) which is included by the two .c files.
> I've renamed ks8851_spi.c back to the original ks8851.c and
> ks8851_par.c back to ks8851_mll.c. The result is the attached 0002 patch.
> Compiles without any errors regardless if one or both drivers are
> built-in or modules.
> 
> I'll be back at the office this week and will conduct performance
> measurements with this version.

This looks like a hack, I'm more inclined to go back to using callbacks
for the various functions, since I don't see any performance problems
there. We're still talking about 25 MHz SPI bus here and the SPI
subsystem is full of such indirection anyway, so I'm not even sure what
you're hoping to gain here ; it seems to me like a premature
optimization which only causes trouble.
Marek Vasut April 8, 2020, 7:49 p.m. UTC | #4
On 4/6/20 1:20 PM, Marek Vasut wrote:
> On 4/6/20 5:16 AM, Lukas Wunner wrote:
>> On Sat, Mar 28, 2020 at 01:31:30AM +0100, Marek Vasut wrote:
>>> The KS8851SNL/SNLI and KS8851-16MLL/MLLI/MLLU are very much the same pieces
>>> of silicon, except the former has an SPI interface, while the later has a
>>> parallel bus interface. Thus far, Linux has two separate drivers for each
>>> and they are diverging considerably.
>>>
>>> This series unifies them into a single driver with small SPI and parallel
>>> bus specific parts. The approach here is to first separate out the SPI
>>> specific parts into a separate file, then add parallel bus accessors in
>>> another separate file and then finally remove the old parallel bus driver.
>>> The reason for replacing the old parallel bus driver is because the SPI
>>> bus driver is much higher quality.
>>
>> Sorry for the delay Marek.
>>
>> ks8851.ko (SPI variant) no longer compiles with this series.
>> The attached 0001 patch fixes it.
>>
>> Both drivers can only be compiled as modules with this series:
>> If only one of them is built-in, there's a linker error because of
>> the module_param_named() for msg_enable.
>> If both are built-in, the symbol collisions you've mentioned occur.
>>
>> It seems Kbuild can't support building a .o file with a different name
>> than the corresponding .c file because of the implicit rules used
>> everywhere.  However, ks8851_common.c can be renamed to be a header
>> file (a library of sorts) which is included by the two .c files.
>> I've renamed ks8851_spi.c back to the original ks8851.c and
>> ks8851_par.c back to ks8851_mll.c. The result is the attached 0002 patch.
>> Compiles without any errors regardless if one or both drivers are
>> built-in or modules.
>>
>> I'll be back at the office this week and will conduct performance
>> measurements with this version.
> 
> This looks like a hack, I'm more inclined to go back to using callbacks
> for the various functions, since I don't see any performance problems
> there. We're still talking about 25 MHz SPI bus here and the SPI
> subsystem is full of such indirection anyway, so I'm not even sure what
> you're hoping to gain here ; it seems to me like a premature
> optimization which only causes trouble.

So I got the KS8851SNL development kit to test the SPI option. The
current driver in linux-next gives me SPI transfer timeouts at 1 MHz (I
also tried 6 MHz, 10 MHz, same problem, I also checked the signal
quality which is OK) both on iMX6Q and on STM32MP1 with ~2 cm long
wiring between the SoM and the KS8851SNL devkit, so that's where my
testing ends, sadly. Unless you have some idea what the problem might be ?

That said, I at least did test of the impact of the pointer indirection
on the KS8851-16MLL with [1] (direct access test is with the top 7
patches removed, indirect access test is with the branch as-is). The
KS8851-16MLL also has higher bandwidth, so it calls the IO accessors
more often than the SPI variant, and so it's likely more relevant when
testing whether their overhead matters.

- SoC is stm32mp157c at 650 MHz
- link partner
  Intel Corporation 82579LM Gigabit Network Connection (Lewisville) (rev 06)
- Test commands (repeated thrice, averaged,
                 after three discarded runs to prime the system):
  $ iperf -sei 1               # RX
  $ iperf -c 192.168.1.1 -ei 1 # TX

             | accessor |     RX       |     TX
KS8851-16MLL | indirect | 52.86 Mbit/s | 74.03 Mbit/s
KS8851-16MLL |  direct  | 53.80 Mbit/s | 73.90 Mbit/s

There is ~1 Mbit/s throughput increase on the RX average, however the
throughput fluctuates by a few Mbit/s during the iperf RX tests between
50.7 Mbit/s and 54.1 Mbit/s. There is also some decrease on TX, but it
is marginal. Interestingly enough, the TX test throughput fluctuates
less than RX, only by small hundreds of kbit/s.

I'd say that the pointer indirection doesn't play any role here and it
is better to keep the driver simple and go with it, unless the SPI
option is affected, which I doubt it is.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/marex/linux-2.6.git/log/?h=ks8851-v4
Lukas Wunner April 10, 2020, 11:01 a.m. UTC | #5
On Wed, Apr 08, 2020 at 09:49:14PM +0200, Marek Vasut wrote:
> On 4/6/20 1:20 PM, Marek Vasut wrote:
> > On 4/6/20 5:16 AM, Lukas Wunner wrote:
> > > I'll be back at the office this week and will conduct performance
> > > measurements with this version.

Latency without this series (ping -A -c 100000):

100000 packets transmitted, 100000 received, 0% packet loss, time 205ms
rtt min/avg/max/mdev = 0.790/1.695/2.624/0.046 ms, ipg/ewma 2.000/1.693 ms
                             ^^^^^
Latency with this series:

100000 packets transmitted, 100000 received, 0% packet loss, time 220ms
rtt min/avg/max/mdev = 0.880/1.717/2.428/0.059 ms, ipg/ewma 2.000/1.710 ms

Latency with this series + my patch to inline accessors:

100000 packets transmitted, 100000 received, 0% packet loss, time 219ms
rtt min/avg/max/mdev = 0.874/1.716/3.296/0.058 ms, ipg/ewma 2.000/1.719 ms


RX throughput without this series (iperf3 -f k -i 0 -c):

[  5]   0.00-10.00  sec  19.0 MBytes  15958 Kbits/sec                  receiver

RX throughput with this series:

[  5]   0.00-10.00  sec  18.4 MBytes  15452 Kbits/sec                  receiver

RX throughput with this series + my patch to inline accessors:

[  5]   0.00-10.00  sec  18.5 MBytes  15506 Kbits/sec                  receiver


TX throughput without this series (iperf3 -R -f k -i 0 -c):

[  5]   0.00-10.00  sec  18.6 MBytes  15604 Kbits/sec                  receiver

TX throughput with this series:

[  5]   0.00-10.00  sec  18.3 MBytes  15314 Kbits/sec                  receiver

TX throughput with this series + my patch to inline accessors:

[  5]   0.00-10.00  sec  18.3 MBytes  15349 Kbits/sec                  receiver


The commands were invoked from a machine with a Broadcom tg3
Gigabit Ethernet controller.

Conclusion:  The series does incur a measurable performance penalty
which should be fixed before it gets merged.  Inlining the accessors
only yields a very small improvement.  I'm wondering where the
performance penalty is originating from:  Perhaps because of the
16-bit read of RXFC in ks8851_rx_pkts()?


> So I got the KS8851SNL development kit to test the SPI option. The
> current driver in linux-next gives me SPI transfer timeouts at 1 MHz (I
> also tried 6 MHz, 10 MHz, same problem, I also checked the signal
> quality which is OK) both on iMX6Q and on STM32MP1 with ~2 cm long
> wiring between the SoM and the KS8851SNL devkit, so that's where my
> testing ends, sadly. Unless you have some idea what the problem might be ?

Check the signal quality with an oscilloscope.
Crank up drive strength of the SPI pins if supported by the pin controller.
Check if the driver for the SPI controller is buggy or somehow limits the
speed.

We use the KSZ8851SNL both with STM32-based products (at 20 MHz SPI clock,
but without an operating system) and with Raspberry Pi based products
(at up to 25 MHz SPI clock, with Linux).  It is only the latter that
I'm really familiar with.  We've invested considerable resources to
fix bugs and speed up the Raspberry Pi's SPI driver as much as possible.
By now it works pretty well with the ks8851.

A Raspberry Pi 3 is readily available for just a few Euros, so you may
want to pick up one of those.  My most recent speed improvements for the
Raspberry Pi's SPI and DMA drivers went into v5.4, so be sure to use at
least that.  Should signal quality be a problem, it's possible to
increase GPIO drive strength by putting a dt-blob.bin in the /boot
partition:

https://www.raspberrypi.org/documentation/configuration/pin-configuration.md

I forgot to test whether reading the MAC address from the external
EEPROM works with v3 of your series, but I'll be back in the office
on Tuesday to take another look.

Thanks,

Lukas
Marek Vasut April 10, 2020, 6:10 p.m. UTC | #6
On 4/10/20 1:01 PM, Lukas Wunner wrote:
> On Wed, Apr 08, 2020 at 09:49:14PM +0200, Marek Vasut wrote:
>> On 4/6/20 1:20 PM, Marek Vasut wrote:
>>> On 4/6/20 5:16 AM, Lukas Wunner wrote:
>>>> I'll be back at the office this week and will conduct performance
>>>> measurements with this version.
> 
> Latency without this series (ping -A -c 100000):
> 
> 100000 packets transmitted, 100000 received, 0% packet loss, time 205ms
> rtt min/avg/max/mdev = 0.790/1.695/2.624/0.046 ms, ipg/ewma 2.000/1.693 ms
>                              ^^^^^
> Latency with this series:
> 
> 100000 packets transmitted, 100000 received, 0% packet loss, time 220ms
> rtt min/avg/max/mdev = 0.880/1.717/2.428/0.059 ms, ipg/ewma 2.000/1.710 ms
> 
> Latency with this series + my patch to inline accessors:
> 
> 100000 packets transmitted, 100000 received, 0% packet loss, time 219ms
> rtt min/avg/max/mdev = 0.874/1.716/3.296/0.058 ms, ipg/ewma 2.000/1.719 ms
> 
> 
> RX throughput without this series (iperf3 -f k -i 0 -c):
> 
> [  5]   0.00-10.00  sec  19.0 MBytes  15958 Kbits/sec                  receiver
> 
> RX throughput with this series:
> 
> [  5]   0.00-10.00  sec  18.4 MBytes  15452 Kbits/sec                  receiver
> 
> RX throughput with this series + my patch to inline accessors:
> 
> [  5]   0.00-10.00  sec  18.5 MBytes  15506 Kbits/sec                  receiver
> 
> 
> TX throughput without this series (iperf3 -R -f k -i 0 -c):
> 
> [  5]   0.00-10.00  sec  18.6 MBytes  15604 Kbits/sec                  receiver
> 
> TX throughput with this series:
> 
> [  5]   0.00-10.00  sec  18.3 MBytes  15314 Kbits/sec                  receiver
> 
> TX throughput with this series + my patch to inline accessors:
> 
> [  5]   0.00-10.00  sec  18.3 MBytes  15349 Kbits/sec                  receiver
> 
> 
> The commands were invoked from a machine with a Broadcom tg3
> Gigabit Ethernet controller.

It would be helpful if we could at least run the same test, so the
results are comparable.

> Conclusion:  The series does incur a measurable performance penalty
> which should be fixed before it gets merged.  Inlining the accessors
> only yields a very small improvement.

OK, so we finally agree on this, thanks.

> I'm wondering where the
> performance penalty is originating from:  Perhaps because of the
> 16-bit read of RXFC in ks8851_rx_pkts()?

Can you patch that part away to see whether that's the case ?

>> So I got the KS8851SNL development kit to test the SPI option. The
>> current driver in linux-next gives me SPI transfer timeouts at 1 MHz (I
>> also tried 6 MHz, 10 MHz, same problem, I also checked the signal
>> quality which is OK) both on iMX6Q and on STM32MP1 with ~2 cm long
>> wiring between the SoM and the KS8851SNL devkit, so that's where my
>> testing ends, sadly. Unless you have some idea what the problem might be ?
> 
> Check the signal quality with an oscilloscope.
> Crank up drive strength of the SPI pins if supported by the pin controller.

I already tried both, got nothing useful.

> Check if the driver for the SPI controller is buggy or somehow limits the
> speed.

I used two different drivers -- the iMX SPI and the STM32 SPI -- I would
say that if both show the same behavior, it's unlikely to be the driver.

> We use the KSZ8851SNL both with STM32-based products (at 20 MHz SPI clock,
> but without an operating system) and with Raspberry Pi based products
> (at up to 25 MHz SPI clock, with Linux).  It is only the latter that
> I'm really familiar with.  We've invested considerable resources to
> fix bugs and speed up the Raspberry Pi's SPI driver as much as possible.
> By now it works pretty well with the ks8851.
> 
> A Raspberry Pi 3 is readily available for just a few Euros, so you may
> want to pick up one of those.  My most recent speed improvements for the
> Raspberry Pi's SPI and DMA drivers went into v5.4, so be sure to use at
> least that.

This is based off linux-next, so that shouldn't be a problem.

I'll try to set up the RPi3 and see if that's any better.

> Should signal quality be a problem, it's possible to
> increase GPIO drive strength by putting a dt-blob.bin in the /boot
> partition:
> 
> https://www.raspberrypi.org/documentation/configuration/pin-configuration.md
> 
> I forgot to test whether reading the MAC address from the external
> EEPROM works with v3 of your series, but I'll be back in the office
> on Tuesday to take another look.

While you're at it, can you take a look at the latency ?
I think you should be able to git-bisect it rather easily.
Thanks