diff mbox series

[RFC,5/6] net: dsa: microchip: Update tag_ksz.c to access switch driver

Message ID 1543880097-7106-6-git-send-email-Tristram.Ha@microchip.com
State RFC, archived
Delegated to: David Miller
Headers show
Series net: dsa: microchip: Modify KSZ9477 DSA driver to support different tail tag formats | expand

Commit Message

Tristram.Ha@microchip.com Dec. 3, 2018, 11:34 p.m. UTC
From: Tristram Ha <Tristram.Ha@microchip.com>

Update tag_ksz.c to access switch driver's tail tagging operations.

Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
---
 net/dsa/tag_ksz.c | 44 ++++++++++++++++++++------------------------
 1 file changed, 20 insertions(+), 24 deletions(-)

Comments

Andrew Lunn Dec. 5, 2018, 6 p.m. UTC | #1
On Mon, Dec 03, 2018 at 03:34:56PM -0800, Tristram.Ha@microchip.com wrote:
> From: Tristram Ha <Tristram.Ha@microchip.com>
> 
> Update tag_ksz.c to access switch driver's tail tagging operations.

Hi Tristram

Humm, i'm not sure we want this, the tagging spit into two places.  I
need to take a closer look at the previous patch, to see why it cannot
be done here.

   Andrew
Andrew Lunn Dec. 5, 2018, 6:18 p.m. UTC | #2
On Wed, Dec 05, 2018 at 07:00:38PM +0100, Andrew Lunn wrote:
> On Mon, Dec 03, 2018 at 03:34:56PM -0800, Tristram.Ha@microchip.com wrote:
> > From: Tristram Ha <Tristram.Ha@microchip.com>
> > 
> > Update tag_ksz.c to access switch driver's tail tagging operations.
> 
> Hi Tristram
> 
> Humm, i'm not sure we want this, the tagging spit into two places.  I
> need to take a closer look at the previous patch, to see why it cannot
> be done here.

O.K, i think i get what is going on.

I would however implement it differently.

One net/dsa/tag_X.c file can export two dsa_device_ops structures,
allowing you to share common code for the two taggers. You could call
these DSA_TAG_PROTO_KSZ_1_BYTE, and DSA_TAG_PROTO_KSZ_2_BYTE, and the
.get_tag_protocol call would then return the correct one for the
switch.

It might also be possible to merge in tag_trailer, or at least share
some code.

What i don't yet understand is how you are passing PTP information
around. The commit messages need to explain that, since it is not
obvious, and it is the first time we have needed PTP info in a tag
driver.

	Andrew
Florian Fainelli Dec. 5, 2018, 6:52 p.m. UTC | #3
On 12/5/18 10:18 AM, Andrew Lunn wrote:
> On Wed, Dec 05, 2018 at 07:00:38PM +0100, Andrew Lunn wrote:
>> On Mon, Dec 03, 2018 at 03:34:56PM -0800, Tristram.Ha@microchip.com wrote:
>>> From: Tristram Ha <Tristram.Ha@microchip.com>
>>>
>>> Update tag_ksz.c to access switch driver's tail tagging operations.
>>
>> Hi Tristram
>>
>> Humm, i'm not sure we want this, the tagging spit into two places.  I
>> need to take a closer look at the previous patch, to see why it cannot
>> be done here.
> 
> O.K, i think i get what is going on.
> 
> I would however implement it differently.
> 
> One net/dsa/tag_X.c file can export two dsa_device_ops structures,
> allowing you to share common code for the two taggers. You could call
> these DSA_TAG_PROTO_KSZ_1_BYTE, and DSA_TAG_PROTO_KSZ_2_BYTE, and the
> .get_tag_protocol call would then return the correct one for the
> switch.

Agreed, that is what is done by net/dsa/tag_brcm.c because there are two
formats for the Broadcom tag:

- TAG_BRCM: the 4-bytes Broadcom tag is between MAC SA and Ethertype
- TAG_BRCM_PREPEND: the 4-bytes Broadcom tag is before the MAC DA

And the code to process them is basically using relative offsets from
the start of the frame to access correct data.

This is done largely for performance reasons because we have 1/2
Gigabit/secs capable CPU ports and so we want to avoid as little cache
trashing as possible and immediately get the right rcv() function to
process the packets.

> 
> It might also be possible to merge in tag_trailer, or at least share
> some code.
> 
> What i don't yet understand is how you are passing PTP information
> around. The commit messages need to explain that, since it is not
> obvious, and it is the first time we have needed PTP info in a tag
> driver.
> 
> 	Andrew
>
Tristram.Ha@microchip.com Dec. 6, 2018, 8 p.m. UTC | #4
> >>> Update tag_ksz.c to access switch driver's tail tagging operations.
> >>
> >> Hi Tristram
> >>
> >> Humm, i'm not sure we want this, the tagging spit into two places.  I
> >> need to take a closer look at the previous patch, to see why it cannot
> >> be done here.
> >
> > O.K, i think i get what is going on.
> >
> > I would however implement it differently.
> >
> > One net/dsa/tag_X.c file can export two dsa_device_ops structures,
> > allowing you to share common code for the two taggers. You could call
> > these DSA_TAG_PROTO_KSZ_1_BYTE, and DSA_TAG_PROTO_KSZ_2_BYTE,
> and the
> > .get_tag_protocol call would then return the correct one for the
> > switch.
> 
> Agreed, that is what is done by net/dsa/tag_brcm.c because there are two
> formats for the Broadcom tag:
> 
> - TAG_BRCM: the 4-bytes Broadcom tag is between MAC SA and Ethertype
> - TAG_BRCM_PREPEND: the 4-bytes Broadcom tag is before the MAC DA
>

I did try to implement this way.  But the other switches do not have the same
format even though the length is the same.  Then I need to change the following
files for any new KSZ switch: include/linux/dsa.h, net/dsa/dsa.c, net/dsa/dsa_priv.h,
and finally net/dsa/tag_ksz.c.

Even then it will not work if Microchip wants to add 1588 PTP capability to the switches.

For KSZ9477 the length of the tail tag changes when the PTP function is enabled.
Typically this function is either enabled or disabled all the time, but if users want to
change that during normal operation to see how the switch behaves, the transmit
function completely stops working correctly.

Older driver implementation is to monitor that register change and adjust the length
dynamically.

Another problem is the tail tag needs to include the timestamp for the 1-step
Pdelay_Resp to have accurate turnaround time when that message is sent out by the
switch.  This will require access to the main switch driver which will keep track of those
PTP messages.

PTP handles transmit timestamp in skb_tx_timestamp, which is typically called after the
frame is sent, so it is too late.  DSA calls dsa_skb_tx_timestamp before sending, but it
only provides a clone to the driver that supports port_txstamp and so the switch driver
may not be able to do anything.
 
> And the code to process them is basically using relative offsets from
> the start of the frame to access correct data.
> 
> This is done largely for performance reasons because we have 1/2
> Gigabit/secs capable CPU ports and so we want to avoid as little cache
> trashing as possible and immediately get the right rcv() function to
> process the packets.
> 

The SoC I used for this driver development actually has problem sending
Gigabit traffic so I do not see the effect of any slowdown, and the updated
MAC driver change for a hardware problem does not help and greatly
degrades the transmit performance.

> >
> > It might also be possible to merge in tag_trailer, or at least share
> > some code.
> >

Actually in previous old DSA implementation I just hijacked this file to
add the tail tag operations without creating a new file like tag_ksz.c.

> > What i don't yet understand is how you are passing PTP information
> > around. The commit messages need to explain that, since it is not
> > obvious, and it is the first time we have needed PTP info in a tag
> > driver.

It seems the official 1588 PTP timestamp API for a PHY driver is only implemented
in only PHY driver, net/phy/dp83640.c, in the whole kernel.  DSA uses similar
mechanism to support 1588 PTP.  In dsa_switch_rcv() the CPU receive function is called
first before dsa_skb_defer_rx_timestamp().  That means the receive tail tag operation
has to be done first to retrieve the receive timestamp so that it can be passed later.

It is probably not good to change the socket buffer length inside the port_rxtstamp
function, and I do not see any other way to insert that transmit timestamp.

A customer has already inquired about implementing 1588 PTP in the DSA driver.  I hope
this mechanism is approved so that I can start doing that.
Andrew Lunn Dec. 6, 2018, 8:41 p.m. UTC | #5
> I did try to implement this way.  But the other switches do not have the same
> format even though the length is the same.  Then I need to change the following
> files for any new KSZ switch: include/linux/dsa.h, net/dsa/dsa.c, net/dsa/dsa_priv.h,
> and finally net/dsa/tag_ksz.c.

You can always add two different tag drivers. They don't have to share
code if it does not make sense.

> Even then it will not work if Microchip wants to add 1588 PTP
> capability to the switches.
> 
> For KSZ9477 the length of the tail tag changes when the PTP function
> is enabled.  Typically this function is either enabled or disabled
> all the time, but if users want to change that during normal
> operation to see how the switch behaves, the transmit function
> completely stops working correctly.

We should figure out how to support PTP. I think that is the main
issue here.

> Older driver implementation is to monitor that register change and adjust the length
> dynamically.
> 
> Another problem is the tail tag needs to include the timestamp for the 1-step
> Pdelay_Resp to have accurate turnaround time when that message is sent out by the
> switch.  This will require access to the main switch driver which will keep track of those
> PTP messages.
> 
> PTP handles transmit timestamp in skb_tx_timestamp, which is typically called after the
> frame is sent, so it is too late.  DSA calls dsa_skb_tx_timestamp before sending, but it
> only provides a clone to the driver that supports port_txstamp and so the switch driver
> may not be able to do anything.

The current design assumes the hardware will insert the PTP timestamp
into the frame using the clock inside the hardware. You then ask it
what timestamp it actually used. 

If i understand you correctly, in your case, software was to provide
the timestamp which then gets inserted into the frame. So you want to
provide this timestamp as late as possible, when the frame reaches the
head of the queue and is about to be sent out the master interface?

> In dsa_switch_rcv() the CPU receive function is called first before
> dsa_skb_defer_rx_timestamp().  That means the receive tail tag
> operation has to be done first to retrieve the receive timestamp so
> that it can be passed later.

What i think you can do is in your tag rx function you can directly
add the timestamp info to the skbuf. The dsa driver function
.port_txtstamp can then always return false.

Your tag function is going to need access to some driver state, but
you should be able to get at that, following pointers, and placing
some of the structures in global headers.

    Andrew
Richard Cochran Dec. 7, 2018, 3:16 a.m. UTC | #6
On Thu, Dec 06, 2018 at 08:00:26PM +0000, Tristram.Ha@microchip.com wrote:
> A customer has already inquired about implementing 1588 PTP in the DSA driver.  I hope
> this mechanism is approved so that I can start doing that.

If you need changes to the PTP core, you had better discuss this with
the PTP maintainer.

Thanks,
Richard
Pavel Machek Dec. 9, 2018, 9:17 a.m. UTC | #7
On Thu 2018-12-06 20:00:26, Tristram.Ha@microchip.com wrote:
> > >>> Update tag_ksz.c to access switch driver's tail tagging operations.
> > >>
> > >> Hi Tristram
> > >>
> > >> Humm, i'm not sure we want this, the tagging spit into two places.  I
> > >> need to take a closer look at the previous patch, to see why it cannot
> > >> be done here.
> > >
> > > O.K, i think i get what is going on.
> > >
> > > I would however implement it differently.
> > >
> > > One net/dsa/tag_X.c file can export two dsa_device_ops structures,
> > > allowing you to share common code for the two taggers. You could call
> > > these DSA_TAG_PROTO_KSZ_1_BYTE, and DSA_TAG_PROTO_KSZ_2_BYTE,
> > and the
> > > .get_tag_protocol call would then return the correct one for the
> > > switch.
> > 
> > Agreed, that is what is done by net/dsa/tag_brcm.c because there are two
> > formats for the Broadcom tag:
> > 
> > - TAG_BRCM: the 4-bytes Broadcom tag is between MAC SA and Ethertype
> > - TAG_BRCM_PREPEND: the 4-bytes Broadcom tag is before the MAC DA
> >
> 
> I did try to implement this way.  But the other switches do not have the same
> format even though the length is the same.  Then I need to change the following
> files for any new KSZ switch: include/linux/dsa.h, net/dsa/dsa.c, net/dsa/dsa_priv.h,
> and finally net/dsa/tag_ksz.c.
> 
> Even then it will not work if Microchip wants to add 1588 PTP capability to the switches.
> 
> For KSZ9477 the length of the tail tag changes when the PTP function is enabled.
> Typically this function is either enabled or disabled all the time, but if users want to
> change that during normal operation to see how the switch behaves, the transmit
> function completely stops working correctly.

I'd be careful about locking. Seems like dsa was designed with "tag
format is static", and you want to change it dynamically...
									Pavel
Tristram.Ha@microchip.com Dec. 11, 2018, 11:59 p.m. UTC | #8
> I'd be careful about locking. Seems like dsa was designed with "tag
> format is static", and you want to change it dynamically...

I see there is now a new overhead parameter in the dsa_device_ops structure
and dev_set_mtu is called in master.c.  It does not prevent the tag size to
change dynamically though.  A bigger size can be used instead to make sure the
MAC controller can support it.

In practice I do not think it does anything meaningful.  Most MAC controllers
can transmit and receive more than 1518 bytes but still only advertise 1500
MTU.  It is only when they support jumbo frame the drivers allow increasing
the MTU.  In the case of Atmel MAC controller I only see the mtu size is 1502,
but there is nothing changed inside the driver.

I did find another bug in the Atmel MAC driver concerning this max_mtu
implementation.  It does not affect the DSA operation as the child devices
still have the cap of 1500 MTU, but the main device will have problem running
by itself when MTU is increased.
Andrew Lunn Dec. 12, 2018, 8:18 a.m. UTC | #9
On Tue, Dec 11, 2018 at 11:59:34PM +0000, Tristram.Ha@microchip.com wrote:
> > I'd be careful about locking. Seems like dsa was designed with "tag
> > format is static", and you want to change it dynamically...
> 
> I see there is now a new overhead parameter in the dsa_device_ops structure
> and dev_set_mtu is called in master.c.  It does not prevent the tag size to
> change dynamically though.  A bigger size can be used instead to make sure the
> MAC controller can support it.
> 
> In practice I do not think it does anything meaningful.  Most MAC controllers
> can transmit and receive more than 1518 bytes but still only advertise 1500
> MTU. 

Hi Tristram

There are a few MAC devices to do enforce 1518. e1000e is one
example. You have to increase the MTU before it will receive DSA
tagged frames. I initially had similar problems with the FEC driver
when i started using that a few years ago. At that time i did not
rallies it was a wide scale problem and just changed the FEC. This
should be a more generic solution.

       Andrew
diff mbox series

Patch

diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
index 0f62eff..307e58b 100644
--- a/net/dsa/tag_ksz.c
+++ b/net/dsa/tag_ksz.c
@@ -11,37 +11,25 @@ 
 #include <linux/etherdevice.h>
 #include <linux/list.h>
 #include <linux/slab.h>
+#include <linux/dsa/ksz_dsa.h>
 #include <net/dsa.h>
 #include "dsa_priv.h"
 
-/* For Ingress (Host -> KSZ), 2 bytes are added before FCS.
- * ---------------------------------------------------------------------------
- * DA(6bytes)|SA(6bytes)|....|Data(nbytes)|tag0(1byte)|tag1(1byte)|FCS(4bytes)
- * ---------------------------------------------------------------------------
- * tag0 : Prioritization (not used now)
- * tag1 : each bit represents port (eg, 0x01=port1, 0x02=port2, 0x10=port5)
- *
- * For Egress (KSZ -> Host), 1 byte is added before FCS.
- * ---------------------------------------------------------------------------
- * DA(6bytes)|SA(6bytes)|....|Data(nbytes)|tag0(1byte)|FCS(4bytes)
- * ---------------------------------------------------------------------------
- * tag0 : zero-based value represents port
- *	  (eg, 0x00=port1, 0x02=port3, 0x06=port7)
- */
-
-#define	KSZ_INGRESS_TAG_LEN	2
 #define	KSZ_EGRESS_TAG_LEN	1
 
 static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct dsa_port *dp = dsa_slave_to_port(dev);
+	struct ksz_device *sw = dp->ds->priv;
 	struct sk_buff *nskb;
+	int len;
 	int padlen;
-	u8 *tag;
+
+	len = sw->tag_ops->get_len(sw);
 
 	padlen = (skb->len >= ETH_ZLEN) ? 0 : ETH_ZLEN - skb->len;
 
-	if (skb_tailroom(skb) >= padlen + KSZ_INGRESS_TAG_LEN) {
+	if (skb_tailroom(skb) >= padlen + len) {
 		/* Let dsa_slave_xmit() free skb */
 		if (__skb_put_padto(skb, skb->len + padlen, false))
 			return NULL;
@@ -49,7 +37,7 @@  static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct net_device *dev)
 		nskb = skb;
 	} else {
 		nskb = alloc_skb(NET_IP_ALIGN + skb->len +
-				 padlen + KSZ_INGRESS_TAG_LEN, GFP_ATOMIC);
+				 padlen + len, GFP_ATOMIC);
 		if (!nskb)
 			return NULL;
 		skb_reserve(nskb, NET_IP_ALIGN);
@@ -70,9 +58,8 @@  static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct net_device *dev)
 		consume_skb(skb);
 	}
 
-	tag = skb_put(nskb, KSZ_INGRESS_TAG_LEN);
-	tag[0] = 0;
-	tag[1] = 1 << dp->index; /* destination port */
+	sw->tag_ops->set_tag(sw, skb_put(nskb, len), skb_mac_header(nskb),
+			     dp->index);
 
 	return nskb;
 }
@@ -80,18 +67,27 @@  static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct net_device *dev)
 static struct sk_buff *ksz_rcv(struct sk_buff *skb, struct net_device *dev,
 			       struct packet_type *pt)
 {
+	struct dsa_port *cpu_dp = dev->dsa_ptr;
+	struct dsa_switch_tree *dst = cpu_dp->dst;
+	struct dsa_switch *ds = dst->ds[0];
+	struct ksz_device *sw;
 	u8 *tag;
+	int len;
 	int source_port;
 
+	if (!ds)
+		return NULL;
+	sw = ds->priv;
+
 	tag = skb_tail_pointer(skb) - KSZ_EGRESS_TAG_LEN;
 
-	source_port = tag[0] & 7;
+	len = sw->tag_ops->get_tag(sw, tag, &source_port);
 
 	skb->dev = dsa_master_find_slave(dev, 0, source_port);
 	if (!skb->dev)
 		return NULL;
 
-	pskb_trim_rcsum(skb, skb->len - KSZ_EGRESS_TAG_LEN);
+	pskb_trim_rcsum(skb, skb->len - len);
 
 	return skb;
 }