Message ID | 1353950255.7553.6.camel@edumazet-glaptop |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, 2012-11-27 at 10:36 +0100, Keitel, Tino (ALC NetworX GmbH) wrote: > On Mo, 2012-11-26 at 09:17 -0800, Eric Dumazet wrote: > > On Mon, 2012-11-26 at 18:08 +0100, Keitel, Tino (ALC NetworX GmbH) > > wrote: > > > On Mo, 2012-11-26 at 08:34 -0800, Eric Dumazet wrote: > > > > On Mon, 2012-11-26 at 11:01 +0100, Tino Keitel wrote: > > > > > On Sat, Nov 24, 2012 at 15:43:36 -0800, Eric Dumazet wrote: > > > > > > > > > > [...] > > > > > > > > > > > Hmm, I wonder if BQL makes a particular bug showing more often. > > > > > > > > > > > > I see gianfar uses a very small watchdog_timeo of 1 second, while many > > > > > > drivers use 5 seconds. > > > > > > > > > > > > What happens if you change this to 5 seconds ? > > > > > > > > > > I still got the trace and a failing ptp client. > > > > > > > > > > > > > Thanks. Is this bug easy to trigger ? > > > > > > > > I suspect a core issue and a race, likely to happen on your (non x86) > > > > hardware > > > > > > > > Could you add the following debugging patch ? > > > > > > No visible difference: > > > > OK it seems you trigger the problem fast ! > > > > Please try the following as well : > > Hi, > > yes, it can be triggered within 2 minutes. > > The patch makes no difference: Can you reproduce the problem using a single cpu ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2012-11-27 at 13:42 +0100, Keitel, Tino (ALC NetworX GmbH) wrote: > On Di, 2012-11-27 at 04:36 -0800, Eric Dumazet wrote: > > > > Can you reproduce the problem using a single cpu ? > > Yes, it is a single-CPU system. Can you reproduce the problem without PTP running, or disabled in the driver ? (comment the "priv->hwts_tx_en = 1;" line) This looks like we miss an interrupt ( or TXBD_INTERRUPT not correctly set) And it could be a bug occurring if we try to send one skb with fragments and skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2012-11-27 at 05:32 -0800, Eric Dumazet wrote: > Can you reproduce the problem without PTP running, or disabled in the > driver ? > > (comment the "priv->hwts_tx_en = 1;" line) > > > This looks like we miss an interrupt ( or TXBD_INTERRUPT not correctly > set) > > And it could be a bug occurring if we try to send one skb with fragments > and skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP > > By the way are any errata flagged in gfar_detect_errata() ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Feb 5, 2013 at 8:00 AM, Keitel, Tino (ALC NetworX GmbH) <Tino.Keitel@alcnetworx.de> wrote: > On Di, 2012-11-27 at 05:32 -0800, Eric Dumazet wrote: >> On Tue, 2012-11-27 at 13:42 +0100, Keitel, Tino (ALC NetworX GmbH) >> wrote: >> > On Di, 2012-11-27 at 04:36 -0800, Eric Dumazet wrote: >> > > >> > > Can you reproduce the problem using a single cpu ? >> > >> > Yes, it is a single-CPU system. >> >> Can you reproduce the problem without PTP running, or disabled in the >> driver ? >> >> (comment the "priv->hwts_tx_en = 1;" line) > > I can't reproduce it with that line commented. However, so far I was > only able to reproduce it when starting the ptp2 client, so maybe this > is connected. How critical is ptp2 for this? And/or the platform details? I can try and reproduce it on an mpc8349 system and/or an mpc8548 system (and even an mpc8641D system) but I'd rather know that was a meaningful chase and not a snipe hunt before going there. So, in that respect, a "If you run this, you will get this" type of error message would be good. I understand that may be too idealistic, though. Thanks, Paul. -- > >> By the way are any errata flagged in gfar_detect_errata() ? > > This is from dmesg: > > fsl-gianfar e0024000.ethernet: enabled errata workarounds, flags: 0x7 > > 0x7 would be GFAR_ERRATA_74, GFAR_ERRATA_76 and GFAR_ERRATA_A002 > according to drivers/net/ethernet/freescale/gianfar.h. > > Regards, > Tino > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Di, 2013-02-05 at 20:55 -0500, Paul Gortmaker wrote: > How critical is ptp2 for this? And/or the platform details? I can > try and reproduce it on an mpc8349 system and/or an mpc8548 > system (and even an mpc8641D system) but I'd rather know that > was a meaningful chase and not a snipe hunt before going there. > > So, in that respect, a "If you run this, you will get this" type of > error message would be good. I understand that may be too > idealistic, though. Hi, I'm using an MPC8313ERDB board and was able to reproduce it with a vanilla 3.6.7 kernel. There are some modifications to the device tree, though. I'll try to reproduce this with as few changes to the device tree as possible. Ptp2 is critical for the desired use. So far I also only was able to reproduce it with this ptp2 client. If it is a possible option for you, I could provide an affected device to make the issue easy to reproduce. Regards, Tino
On 13-02-05 08:00 AM, Keitel, Tino (ALC NetworX GmbH) wrote: > On Di, 2012-11-27 at 05:32 -0800, Eric Dumazet wrote: >> On Tue, 2012-11-27 at 13:42 +0100, Keitel, Tino (ALC NetworX GmbH) >> wrote: >>> On Di, 2012-11-27 at 04:36 -0800, Eric Dumazet wrote: >>>> >>>> Can you reproduce the problem using a single cpu ? >>> >>> Yes, it is a single-CPU system. >> >> Can you reproduce the problem without PTP running, or disabled in the >> driver ? >> >> (comment the "priv->hwts_tx_en = 1;" line) > > I can't reproduce it with that line commented. However, so far I was > only able to reproduce it when starting the ptp2 client, so maybe this > is connected. I found an mpc8315erdb, and built the default yocto build (v3.4.20, which should have the issue based on your earlier reports.) > >> By the way are any errata flagged in gfar_detect_errata() ? > > This is from dmesg: > > fsl-gianfar e0024000.ethernet: enabled errata workarounds, flags: 0x7 > > 0x7 would be GFAR_ERRATA_74, GFAR_ERRATA_76 and GFAR_ERRATA_A002 > according to drivers/net/ethernet/freescale/gianfar.h. The MPC8315ECE.pdf lists the 8315 as having 76 and A002 (amongst a lot of others.) However the driver only does errata checks for the 8313 it seems. I also then manually forced the driver to enable the errata and confirmed I saw 0x7 flag in dmesg. Finally I added the meta-networking layer to the yocto build to get a ptpd2 (2.2.0-r1) On another board, I ran a server as: ptpd2 -G -C -c -b eth0 On the 8315, I ran the client as: ptpd2 -g -C -c -b eth0 In neither case (errata off, and errata manually enabled) I did not manage to get the tx timeout that you got. There must be something more specific to your environment, your ptpd client (and args) and perhaps the board itself. Paul. -- > > Regards, > Tino > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, I think I found a way to reproduce this easily, by forcing timestamping on Tx with the following code change: """ drivers/net/ethernet/freescale/gianfar.c: @@ -1210,6 +1210,8 @@ static int gfar_probe(struct platform_device *ofdev) /* Create all the sysfs files */ gfar_init_sysfs(dev); + priv->hwts_tx_en = 1; + /* Print out the device info */ netdev_info(dev, "mac: %pM\n", dev->dev_addr); @@ -2084,6 +2087,7 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev) txq = netdev_get_tx_queue(dev, rq); base = tx_queue->tx_bd_base; regs = tx_queue->grp->regs; + skb_shinfo(skb)->tx_flags |= SKBTX_HW_TSTAMP; /* check if time stamp should be generated */ if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP && """ With this change, tx timeout triggers with the first packet sent. I tested it on p1020/ p1010 boards, so it's not related to a specific board. The proposed fix (passes this test): http://patchwork.ozlabs.org/patch/240365/ Does it work for you? I'm a bit concerned about BQL support being added to Gianfar. Seems that it only adds an overhead to this driver and possible failure points. I'm not sure whether Gianfar has the issue of excessive H/W queuing. I'm also wondering why only 5 or 6 eth drivers integrated BQL to date. Regards, Claudiu On 2/6/2013 5:20 PM, Keitel, Tino (ALC NetworX GmbH) wrote: > On Di, 2013-02-05 at 20:55 -0500, Paul Gortmaker wrote: >> How critical is ptp2 for this? And/or the platform details? I can >> try and reproduce it on an mpc8349 system and/or an mpc8548 >> system (and even an mpc8641D system) but I'd rather know that >> was a meaningful chase and not a snipe hunt before going there. >> >> So, in that respect, a "If you run this, you will get this" type of >> error message would be good. I understand that may be too >> idealistic, though. > > Hi, > > I'm using an MPC8313ERDB board and was able to reproduce it with a > vanilla 3.6.7 kernel. There are some modifications to the device tree, > though. I'll try to reproduce this with as few changes to the device > tree as possible. > > Ptp2 is critical for the desired use. So far I also only was able to > reproduce it with this ptp2 client. > > If it is a possible option for you, I could provide an affected device > to make the issue easy to reproduce. > > Regards, > Tino -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mo, 2013-04-29 at 15:14 +0200, Claudiu Manoil wrote: > The proposed fix (passes this test): > http://patchwork.ozlabs.org/patch/240365/ > > Does it work for you? Hi, thanks a lot. I'll try the patch and report back. Regards, Tino -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mo, 2013-04-29 at 15:20 +0200, Tino Keitel wrote: > On Mo, 2013-04-29 at 15:14 +0200, Claudiu Manoil wrote: > > > The proposed fix (passes this test): > > http://patchwork.ozlabs.org/patch/240365/ > > > > Does it work for you? Hi, I tested the patch over the weekend and it looks fine. I then re-checked without the patch and got the bug immediately. So the patch seems to work fine. Thanks a lot and regards, Tino Keitel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c index 19ac096..77190bf 100644 --- a/drivers/net/ethernet/freescale/gianfar.c +++ b/drivers/net/ethernet/freescale/gianfar.c @@ -101,7 +101,7 @@ #include "gianfar.h" -#define TX_TIMEOUT (1*HZ) +#define TX_TIMEOUT (5*HZ) const char gfar_driver_version[] = "1.3"; @@ -2465,9 +2465,9 @@ static int gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue) struct sk_buff *skb; int skb_dirtytx; int tx_ring_size = tx_queue->tx_ring_size; - int frags = 0, nr_txbds = 0; + int frags, nr_txbds; int i; - int howmany = 0; + int howmany = 0, total_txbds = 0; int tqi = tx_queue->qindex; unsigned int bytes_sent = 0; u32 lstatus; @@ -2479,7 +2479,6 @@ static int gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue) skb_dirtytx = tx_queue->skb_dirtytx; while ((skb = tx_queue->tx_skbuff[skb_dirtytx])) { - unsigned long flags; frags = skb_shinfo(skb)->nr_frags; @@ -2541,20 +2540,24 @@ static int gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue) TX_RING_MOD_MASK(tx_ring_size); howmany++; - spin_lock_irqsave(&tx_queue->txlock, flags); - tx_queue->num_txbdfree += nr_txbds; - spin_unlock_irqrestore(&tx_queue->txlock, flags); + total_txbds += nr_txbds; } - /* If we freed a buffer, we can restart transmission, if necessary */ - if (netif_tx_queue_stopped(txq) && tx_queue->num_txbdfree) - netif_wake_subqueue(dev, tqi); + if (howmany) { + unsigned long flags; + spin_lock_irqsave(&tx_queue->txlock, flags); + tx_queue->num_txbdfree += total_txbds; + /* If we freed a buffer, we can restart transmission, if necessary */ + if (netif_tx_queue_stopped(txq)) + netif_tx_wake_queue(txq); + netdev_tx_completed_queue(txq, howmany, bytes_sent); + spin_unlock_irqrestore(&tx_queue->txlock, flags); + } /* Update dirty indicators */ tx_queue->skb_dirtytx = skb_dirtytx; tx_queue->dirty_tx = bdp; - netdev_tx_completed_queue(txq, howmany, bytes_sent); return howmany; }