Patchwork 8139cp: set ring address after enabling C+ mode

login
register
mail settings
Submitter David Woodhouse
Date Nov. 21, 2012, 8:27 p.m.
Message ID <1353529639.26346.164.camel@shinybook.infradead.org>
Download mbox | patch
Permalink /patch/200840/
State Accepted
Delegated to: David Miller
Headers show

Comments

David Woodhouse - Nov. 21, 2012, 8:27 p.m.
This fixes (for me) a regression introduced by commit b01af457 ("8139cp:
set ring address before enabling receiver"). That commit configured the
descriptor ring addresses earlier in the initialisation sequence, in
order to avoid the possibility of triggering stray DMA before the
correct address had been set up.

Unfortunately, it seems that the hardware will scribble garbage into the
TxRingAddr registers when we enable "plus mode" Tx in the CpCmd
register. Observed on a Traverse Geos router board.

To deal with this, while not reintroducing the problem which led to the
original commit, we augment cp_start_hw() to write to the CpCmd register
*first*, then set the descriptor ring addresses, and then finally to
enable Rx and Tx in the original 8139 Cmd register. The datasheet
actually indicates that we should enable Tx/Rx in the Cmd register
*before* configuring the descriptor addresses, but that would appear to
re-introduce the problem that the offending commit b01af457 was trying
to solve. And this variant appears to work fine on real hardware.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@kernel.org [3.5+]

---
How about this? I'm still somewhat confused about when it actually
*does* start doing DMA, given what the datasheet says.
françois romieu - Nov. 21, 2012, 8:40 p.m.
David Woodhouse <dwmw2@infradead.org> :
> This fixes (for me) a regression introduced by commit b01af457 ("8139cp:
> set ring address before enabling receiver"). That commit configured the
> descriptor ring addresses earlier in the initialisation sequence, in
> order to avoid the possibility of triggering stray DMA before the
> correct address had been set up.
> 
> Unfortunately, it seems that the hardware will scribble garbage into the
> TxRingAddr registers when we enable "plus mode" Tx in the CpCmd
> register. Observed on a Traverse Geos router board.
> 
> To deal with this, while not reintroducing the problem which led to the
> original commit, we augment cp_start_hw() to write to the CpCmd register
> *first*, then set the descriptor ring addresses, and then finally to
> enable Rx and Tx in the original 8139 Cmd register. The datasheet
> actually indicates that we should enable Tx/Rx in the Cmd register
> *before* configuring the descriptor addresses, but that would appear to
> re-introduce the problem that the offending commit b01af457 was trying
> to solve. And this variant appears to work fine on real hardware.
> 
> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
> Cc: stable@kernel.org [3.5+]
> 
> ---
> How about this? I'm still somewhat confused about when it actually
> *does* start doing DMA, given what the datasheet says.

Straight to -stable ?

Afaik nobody complained from the original (pre b01af457) problem on
real hardware.

May be someone @realtek (hi Hayes) can give an explanation regarding
the CpCmd, RingAddr, Cmd init sequence and the start of DMA.
David Woodhouse - Nov. 21, 2012, 10:32 p.m.
On Wed, 2012-11-21 at 21:40 +0100, Francois Romieu wrote:
> Straight to -stable ?

That's the way it works. You put the Cc: stable on the *original* commit
that goes upstream. There's no sane way to retroactively add that tag
after it's already been merged and tested.

Yes, you can bug Greg manually to 'please add this upstream commit which
we forgot to mark as Cc: stable' but that isn't the way it's usually
done.

> Afaik nobody complained from the original (pre b01af457) problem on
> real hardware.
>
> May be someone @realtek (hi Hayes) can give an explanation regarding
> the CpCmd, RingAddr, Cmd init sequence and the start of DMA.

That would be really useful; thanks. To recap for Hayes' benefit: the
concern is that if we follow the instructions in §6.33 of the data
sheet:

Recommendation to C+ mode programming: Enable C+ mode functions in C+CR
register first, => Enable transmit/receive in Command register (offset
37h), => Configure other related registers (ex. Descriptor start
address, TCR, RCR, ...).

... then we appear to be starting up the DMA before we actually tell it
the descriptor ring addresses, which will cause stray DMA to random
unconfigured addresses!

Is there some detail of the hardware which prevents this from actually
happening? Or if not, is my proposed workaround (enabling Tx/Rx in the
C+ Command Register *first*, then setting the descriptor addresses, and
enabling Tx/Rx in the old-style Command register last) OK?

It was observed that when setting the descriptor addresses *first*, the
Transmit Descriptor Start Address Register was getting overwritten with
garbage when we enabled Tx in the C+ Command Register.

I note that we're also setting a bunch of other things in the Rx and Tx
config registers *after* operation all seems to have started up... is
that OK too?
David Miller - Nov. 21, 2012, 10:40 p.m.
From: David Woodhouse <dwmw2@infradead.org>
Date: Wed, 21 Nov 2012 22:32:11 +0000

> On Wed, 2012-11-21 at 21:40 +0100, Francois Romieu wrote:
>> Straight to -stable ?
> 
> That's the way it works. You put the Cc: stable on the *original* commit
> that goes upstream. There's no sane way to retroactively add that tag
> after it's already been merged and tested.
> 
> Yes, you can bug Greg manually to 'please add this upstream commit which
> we forgot to mark as Cc: stable' but that isn't the way it's usually
> done.

On the contrary, for networking I submit everything manually and I
remove the CC: tags.

I have a queue on patchwork that I add such patches to, so that they
do not get lost.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Woodhouse - Nov. 21, 2012, 10:52 p.m.
On Wed, 2012-11-21 at 17:40 -0500, David Miller wrote:
> On the contrary, for networking I submit everything manually and I
> remove the CC: tags.
> 
> I have a queue on patchwork that I add such patches to, so that they
> do not get lost.

Ah, right. Thanks for the correction. Is it even worth giving the hint
that this should be for the stable tree (from v3.5 onwards), or should I
leave you to work that all out for yourself? And if it *is* worth giving
that hint, is it better to do it in a comment after --- at the end of
the commit comment, rather than the "normal" 'Cc: stable' tag?
David Miller - Nov. 21, 2012, 11:12 p.m.
From: David Woodhouse <dwmw2@infradead.org>
Date: Wed, 21 Nov 2012 22:52:54 +0000

> On Wed, 2012-11-21 at 17:40 -0500, David Miller wrote:
>> On the contrary, for networking I submit everything manually and I
>> remove the CC: tags.
>> 
>> I have a queue on patchwork that I add such patches to, so that they
>> do not get lost.
> 
> Ah, right. Thanks for the correction. Is it even worth giving the hint
> that this should be for the stable tree (from v3.5 onwards), or should I
> leave you to work that all out for yourself? And if it *is* worth giving
> that hint, is it better to do it in a comment after --- at the end of
> the commit comment, rather than the "normal" 'Cc: stable' tag?

The more information you give in the commit message the better, that
way I don't have to guess :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik - Nov. 22, 2012, 3:47 a.m.
On 11/21/2012 03:27 PM, David Woodhouse wrote:
> This fixes (for me) a regression introduced by commit b01af457 ("8139cp:
> set ring address before enabling receiver"). That commit configured the
> descriptor ring addresses earlier in the initialisation sequence, in
> order to avoid the possibility of triggering stray DMA before the
> correct address had been set up.
>
> Unfortunately, it seems that the hardware will scribble garbage into the
> TxRingAddr registers when we enable "plus mode" Tx in the CpCmd
> register. Observed on a Traverse Geos router board.
>
> To deal with this, while not reintroducing the problem which led to the
> original commit, we augment cp_start_hw() to write to the CpCmd register
> *first*, then set the descriptor ring addresses, and then finally to
> enable Rx and Tx in the original 8139 Cmd register. The datasheet
> actually indicates that we should enable Tx/Rx in the Cmd register
> *before* configuring the descriptor addresses, but that would appear to
> re-introduce the problem that the offending commit b01af457 was trying
> to solve. And this variant appears to work fine on real hardware.
>
> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
> Cc: stable@kernel.org [3.5+]
>
> ---
> How about this? I'm still somewhat confused about when it actually
> *does* start doing DMA, given what the datasheet says.

Well, we have three logical code states:

State A:  pre-b01af457, known working
State B:  b01af457, known broken
State C:  dwmw2 proposed fix, tested on 1 hardware, new technique, query 
open with Realtek

State A seems safer for late -rc, which is where we are.  Fix the 
regression by reverting to well-tested, widely deployed state.

Then apply your patch here as an immediate candidate for net-next.

	Jeff






--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Nov. 22, 2012, 4:39 a.m.
From: Jeff Garzik <jgarzik@pobox.com>
Date: Wed, 21 Nov 2012 22:47:39 -0500

> State A:  pre-b01af457, known working
> State B:  b01af457, known broken

State A is also known buggy on the largest consumer of this driver,
the emulated hardware.

Please evaluate this realistically.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik - Nov. 22, 2012, 4:53 a.m.
On 11/21/2012 11:39 PM, David Miller wrote:
> From: Jeff Garzik <jgarzik@pobox.com>
> Date: Wed, 21 Nov 2012 22:47:39 -0500
>
>> State A:  pre-b01af457, known working
>> State B:  b01af457, known broken
>
> State A is also known buggy on the largest consumer of this driver,
> the emulated hardware.
>
> Please evaluate this realistically.

If the simulator fails to match the hardware, that is a simulator bug.

It is disappointing to work around someone else's software bug in the 
kernel.

	Jeff



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Wang - Nov. 22, 2012, 5:30 a.m.
On 11/22/2012 12:53 PM, Jeff Garzik wrote:
> On 11/21/2012 11:39 PM, David Miller wrote:
>> From: Jeff Garzik <jgarzik@pobox.com>
>> Date: Wed, 21 Nov 2012 22:47:39 -0500
>>
>>> State A:  pre-b01af457, known working
>>> State B:  b01af457, known broken
>>
>> State A is also known buggy on the largest consumer of this driver,
>> the emulated hardware.
>>
>> Please evaluate this realistically.
>
> If the simulator fails to match the hardware, that is a simulator bug.
CC realtek linux driver mainter (nic_swsd@realtek.com)

The problem the behaviour of the hardware is subtle, and we could not 
just infer it from the datasheet. Another issue is in some situation, 
the datasheet is conflict with what real hardware does, one example is  
the cfg9364 issue mentioned by David ( I also meet it during qemu 
development).

If the hardware always fit garbage into the TxRingAddr register when 
"plus mode" were enabled, it may send something from memory to the wire 
unexpectedly which looks really strange. If it does not change the 
RxRingAddr when enabling C+, another method is to keep setting the rx 
address before C+ enabling but does the tx after.
>
> It is disappointing to work around someone else's software bug in the 
> kernel.

Qemu also has some workarounds for the legacy kernels and even in this 
case: it initialize RxRingAddr to 0 and check it during receiving, it 
the addr is still zero ( which may mean the rx ring addr were set after 
the c+ is enabled), it won't do the receiving to prevent the corruption. 
So reverting is safe for rx now.
>
>     Jeff
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
françois romieu - Nov. 22, 2012, 9:39 p.m.
Jeff Garzik <jgarzik@pobox.com> :
> On 11/21/2012 11:39 PM, David Miller wrote:
> >From: Jeff Garzik <jgarzik@pobox.com>
> >Date: Wed, 21 Nov 2012 22:47:39 -0500
> >
> >>State A:  pre-b01af457, known working
> >>State B:  b01af457, known broken
> >
> >State A is also known buggy on the largest consumer of this driver,
> >the emulated hardware.
>
> >Please evaluate this realistically.
> 
> If the simulator fails to match the hardware, that is a simulator bug.

Yes.

> It is disappointing to work around someone else's software bug in
> the kernel.

Yes. :o/

I like David Woodhouse's C (attached patch) since 1) Realtek does
not seem to care about oldies 2) the emulation will not be fixed in a
decent timeframe 3) real 8139cp users care.

It would be nice if gilboad could give it a try (users Cced).

Btw David W., could consider adding artificial delays between the writes
and see if / when things start to fail (CpCmd write in cp_start_hw is an
unflushed posted write for instance).
David Woodhouse - Nov. 22, 2012, 11:12 p.m.
On Thu, 2012-11-22 at 22:39 +0100, Francois Romieu wrote:
> Btw David W., could consider adding artificial delays between the
> writes and see if / when things start to fail (CpCmd write in
> cp_start_hw is an unflushed posted write for instance).

That's how I tracked it down to the CpCmd write. I littered the whole of
the init path with
 printk("at line %d TxRingAddr %08x%08 (sb %08x)\n", __LINE__,
         cpr32(TxRingAddr+4), cpr32(TxRingAddr), cp->ring_dma + whatever);
... until the output looked something like this:

root@geos:~# insmod ./8139cp.ko 
[ 1331.492486] 8139cp: 8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004)
[ 1331.500388] 8139cp 0000:00:0a.0: eth0: RTL-8139Cx at 0xd10a6000, 00:0a:fa:22:
00:96, IRQ 10
[ 1331.509608] 8139cp 0000:00:0b.0: eth1: RTL-8139Cx at 0xd10a8100, 00:0a:fa:22:
00:97, IRQ 11
root@geos:~# [ 1331.644393] at line 995 TxRingAddr   000000000f3c6400 (sb f3c6400)
[ 1331.650579] at line 960 TxRingAddr   000000000f3c6400 (sb f3c6400)
[ 1331.656820] at line 962 TxRingAddr   000000000f3e4400 (sb f3c6400)
[ 1331.663020] at line 964 TxRingAddr   000000000f3e4400 (sb f3c6400)
[ 1331.669205] at line 998 TxRingAddr   000000000f3e4400 (sb f3c6400)
[ 1331.675412] at line 1001 TxRingAddr   000000000f3e4400 (sb f3c6400)
[ 1331.681706] at line 1003 TxRingAddr   000000000f3e4400 (sb f3c6400)
[ 1331.687977] at line 1005 TxRingAddr   000000000f3e4400 (sb f3c6400)

Each of those printks will have effectively flushed any prior posted
writes... not that this AMD Geode platform actually *does* post writes,
to my knowledge. And at 115200 baud, each one was about a 6ms delay.
Jason Wang - Nov. 23, 2012, 3:53 a.m.
On 11/22/2012 12:53 PM, Jeff Garzik wrote:
> On 11/21/2012 11:39 PM, David Miller wrote:
>> From: Jeff Garzik <jgarzik@pobox.com>
>> Date: Wed, 21 Nov 2012 22:47:39 -0500
>>
>>> State A:  pre-b01af457, known working
>>> State B:  b01af457, known broken
>>
>> State A is also known buggy on the largest consumer of this driver,
>> the emulated hardware.
>>
>> Please evaluate this realistically.
>
> If the simulator fails to match the hardware, that is a simulator bug.
Resend the mail because it's fail to post to the list yesterday.

CC realtek linux driver mainter (nic_swsd@realtek.com)

The problem the behaviour of the hardware is subtle, and we could not 
just infer it from the datasheet. Another issue is in some situation, 
the datasheet is conflict with what real hardware does, one example is 
the cfg9364 issue mentioned by David ( I also meet it during qemu 
development).

If the hardware always fit garbage into the TxRingAddr register when 
"plus mode" were enabled, it may send something from memory to the wire 
unexpectedly which looks really strange. If it does not change the 
RxRingAddr when enabling C+, another method is to keep setting the rx 
address before C+ enabling but does the tx after.
>
> It is disappointing to work around someone else's software bug in the 
> kernel.
>

Qemu also has some workarounds for the buggy kernels and even in this 
case: it initialize RxRingAddr to 0 and check it during receiving, it  
check whether the addr is still zero ( which may mean the rx ring addr 
were set after the c+ is enabled), it won't do the receiving to prevent 
the corruption. So reverting is safe for rx now.
>     Jeff
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gilboa Davara - Nov. 23, 2012, 12:37 p.m.
On Thu, Nov 22, 2012 at 11:39 PM, Francois Romieu <romieu@fr.zoreil.com> wrote:
> It would be nice if gilboad could give it a try (users Cced).
>

Applied it against 3.6.6.
Seems to be working just fine.

> --
> Ueimor

- Gilboa
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Nov. 25, 2012, 8:54 p.m.
From: David Woodhouse <dwmw2@infradead.org>
Date: Wed, 21 Nov 2012 20:27:19 +0000

> This fixes (for me) a regression introduced by commit b01af457 ("8139cp:
> set ring address before enabling receiver"). That commit configured the
> descriptor ring addresses earlier in the initialisation sequence, in
> order to avoid the possibility of triggering stray DMA before the
> correct address had been set up.
> 
> Unfortunately, it seems that the hardware will scribble garbage into the
> TxRingAddr registers when we enable "plus mode" Tx in the CpCmd
> register. Observed on a Traverse Geos router board.
> 
> To deal with this, while not reintroducing the problem which led to the
> original commit, we augment cp_start_hw() to write to the CpCmd register
> *first*, then set the descriptor ring addresses, and then finally to
> enable Rx and Tx in the original 8139 Cmd register. The datasheet
> actually indicates that we should enable Tx/Rx in the Cmd register
> *before* configuring the descriptor addresses, but that would appear to
> re-introduce the problem that the offending commit b01af457 was trying
> to solve. And this variant appears to work fine on real hardware.
> 
> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>

Applied to net-next.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/ethernet/realtek/8139cp.c b/drivers/net/ethernet/realtek/8139cp.c
index 1c81825..5166d94 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -957,7 +957,35 @@  static void cp_reset_hw (struct cp_private *cp)
 
 static inline void cp_start_hw (struct cp_private *cp)
 {
+	dma_addr_t ring_dma;
+
 	cpw16(CpCmd, cp->cpcmd);
+
+	/*
+	 * These (at least TxRingAddr) need to be configured after the
+	 * corresponding bits in CpCmd are enabled. Datasheet v1.6 §6.33
+	 * (C+ Command Register) recommends that these and more be configured
+	 * *after* the [RT]xEnable bits in CpCmd are set. And on some hardware
+	 * it's been observed that the TxRingAddr is actually reset to garbage
+	 * when C+ mode Tx is enabled in CpCmd.
+	 */
+	cpw32_f(HiTxRingAddr, 0);
+	cpw32_f(HiTxRingAddr + 4, 0);
+
+	ring_dma = cp->ring_dma;
+	cpw32_f(RxRingAddr, ring_dma & 0xffffffff);
+	cpw32_f(RxRingAddr + 4, (ring_dma >> 16) >> 16);
+
+	ring_dma += sizeof(struct cp_desc) * CP_RX_RING_SIZE;
+	cpw32_f(TxRingAddr, ring_dma & 0xffffffff);
+	cpw32_f(TxRingAddr + 4, (ring_dma >> 16) >> 16);
+
+	/*
+	 * Strictly speaking, the datasheet says this should be enabled
+	 * *before* setting the descriptor addresses. But what, then, would
+	 * prevent it from doing DMA to random unconfigured addresses?
+	 * This variant appears to work fine.
+	 */
 	cpw8(Cmd, RxOn | TxOn);
 }
 
@@ -969,7 +997,6 @@  static void cp_enable_irq(struct cp_private *cp)
 static void cp_init_hw (struct cp_private *cp)
 {
 	struct net_device *dev = cp->dev;
-	dma_addr_t ring_dma;
 
 	cp_reset_hw(cp);
 
@@ -979,17 +1006,6 @@  static void cp_init_hw (struct cp_private *cp)
 	cpw32_f (MAC0 + 0, le32_to_cpu (*(__le32 *) (dev->dev_addr + 0)));
 	cpw32_f (MAC0 + 4, le32_to_cpu (*(__le32 *) (dev->dev_addr + 4)));
 
-	cpw32_f(HiTxRingAddr, 0);
-	cpw32_f(HiTxRingAddr + 4, 0);
-
-	ring_dma = cp->ring_dma;
-	cpw32_f(RxRingAddr, ring_dma & 0xffffffff);
-	cpw32_f(RxRingAddr + 4, (ring_dma >> 16) >> 16);
-
-	ring_dma += sizeof(struct cp_desc) * CP_RX_RING_SIZE;
-	cpw32_f(TxRingAddr, ring_dma & 0xffffffff);
-	cpw32_f(TxRingAddr + 4, (ring_dma >> 16) >> 16);
-
 	cp_start_hw(cp);
 	cpw8(TxThresh, 0x06); /* XXX convert magic num to a constant */