diff mbox

Suspend/resume - slow resume

Message ID 20110417101731.GA17986@electric-eye.fr.zoreil.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Francois Romieu April 17, 2011, 10:17 a.m. UTC
Linus Torvalds <torvalds@linux-foundation.org> :
[...]
> So Francois, can we please not load the firmware at resume time when
> it wasn't loaded when suspended!

One can try the patch below. It is completely untested yet.

Subject: [PATCH] r8169: don't request firmware when there's no userspace.

The firmware is cached during open() and released during close().
The driver uses the cached firmware between open() and close().

Don't bother with rtl8169_pcierr_interrupt. It is special anyway.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
---
 drivers/net/r8169.c |   76 +++++++++++++++++++++++++++++++++++----------------
 1 files changed, 52 insertions(+), 24 deletions(-)

Comments

Linus Torvalds April 17, 2011, 4:42 p.m. UTC | #1
On Sun, Apr 17, 2011 at 3:17 AM, Francois Romieu <romieu@fr.zoreil.com> wrote:
>
> One can try the patch below. It is completely untested yet.

Looks _fairly_ sane to me.  The "request firmware in open, release
firmware in close" approach would seem to be a fairly obvious one.

HOWEVER.

I think it's broken in two ways:

 - it causes too much re-loading for no good reason. I'm looking at
rtl8169_reinit_task() in particular, and if I read that correctly,
then if there are any problems, that crazy function will end up
loading and unloading the firmware EVERY FOUR TIMER TICKS!

   That's just totally broken.

   But I'd also worry about some init scripts in particular, maybe
that whole "open to test something, close immediately" is common. I
don't know.

 - it seems to leak the open function fails after requesting the
firmware - nothing will ever close it, and if you unload the module
the firmware will never be released as far as I can tell.

I might be missing some failure path (maybe the network device open
will call close even if the open failed), but it looks real.

So I think your patch _approaches_ being sane, but no, it's not working as-is.

I really think that this kind of "ad-hoc random firmware loading"
stuff should go away. The device layer (or maybe the network layer)
should have some clear and unambiguous rules. Exactly so that drivers
don't make these kinds of mistakes.

My gut feel for what the rules should be is roughly (but please take
this as a starting point for discussion rather than some final thing):

 - try to load the firmware at each time the user tries to activate
the device. IOW, not like this, where the rtl8169_open() function is
called from many different contexts, only one of them being the
"network layer tried to open the device".

   I do think we need to do it potentially multiple times: the main
issue being something like this:

      ifconfig eth0 up
      ** oops, that failed because we didn't have the firmware files
installed **
      ... install firmware files, the 'dmesg' gave good error messages
that the user could use to know what was going on ..
      ifconfig eth0 up
      ** this needs to trigger another firmware load attempt **

   In other words, doing firmware load at module load time - or at
device scan time - is fundamentally broken for a network driver.

 - unload not on close, but on device unregister (iow not when you do
"ifconfig eth0 down", but when the "eth0" device really goes away)

But as mentioned, the above is (a) just my gut feel - please discuss -
and (b) I really think that leaving this to the network driver has
been shown to continually result in the drivers doing the firmware
load/unload in all the wrong places.

So I do wonder whether we could add a "ndo_firmware_load()" and
"ndo_firmware_unload()" callback to the network device operations, and
have the network layer make the above rules. A network driver
obviously _could_ do its firmware load from other places instead, but
such a network driver would basically be "obviously broken".

Comments?

(That said, I think that Francois' patch could be made acceptable
fairly trivially:

 - avoiding the load/unload from rtl8169_reinit_task() and that
horrible "every four timer ticks" issue. That just seems crazy. Maybe
by just having an internal open helper function that does everything
but the firmware load)

 - adding a rtl_release_firmware on open failure so that you don't leak stuff

but I do think that this whole "firmware load in random places" has
been such a common bug that I think we should have some real rules
about it)

Hmm?

                                Linus
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Francois Romieu April 18, 2011, 6:08 p.m. UTC | #2
Linus Torvalds <torvalds@linux-foundation.org> :
[lots of silly bugs]

Ok. it should be fixed for wednesday.

[...]
>  - unload not on close, but on device unregister (iow not when you do
> "ifconfig eth0 down", but when the "eth0" device really goes away)

Without further action, the firmware(s) will thus be locked in until the
driver is removed. 

> But as mentioned, the above is (a) just my gut feel - please discuss -
> and (b) I really think that leaving this to the network driver has
> been shown to continually result in the drivers doing the firmware
> load/unload in all the wrong places.

As long as it can be fixed... If the 60s delay is removed and the firmware
loading emits some messages for programmer barbie, I am more than happy.

If someone can tell me where Realtek's firmware should be sent to as David
W. seems to be busy, it will be perfect.
diff mbox

Patch

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 493b0de..ccc25cd 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -170,6 +170,16 @@  static const struct {
 };
 #undef _R
 
+static const struct rtl_firmware_info {
+	int mac_version;
+	const char *fw_name;
+} rtl_firmware_infos[] = {
+	{ .mac_version = RTL_GIGA_MAC_VER_25, .fw_name = FIRMWARE_8168D_1 },
+	{ .mac_version = RTL_GIGA_MAC_VER_26, .fw_name = FIRMWARE_8168D_2 },
+	{ .mac_version = RTL_GIGA_MAC_VER_29, .fw_name = FIRMWARE_8105E_1 },
+	{ .mac_version = RTL_GIGA_MAC_VER_30, .fw_name = FIRMWARE_8105E_1 }
+};
+
 enum cfg_version {
 	RTL_CFG_0 = 0x00,
 	RTL_CFG_1,
@@ -1793,21 +1803,21 @@  static void rtl_release_firmware(struct rtl8169_private *tp)
 	tp->fw = NULL;
 }
 
-static int rtl_apply_firmware(struct rtl8169_private *tp, const char *fw_name)
+static void rtl_apply_firmware(struct rtl8169_private *tp)
 {
-	const struct firmware **fw = &tp->fw;
-	int rc = !*fw;
-
-	if (rc) {
-		rc = request_firmware(fw, fw_name, &tp->pci_dev->dev);
-		if (rc < 0)
-			goto out;
-	}
+	const struct firmware *fw = tp->fw;
 
 	/* TODO: release firmware once rtl_phy_write_fw signals failures. */
-	rtl_phy_write_fw(tp, *fw);
-out:
-	return rc;
+	if (fw)
+		rtl_phy_write_fw(tp, fw);
+}
+
+static void rtl_apply_firmware_cond(struct rtl8169_private *tp, u8 reg, u16 val)
+{
+	if (rtl_readphy(tp, reg) != val)
+		netif_warn(tp, hw, tp->dev, "chipset not ready for firmware\n");
+	else
+		rtl_apply_firmware(tp);
 }
 
 static void rtl8169s_hw_phy_config(struct rtl8169_private *tp)
@@ -2246,10 +2256,8 @@  static void rtl8168d_1_hw_phy_config(struct rtl8169_private *tp)
 
 	rtl_writephy(tp, 0x1f, 0x0005);
 	rtl_writephy(tp, 0x05, 0x001b);
-	if ((rtl_readphy(tp, 0x06) != 0xbf00) ||
-	    (rtl_apply_firmware(tp, FIRMWARE_8168D_1) < 0)) {
-		netif_warn(tp, probe, tp->dev, "unable to apply firmware patch\n");
-	}
+
+	rtl_apply_firmware_cond(tp, MII_EXPANSION, 0xbf00);
 
 	rtl_writephy(tp, 0x1f, 0x0000);
 }
@@ -2351,10 +2359,8 @@  static void rtl8168d_2_hw_phy_config(struct rtl8169_private *tp)
 
 	rtl_writephy(tp, 0x1f, 0x0005);
 	rtl_writephy(tp, 0x05, 0x001b);
-	if ((rtl_readphy(tp, 0x06) != 0xb300) ||
-	    (rtl_apply_firmware(tp, FIRMWARE_8168D_2) < 0)) {
-		netif_warn(tp, probe, tp->dev, "unable to apply firmware patch\n");
-	}
+
+	rtl_apply_firmware_cond(tp, MII_EXPANSION, 0xb300);
 
 	rtl_writephy(tp, 0x1f, 0x0000);
 }
@@ -2474,8 +2480,7 @@  static void rtl8105e_hw_phy_config(struct rtl8169_private *tp)
 	rtl_writephy(tp, 0x18, 0x0310);
 	msleep(100);
 
-	if (rtl_apply_firmware(tp, FIRMWARE_8105E_1) < 0)
-		netif_warn(tp, probe, tp->dev, "unable to apply firmware patch\n");
+	rtl_apply_firmware(tp);
 
 	rtl_writephy_batch(tp, phy_reg_init, ARRAY_SIZE(phy_reg_init));
 }
@@ -3288,8 +3293,6 @@  static void __devexit rtl8169_remove_one(struct pci_dev *pdev)
 
 	cancel_delayed_work_sync(&tp->task);
 
-	rtl_release_firmware(tp);
-
 	unregister_netdev(dev);
 
 	if (pci_dev_run_wake(pdev))
@@ -3303,6 +3306,27 @@  static void __devexit rtl8169_remove_one(struct pci_dev *pdev)
 	pci_set_drvdata(pdev, NULL);
 }
 
+static void rtl_request_firmware(struct rtl8169_private *tp)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(rtl_firmware_infos); i++) {
+		const struct rtl_firmware_info *info = rtl_firmware_infos + i;
+
+		if (info->mac_version == tp->mac_version) {
+			const char *name = info->fw_name;
+			int rc;
+
+			rc = request_firmware(&tp->fw, name, &tp->pci_dev->dev);
+			if (rc < 0) {
+				netif_warn(tp, ifup, tp->dev, "unable to load "
+					"firmware patch %s (%d)\n", name, rc);
+			}
+			break;
+		}
+	}
+}
+
 static int rtl8169_open(struct net_device *dev)
 {
 	struct rtl8169_private *tp = netdev_priv(dev);
@@ -3334,6 +3358,8 @@  static int rtl8169_open(struct net_device *dev)
 
 	smp_mb();
 
+	rtl_request_firmware(tp);
+
 	retval = request_irq(dev->irq, rtl8169_interrupt,
 			     (tp->features & RTL_FEATURE_MSI) ? 0 : IRQF_SHARED,
 			     dev->name, dev);
@@ -4891,6 +4917,8 @@  static int rtl8169_close(struct net_device *dev)
 
 	free_irq(dev->irq, dev);
 
+	rtl_release_firmware(tp);
+
 	dma_free_coherent(&pdev->dev, R8169_RX_RING_BYTES, tp->RxDescArray,
 			  tp->RxPhyAddr);
 	dma_free_coherent(&pdev->dev, R8169_TX_RING_BYTES, tp->TxDescArray,