diff mbox

[v2] ucc_geth: fix ethtool set ring param bug

Message ID 1283443364-2136-1-git-send-email-liang.li@windriver.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Liang Li Sept. 2, 2010, 4:02 p.m. UTC
It's common sense that when we should do change to driver ring
desc/buffer etc only after 'stop/shutdown' the device. When we
do change while devices/driver is running, kernel oops occur:

[
root@fsl_8569mds:/root> ethtool -G eth0 tx 256
root@fsl_8569mds:/root> Oops: Kernel access of bad area, sig: 11 [#1]
MPC8569 MDS
last sysfs file: /sys/kernel/uevent_seqnum
Modules linked in:
NIP: 00000000 LR: c0072fbc CTR: 00000000
REGS: effefef0 TRAP: 0400   Not tainted  (2.6.36-rc3-00002-g6c3b118-dirty)
MSR: 00021000 <ME,CE>  CR: 24442048  XER: 00000000
TASK = c0518350[0] 'swapper' THREAD: c0544000
GPR00: 00000000 effeffa0 c0518350 00000020 ef0be000 ef005000 80000000 00000200
GPR08: c03b5d00 00000000 f1010080 ef08d458 000dda96 00000000 3ffb2900 00000000
GPR16: 00000000 3ffa8948 3fff1314 3ffac3f8 00000000 00000000 00000000 00000000
GPR24: 00000000 00000000 00000000 c0530000 00000000 00000000 00000020 ef3709c0
NIP [00000000] (null)
LR [c0072fbc] handle_IRQ_event+0x4c/0x12c
Call Trace:
[effeffa0] [c0019414] qe_ic_mask_irq+0x1c/0x90 (unreliable)
[effeffc0] [c0075b74] handle_level_irq+0x88/0x128
[effeffd0] [c001ca44] qe_ic_cascade_muxed_mpic+0x50/0x88
[effefff0] [c000d5fc] call_handle_irq+0x18/0x28
[c0545ea0] [c0004f3c] do_IRQ+0xa8/0x140
[c0545ed0] [c000e2bc] ret_from_except+0x0/0x18
-- Exception: 501 at cpu_idle+0x9c/0xdc
    LR = cpu_idle+0x9c/0xdc
[c0545f90] [c0008318] cpu_idle+0x54/0xdc (unreliable)
[c0545fb0] [c000231c] rest_init+0x68/0x7c
[c0545fc0] [c04e686c] start_kernel+0x230/0x2b0
[c0545ff0] [c000039c] skpinv+0x2b4/0x2f0
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
Kernel panic - not syncing: Fatal exception in interrupt
Rebooting in 180 seconds..
]

Then the natural solution would be 'stop driver/device then adjust
ring buffer parameter then reactivate driver/device'.

v1: add roll back state for 'auto reopen fail'
v2: just do dev_close when 'reopen fail'

Signed-off-by: Liang Li <liang.li@windriver.com>
---
 drivers/net/ucc_geth.c         |    4 ++--
 drivers/net/ucc_geth.h         |    2 ++
 drivers/net/ucc_geth_ethtool.c |   27 +++++++++++++++++++++------
 3 files changed, 25 insertions(+), 8 deletions(-)

Comments

Ben Hutchings Sept. 2, 2010, 6:04 p.m. UTC | #1
On Fri, 2010-09-03 at 00:02 +0800, Liang Li wrote:
> It's common sense that when we should do change to driver ring
> desc/buffer etc only after 'stop/shutdown' the device. When we
> do change while devices/driver is running, kernel oops occur:
[...]
> +		printk(KERN_INFO "Reactivating interface %s.\n", netdev->name);
> +		ret = ucc_geth_open(netdev);
> +		if (ret) {
> +			printk(KERN_WARNING "uec_set_ringparam: set ring param for running"
> +					" interface %s failed. Please try again.\n", netdev->name);
> +			dev_close(netdev);
[...]

If ucc_geth_open() failed you MUST NOT call ucc_geth_close(), but that
is what dev_close() is going to do.  But the device is still flagged as
running so 'ifconfig down' is going to call dev_close() as well.  There
is no way out.

This is why I said you must call dev_close() and then dev_open()
instead.  Then if dev_open() fails, just print the error, e.g.:

               dev_close(netdev);
               ret = dev_open(netdev);
               if (ret)
                       netdev_err(netdev,
                                  "uec_set_ringparam: failed to restart"
                                  " interface with new ring parameters\n");

(And I think this really is a serious error, hence the 'err' rather than
'warning' severity.)

(By the way, I noticed there are other places where ucc_geth_close() and
ucc_geth_open() are called, without error checking.  These are also
bugs, but that doesn't justify adding new bugs.)

Ben.
Liang Li Sept. 3, 2010, 1:20 a.m. UTC | #2
On Thu, Sep 02, 2010 at 07:04:59PM +0100, Ben Hutchings wrote:
> On Fri, 2010-09-03 at 00:02 +0800, Liang Li wrote:
> > It's common sense that when we should do change to driver ring
> > desc/buffer etc only after 'stop/shutdown' the device. When we
> > do change while devices/driver is running, kernel oops occur:
> [...]
> > +		printk(KERN_INFO "Reactivating interface %s.\n", netdev->name);
> > +		ret = ucc_geth_open(netdev);
> > +		if (ret) {
> > +			printk(KERN_WARNING "uec_set_ringparam: set ring param for running"
> > +					" interface %s failed. Please try again.\n", netdev->name);
> > +			dev_close(netdev);
> [...]
> 
> If ucc_geth_open() failed you MUST NOT call ucc_geth_close(), but that
> is what dev_close() is going to do.  But the device is still flagged as
> running so 'ifconfig down' is going to call dev_close() as well.  There
> is no way out.

dev_close is safe enough IMHO. Call dev_close repeatly won't cause
problem though.

> 
> This is why I said you must call dev_close() and then dev_open()
> instead.  Then if dev_open() fails, just print the error, e.g.:
> 
>                dev_close(netdev);
>                ret = dev_open(netdev);
>                if (ret)
>                        netdev_err(netdev,
>                                   "uec_set_ringparam: failed to restart"
>                                   " interface with new ring parameters\n");
> 
> (And I think this really is a serious error, hence the 'err' rather than
> 'warning' severity.)

I checked NIC drivers in drivers/net, there is no such:

	dev_close(netdev);
	ret = dev_open(netdev)
	if (ret)
		netdev_err(...);

Instead, there are:

	nic_driver_close/down(netdev);
	ret = nic_driver_open/restart(netdev);
	if (ret) {
		waring;
		dev_close(netdev);
	}

> 
> (By the way, I noticed there are other places where ucc_geth_close() and
> ucc_geth_open() are called, without error checking.  These are also
> bugs, but that doesn't justify adding new bugs.)

I think I did not invite new bugs, as I mentioned before, can you show
scenario that the reopen fail and perfect cleanup way?

Thanks,
				-Liang Li

> 
> Ben.
> 
> -- 
> Ben Hutchings, Senior Software Engineer, Solarflare Communications
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Sept. 6, 2010, 8:22 p.m. UTC | #3
From: Liang Li <liang.li@windriver.com>
Date: Fri,  3 Sep 2010 00:02:44 +0800

> +		printk(KERN_INFO "Stopping interface %s.\n", netdev->name);
 ...
> +		printk(KERN_INFO "Reactivating interface %s.\n", netdev->name);

Please get rid of these log messages.

Also, this isn't the way to implement this.

Abstract out the one and only operation that can fail when you change
the ring size, which is the memory allocations, into seperate code.

Allocate the newly sized rings first, and if that fails abort the
ring size change attempt.  This will let the device continue to
function and operate properly even if the ring resizing fails.

Then stop the RX/TX DMA, switch the ring pointers and configuration
inside of the device, restart RX/TX DMA, and finally free up the
old ring memory.

I know why you want to use *_close() and *_open(), it requires less
coding and work on your part.  But this is why the error handling
behavior is difficult and what happens on failure (device goes down
and becomes inoperative) is extremely unpleasant for the user.

The user should be able to make this kind of change over an SSH
shell that uses the interface itself, and not expect to lose
connectivity completely just because the ring allocation fails.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index 7d2e33a..5eadfa3 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -3485,7 +3485,7 @@  err:
 
 /* Called when something needs to use the ethernet device */
 /* Returns 0 for success. */
-static int ucc_geth_open(struct net_device *dev)
+int ucc_geth_open(struct net_device *dev)
 {
 	struct ucc_geth_private *ugeth = netdev_priv(dev);
 	int err;
@@ -3542,7 +3542,7 @@  err:
 }
 
 /* Stops the kernel queue, and halts the controller */
-static int ucc_geth_close(struct net_device *dev)
+int ucc_geth_close(struct net_device *dev)
 {
 	struct ucc_geth_private *ugeth = netdev_priv(dev);
 
diff --git a/drivers/net/ucc_geth.h b/drivers/net/ucc_geth.h
index 05a9558..1cfb400 100644
--- a/drivers/net/ucc_geth.h
+++ b/drivers/net/ucc_geth.h
@@ -1235,5 +1235,7 @@  int init_flow_control_params(u32 automatic_flow_control_mode,
 		u32 __iomem *upsmr_register, u32 __iomem *uempr_register,
 		u32 __iomem *maccfg1_register);
 
+extern int ucc_geth_open(struct net_device *);
+extern int ucc_geth_close(struct net_device *);
 
 #endif				/* __UCC_GETH_H__ */
diff --git a/drivers/net/ucc_geth_ethtool.c b/drivers/net/ucc_geth_ethtool.c
index 6f92e48..2a615ee 100644
--- a/drivers/net/ucc_geth_ethtool.c
+++ b/drivers/net/ucc_geth_ethtool.c
@@ -255,13 +255,28 @@  uec_set_ringparam(struct net_device *netdev,
 		return -EINVAL;
 	}
 
-	ug_info->bdRingLenRx[queue] = ring->rx_pending;
-	ug_info->bdRingLenTx[queue] = ring->tx_pending;
-
 	if (netif_running(netdev)) {
-		/* FIXME: restart automatically */
-		printk(KERN_INFO
-			"Please re-open the interface.\n");
+		u16 rx_t;
+		u16 tx_t;
+		printk(KERN_INFO "Stopping interface %s.\n", netdev->name);
+		ucc_geth_close(netdev);
+
+		rx_t = ug_info->bdRingLenRx[queue];
+		tx_t = ug_info->bdRingLenTx[queue];
+
+		ug_info->bdRingLenRx[queue] = ring->rx_pending;
+		ug_info->bdRingLenTx[queue] = ring->tx_pending;
+
+		printk(KERN_INFO "Reactivating interface %s.\n", netdev->name);
+		ret = ucc_geth_open(netdev);
+		if (ret) {
+			printk(KERN_WARNING "uec_set_ringparam: set ring param for running"
+					" interface %s failed. Please try again.\n", netdev->name);
+			dev_close(netdev);
+		}
+	} else {
+		ug_info->bdRingLenRx[queue] = ring->rx_pending;
+		ug_info->bdRingLenTx[queue] = ring->tx_pending;
 	}
 
 	return ret;