diff mbox

[v1] ucc_geth: fix ethtool set ring param bug

Message ID 1283305429-8426-1-git-send-email-liang.li@windriver.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Liang Li Sept. 1, 2010, 1:43 a.m. UTC
It's common sense that when we should do change to driver ring
desc/buffer etc only after 'stop/shutdown' the device. When we
do change while devices/driver is running, kernel oops occur:

[
root@fsl_8569mds:/root> ethtool -G eth0 tx 256
root@fsl_8569mds:/root> Oops: Kernel access of bad area, sig: 11 [#1]
MPC8569 MDS
last sysfs file: /sys/kernel/uevent_seqnum
Modules linked in:
NIP: 00000000 LR: c0072fbc CTR: 00000000
REGS: effefef0 TRAP: 0400   Not tainted  (2.6.36-rc3-00002-g6c3b118-dirty)
MSR: 00021000 <ME,CE>  CR: 24442048  XER: 00000000
TASK = c0518350[0] 'swapper' THREAD: c0544000
GPR00: 00000000 effeffa0 c0518350 00000020 ef0be000 ef005000 80000000 00000200
GPR08: c03b5d00 00000000 f1010080 ef08d458 000dda96 00000000 3ffb2900 00000000
GPR16: 00000000 3ffa8948 3fff1314 3ffac3f8 00000000 00000000 00000000 00000000
GPR24: 00000000 00000000 00000000 c0530000 00000000 00000000 00000020 ef3709c0
NIP [00000000] (null)
LR [c0072fbc] handle_IRQ_event+0x4c/0x12c
Call Trace:
[effeffa0] [c0019414] qe_ic_mask_irq+0x1c/0x90 (unreliable)
[effeffc0] [c0075b74] handle_level_irq+0x88/0x128
[effeffd0] [c001ca44] qe_ic_cascade_muxed_mpic+0x50/0x88
[effefff0] [c000d5fc] call_handle_irq+0x18/0x28
[c0545ea0] [c0004f3c] do_IRQ+0xa8/0x140
[c0545ed0] [c000e2bc] ret_from_except+0x0/0x18
-- Exception: 501 at cpu_idle+0x9c/0xdc
    LR = cpu_idle+0x9c/0xdc
[c0545f90] [c0008318] cpu_idle+0x54/0xdc (unreliable)
[c0545fb0] [c000231c] rest_init+0x68/0x7c
[c0545fc0] [c04e686c] start_kernel+0x230/0x2b0
[c0545ff0] [c000039c] skpinv+0x2b4/0x2f0
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
Kernel panic - not syncing: Fatal exception in interrupt
Rebooting in 180 seconds..
]

Then the natural solution would be 'stop driver/device then adjust
ring buffer parameter then reactivate driver/device'.

v1: add roll back branch for 'auto reopen fail' statement

Signed-off-by: Liang Li <liang.li@windriver.com>
---
 drivers/net/ucc_geth.c         |    4 ++--
 drivers/net/ucc_geth.h         |    2 ++
 drivers/net/ucc_geth_ethtool.c |   29 +++++++++++++++++++++++------
 3 files changed, 27 insertions(+), 8 deletions(-)

Comments

Ben Hutchings Sept. 1, 2010, 1:42 p.m. UTC | #1
On Wed, 2010-09-01 at 09:43 +0800, Liang Li wrote:
> It's common sense that when we should do change to driver ring
> desc/buffer etc only after 'stop/shutdown' the device. When we
> do change while devices/driver is running, kernel oops occur:
[...]
> -	ug_info->bdRingLenRx[queue] = ring->rx_pending;
> -	ug_info->bdRingLenTx[queue] = ring->tx_pending;
> -
>  	if (netif_running(netdev)) {
> -		/* FIXME: restart automatically */
> -		printk(KERN_INFO
> -			"Please re-open the interface.\n");
> +		u16 rx_t;
> +		u16 tx_t;
> +		printk(KERN_INFO "Stopping interface %s.\n", netdev->name);
> +		ucc_geth_close(netdev);
> +
> +		rx_t = ug_info->bdRingLenRx[queue];
> +		tx_t = ug_info->bdRingLenTx[queue];
> +
> +		ug_info->bdRingLenRx[queue] = ring->rx_pending;
> +		ug_info->bdRingLenTx[queue] = ring->tx_pending;
> +
> +		printk(KERN_INFO "Reactivating interface %s.\n", netdev->name);
> +		ret = ucc_geth_open(netdev);
> +		if (ret) {
> +			printk(KERN_WARNING "uec_set_ringparam: set ring param for running"
> +					" interface %s failed. Please try to make the interface "
> +					" down, then try again.\n", netdev->name);
> +			ug_info->bdRingLenRx[queue] = rx_t;
> +			ug_info->bdRingLenTx[queue] = tx_t;
> +		}
[...]

Bringing the interface down will call ucc_geth_close(), which will try
to free resources that have not been allocated!

If you cannot roll back a failed change then at least use dev_close()
and dev_open() rather than calling ucc_geth_{close,open}() directly, so
that the interface state is updated correctly.  Or just refuse to make
the change if the interface is up.

Ben.
Liang Li Sept. 2, 2010, 12:50 a.m. UTC | #2
On Wed, Sep 01, 2010 at 02:42:30PM +0100, Ben Hutchings wrote:
> On Wed, 2010-09-01 at 09:43 +0800, Liang Li wrote:
> > It's common sense that when we should do change to driver ring
> > desc/buffer etc only after 'stop/shutdown' the device. When we
> > do change while devices/driver is running, kernel oops occur:
> [...]
> > -	ug_info->bdRingLenRx[queue] = ring->rx_pending;
> > -	ug_info->bdRingLenTx[queue] = ring->tx_pending;
> > -
> >  	if (netif_running(netdev)) {
> > -		/* FIXME: restart automatically */
> > -		printk(KERN_INFO
> > -			"Please re-open the interface.\n");
> > +		u16 rx_t;
> > +		u16 tx_t;
> > +		printk(KERN_INFO "Stopping interface %s.\n", netdev->name);
> > +		ucc_geth_close(netdev);
> > +
> > +		rx_t = ug_info->bdRingLenRx[queue];
> > +		tx_t = ug_info->bdRingLenTx[queue];
> > +
> > +		ug_info->bdRingLenRx[queue] = ring->rx_pending;
> > +		ug_info->bdRingLenTx[queue] = ring->tx_pending;
> > +
> > +		printk(KERN_INFO "Reactivating interface %s.\n", netdev->name);
> > +		ret = ucc_geth_open(netdev);
> > +		if (ret) {
> > +			printk(KERN_WARNING "uec_set_ringparam: set ring param for running"
> > +					" interface %s failed. Please try to make the interface "
> > +					" down, then try again.\n", netdev->name);
> > +			ug_info->bdRingLenRx[queue] = rx_t;
> > +			ug_info->bdRingLenTx[queue] = tx_t;
> > +		}
> [...]
> 
> Bringing the interface down will call ucc_geth_close(), which will try
> to free resources that have not been allocated!

Sorry, I did not understand you on this point. There is no
ucc_geth_close when 'open fail'. What you mean here exactly?

> 
> If you cannot roll back a failed change then at least use dev_close()
> and dev_open() rather than calling ucc_geth_{close,open}() directly, so
> that the interface state is updated correctly.  Or just refuse to make
> the change if the interface is up.

That make things more simply but I do not think that is necessary.
Since there is no such restriction exist in most NIC drivers. :)
Actually I did not see the 'fail of reopen' case. So I assume you
may witnessed similar 'open fail' case in some rare cases. May you
please give more input on this then I can 'make re-open safer' here.

Thanks,
				-Liang Li

> 
> Ben.
> 
> -- 
> Ben Hutchings, Senior Software Engineer, Solarflare Communications
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
Ben Hutchings Sept. 2, 2010, 11:11 a.m. UTC | #3
On Thu, 2010-09-02 at 08:50 +0800, Liang Li wrote:
> On Wed, Sep 01, 2010 at 02:42:30PM +0100, Ben Hutchings wrote:
> > On Wed, 2010-09-01 at 09:43 +0800, Liang Li wrote:
> > > It's common sense that when we should do change to driver ring
> > > desc/buffer etc only after 'stop/shutdown' the device. When we
> > > do change while devices/driver is running, kernel oops occur:
> > [...]
> > > -	ug_info->bdRingLenRx[queue] = ring->rx_pending;
> > > -	ug_info->bdRingLenTx[queue] = ring->tx_pending;
> > > -
> > >  	if (netif_running(netdev)) {
> > > -		/* FIXME: restart automatically */
> > > -		printk(KERN_INFO
> > > -			"Please re-open the interface.\n");
> > > +		u16 rx_t;
> > > +		u16 tx_t;
> > > +		printk(KERN_INFO "Stopping interface %s.\n", netdev->name);
> > > +		ucc_geth_close(netdev);
> > > +
> > > +		rx_t = ug_info->bdRingLenRx[queue];
> > > +		tx_t = ug_info->bdRingLenTx[queue];
> > > +
> > > +		ug_info->bdRingLenRx[queue] = ring->rx_pending;
> > > +		ug_info->bdRingLenTx[queue] = ring->tx_pending;
> > > +
> > > +		printk(KERN_INFO "Reactivating interface %s.\n", netdev->name);
> > > +		ret = ucc_geth_open(netdev);
> > > +		if (ret) {
> > > +			printk(KERN_WARNING "uec_set_ringparam: set ring param for running"
> > > +					" interface %s failed. Please try to make the interface "
> > > +					" down, then try again.\n", netdev->name);
> > > +			ug_info->bdRingLenRx[queue] = rx_t;
> > > +			ug_info->bdRingLenTx[queue] = tx_t;
> > > +		}
> > [...]
> > 
> > Bringing the interface down will call ucc_geth_close(), which will try
> > to free resources that have not been allocated!
> 
> Sorry, I did not understand you on this point. There is no
> ucc_geth_close when 'open fail'. What you mean here exactly?
[...]

Read your own warning.

Ben.
Liang Li Sept. 2, 2010, 3:48 p.m. UTC | #4
On Thu, Sep 02, 2010 at 12:11:47PM +0100, Ben Hutchings wrote:
> On Thu, 2010-09-02 at 08:50 +0800, Liang Li wrote:
> > On Wed, Sep 01, 2010 at 02:42:30PM +0100, Ben Hutchings wrote:
> > > On Wed, 2010-09-01 at 09:43 +0800, Liang Li wrote:
> > > > It's common sense that when we should do change to driver ring
> > > > desc/buffer etc only after 'stop/shutdown' the device. When we
> > > > do change while devices/driver is running, kernel oops occur:
> > > [...]
> > > > -	ug_info->bdRingLenRx[queue] = ring->rx_pending;
> > > > -	ug_info->bdRingLenTx[queue] = ring->tx_pending;
> > > > -
> > > >  	if (netif_running(netdev)) {
> > > > -		/* FIXME: restart automatically */
> > > > -		printk(KERN_INFO
> > > > -			"Please re-open the interface.\n");
> > > > +		u16 rx_t;
> > > > +		u16 tx_t;
> > > > +		printk(KERN_INFO "Stopping interface %s.\n", netdev->name);
> > > > +		ucc_geth_close(netdev);
> > > > +
> > > > +		rx_t = ug_info->bdRingLenRx[queue];
> > > > +		tx_t = ug_info->bdRingLenTx[queue];
> > > > +
> > > > +		ug_info->bdRingLenRx[queue] = ring->rx_pending;
> > > > +		ug_info->bdRingLenTx[queue] = ring->tx_pending;
> > > > +
> > > > +		printk(KERN_INFO "Reactivating interface %s.\n", netdev->name);
> > > > +		ret = ucc_geth_open(netdev);
> > > > +		if (ret) {
> > > > +			printk(KERN_WARNING "uec_set_ringparam: set ring param for running"
> > > > +					" interface %s failed. Please try to make the interface "
> > > > +					" down, then try again.\n", netdev->name);
> > > > +			ug_info->bdRingLenRx[queue] = rx_t;
> > > > +			ug_info->bdRingLenTx[queue] = tx_t;
> > > > +		}
> > > [...]
> > > 
> > > Bringing the interface down will call ucc_geth_close(), which will try
> > > to free resources that have not been allocated!
> > 
> > Sorry, I did not understand you on this point. There is no
> > ucc_geth_close when 'open fail'. What you mean here exactly?
> [...]
> 
> Read your own warning.

So the warning is mis-leading... We may not need it... Still, let's talk
about the rare case that 'fail of open', may you please show me what is
the right thing to do 'clean up work' when ever the open fail. Maybe
dev_close(net_dev)?

Thanks,
				-Liang Li

> 
> Ben.
> 
> -- 
> Ben Hutchings, Senior Software Engineer, Solarflare Communications
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
Ben Hutchings Sept. 2, 2010, 4:32 p.m. UTC | #5
On Thu, 2010-09-02 at 23:48 +0800, Liang Li wrote:
> On Thu, Sep 02, 2010 at 12:11:47PM +0100, Ben Hutchings wrote:
> > On Thu, 2010-09-02 at 08:50 +0800, Liang Li wrote:
> > > On Wed, Sep 01, 2010 at 02:42:30PM +0100, Ben Hutchings wrote:
> > > > On Wed, 2010-09-01 at 09:43 +0800, Liang Li wrote:
> > > > > It's common sense that when we should do change to driver ring
> > > > > desc/buffer etc only after 'stop/shutdown' the device. When we
> > > > > do change while devices/driver is running, kernel oops occur:
> > > > [...]
> > > > > -	ug_info->bdRingLenRx[queue] = ring->rx_pending;
> > > > > -	ug_info->bdRingLenTx[queue] = ring->tx_pending;
> > > > > -
> > > > >  	if (netif_running(netdev)) {
> > > > > -		/* FIXME: restart automatically */
> > > > > -		printk(KERN_INFO
> > > > > -			"Please re-open the interface.\n");
> > > > > +		u16 rx_t;
> > > > > +		u16 tx_t;
> > > > > +		printk(KERN_INFO "Stopping interface %s.\n", netdev->name);
> > > > > +		ucc_geth_close(netdev);
> > > > > +
> > > > > +		rx_t = ug_info->bdRingLenRx[queue];
> > > > > +		tx_t = ug_info->bdRingLenTx[queue];
> > > > > +
> > > > > +		ug_info->bdRingLenRx[queue] = ring->rx_pending;
> > > > > +		ug_info->bdRingLenTx[queue] = ring->tx_pending;
> > > > > +
> > > > > +		printk(KERN_INFO "Reactivating interface %s.\n", netdev->name);
> > > > > +		ret = ucc_geth_open(netdev);
> > > > > +		if (ret) {
> > > > > +			printk(KERN_WARNING "uec_set_ringparam: set ring param for running"
> > > > > +					" interface %s failed. Please try to make the interface "
> > > > > +					" down, then try again.\n", netdev->name);
> > > > > +			ug_info->bdRingLenRx[queue] = rx_t;
> > > > > +			ug_info->bdRingLenTx[queue] = tx_t;
> > > > > +		}
> > > > [...]
> > > > 
> > > > Bringing the interface down will call ucc_geth_close(), which will try
> > > > to free resources that have not been allocated!
> > > 
> > > Sorry, I did not understand you on this point. There is no
> > > ucc_geth_close when 'open fail'. What you mean here exactly?
> > [...]
> > 
> > Read your own warning.
> 
> So the warning is mis-leading... We may not need it... Still, let's talk
> about the rare case that 'fail of open', may you please show me what is
> the right thing to do 'clean up work' when ever the open fail. Maybe
> dev_close(net_dev)?

I give up.

Just s/ucc_geth_open/dev_open/ and s/ucc_geth_close/dev_close/.

Ben.
diff mbox

Patch

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index 7d2e33a..5eadfa3 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -3485,7 +3485,7 @@  err:
 
 /* Called when something needs to use the ethernet device */
 /* Returns 0 for success. */
-static int ucc_geth_open(struct net_device *dev)
+int ucc_geth_open(struct net_device *dev)
 {
 	struct ucc_geth_private *ugeth = netdev_priv(dev);
 	int err;
@@ -3542,7 +3542,7 @@  err:
 }
 
 /* Stops the kernel queue, and halts the controller */
-static int ucc_geth_close(struct net_device *dev)
+int ucc_geth_close(struct net_device *dev)
 {
 	struct ucc_geth_private *ugeth = netdev_priv(dev);
 
diff --git a/drivers/net/ucc_geth.h b/drivers/net/ucc_geth.h
index 05a9558..1cfb400 100644
--- a/drivers/net/ucc_geth.h
+++ b/drivers/net/ucc_geth.h
@@ -1235,5 +1235,7 @@  int init_flow_control_params(u32 automatic_flow_control_mode,
 		u32 __iomem *upsmr_register, u32 __iomem *uempr_register,
 		u32 __iomem *maccfg1_register);
 
+extern int ucc_geth_open(struct net_device *);
+extern int ucc_geth_close(struct net_device *);
 
 #endif				/* __UCC_GETH_H__ */
diff --git a/drivers/net/ucc_geth_ethtool.c b/drivers/net/ucc_geth_ethtool.c
index 6f92e48..644a6db 100644
--- a/drivers/net/ucc_geth_ethtool.c
+++ b/drivers/net/ucc_geth_ethtool.c
@@ -255,13 +255,30 @@  uec_set_ringparam(struct net_device *netdev,
 		return -EINVAL;
 	}
 
-	ug_info->bdRingLenRx[queue] = ring->rx_pending;
-	ug_info->bdRingLenTx[queue] = ring->tx_pending;
-
 	if (netif_running(netdev)) {
-		/* FIXME: restart automatically */
-		printk(KERN_INFO
-			"Please re-open the interface.\n");
+		u16 rx_t;
+		u16 tx_t;
+		printk(KERN_INFO "Stopping interface %s.\n", netdev->name);
+		ucc_geth_close(netdev);
+
+		rx_t = ug_info->bdRingLenRx[queue];
+		tx_t = ug_info->bdRingLenTx[queue];
+
+		ug_info->bdRingLenRx[queue] = ring->rx_pending;
+		ug_info->bdRingLenTx[queue] = ring->tx_pending;
+
+		printk(KERN_INFO "Reactivating interface %s.\n", netdev->name);
+		ret = ucc_geth_open(netdev);
+		if (ret) {
+			printk(KERN_WARNING "uec_set_ringparam: set ring param for running"
+					" interface %s failed. Please try to make the interface "
+					" down, then try again.\n", netdev->name);
+			ug_info->bdRingLenRx[queue] = rx_t;
+			ug_info->bdRingLenTx[queue] = tx_t;
+		}
+	} else {
+		ug_info->bdRingLenRx[queue] = ring->rx_pending;
+		ug_info->bdRingLenTx[queue] = ring->tx_pending;
 	}
 
 	return ret;