Message ID | 1283443364-2136-1-git-send-email-liang.li@windriver.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Kumar Gala |
Headers | show |
On Fri, 2010-09-03 at 00:02 +0800, Liang Li wrote: > It's common sense that when we should do change to driver ring > desc/buffer etc only after 'stop/shutdown' the device. When we > do change while devices/driver is running, kernel oops occur: [...] > + printk(KERN_INFO "Reactivating interface %s.\n", netdev->name); > + ret = ucc_geth_open(netdev); > + if (ret) { > + printk(KERN_WARNING "uec_set_ringparam: set ring param for running" > + " interface %s failed. Please try again.\n", netdev->name); > + dev_close(netdev); [...] If ucc_geth_open() failed you MUST NOT call ucc_geth_close(), but that is what dev_close() is going to do. But the device is still flagged as running so 'ifconfig down' is going to call dev_close() as well. There is no way out. This is why I said you must call dev_close() and then dev_open() instead. Then if dev_open() fails, just print the error, e.g.: dev_close(netdev); ret = dev_open(netdev); if (ret) netdev_err(netdev, "uec_set_ringparam: failed to restart" " interface with new ring parameters\n"); (And I think this really is a serious error, hence the 'err' rather than 'warning' severity.) (By the way, I noticed there are other places where ucc_geth_close() and ucc_geth_open() are called, without error checking. These are also bugs, but that doesn't justify adding new bugs.) Ben.
On Thu, Sep 02, 2010 at 07:04:59PM +0100, Ben Hutchings wrote: > On Fri, 2010-09-03 at 00:02 +0800, Liang Li wrote: > > It's common sense that when we should do change to driver ring > > desc/buffer etc only after 'stop/shutdown' the device. When we > > do change while devices/driver is running, kernel oops occur: > [...] > > + printk(KERN_INFO "Reactivating interface %s.\n", netdev->name); > > + ret = ucc_geth_open(netdev); > > + if (ret) { > > + printk(KERN_WARNING "uec_set_ringparam: set ring param for running" > > + " interface %s failed. Please try again.\n", netdev->name); > > + dev_close(netdev); > [...] > > If ucc_geth_open() failed you MUST NOT call ucc_geth_close(), but that > is what dev_close() is going to do. But the device is still flagged as > running so 'ifconfig down' is going to call dev_close() as well. There > is no way out. dev_close is safe enough IMHO. Call dev_close repeatly won't cause problem though. > > This is why I said you must call dev_close() and then dev_open() > instead. Then if dev_open() fails, just print the error, e.g.: > > dev_close(netdev); > ret = dev_open(netdev); > if (ret) > netdev_err(netdev, > "uec_set_ringparam: failed to restart" > " interface with new ring parameters\n"); > > (And I think this really is a serious error, hence the 'err' rather than > 'warning' severity.) I checked NIC drivers in drivers/net, there is no such: dev_close(netdev); ret = dev_open(netdev) if (ret) netdev_err(...); Instead, there are: nic_driver_close/down(netdev); ret = nic_driver_open/restart(netdev); if (ret) { waring; dev_close(netdev); } > > (By the way, I noticed there are other places where ucc_geth_close() and > ucc_geth_open() are called, without error checking. These are also > bugs, but that doesn't justify adding new bugs.) I think I did not invite new bugs, as I mentioned before, can you show scenario that the reopen fail and perfect cleanup way? Thanks, -Liang Li > > Ben. > > -- > Ben Hutchings, Senior Software Engineer, Solarflare Communications > Not speaking for my employer; that's the marketing department's job. > They asked us to note that Solarflare product names are trademarked.
From: Liang Li <liang.li@windriver.com> Date: Fri, 3 Sep 2010 00:02:44 +0800 > + printk(KERN_INFO "Stopping interface %s.\n", netdev->name); ... > + printk(KERN_INFO "Reactivating interface %s.\n", netdev->name); Please get rid of these log messages. Also, this isn't the way to implement this. Abstract out the one and only operation that can fail when you change the ring size, which is the memory allocations, into seperate code. Allocate the newly sized rings first, and if that fails abort the ring size change attempt. This will let the device continue to function and operate properly even if the ring resizing fails. Then stop the RX/TX DMA, switch the ring pointers and configuration inside of the device, restart RX/TX DMA, and finally free up the old ring memory. I know why you want to use *_close() and *_open(), it requires less coding and work on your part. But this is why the error handling behavior is difficult and what happens on failure (device goes down and becomes inoperative) is extremely unpleasant for the user. The user should be able to make this kind of change over an SSH shell that uses the interface itself, and not expect to lose connectivity completely just because the ring allocation fails. Thanks.
diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c index 7d2e33a..5eadfa3 100644 --- a/drivers/net/ucc_geth.c +++ b/drivers/net/ucc_geth.c @@ -3485,7 +3485,7 @@ err: /* Called when something needs to use the ethernet device */ /* Returns 0 for success. */ -static int ucc_geth_open(struct net_device *dev) +int ucc_geth_open(struct net_device *dev) { struct ucc_geth_private *ugeth = netdev_priv(dev); int err; @@ -3542,7 +3542,7 @@ err: } /* Stops the kernel queue, and halts the controller */ -static int ucc_geth_close(struct net_device *dev) +int ucc_geth_close(struct net_device *dev) { struct ucc_geth_private *ugeth = netdev_priv(dev); diff --git a/drivers/net/ucc_geth.h b/drivers/net/ucc_geth.h index 05a9558..1cfb400 100644 --- a/drivers/net/ucc_geth.h +++ b/drivers/net/ucc_geth.h @@ -1235,5 +1235,7 @@ int init_flow_control_params(u32 automatic_flow_control_mode, u32 __iomem *upsmr_register, u32 __iomem *uempr_register, u32 __iomem *maccfg1_register); +extern int ucc_geth_open(struct net_device *); +extern int ucc_geth_close(struct net_device *); #endif /* __UCC_GETH_H__ */ diff --git a/drivers/net/ucc_geth_ethtool.c b/drivers/net/ucc_geth_ethtool.c index 6f92e48..2a615ee 100644 --- a/drivers/net/ucc_geth_ethtool.c +++ b/drivers/net/ucc_geth_ethtool.c @@ -255,13 +255,28 @@ uec_set_ringparam(struct net_device *netdev, return -EINVAL; } - ug_info->bdRingLenRx[queue] = ring->rx_pending; - ug_info->bdRingLenTx[queue] = ring->tx_pending; - if (netif_running(netdev)) { - /* FIXME: restart automatically */ - printk(KERN_INFO - "Please re-open the interface.\n"); + u16 rx_t; + u16 tx_t; + printk(KERN_INFO "Stopping interface %s.\n", netdev->name); + ucc_geth_close(netdev); + + rx_t = ug_info->bdRingLenRx[queue]; + tx_t = ug_info->bdRingLenTx[queue]; + + ug_info->bdRingLenRx[queue] = ring->rx_pending; + ug_info->bdRingLenTx[queue] = ring->tx_pending; + + printk(KERN_INFO "Reactivating interface %s.\n", netdev->name); + ret = ucc_geth_open(netdev); + if (ret) { + printk(KERN_WARNING "uec_set_ringparam: set ring param for running" + " interface %s failed. Please try again.\n", netdev->name); + dev_close(netdev); + } + } else { + ug_info->bdRingLenRx[queue] = ring->rx_pending; + ug_info->bdRingLenTx[queue] = ring->tx_pending; } return ret;
It's common sense that when we should do change to driver ring desc/buffer etc only after 'stop/shutdown' the device. When we do change while devices/driver is running, kernel oops occur: [ root@fsl_8569mds:/root> ethtool -G eth0 tx 256 root@fsl_8569mds:/root> Oops: Kernel access of bad area, sig: 11 [#1] MPC8569 MDS last sysfs file: /sys/kernel/uevent_seqnum Modules linked in: NIP: 00000000 LR: c0072fbc CTR: 00000000 REGS: effefef0 TRAP: 0400 Not tainted (2.6.36-rc3-00002-g6c3b118-dirty) MSR: 00021000 <ME,CE> CR: 24442048 XER: 00000000 TASK = c0518350[0] 'swapper' THREAD: c0544000 GPR00: 00000000 effeffa0 c0518350 00000020 ef0be000 ef005000 80000000 00000200 GPR08: c03b5d00 00000000 f1010080 ef08d458 000dda96 00000000 3ffb2900 00000000 GPR16: 00000000 3ffa8948 3fff1314 3ffac3f8 00000000 00000000 00000000 00000000 GPR24: 00000000 00000000 00000000 c0530000 00000000 00000000 00000020 ef3709c0 NIP [00000000] (null) LR [c0072fbc] handle_IRQ_event+0x4c/0x12c Call Trace: [effeffa0] [c0019414] qe_ic_mask_irq+0x1c/0x90 (unreliable) [effeffc0] [c0075b74] handle_level_irq+0x88/0x128 [effeffd0] [c001ca44] qe_ic_cascade_muxed_mpic+0x50/0x88 [effefff0] [c000d5fc] call_handle_irq+0x18/0x28 [c0545ea0] [c0004f3c] do_IRQ+0xa8/0x140 [c0545ed0] [c000e2bc] ret_from_except+0x0/0x18 -- Exception: 501 at cpu_idle+0x9c/0xdc LR = cpu_idle+0x9c/0xdc [c0545f90] [c0008318] cpu_idle+0x54/0xdc (unreliable) [c0545fb0] [c000231c] rest_init+0x68/0x7c [c0545fc0] [c04e686c] start_kernel+0x230/0x2b0 [c0545ff0] [c000039c] skpinv+0x2b4/0x2f0 Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX Kernel panic - not syncing: Fatal exception in interrupt Rebooting in 180 seconds.. ] Then the natural solution would be 'stop driver/device then adjust ring buffer parameter then reactivate driver/device'. v1: add roll back state for 'auto reopen fail' v2: just do dev_close when 'reopen fail' Signed-off-by: Liang Li <liang.li@windriver.com> --- drivers/net/ucc_geth.c | 4 ++-- drivers/net/ucc_geth.h | 2 ++ drivers/net/ucc_geth_ethtool.c | 27 +++++++++++++++++++++------ 3 files changed, 25 insertions(+), 8 deletions(-)