Message ID | 20141002201203.GA6001@oracle.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
From: Sowmini Varadhan <sowmini.varadhan@oracle.com> Date: Thu, 2 Oct 2014 16:12:03 -0400 > The patch is attached to the end of this email. There are an explosion of simplifications and optimizations possilbe once you've done a NAPI conversion, which I haven't seen you perform here in this patch. For example, you can now move everything into software IRQ context, just disable the VIO interrupt and unconditionally go into NAPI context from the VIO event. No more irqsave/irqrestore. Then the TX path even can run mostly lockless, it just needs to hold the VIO lock for a minute period of time. The caller holds the xmit_lock of the network device to prevent re-entry into the ->ndo_start_xmit() path. Really, what you've done here as a NAPI conversion is just the beginning. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On (10/02/14 13:43), David Miller wrote: > For example, you can now move everything into software IRQ context, > just disable the VIO interrupt and unconditionally go into NAPI > context from the VIO event. > No more irqsave/irqrestore. > Then the TX path even can run mostly lockless, it just needs > to hold the VIO lock for a minute period of time. The caller > holds the xmit_lock of the network device to prevent re-entry > into the ->ndo_start_xmit() path. > let me check into this and get back. I think the xmit path may also need to have some kind of locking for the dring access and ldc_write? I think you are suggesting that I should also move the control-packet processing to vnet_event_napi(), which I have not done in my patch. I will examine where that leads. But there is one thing that I do not understand - why does my hack to lie to net_rx_action() by always returning "budget" make such a difference to throughput? Even if I set the budget to be as low as 64 (so I would get called repeatedly under NAPIs polling infra), I have to turn on the "liar" commented code in my patch for the throughput shoots up to 2 - 2.5 Gbps (whereas it's otherwise around 300 Mbps). Eyeballing the net_rx_action() code quickly did not make the explanation for this pop out at me (yet). Pure polling (i.e., workq) gives me about 1.5 Gbps, and pure-tasklet (i.e, just setting up a tasklet from vnet_event to handle data) gives me approx 2 Gbps. So I dont understand why NAPI doesn't give me something similar, if I adhere strictly to the documentation. --Sowmini -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Sowmini Varadhan <sowmini.varadhan@oracle.com> Date: Fri, 3 Oct 2014 10:40:24 -0400 > On (10/02/14 13:43), David Miller wrote: >> For example, you can now move everything into software IRQ context, >> just disable the VIO interrupt and unconditionally go into NAPI >> context from the VIO event. >> No more irqsave/irqrestore. >> Then the TX path even can run mostly lockless, it just needs >> to hold the VIO lock for a minute period of time. The caller >> holds the xmit_lock of the network device to prevent re-entry >> into the ->ndo_start_xmit() path. >> > > let me check into this and get back. I think the xmit path > may also need to have some kind of locking for the dring access > and ldc_write? I think you are suggesting that I should also > move the control-packet processing to vnet_event_napi(), which > I have not done in my patch. I will examine where that leads. I think you should be able to get rid of all of the in-driver locking in the fast paths. NAPI ->poll() is non-reentrant, so all RX processing occurs strictly in a serialized environment. Once you do TX reclaim in NAPI context, then all you have to do is take the generic netdev TX queue lock during the evaluation of whether to wakeup the TX queue or not. Worst case you need to hold the TX netdev queue lock across the whole TX reclaim operation. The VIO lock really ought to be entirely superfluous in the data paths. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> I think you should be able to get rid of all of the in-driver > locking in the fast paths. > > NAPI ->poll() is non-reentrant, so all RX processing occurs > strictly in a serialized environment. > > Once you do TX reclaim in NAPI context, then all you have to do is > take the generic netdev TX queue lock during the evaluation of whether > to wakeup the TX queue or not. Worst case you need to hold the > TX netdev queue lock across the whole TX reclaim operation. > > The VIO lock really ought to be entirely superfluous in the data > paths. A few clarifications, since there are more driver-examples using NAPI for Rx than for Tx reclaim so I can move the LDC_EVENT_RESET/LDC_EVENT_UP processing code into the napi callback, and that enables the removal of irqsave/restore for vio.lock from vio_port_up at the least (I do this conditional on in_softirq() so as to not perturb vdc code at the moment) But there are still a lot of irqsaves at the ldc layer for the lp lock. I dont know if these can/should be optimized out. I looked at tg3 for a template on how to use NAPI in the TX path The analog of the tg3_poll_work->tg3_tx invocation is probably the maybe_tx_wakeup call triggered from the Rx side vnet processing, which, with NAPI happens naturally from softirq context (no need for extra tasklet). Regarding rcu locking of port_list and the hash in struct vnet_port, the thorn here is that vnet_set_rx_mode may end up allocating a vnet_mcast_entry as part of __update_mc_list (there may be a different bug here in that it assumes that the first entry is the switch_port, and this is the only switch_port) I dont know of a simple way to avoid that (a rwlock just for this function?!). But we still need to hold the vio lock around the ldc_write (and also around dring write) in vnet_start_xmit, right? --Sowmini -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Sowmini Varadhan <sowmini.varadhan@oracle.com> Date: Mon, 6 Oct 2014 12:04:18 -0400 >> I think you should be able to get rid of all of the in-driver >> locking in the fast paths. >> >> NAPI ->poll() is non-reentrant, so all RX processing occurs >> strictly in a serialized environment. >> >> Once you do TX reclaim in NAPI context, then all you have to do is >> take the generic netdev TX queue lock during the evaluation of whether >> to wakeup the TX queue or not. Worst case you need to hold the >> TX netdev queue lock across the whole TX reclaim operation. >> >> The VIO lock really ought to be entirely superfluous in the data >> paths. > > A few clarifications, since there are more driver-examples using NAPI for > Rx than for Tx reclaim Those drivers fully go against our recommendations, we always say that TX reclaim should also run from NAPI because it liberates SKBs that therefore become available for RX processing. > But we still need to hold the vio lock around the ldc_write > (and also around dring write) in vnet_start_xmit, right? You might be able to avoid it, you're fully serialized by the TX queue lock. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On (10/06/14 15:25), David Miller wrote: > > > But we still need to hold the vio lock around the ldc_write > > (and also around dring write) in vnet_start_xmit, right? > > You might be able to avoid it, you're fully serialized by the TX queue > lock. yes, I was just noticing that. The only place where I believe I need to hold the vio spin-lock is to sync with the dr->cons checks (the "should I send a start_cons LDC message?" check in vnet_start_xmit() vs the vnet_ack() updates). But isn't it better in general to declare NETIF_F_LLTX and have finer lock granularity in the driver? --Sowmini -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Sowmini Varadhan <sowmini.varadhan@oracle.com> Date: Mon, 6 Oct 2014 15:31:11 -0400 > On (10/06/14 15:25), David Miller wrote: >> >> > But we still need to hold the vio lock around the ldc_write >> > (and also around dring write) in vnet_start_xmit, right? >> >> You might be able to avoid it, you're fully serialized by the TX queue >> lock. > > yes, I was just noticing that. The only place where I believe I need > to hold the vio spin-lock is to sync with the dr->cons checks > (the "should I send a start_cons LDC message?" check in vnet_start_xmit() > vs the vnet_ack() updates). I don't see how that is any different from the netif_queue_wake() checks, it should be just as easy to cover it with the generic xmit lock. > But isn't it better in general to declare NETIF_F_LLTX and have finer lock > granularity in the driver? No, NETIF_F_LLTX drivers are heavily discouraged. And on the contrary, it's easier to optimize the locking when we can consolidate what is covered at both the mid-level of the networking send paths and what the drivers need in their ->ndo_start_xmit() routines. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I would like to share my experience, on sparc we need more parallelism to get the high performance that's just the nature of CMT processors(at least today). Lock less Tx and Rx implementation is very nice, but requires to the code path to single threaded to achieve it. This may limit the performance to the single thread performance, that may be not very high for standard MTU packets. The large MTU packets like 64K MTU may have some advantage due to less processing both in the stack and the driver, but still single thread would limit the max performance. I would suggest explore both lock less and high parallelism methods to see which one gives the best performance. -Raghuram On Oct 6, 2014, at 12:31 PM, Sowmini Varadhan <sowmini.varadhan@oracle.com> wrote: > On (10/06/14 15:25), David Miller wrote: >> >>> But we still need to hold the vio lock around the ldc_write >>> (and also around dring write) in vnet_start_xmit, right? >> >> You might be able to avoid it, you're fully serialized by the TX queue >> lock. > > yes, I was just noticing that. The only place where I believe I need > to hold the vio spin-lock is to sync with the dr->cons checks > (the "should I send a start_cons LDC message?" check in vnet_start_xmit() > vs the vnet_ack() updates). > > But isn't it better in general to declare NETIF_F_LLTX and have finer lock > granularity in the driver? > > --Sowmini > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Raghuram Kothakota <Raghuram.Kothakota@oracle.com> Date: Thu, 9 Oct 2014 18:10:24 -0700 > Lock less Tx and Rx implementation is very nice, but requires to the > code path to single threaded to achieve it. I would like to know how you believe the Linux LLTX "lockless TX" facility works. It's not truly lockless, it just turns off the generic networking TX path per-queue spinlock and puts the full burdon on the driver. It's just moving the locking from one place to another, not eliminating it. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Oct 9, 2014, at 9:36 PM, David Miller <davem@davemloft.net> wrote: > From: Raghuram Kothakota <Raghuram.Kothakota@oracle.com> > Date: Thu, 9 Oct 2014 18:10:24 -0700 > >> Lock less Tx and Rx implementation is very nice, but requires to the >> code path to single threaded to achieve it. > > I would like to know how you believe the Linux LLTX "lockless TX" > facility works. > Sorry, I used incorrect terminology in my email. My knowledge of LLTX is limited and I am still learning. I was not referring to the LLTX, but about the implementation of sunvnet transmit path and receive paths without locks. To me that means only one thread of execution exists at a given time and I was referring to it as single threadedness, which limits performance on SPARC CMT processors today. Using methods to increase parallelism will help especially when the traffic involves multiple connections, mainly from the point of view of using multiple vCPUs to perform the processing where possible. -Raghuram > It's not truly lockless, it just turns off the generic networking TX > path per-queue spinlock and puts the full burdon on the driver. It's > just moving the locking from one place to another, not eliminating it. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Raghuram Kothakota <Raghuram.Kothakota@oracle.com> Date: Thu, 9 Oct 2014 21:56:45 -0700 > Sorry, I used incorrect terminology in my email. My knowledge of LLTX > is limited and I am still learning. I was not referring to the LLTX, but about > the implementation of sunvnet transmit path and receive paths without locks. To me that > means only one thread of execution exists at a given time and I was > referring to it as single threadedness, which limits performance on SPARC CMT > processors today. Using methods to increase parallelism will help especially > when the traffic involves multiple connections, mainly from the point of view > of using multiple vCPUs to perform the processing where possible. Let's not speak in generalities, but rather about specifical implementations of specific things. Linux's TX path it fully parallelized and multiqueue, so you can have as many parallel TX threads of control executing over a specific device as you can provide TX queues. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Linux's TX path it fully parallelized and multiqueue, so you can have > as many parallel TX threads of control executing over a specific > device as you can provide TX queues. Thanks, I guess we need to explore Tx queues as well. -Raghuram > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I have the NAPIfication-of-sunvnet patch-set ready for code review. AFAIK net-next is not currently open for new features this week, but does it make sense for me to go ahead and send the patch-set to the list (I'm sure it will take some time to review) or will that just create confusion to the netdev maintainers? --Sowmini Previously discussed: > >> For example, you can now move everything into software IRQ context, > >> just disable the VIO interrupt and unconditionally go into NAPI > >> context from the VIO event. > >> No more irqsave/irqrestore. > >> Then the TX path even can run mostly lockless, it just needs > >> to hold the VIO lock for a minute period of time. The caller > >> holds the xmit_lock of the network device to prevent re-entry > >> into the ->ndo_start_xmit() path. > > I think you should be able to get rid of all of the in-driver > locking in the fast paths. > > NAPI ->poll() is non-reentrant, so all RX processing occurs > strictly in a serialized environment. > > Once you do TX reclaim in NAPI context, then all you have to do is > take the generic netdev TX queue lock during the evaluation of whether > to wakeup the TX queue or not. Worst case you need to hold the > TX netdev queue lock across the whole TX reclaim operation. > > The VIO lock really ought to be entirely superfluous in the data > paths. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c index 1539672..da05d68 100644 --- a/drivers/net/ethernet/sun/sunvnet.c +++ b/drivers/net/ethernet/sun/sunvnet.c @@ -33,6 +33,9 @@ #define DRV_MODULE_VERSION "1.0" #define DRV_MODULE_RELDATE "June 25, 2007" +/* #define VNET_BUDGET 2048 */ /* see comments in vnet_poll() */ +#define VNET_BUDGET 64 /* typical budget */ + static char version[] = DRV_MODULE_NAME ".c:v" DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")\n"; MODULE_AUTHOR("David S. Miller (davem@davemloft.net)"); @@ -274,6 +277,7 @@ static struct sk_buff *alloc_and_align_skb(struct net_device *dev, return skb; } +/* reads in exactly one sk_buff */ static int vnet_rx_one(struct vnet_port *port, unsigned int len, struct ldc_trans_cookie *cookies, int ncookies) { @@ -311,9 +315,7 @@ static int vnet_rx_one(struct vnet_port *port, unsigned int len, dev->stats.rx_packets++; dev->stats.rx_bytes += len; - - netif_rx(skb); - + napi_gro_receive(&port->napi, skb); return 0; out_free_skb: @@ -444,6 +446,7 @@ static int vnet_walk_rx_one(struct vnet_port *port, desc->cookies[0].cookie_addr, desc->cookies[0].cookie_size); + /* read in one packet */ err = vnet_rx_one(port, desc->size, desc->cookies, desc->ncookies); if (err == -ECONNRESET) return err; @@ -456,10 +459,11 @@ static int vnet_walk_rx_one(struct vnet_port *port, } static int vnet_walk_rx(struct vnet_port *port, struct vio_dring_state *dr, - u32 start, u32 end) + u32 start, u32 end, int *npkts, int budget) { struct vio_driver_state *vio = &port->vio; int ack_start = -1, ack_end = -1; + int send_ack = 1; end = (end == (u32) -1) ? prev_idx(start, dr) : next_idx(end, dr); @@ -471,6 +475,7 @@ static int vnet_walk_rx(struct vnet_port *port, struct vio_dring_state *dr, return err; if (err != 0) break; + (*npkts)++; if (ack_start == -1) ack_start = start; ack_end = start; @@ -482,13 +487,28 @@ static int vnet_walk_rx(struct vnet_port *port, struct vio_dring_state *dr, return err; ack_start = -1; } + if ((*npkts) >= budget ) { + send_ack = 0; + break; + } } if (unlikely(ack_start == -1)) ack_start = ack_end = prev_idx(start, dr); - return vnet_send_ack(port, dr, ack_start, ack_end, VIO_DRING_STOPPED); + if (send_ack) { + int ret; + port->napi_resume = false; + ret = vnet_send_ack(port, dr, ack_start, ack_end, + VIO_DRING_STOPPED); + return ret; + } else { + port->napi_resume = true; + port->napi_stop_idx = ack_end; + return (56); + } } -static int vnet_rx(struct vnet_port *port, void *msgbuf) +static int vnet_rx(struct vnet_port *port, void *msgbuf, int *npkts, + int budget) { struct vio_dring_data *pkt = msgbuf; struct vio_dring_state *dr = &port->vio.drings[VIO_DRIVER_RX_RING]; @@ -505,11 +525,13 @@ static int vnet_rx(struct vnet_port *port, void *msgbuf) return 0; } - dr->rcv_nxt++; + if (!port->napi_resume) + dr->rcv_nxt++; /* XXX Validate pkt->start_idx and pkt->end_idx XXX */ - return vnet_walk_rx(port, dr, pkt->start_idx, pkt->end_idx); + return vnet_walk_rx(port, dr, pkt->start_idx, pkt->end_idx, + npkts, budget); } static int idx_is_pending(struct vio_dring_state *dr, u32 end) @@ -534,7 +556,10 @@ static int vnet_ack(struct vnet_port *port, void *msgbuf) struct net_device *dev; struct vnet *vp; u32 end; + unsigned long flags; struct vio_net_desc *desc; + bool need_trigger = false; + if (unlikely(pkt->tag.stype_env != VIO_DRING_DATA)) return 0; @@ -545,21 +570,17 @@ static int vnet_ack(struct vnet_port *port, void *msgbuf) /* sync for race conditions with vnet_start_xmit() and tell xmit it * is time to send a trigger. */ + spin_lock_irqsave(&port->vio.lock, flags); dr->cons = next_idx(end, dr); desc = vio_dring_entry(dr, dr->cons); - if (desc->hdr.state == VIO_DESC_READY && port->start_cons) { - /* vnet_start_xmit() just populated this dring but missed - * sending the "start" LDC message to the consumer. - * Send a "start" trigger on its behalf. - */ - if (__vnet_tx_trigger(port, dr->cons) > 0) - port->start_cons = false; - else - port->start_cons = true; - } else { - port->start_cons = true; - } + if (desc->hdr.state == VIO_DESC_READY && !port->start_cons) + need_trigger = true; + else + port->start_cons = true; /* vnet_start_xmit will send trigger */ + spin_unlock_irqrestore(&port->vio.lock, flags); + if (need_trigger && __vnet_tx_trigger(port, dr->cons) <= 0) + port->start_cons = true; vp = port->vp; dev = vp->dev; @@ -617,33 +638,12 @@ static void maybe_tx_wakeup(unsigned long param) netif_tx_unlock(dev); } -static void vnet_event(void *arg, int event) +static int vnet_event_napi(struct vnet_port *port, int budget) { - struct vnet_port *port = arg; struct vio_driver_state *vio = &port->vio; - unsigned long flags; int tx_wakeup, err; - spin_lock_irqsave(&vio->lock, flags); - - if (unlikely(event == LDC_EVENT_RESET || - event == LDC_EVENT_UP)) { - vio_link_state_change(vio, event); - spin_unlock_irqrestore(&vio->lock, flags); - - if (event == LDC_EVENT_RESET) { - port->rmtu = 0; - vio_port_up(vio); - } - return; - } - - if (unlikely(event != LDC_EVENT_DATA_READY)) { - pr_warn("Unexpected LDC event %d\n", event); - spin_unlock_irqrestore(&vio->lock, flags); - return; - } - + int npkts = 0; tx_wakeup = err = 0; while (1) { union { @@ -651,6 +651,20 @@ static void vnet_event(void *arg, int event) u64 raw[8]; } msgbuf; + if (port->napi_resume) { + struct vio_dring_data *pkt = + (struct vio_dring_data *)&msgbuf; + struct vio_dring_state *dr = + &port->vio.drings[VIO_DRIVER_RX_RING]; + + pkt->tag.type = VIO_TYPE_DATA; + pkt->tag.stype = VIO_SUBTYPE_INFO; + pkt->tag.stype_env = VIO_DRING_DATA; + pkt->seq = dr->rcv_nxt; + pkt->start_idx = next_idx(port->napi_stop_idx, dr); + pkt->end_idx = -1; + goto napi_resume; + } err = ldc_read(vio->lp, &msgbuf, sizeof(msgbuf)); if (unlikely(err < 0)) { if (err == -ECONNRESET) @@ -667,10 +681,12 @@ static void vnet_event(void *arg, int event) err = vio_validate_sid(vio, &msgbuf.tag); if (err < 0) break; - +napi_resume: if (likely(msgbuf.tag.type == VIO_TYPE_DATA)) { if (msgbuf.tag.stype == VIO_SUBTYPE_INFO) { - err = vnet_rx(port, &msgbuf); + err = vnet_rx(port, &msgbuf, &npkts, budget); + if (npkts >= budget || npkts == 0) + break; } else if (msgbuf.tag.stype == VIO_SUBTYPE_ACK) { err = vnet_ack(port, &msgbuf); if (err > 0) @@ -691,15 +707,54 @@ static void vnet_event(void *arg, int event) if (err == -ECONNRESET) break; } - spin_unlock(&vio->lock); - /* Kick off a tasklet to wake the queue. We cannot call - * maybe_tx_wakeup directly here because we could deadlock on - * netif_tx_lock() with dev_watchdog() - */ if (unlikely(tx_wakeup && err != -ECONNRESET)) tasklet_schedule(&port->vp->vnet_tx_wakeup); + return npkts; +} +static int vnet_poll(struct napi_struct *napi, int budget) +{ + struct vnet_port *port = container_of(napi, struct vnet_port, napi); + struct vio_driver_state *vio = &port->vio; + int processed = vnet_event_napi(port, budget); + + if (processed < budget) { + napi_complete(napi); + napi_reschedule(napi); + vio_set_intr(vio->vdev->rx_ino, HV_INTR_ENABLED); + } + /* return budget; */ /* liar! but better throughput! */ + return processed; +} + +static void vnet_event(void *arg, int event) +{ + struct vnet_port *port = arg; + struct vio_driver_state *vio = &port->vio; + unsigned long flags; + + spin_lock_irqsave(&vio->lock, flags); + + if (unlikely(event == LDC_EVENT_RESET || + event == LDC_EVENT_UP)) { + vio_link_state_change(vio, event); + spin_unlock_irqrestore(&vio->lock, flags); + + if (event == LDC_EVENT_RESET) + vio_port_up(vio); + return; + } + + if (unlikely(event != LDC_EVENT_DATA_READY)) { + pr_warning("Unexpected LDC event %d\n", event); + spin_unlock_irqrestore(&vio->lock, flags); + return; + } + + spin_unlock(&vio->lock); local_irq_restore(flags); + vio_set_intr(vio->vdev->rx_ino, HV_INTR_DISABLED); + napi_schedule(&port->napi); } static int __vnet_tx_trigger(struct vnet_port *port, u32 start) @@ -1342,6 +1397,22 @@ err_out: return err; } +#ifdef CONFIG_NET_POLL_CONTROLLER +static void vnet_poll_controller(struct net_device *dev) +{ + struct vnet *vp = netdev_priv(dev); + struct vnet_port *port; + unsigned long flags; + + spin_lock_irqsave(&vp->lock, flags); + if (!list_empty(&vp->port_list)) { + port = list_entry(vp->port_list.next, struct vnet_port, list); + napi_schedule(&port->napi); + } + spin_unlock_irqrestore(&vp->lock, flags); + +} +#endif static LIST_HEAD(vnet_list); static DEFINE_MUTEX(vnet_list_mutex); @@ -1354,6 +1425,9 @@ static const struct net_device_ops vnet_ops = { .ndo_tx_timeout = vnet_tx_timeout, .ndo_change_mtu = vnet_change_mtu, .ndo_start_xmit = vnet_start_xmit, +#ifdef CONFIG_NET_POLL_CONTROLLER + .ndo_poll_controller = vnet_poll_controller, +#endif }; static struct vnet *vnet_new(const u64 *local_mac) @@ -1536,6 +1610,9 @@ static int vnet_port_probe(struct vio_dev *vdev, const struct vio_device_id *id) if (err) goto err_out_free_port; + netif_napi_add(port->vp->dev, &port->napi, vnet_poll, VNET_BUDGET); + napi_enable(&port->napi); + err = vnet_port_alloc_tx_bufs(port); if (err) goto err_out_free_ldc; @@ -1592,6 +1669,8 @@ static int vnet_port_remove(struct vio_dev *vdev) del_timer_sync(&port->vio.timer); del_timer_sync(&port->clean_timer); + napi_disable(&port->napi); + spin_lock_irqsave(&vp->lock, flags); list_del(&port->list); hlist_del(&port->hash); hlist_del(&port->hash); diff --git a/drivers/net/ethernet/sun/sunvnet.h b/drivers/net/ethernet/sun/sunvnet.h index c911045..3c745d0 100644 --- a/drivers/net/ethernet/sun/sunvnet.h +++ b/drivers/net/ethernet/sun/sunvnet.h @@ -56,6 +56,10 @@ struct vnet_port { struct timer_list clean_timer; u64 rmtu; + + struct napi_struct napi; + u32 napi_stop_idx; + bool napi_resume; }; static inline struct vnet_port *to_vnet_port(struct vio_driver_state *vio)